Presentation is loading. Please wait.

Presentation is loading. Please wait.

XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation.

Similar presentations


Presentation on theme: "XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation."— Presentation transcript:

1 XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation Nancy Ide Department of Computer Science Vassar College

2 XMELLT Participants zDepartment of Computer Science, Vassar College zInternational Computer Science Institute, University of California, Berkeley zDepartment of Computer Science, New York University zComputing Research Laboratory, New Mexico State University

3 XMELLT Framework zPlanning project yone-year time frame zOriginally submitted as a joint NSF-EU project with additional European partners yIstituto di Linguistica Computazionale, CNR, Pisa yInstitut für Maschinelle Sprachverarbeitung, Stuttgart yLexiQuest, Paris

4 XMELLT Overall goal zdefine a core international infrastructure to support the creation of a multi-lingual multi-word expression lexicon incorporating both morpho-syntactic and semantic information

5 XMELLT Specific aims zdetermine the type and dimensions of information to serve the needs of critical NLP applications zspecify an overall architecture for a joint software and lingware development project

6 XMELLT Aims...  Explore the possibilities for recognizing and acquiring multi-word lexical units from corpora by means of partial parsing, statistics, etc. zOutline a collaborative project to acquire and represent multi-word lexical entries for multiple languages

7 XMELLT Motivation  Multi-word constructions are extremely frequent in language y~30%of the lexical stock zExisting resources do not adequately treat multi-word expressions

8 XMELLT Limitations zconstructed for particular system or application yincorporate tailored information (e.g., primarily syntax with little semantics) ynot reusable zmost devoted to a single language and/or approach

9 XMELLT Limitations... znot flexible, expandable to multiple languages yMT systems' lexicons are typically little more than "translation memories" yNo interface among single-word entries, multi-word entries, syntax, and semantics

10 XMELLT XMELLT Approach zBroad view of multi-word expressions yidioms, compounds, collocations, co-occurrence patterns zfocus on linking of individual language lexicons yindividual words and multi-word expressions y different types of multi-word expressions xe.g., English noun-noun vs Romance noun-PP

11 XMELLT Considerations zinternal variation zsub-categorization properties zidiosyncratic constraints on inflection zmeaning (non-)compositionality

12 XMELLT Encoding Model zCompatible and integrated with existing and de facto standards ye.g., EAGLES, PAROLE/SIMPLE, NOMLEX

13 XMELLT Activities zAssessment of existing lexical resources for multi-word expressions yDelivery of survey

14 XMELLT Activities... zCreation of a small set of sample entries yadd lexical information on support verb constructions to 50 nouns drawn from NOMLEX for English, Italian, German, and French ycreate lexical entries for 50 N-N English constructs from the PAROLE/SIMPLE lexicons and corresponding constructs in Italian, German, and French

15 XMELLT Activities... zDevelop preliminary specifications for structuring and encoding multi-lingual, multi-word expression lexicons yrequired linguistic information yharmonized data architecture and encoding format

16 XMELLT Activities... zExploration of techniques for automatic acquisition yMonths 1-6: Survey of acquisition techniques, typology of MWE yMonths 7-12: Design of architecture for MWE acquisition

17 XMELLT Project information zStart date: June (?) zWeb site: zContact: Nancy Ide (PI) Department of Computer Science Vassar College ide@cs.vassar.edu http://www.cs.vassar.edu/~ide/XMELLT.html


Download ppt "XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation."

Similar presentations


Ads by Google