XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation Nancy Ide Department of Computer Science Vassar College
XMELLT Participants zDepartment of Computer Science, Vassar College zInternational Computer Science Institute, University of California, Berkeley zDepartment of Computer Science, New York University zComputing Research Laboratory, New Mexico State University
XMELLT Framework zPlanning project yone-year time frame zOriginally submitted as a joint NSF-EU project with additional European partners yIstituto di Linguistica Computazionale, CNR, Pisa yInstitut für Maschinelle Sprachverarbeitung, Stuttgart yLexiQuest, Paris
XMELLT Overall goal zdefine a core international infrastructure to support the creation of a multi-lingual multi-word expression lexicon incorporating both morpho-syntactic and semantic information
XMELLT Specific aims zdetermine the type and dimensions of information to serve the needs of critical NLP applications zspecify an overall architecture for a joint software and lingware development project
XMELLT Aims... Explore the possibilities for recognizing and acquiring multi-word lexical units from corpora by means of partial parsing, statistics, etc. zOutline a collaborative project to acquire and represent multi-word lexical entries for multiple languages
XMELLT Motivation Multi-word constructions are extremely frequent in language y~30%of the lexical stock zExisting resources do not adequately treat multi-word expressions
XMELLT Limitations zconstructed for particular system or application yincorporate tailored information (e.g., primarily syntax with little semantics) ynot reusable zmost devoted to a single language and/or approach
XMELLT Limitations... znot flexible, expandable to multiple languages yMT systems' lexicons are typically little more than "translation memories" yNo interface among single-word entries, multi-word entries, syntax, and semantics
XMELLT XMELLT Approach zBroad view of multi-word expressions yidioms, compounds, collocations, co-occurrence patterns zfocus on linking of individual language lexicons yindividual words and multi-word expressions y different types of multi-word expressions xe.g., English noun-noun vs Romance noun-PP
XMELLT Considerations zinternal variation zsub-categorization properties zidiosyncratic constraints on inflection zmeaning (non-)compositionality
XMELLT Encoding Model zCompatible and integrated with existing and de facto standards ye.g., EAGLES, PAROLE/SIMPLE, NOMLEX
XMELLT Activities zAssessment of existing lexical resources for multi-word expressions yDelivery of survey
XMELLT Activities... zCreation of a small set of sample entries yadd lexical information on support verb constructions to 50 nouns drawn from NOMLEX for English, Italian, German, and French ycreate lexical entries for 50 N-N English constructs from the PAROLE/SIMPLE lexicons and corresponding constructs in Italian, German, and French
XMELLT Activities... zDevelop preliminary specifications for structuring and encoding multi-lingual, multi-word expression lexicons yrequired linguistic information yharmonized data architecture and encoding format
XMELLT Activities... zExploration of techniques for automatic acquisition yMonths 1-6: Survey of acquisition techniques, typology of MWE yMonths 7-12: Design of architecture for MWE acquisition
XMELLT Project information zStart date: June (?) zWeb site: zContact: Nancy Ide (PI) Department of Computer Science Vassar College