XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation.

Slides:



Advertisements
Similar presentations
Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Advertisements

a Terminological and Statistical Approach
Pisa Research Area National Research Council Computer Science Institutes ERCIM Italian Partners Norma Lijtmaer.
CODE/ CODE SWITCHING.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
A Bilingual Corpus of Inter-linked Events Tommaso Caselli♠, Nancy Ide ♣, Roberto Bartolini ♠ ♠ Istituto di Linguistica Computazionale – ILC-CNR Pisa ♣
Statistical NLP: Lecture 3
The Bulgarian National Corpus and Its Application in Bulgarian Academic Lexicography Diana Blagoeva, Sia Kolkovska, Nadezhda Kostova, Cvetelina Georgieva.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Kakia Chatsiou GreekGram: Building a parallel grammar for Modern Greek LAC day GreekGram Building a parallel grammar for Modern Greek Kakia.
Kakia Chatsiou Modern Greek Grammar fragment Implementation using XLE FLATLANDS GreekGram Reporting on the progress of the implementation.
REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December
CALL 2008 Antwerp Choosing words and their order for vocabulary CALL Cornelia Tschichold Swansea.
Towards an NLP `module’ The role of an utterance-level interface.
Information Retrieval in Practice
EAGLES/ISLE Workshop LREC 2000 Athens, Greece The XML Framework Its Implications for Corpus Access and Use Nancy Ide Department of Computer Science Vassar.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Yaroslavl State Pedagogical University named after K.D.Ushinsky.
1.3 The importance of Morphology.
Intuitive Coding of the Arabic Lexicon Ali Farghaly & Jean Senellart SYSTRAN Software Corporation San Diego, CA & Soisy, France.
Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
1 Define a model 2 Populate the lexicon. Core Model.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Lessons Learned Mokusei: Multilingual Conversational Interfaces Future Plans Explore language-independent approaches to speech understanding and generation.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
Development of NE Wordnet: An Integrated Wordnet for Languages of the North-East India Assamese & Bodo by Utpal Saikia Biswajit Brahma Dibyajyoti Sarmah.
EVikings II WP3: Language Technologies. HLT Human Language Technologies (HLT) play a crucial role in the Information Society For small languages it is.
CLARIN work packages. Conference Place yyyy-mm-dd
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
ENeL WG3 meeting: Automatic Knowledge Acquisition for Lexicography Herstmonceux, August 2015 STARTS AT 2:30 PM.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
Towards Linguistically Grounded Ontologies Paul Buitelaar, Philipp Cimiano, Peter Haase, and Michael Sintek Proceedings of the 6 th European Semantic Web.
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
Multi-lingual Semantic Annotation: Theory and Applications June 26 and 27, 2006 Saarbrücken.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Regional Collaboration between Companies and the University Belfast 2 April 2007 Toril Eikaas Eide, Centre for Continuing Education (SEVU)
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
Digital University of Pisa Alessandro Lenci CoLing Lab – Laboratorio di Linguistica Computazionale Università di Pisa Aix-Marseille Université.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Annual Review, Brussels March XX, 2006 SemanticMining No Annual Review NoE No Semantic Interoperability and Data Mining in Biomedicine WP20.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
WORD FORMATION AND CHANGE WORD CHANGE THE STUDY OF WORDS.
استخراج بی‌ناظر ظرفیت فعل در زبان فارسی بر مبنای دستور وابستگی
Information Retrieval in Practice
A tool for automated extraction of multi-word expressions
Statistical NLP: Lecture 3

Verb Activation through Priming at the Syntax-Semantics Interface
Infrastructrural Language Resources and International Cooperation
Requirements Document
Programming Languages, Preliminaries, History & Evolution
Presentation transcript:

XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation Nancy Ide Department of Computer Science Vassar College

XMELLT Participants zDepartment of Computer Science, Vassar College zInternational Computer Science Institute, University of California, Berkeley zDepartment of Computer Science, New York University zComputing Research Laboratory, New Mexico State University

XMELLT Framework zPlanning project yone-year time frame zOriginally submitted as a joint NSF-EU project with additional European partners yIstituto di Linguistica Computazionale, CNR, Pisa yInstitut für Maschinelle Sprachverarbeitung, Stuttgart yLexiQuest, Paris

XMELLT Overall goal zdefine a core international infrastructure to support the creation of a multi-lingual multi-word expression lexicon incorporating both morpho-syntactic and semantic information

XMELLT Specific aims zdetermine the type and dimensions of information to serve the needs of critical NLP applications zspecify an overall architecture for a joint software and lingware development project

XMELLT Aims...  Explore the possibilities for recognizing and acquiring multi-word lexical units from corpora by means of partial parsing, statistics, etc. zOutline a collaborative project to acquire and represent multi-word lexical entries for multiple languages

XMELLT Motivation  Multi-word constructions are extremely frequent in language y~30%of the lexical stock zExisting resources do not adequately treat multi-word expressions

XMELLT Limitations zconstructed for particular system or application yincorporate tailored information (e.g., primarily syntax with little semantics) ynot reusable zmost devoted to a single language and/or approach

XMELLT Limitations... znot flexible, expandable to multiple languages yMT systems' lexicons are typically little more than "translation memories" yNo interface among single-word entries, multi-word entries, syntax, and semantics

XMELLT XMELLT Approach zBroad view of multi-word expressions yidioms, compounds, collocations, co-occurrence patterns zfocus on linking of individual language lexicons yindividual words and multi-word expressions y different types of multi-word expressions xe.g., English noun-noun vs Romance noun-PP

XMELLT Considerations zinternal variation zsub-categorization properties zidiosyncratic constraints on inflection zmeaning (non-)compositionality

XMELLT Encoding Model zCompatible and integrated with existing and de facto standards ye.g., EAGLES, PAROLE/SIMPLE, NOMLEX

XMELLT Activities zAssessment of existing lexical resources for multi-word expressions yDelivery of survey

XMELLT Activities... zCreation of a small set of sample entries yadd lexical information on support verb constructions to 50 nouns drawn from NOMLEX for English, Italian, German, and French ycreate lexical entries for 50 N-N English constructs from the PAROLE/SIMPLE lexicons and corresponding constructs in Italian, German, and French

XMELLT Activities... zDevelop preliminary specifications for structuring and encoding multi-lingual, multi-word expression lexicons yrequired linguistic information yharmonized data architecture and encoding format

XMELLT Activities... zExploration of techniques for automatic acquisition yMonths 1-6: Survey of acquisition techniques, typology of MWE yMonths 7-12: Design of architecture for MWE acquisition

XMELLT Project information zStart date: June (?) zWeb site: zContact: Nancy Ide (PI) Department of Computer Science Vassar College