REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Using OLIF, The Open Lexicon Interchange Format Susan McCormick OLIF2 Consortium October 1, 2004.
Building Wordnets Piek Vossen, Irion Technologies.
A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
CODE/ CODE SWITCHING.
© NCSR, Paris, December 5-6, 2002 WP1: Plan for the remainder (1) Ontology Ontology  Enrich the lexicons for the 1 st domain based on partners remarks.
Chapter 4 Syntax.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Statistical NLP: Lecture 3
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
CZECH STATISTICAL OFFICE | Na padesatem 81, Prague 10 | Jitka Prokop, Czech Statistical Office SMS-QUALITY The project and application.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
XMELLT Cross-lingual Multi-word Expression Lexicons for Language Technology Multilingual Information Access and Management International Research Co-operation.
Artificial Intelligence 2005/06 From Syntax to Semantics.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Business Domain Modelling Principles Theory and Practice HYPERCUBE Ltd 7 CURTAIN RD, LONDON EC2A 3LT Mike Bennett, Hypercube Ltd.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
Intuitive Coding of the Arabic Lexicon Ali Farghaly & Jean Senellart SYSTRAN Software Corporation San Diego, CA & Soisy, France.
Barcelona Meeting 21/06/05 MM 1 LIRICS WP2 LIRICS WP2 NLP LEXICA Task Leader: ILC-CNR (Pisa) presented by: Monica Monachini.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
9 th Open Forum on Metadata Registries Harmonization of Terminology, Ontology and Metadata 20th – 22nd March, 2006, Kobe Japan. Commonalities and Differences.
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
LIRICS mid-term review 1 LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck DFKI-LT - Saarbrücken 23rd May 2006.
Chapter 1: By: Ms. Ola Al-arjani
1 Define a model 2 Populate the lexicon. Core Model.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
LIRICS Mid-term Review 1 LIRICS WP2 – NLP Lexica Monica Monachini CNR-ILC - Pisa 23rd May 2006.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
ISLE: International Standards for Language Engineering A European/US joint project Martha Palmer University of Pennsylvania Tides Kickoff March 22, 2000.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
The Descriptive Grammar as a (Meta)Database Jeff Good University of Pittsburgh and Max Planck Institute for Evolutionary Anthropology.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Linguistic Essentials
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
1 Analysis of Lexical Structures from Field Linguistics and Language Engineering Wim Peters - University of Sheffield Sebastian Drude - University of Berlin.
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Group 2: Sino-Tibetan Languages Working Group II: Sino-Tibetan Languages Session Report July 2, 2005.
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany.
SemAF – Basics: Semantic annotation framework Harry Bunt Tilburg University isa -6 Joint ISO - ACL/SIGSEM workshop Oxford, January 2011 TC 37/SC.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
Web Service Exchange Protocols Preliminary Proposal ISO TC37 SC4 WG1 2 September 2013 Pisa, Italy.
16 April 2011 Alan, Edison, etc, Saturday.. Knowledge, Planning and Robotics 1.Knowledge 2.Types of knowledge 3.Representation of knowledge 4.Planning.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
The theory of word classes in modern grammar studies
Approaches to Machine Translation
Statistical NLP: Lecture 3
Ontology Engineering: from Cognitive Science to the Semantic Web
Approaches to Machine Translation
Linguistic Essentials
Presentation transcript:

REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December

The Multilingual ISLE Lexical Entry (MILE) General methodological principles (from EAGLES): MILE: 1.Basic requirements for the MILE:  Modular and layered  Granular underspecification  Allow for underspecification basic notions  ISLE should discover and list (the maximal set of) basic notions to be included in the MILE edited union redundancy  The leading principle for the design of the MILE should be the edited union of existing lexicons / models (redundancy should not be a problem)

MILE 3.Objectivedefinition of MILE, 3.Objective: definition of MILE, its basic notions, its basic notions,architecture, 3. such that we can write a DTD è & have a tool to support it methodology of work Ø discover a methodology of work towards this

Some advantages: 3Flexibility of representation 3Easy to customise and update 3Easy integration of existing resources 3High versatility towards different applications Modularity Modularity at least under three respects: macrostructuregeneral architecture 2 in the macrostructure and general architecture of the MILE microstructure 2 in the microstructure of the MILE word- sense 2 in the specific microstructure of the MILE word- sense Modularity in MILE

macrostructure and general architecture A.Modularity in the macrostructure and general architecture of the MILE 1.Meta-information 1.Meta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX) 2.Possible architecture(s) of multilingual lexicon(s) 2.Possible architecture(s) of multilingual lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer- based, interlingua-based approaches, and hybrid solutions.

Modularity in MILE microstructure B.Modularity in the microstructure of the MILE – The MILE could be organized in at least the following modules: 1.Monolingual linguistic representation 2.Collocational information 3.Multilingual apparatus (e.g. transfer conditions and actions)

Monolingual Linguistic Representation It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language. It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE- SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.

Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Morphological layer Grammatical category and subcategory Gender, number, person, mood Inflectional class Modifications of the lemma Mass/count, 'pluralia tantum' …

Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Syntactic layer Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.) Attributive vs. predicative function, gradability List of syntactic positions forming subcategorization frames Syntactic constraints and properties of the possible 'slot filler' Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements) Information on control (subject control, object control, etc.) and raising properties …

Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Semantic layer Characterization of senses through links to an Ontology Domain information, gloss Argument structure, semantic roles, selectional preferences Event type for verbs, to characterize their actionality behaviour Link to the syntactic realization of the arguments Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.) Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure) Information about regular polisemous alternation Information concerning cross-part of speech relations ….

Collocational Information More or less typical and/or fixed syntactic-semantic patterns Typical or idiosyncratic syntactic constructions Typical collocates Support verb construction Phraseological or multiwords constructions Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.) Corpus-driven examples of MILE …

Multilingual Apparatus Transfer conditions and actions possible starting points: OLIF, GENELEX, etc. devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated) identify which conditions must be expressible and which transformation actions are necessary select which types of information these conditions must access examine the variability in granularity needed when translating in different languages, and the architectural implications of this which role for an Interlingua?

Modularity in MILE microstructure of word-sense C.Modularity in the specific microstructure of the MILE word-sense 4 Word-senses are the basic units at the multilingual level 4 Senses should also have a modular structure 2Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1) 2 Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)

MILE A. MILE Macrostructure Meta-information Architecture B. MILE Microstructure 1. Monolingual 2. Collocational 3. Multilingual C. Word-Sense Microstructure 1. Coarse-grained 2. Fine-grained

Monolingual Linguistic Representation A strategy: edited unionconsider as the starting point for MILE the edited union of the basic notions represented in the existing syntactic/semantic lexicons (their models) EAGLESevaluate their notions wrt EAGLES recommendations for syntax and semantics usefulness & adequacyevaluate their usefulness & adequacy for multilingual tasks integrabilityevaluate integrability of their notions in a unitary MILE deficient areaslook for deficient areas. To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?

Open issues: what is relevant what can be generalised and formally characterised what must be simply listed (but even lists may be partially categorised) what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.) Collocational Information

Agreed Principles 4MILE 4MILE incorporates previous recommendations: is the “complete” entry (to be evaluated wrt usefulness & adequacy for multilingual tasks) 4MILE builds on the monolingual entry & expands it (at least) with an additional module where correspondences betw. languages are defined categories of applications We consider 2 broad categories of applications 2translation 2CLIR (linking module may be simpler) (label info types wrt application)

/ Clues in dictionaries to decide on target equivalent / Guidelines for lexicographers / Clues (to disambiguate/translate) in corpus concordances / Lexical requirements from various types of transfer conditions and actions in MT systems /Lexical requirements from interlingua-based systems Examined guidelines for bilingual dictionaries provided by SA Paths to discover Basic Notions of MILE

notions For all the notions: in previous work 4 notion already in previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…) 2 evaluate if the existing specs are adequate not yet recommended/adopted 4 draw a list of “not yet recommended/adopted” notions: 2 method of work 2 priorities 2 for which applications 2 assign tasks 2 need of further development Classification of Basic Notions of MILE

Organisational Proposal EAGLES 2Start from available EAGLES recommendations, e.g. as instantiated in Parole/Simple P/SDTD, 2adopt as starting point the P/S DTD, to be revised & augmented 4see Barcelona tool *Evaluate if we can combine “hybrid super-model” in a “hybrid super-model” the transfer & interlingua approaches

critical information types 1.Select a list of critical information types that will compose each module of the MILE in-depth analysis of each 2.Start an in-depth analysis of each of these areas aiming at identifying: 2The most stable solutions adopted in the community 2Linguistic specifications and criteria 2Possible representational solutions, their compatibility, etc. 2An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations) 2Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet) 2….. Organisational Proposal The tasks should lead to:

Information Types 1. How to represent it (e.g. frames, a selection of theta-roles, e.g.) 2. Typology of arguments 3. Representational problems 4. Applicative constraints and needs 5. Linking with syntax (how to express it) 6. Open issues Argument structure 1. Typology (e.g. hyponymy, meronymy, etc.) 2. Available tests 3. Representational format(s) 4. Applicative constraints and needs 5. Expressive limits 6. Open issues Semantic relations

1. Types of modifiers 2. Representational issues 3. Open issues Modification relations 1. Typology 2. How to represent the “internal” structure of MWEs (e.g. Mel’cuk relations, etc.) 3. Encoding criteria 4. Application needs and biases 5. Open issues MultiWords Expressions 1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.) 2. Different status of the preferences 3. Criteria to identify them 4. Expressive limits of existing formal resources Selectional preferences Information Types

1. Identification of categories of transfer phenomena 2. Ranking of hard cases 3. Possible parameterisation wrt language types 4. How to formalise them 5. Types of actions Transfer conditions and actions 1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.) 2. Inheritance 3. Which roles for ontologies in the MILE 4. Representational issues 5. Customisation and development criteria 6. LimitsOntology 1. Typology 2. How to represent them 3. Interaction with selectional preferences Collocational Patterns Information Types

Organisational Proposal Highlighted some hot issues & assigned tasks: 2sense indicators (Issco) 2selection preferences (Thurmair) 2argument structure (US?….) 2MWE (Pisa) 2modifiers (Jock) 2semantic relations (Piek?) 2transfer conditions (…) 2collocational patterns (…) 2ontology (…) 2….

Organisational Proposal Americans Ask to Americans, e.g.: 4evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,… 4analyse some of the above info types 4look at other languages (Japanese, Chinese, Korean, …) for transfer conditions 4look at transfer-based MT systems 4look at interlingua MT systems (e.g. Mikrokosmos): additional info types? 4… Meeting US & EU Meeting together US & EU, e.g. end February, beg. March?

DIET Tool From ISSCO: 4for text annotation (of test suites for semantic annotation) 4to be used for evaluation purposes 4…. 4… 4...

Survey: List of Received Materials Comparison tableLinguistic phenomena Collins, Hachette- Oxford Yes Van Dale LexiconsYesNo FrameNetYesNo Collins-Robert lexical-semantic db YesNo PAROLE-SimpleYes EuroWordNetYes EurotraYes OLIFNo GenelexNo EDRNo

Others Surveys Expected US Surveys from US? Microsoft IBM CMU NMSU ISI Systran Logos