Presentation is loading. Please wait.

Presentation is loading. Please wait.

REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000.

Similar presentations


Presentation on theme: "REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000."— Presentation transcript:

1 REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000

2 The Multilingual ISLE Lexical Entry (MILE) General methodological principles (from EAGLES): MILE: 1.Basic requirements for the MILE:  Modular and layered  Granular underspecification  Allow for underspecification basic notions  ISLE should discover and list (the maximal set of) basic notions to be included in the MILE edited union redundancy  The leading principle for the design of the MILE should be the edited union of existing lexicons / models (redundancy should not be a problem)

3 MILE 3.Objectivedefinition of MILE, 3.Objective: definition of MILE, its basic notions, its basic notions,architecture, 3. such that we can write a DTD è & have a tool to support it methodology of work Ø discover a methodology of work towards this

4 Some advantages: 3Flexibility of representation 3Easy to customise and update 3Easy integration of existing resources 3High versatility towards different applications Modularity Modularity at least under three respects: macrostructuregeneral architecture 2 in the macrostructure and general architecture of the MILE microstructure 2 in the microstructure of the MILE word- sense 2 in the specific microstructure of the MILE word- sense Modularity in MILE

5 macrostructure and general architecture A.Modularity in the macrostructure and general architecture of the MILE 1.Meta-information 1.Meta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX) 2.Possible architecture(s) of multilingual lexicon(s) 2.Possible architecture(s) of multilingual lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer- based, interlingua-based approaches, and hybrid solutions.

6 Modularity in MILE microstructure B.Modularity in the microstructure of the MILE – The MILE could be organized in at least the following modules: 1.Monolingual linguistic representation 2.Collocational information 3.Multilingual apparatus (e.g. transfer conditions and actions)

7 Monolingual Linguistic Representation It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language. It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE- SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.

8 Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Morphological layer Grammatical category and subcategory Gender, number, person, mood Inflectional class Modifications of the lemma Mass/count, 'pluralia tantum' …

9 Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Syntactic layer Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.) Attributive vs. predicative function, gradability List of syntactic positions forming subcategorization frames Syntactic constraints and properties of the possible 'slot filler' Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements) Information on control (subject control, object control, etc.) and raising properties …

10 Monolingual Linguistic Representation: Monolingual Linguistic Representation: a Provisional List a Provisional List Semantic layer Characterization of senses through links to an Ontology Domain information, gloss Argument structure, semantic roles, selectional preferences Event type for verbs, to characterize their actionality behaviour Link to the syntactic realization of the arguments Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.) Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure) Information about regular polisemous alternation Information concerning cross-part of speech relations ….

11 Collocational Information More or less typical and/or fixed syntactic-semantic patterns Typical or idiosyncratic syntactic constructions Typical collocates Support verb construction Phraseological or multiwords constructions Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.) Corpus-driven examples of MILE …

12 Multilingual Apparatus Transfer conditions and actions possible starting points: OLIF, GENELEX, etc. devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated) identify which conditions must be expressible and which transformation actions are necessary select which types of information these conditions must access examine the variability in granularity needed when translating in different languages, and the architectural implications of this which role for an Interlingua?

13 Modularity in MILE microstructure of word-sense C.Modularity in the specific microstructure of the MILE word-sense 4 Word-senses are the basic units at the multilingual level 4 Senses should also have a modular structure 2Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1) 2 Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)

14 MILE A. MILE Macrostructure Meta-information Architecture B. MILE Microstructure 1. Monolingual 2. Collocational 3. Multilingual C. Word-Sense Microstructure 1. Coarse-grained 2. Fine-grained

15 Monolingual Linguistic Representation A strategy: edited unionconsider as the starting point for MILE the edited union of the basic notions represented in the existing syntactic/semantic lexicons (their models) EAGLESevaluate their notions wrt EAGLES recommendations for syntax and semantics usefulness & adequacyevaluate their usefulness & adequacy for multilingual tasks integrabilityevaluate integrability of their notions in a unitary MILE deficient areaslook for deficient areas. To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?

16 Open issues: what is relevant what can be generalised and formally characterised what must be simply listed (but even lists may be partially categorised) what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.) Collocational Information

17 Agreed Principles 4MILE 4MILE incorporates previous recommendations: is the “complete” entry (to be evaluated wrt usefulness & adequacy for multilingual tasks) 4MILE builds on the monolingual entry & expands it (at least) with an additional module where correspondences betw. languages are defined categories of applications We consider 2 broad categories of applications 2translation 2CLIR (linking module may be simpler) (label info types wrt application)

18 / Clues in dictionaries to decide on target equivalent / Guidelines for lexicographers / Clues (to disambiguate/translate) in corpus concordances / Lexical requirements from various types of transfer conditions and actions in MT systems /Lexical requirements from interlingua-based systems Examined guidelines for bilingual dictionaries provided by SA Paths to discover Basic Notions of MILE

19 notions For all the notions: in previous work 4 notion already in previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…) 2 evaluate if the existing specs are adequate not yet recommended/adopted 4 draw a list of “not yet recommended/adopted” notions: 2 method of work 2 priorities 2 for which applications 2 assign tasks 2 need of further development Classification of Basic Notions of MILE

20 Organisational Proposal EAGLES 2Start from available EAGLES recommendations, e.g. as instantiated in Parole/Simple P/SDTD, 2adopt as starting point the P/S DTD, to be revised & augmented 4see Barcelona tool *Evaluate if we can combine “hybrid super-model” in a “hybrid super-model” the transfer & interlingua approaches

21 critical information types 1.Select a list of critical information types that will compose each module of the MILE in-depth analysis of each 2.Start an in-depth analysis of each of these areas aiming at identifying: 2The most stable solutions adopted in the community 2Linguistic specifications and criteria 2Possible representational solutions, their compatibility, etc. 2An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations) 2Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet) 2….. Organisational Proposal The tasks should lead to:

22 Information Types 1. How to represent it (e.g. frames, a selection of theta-roles, e.g.) 2. Typology of arguments 3. Representational problems 4. Applicative constraints and needs 5. Linking with syntax (how to express it) 6. Open issues Argument structure 1. Typology (e.g. hyponymy, meronymy, etc.) 2. Available tests 3. Representational format(s) 4. Applicative constraints and needs 5. Expressive limits 6. Open issues Semantic relations

23 1. Types of modifiers 2. Representational issues 3. Open issues Modification relations 1. Typology 2. How to represent the “internal” structure of MWEs (e.g. Mel’cuk relations, etc.) 3. Encoding criteria 4. Application needs and biases 5. Open issues MultiWords Expressions 1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.) 2. Different status of the preferences 3. Criteria to identify them 4. Expressive limits of existing formal resources Selectional preferences Information Types

24 1. Identification of categories of transfer phenomena 2. Ranking of hard cases 3. Possible parameterisation wrt language types 4. How to formalise them 5. Types of actions Transfer conditions and actions 1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.) 2. Inheritance 3. Which roles for ontologies in the MILE 4. Representational issues 5. Customisation and development criteria 6. LimitsOntology 1. Typology 2. How to represent them 3. Interaction with selectional preferences Collocational Patterns Information Types

25 Organisational Proposal Highlighted some hot issues & assigned tasks: 2sense indicators (Issco) 2selection preferences (Thurmair) 2argument structure (US?….) 2MWE (Pisa) 2modifiers (Jock) 2semantic relations (Piek?) 2transfer conditions (…) 2collocational patterns (…) 2ontology (…) 2….

26 Organisational Proposal Americans Ask to Americans, e.g.: 4evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,… 4analyse some of the above info types 4look at other languages (Japanese, Chinese, Korean, …) for transfer conditions 4look at transfer-based MT systems 4look at interlingua MT systems (e.g. Mikrokosmos): additional info types? 4… Meeting US & EU Meeting together US & EU, e.g. end February, beg. March?

27 DIET Tool From ISSCO: 4for text annotation (of test suites for semantic annotation) 4to be used for evaluation purposes 4…. 4… 4...

28 Survey: List of Received Materials Comparison tableLinguistic phenomena Collins, Hachette- Oxford Yes Van Dale LexiconsYesNo FrameNetYesNo Collins-Robert lexical-semantic db YesNo PAROLE-SimpleYes EuroWordNetYes EurotraYes OLIFNo GenelexNo EDRNo

29 Others Surveys Expected US Surveys from US? Microsoft IBM CMU NMSU ISI Systran Logos


Download ppt "REPORT on Computational Lexicon Working Group on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000."

Similar presentations


Ads by Google