Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pisa, September 2004 Infrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto.

Similar presentations


Presentation on theme: "Pisa, September 2004 Infrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto."— Presentation transcript:

1 Pisa, September 2004 Infrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.itInfrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it

2 Pisa, September 2004 The ENABLER Mission Language Resources (LRs) & Evaluation: central component of the “linguistic infrastructure” LRs supported by national funding in National Projects Availability of LRs also a “sensitive” issue, touching the sphere of linguistic and cultural identity, but also with economical and political implications The ENABLER Network of National initiatives, aims at “enabling” the realisation of a cooperative framework The ENABLER Network of National initiatives, aims at “enabling” the realisation of a cooperative framework formulate a common agenda of medium- & long-term research priorities contribute to the definition of an overall framework for the provision of LRs

3 Pisa, September 2004 towards …. Only Combining the strengths of different initiatives & communities Exploiting at best the ‘modus operandi’ of the national funding authorities in different national situations Responding to/anticipating needs and priorities of R&D & industrial communities Promoting the adoption of [de facto] standards, best practices With a clear distinction of tasks & roles for different actors We can produce the synergies, economy of scale, convergence & critical mass necessary to provide the infrastructural LRs needed to realise the full potential of a multilingual global information society

4 Pisa, September 2004 Lexicon and Corpus: a multi-faceted interaction  L  Ctagging  C  Lfrequencies (of different linguistic “objects”)  C  Lproper nouns, acronyms, …  L  Cparsing, chunking, …  C  Ltraining of parsers  C  Llexicon updating  C  L“collocational” data (MWE, idioms, gram. patterns...)  C  L“nuances” of meanings & semantic clustering  C  L acquisition of lexical (syntactic/semantic) knowledge  L  Csemantic tagging/word-sense disambiguation   (e.g. in Senseval)  C  Lmore semantic information on LE  C  Lcorpus based computational lexicography  C  Lvalidation of lexical models  C  L…  L  C...

5 Pisa, September 2004...Language as a “Continuum” Interesting - and intriguing - aspects of corpus use:  impossibility of descriptions based on a clear-cut boundary betw. what is admitted and what is not  in actual usage, language displays a large number of properties behaving as a continuum, and not as properties of “yes/no” type  the same is true for the so-called “rules”, where we find more a “tendency” towards rules than precise rules in corpus evidence  difficult to constrain word meaning within a rigorously defined organisation: by its very nature it tends to evade any strict boundary BUT BUT Lexicon & Corpus as two viewpoints on the same ling. object …. even more in a multilingual context

6 Pisa, September 2004 Extraction from texts vs. formal representation in lexicons  It is difficult to constrain word meaning within a rigorously defined organisation: by its very nature it tends to evade any strict boundary  The rigour and lack of flexibility of formal representation languages causes difficulties when mapping into it NL word meaning, ambiguous and flexible by its own nature  No clear-cut boundary when analysing many phenomena: it’s more a continuum  The same impression if one looks at examples of types of alternations: no clear-cut classes across languages no clear-cut classes across languages or within one language or within one language

7 Pisa, September 2004 Correlation between different levels of linguistic description in the design of a lexical entry To understand word-meaning:  Focus on the correlation between syntactic and semantic aspects  But other linguistic levels - such as morphology, morphosyntax, lexical cooccurrence, collocational data, etc. - are closely interrelated/involved  These relations must be captured when accounting for meaning discrimination  The complexity of these interrelationships makes semantic disambiguation such a hard task in NLP  Textual corpora as a device to discover and reveal the intricacy of these relationships  Frame/SIMPLE semantics as a device to unravel and disentangle the complex situation into elementary and computationally manageable pieces

8 Pisa, September 2004 towards Corpus based Semantic Lexicons … at least in principle  both in the design of the model, &  in the building of the lexicon (at least partially) with (semi-)automatic means with (semi-)automatic means Design of the lexical entry with a combined approach: Design of the lexical entry with a combined approach:  theoretical: e.g. Fillmore Frame Semantics/ Pustejovsky Generative Lexicon, … Pustejovsky Generative Lexicon, …  empirical: Corpus evidence o even if: not always there are sound and explicit criteria for classification according to “frame elements”/qualia relations/...

9 Pisa, September 2004 But … they will never be “complete”  Semantic networks: Euro-/ItalWordNet  Lexicons: PAROLE/SIMPLE/CLIPS  TreeBanks Infrastructure of Language Resources...  Lexical acquisitionsystems from corpora  Lexical acquisition systems (syntactic & semantic) from corpora  Infrastructure of tools morphosyntactic & syntactic analysersRobust morphosyntactic & syntactic analysers Word-sensedisambiguation systemsWord-sense disambiguation systems Sense classifiersSense classifiers.........static …dynamic InternationalStandards

10 Pisa, September 2004ItalWordNet Semantic Network EuroWordNet [Italian module of EuroWordNet] 50.000synonym groupssynsets hierarchies130.000 ~ 50.000 lemmas organized in synonym groups (synsets), structured in hierarchies & linked by ~ 130.000 semantic relations ~ ~ 50.000 hyperonymy/hyponymy relations ~ 16.000 relations among different POS (role, cause, derivation, etc..) ~ 2.000 part-whole relations ~ 1.500 antonymy relations, …etc. linked to the InterLingual IndexSynsets linked to the InterLingual Index (ILI=Princeton WordNet), ILIEuropean WordNetsThrough the ILI link to all the European WordNets (de-facto standard) Top Ontology & to the common Top Ontology plug-in withdomain terminological lexiconsPossibility of plug-in with domain terminological lexicons (legal, maritime) Usable in IR, CLIR, IE, QA,...

11 Pisa, September 2004 EuroWordNet Multilingual Data Structure

12 Pisa, September 2004 {Casa, abitazione, dimora } Hyperonym : {edificio,..} Hyponym: {villetta } {catapecchia, bicocca,.. } {cottage} {bungalow } Role_location: {stare, abitare,...} Role_target_direction: {rincasare} Role_patient: {affitto, locazione} Mero_part: {vestibolo} {stanza} Holo_part : {casale} {frazione} {caseggiato} home, domicile,.. house TOP Concepts TOP Concepts : Object,Artifact,Building Synsets linked by Semantic Relations in ItalWordNet

13 Pisa, September 2004 Jur-WordNet With ITTG-CNR (Istituto di Teoria e Tecniche dell’informazione Giuridica) Jur-WordNet  Extension for the juridical domain of ItalWordNet Jur-WordNet  Extension for the juridical domain of ItalWordNet Knowledge base for multilingual access to sources of legal information Knowledge base for multilingual access to sources of legal information Source of metadata for semantic mark-up of legal texts Source of metadata for semantic mark-up of legal texts To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc. To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.

14 Pisa, September 2004 Terminological Lexicon of Navigation & Sea Transportation  Nolo Synsets  1.614 Lemmas  2.116 Senses  2.232 Nouns  1.621 Verbs  205 Adjectives  35 Proper Nouns  236

15 Pisa, September 2004 PAROLE Ital. Synt. Lex. ’96-’98PAROLE Ital. Synt. Lex. ’96-’98 SIMPLE Ital. Sem. Lex. ’98-2000 CLIPS2000-2004 morphology: 20,000 entries syntax: 20,000 words semantics: 10,000 senses semantics: 10,000 senses phonology morphology 55,000 words syntax semantics: 55,000 senses SGMLSGML XML PAROLECorpus PAROLE Corpus PAROLE/SIMPLE 12 harmonised computational lexicons http://www.ilc.cnr.it/clips/

16 Pisa, September 2004 machine language learning

17 Pisa, September 2004 machine language learning development of conceptual networks linguistic learning adaptive classification systems information extraction bootstrapping of grammars linguistic change models linguistic change models language usage models bootstrapping of lexical information

18 Pisa, September 2004 structuredknowledge lexic a unstructured text data annotation tools annotated data machine learning for linguistic knowledge acquisition lexica cross-lingual information retrieval multi-lingual information extraction multi-lingual text mining user need s lexicon model Architecture for linguistic knowledge acquisition... LKG …. towards “dynamic” lexicons, able to auto-enrich terminology

19 Pisa, September 2004 Harmonisation: More & more Need of a Global View for Global Interoperability Integration/sharing of data & software/tools  Need of compatibility among various components  An “exemplary cycle”: FormalismsGrammars Software: Taggers, Chunkers, Parsers, … Representation Annotation Representation Annotation Lexicon Corpora Lexicon Corpora Terminology TerminologySoftware: Acquisition Systems I/O Interfaces Languages

20 Pisa, September 2004 A short guide to ISLE/EAGLES http://www.ilc.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm Multilingual Computational Lexicon Working Group

21 Pisa, September 2004 Target: … the Multilingual ISLE Lexical Entry (MILE)  General methodological principles (from EAGLES):  high granularity: factor out the (maximal) set of primitive units of lexical info (basic notions) with the highest degree of inter- theoretical agreement  modular and layered: various degrees of specification possible  explicit representation of info  allow for underspecification (& hierarchical structure)  leading principle: edited union of existing lexicons/models ( redundancy is not a problem)  open to different paradigms of multilinguality  oriented to the creation of large-scale & distributed lexicons

22 Pisa, September 2004 Paths to Discover the Basic Notions of MILE clues in dictionaries to decide on target equivalent clues in dictionaries to decide on target equivalent guidelines for lexicographers guidelines for lexicographers clues (to disambiguate/translate) in corpus concordances clues (to disambiguate/translate) in corpus concordances lexical requirements from various types of transfer conditions & actions in MT systems lexical requirements from various types of transfer conditions & actions in MT systems lexical requirements from interlingua-based systems lexical requirements from interlingua-based systems … critical information types a list of critical information types that will compose each module of the MILE

23 Pisa, September 2004 Designing MILE Steps towards MILE:  Creating entries (Bertagna, Reeves, Bouillon)  Identifying the MILE Basic Notions (Bertagna,Monachini,Atkins,Bouillon)  Defining the MILE Lexical Model (Lenci, Calzolari, etc.)  Formalising MILE (Ide)  Development of the ISLE Lexical Tool (Bel)  ISLE & spoken language & multimodality (Gibbon)  Metadata for the lexicon (Peters, Wittenburg)  A case-study: MWEs in MILE (Quochi, lenci, Calzolari) MILE Basic Notions  the MILE Basic Notions MILE Lexical Model  the MILE Lexical Model

24 Pisa, September 2004 The MILE Basic Notions (the EAGLES/ISLE CLWG) Basic lexical dimensions & info-types relevant to establish multilingual links Basic lexical dimensions & info-types relevant to establish multilingual links Typology of lexical multilingual correspondences (relevant conditions & actions) Typology of lexical multilingual correspondences (relevant conditions & actions) Identified by: creating sample multilingual lexical entries (Bertagna, Reeves) creating sample multilingual lexical entries (Bertagna, Reeves) investigating the use of sense indicators in traditional bilingual dictionaries (Atkins, Bouillon) investigating the use of sense indicators in traditional bilingual dictionaries (Atkins, Bouillon) …. ….

25 Pisa, September 2004 The MILE Lexical Classes – Data Categories for Content Interoperability The MILE Lexical Classes – Data Categories for Content Interoperability Francesca Bertagna*, Alessandro Lenci°, Monica Monachini*, Nicoletta Calzolari* *ILC–CNR – Pisa °Pisa University

26 Pisa, September 2004 Overview 1.MILE Lexical Model with Lexical Objects and Data Categories 2.Mapping of existing lexicons onto MILE 3.RDF schema and DC Registry for some pre- instantiated lexical objects together with a sample entry from the PAROLE-SIMPLE lexicons in MILE 4.Future …

27 Pisa, September 2004 GENELEX Model GENELEX Model PAROLE-SIMPLE Lexicons PAROLE-SIMPLE Lexicons Multilingual Lexicons (EuroWordNet, etc.) Multilingual Lexicons (EuroWordNet, etc.) MILE Lexical Model The MILE Lexical Model Guideline s syntactic semantic lexicons Computational Lexicon Working Group … where after?

28 Pisa, September 2004 The MILE Main Features A general architecture devised as a common representational layer for multilingual Computational Lexicons A general architecture devised as a common representational layer for multilingual Computational Lexicons both for hand-coded and corpus-driven lexical data both for hand-coded and corpus-driven lexical data Key features: Modularity Modularity Granularity Granularity Extensibility and “openess” - User-adaptability Extensibility and “openess” - User-adaptability Resource Sharing Resource Sharing Content Interoperability Content Interoperability Reusability Reusability Semantic Web technologies & standards applied at Lexicon modelling

29 Pisa, September 2004 The MILE Lexical Model (MLM) The MLM core is the Multilingual ISLE Lexical Entry (MILE) The MLM core is the Multilingual ISLE Lexical Entry (MILE) a general schema for multilingual lexical resources a general schema for multilingual lexical resources a lexical meta-entry as a common representational layer for multilingual lexicons a lexical meta-entry as a common representational layer for multilingual lexicons Computational lexicons can be viewed as different instances of the MILE schema Computational lexicons can be viewed as different instances of the MILE schema MILE Lexical Model lexicon#1 lexicon#3 lexicon#2

30 Pisa, September 2004 MILE the building-block model The MILE architecture is designed according to the building-block model: The MILE architecture is designed according to the building-block model: Lexical entries are obtained by combining various types of lexical objects (atomic and complex) Lexical entries are obtained by combining various types of lexical objects (atomic and complex) Users design their lexicon by: Users design their lexicon by: selecting and/or specifying the relevant lexical objects selecting and/or specifying the relevant lexical objects combine the lexical objects into lexical entries combine the lexical objects into lexical entries Lexical objects may be shared: Lexical objects may be shared: within the same lexicon (intra-lexicon reusability) within the same lexicon (intra-lexicon reusability) among different lexicons (inter-lexicon reusability) among different lexicons (inter-lexicon reusability)

31 Pisa, September 2004 syntactic frame phrase slot Syn feature Lexical Objects Sem feature MILE the building-block model Lexical entry 1 Lexical entry 2 Lexical entry 3

32 Pisa, September 2004 morphological layer syntactic layer semantic layer linking conditions mono-MILE Modularity in MILE multi-MILE multilingual correspondence conditions mono-Mile multiple levels of modularity Horizontal organization, where independent, but interlinked, modules allow to express different dimensions of lexical entries

33 Pisa, September 2004 The Mono-MILE Each monolingual layer within Mono-MILE identifies a basic unit of lexical description Each monolingual layer within Mono-MILE identifies a basic unit of lexical description morphological layer MU basic unit to describe the inflectional and derivational morphological properties of the word syntactic layer SynU basic unit to describe the syntactic behaviour of the MU semantic layer SemU basic unit to describe the semantic properties of the MU

34 Pisa, September 2004 The Mono-MILE MU SynU SemU Within each layer, a basic linguistic information unit is identified

35 Pisa, September 2004 Granularity in MILE Concerns the vertical dimension. Within a given lexical layer, varying degrees of depth of lexical descriptions are allowed, both shallow and deep lexical representations Concerns the vertical dimension. Within a given lexical layer, varying degrees of depth of lexical descriptions are allowed, both shallow and deep lexical representations

36 Pisa, September 2004 Defining the MLM The MLM is designed as an E-R model (MILE Entry Schema) The MLM is designed as an E-R model (MILE Entry Schema) defines the lexical objects and the ways they can be combined into a lexical entry defines the lexical objects and the ways they can be combined into a lexical entry The MLM includes 3 types of lexical objects: The MLM includes 3 types of lexical objects: MILE Lexical Classes (MLC) MILE Lexical Classes (MLC) MILE Lexical Data Categories (MDC) MILE Lexical Data Categories (MDC) MILE Lexical Operations (MLO) MILE Lexical Operations (MLO)

37 Pisa, September 2004 The MILE Lexical Objects Within each layer, basic lexical notions are represented by lexical objects: Within each layer, basic lexical notions are represented by lexical objects: MILE Lexical Classes MLC MILE Lexical Classes MLC MILE Data Categories MDC MILE Data Categories MDC Lexical operations Lexical operations They are an ontology of lexical objects as an abstraction over different lexical models and architectures They are an ontology of lexical objects as an abstraction over different lexical models and architectures

38 Pisa, September 2004 The MILE E/R diagrams The lexical objects are described with E-R diagrams which define them and the ways they can be combined into a lexical entry The lexical objects are described with E-R diagrams which define them and the ways they can be combined into a lexical entry

39 Pisa, September 2004 MILE Lexical Objects: Syntactic Layer MLC:SynU MLC:SyntacticFrame hasSyntacticFrame MLC:FrameSet hasFrameSet MLC:Composition composedby correspondTo MLC:SemU MLC:CorrespSynUSemU 1..* * * *

40 Pisa, September 2004 SyntacticFrame Construction Self Slot SynU Function Phrase … expanding one node. … …

41 Pisa, September 2004 MLC:SemU MLC:Synset belongsToSynset MLC:SemanticFrame hasSemFrame MLC:SemanticFeature hasSemFeature MLC:Collocation hasCollocation semanticRelation MLC:SemU MLC:SemanticRelation MILE Lexical Objects: Semantic Layer * 0..1 * * *

42 Pisa, September 2004 MLC:CorrespSynUSemU MLC:SynU hasSourceSynu hasTargetSemu MLC:SemU hasPredicativeCorresp MLC:PredicativeCorresp IncludesSlotArgCorresp MLC:SlotArgCorresp MILE Lexical Objects: Synt-Sem Linking 1 1 1 0..*

43 Pisa, September 2004 Syntax-Semantics Linking CorrespSynUSemU PredCorresp Slot0:Arg1 Slot1:Arg0 SemU Predicate Arg_0 Arg_1 SynU Frame Slot1 Slot0 filters & conditions

44 Pisa, September 2004 Syntax-Semantics Linking John gave the book to Mary John gave Mary the book SynU#1 obj_NPobl_PP_to SemU#1 Semantic_Frame:GIVE Arg1 Agent subj_NP SynU#2 obj_NP subj_NP Arg2 Theme Arg3 Goal

45 Pisa, September 2004 CorrespSynUSemU Syntax-Semantic Linking in SIMPLE Transitive structure Slot0 Slot1 SemU1_migliorare SemU2_migliorare CHANGE_OF_STATE CAUSE_CHANGE_OF_STATE PRED_ migliorare ARG0:Agent ARG1:Patient isomorphic non-isomorphic SynU_migliorare Frameset Intransitive structure Slot0 Ø CorrespSynUSemU SlotArgCorresp

46 Pisa, September 2004 MultiCorresp MUMUCorresp hasMUMUCorr SynUSynUCorresp hasSynUSynuCorr SemUSemUCorresp hasSemUSemUCorr SynsetMultCorresp hasSynsetMultCorr hasSemFrameCorr SemanticFrameMultCorresp The Multilingual layer 1..0

47 Pisa, September 2004 MILE approach to multilinguality Open to various approaches Open to various approaches transfer-based transfer-based monolingual descriptions are used to state correspondences (tests and actions) between source and target entries monolingual descriptions are used to state correspondences (tests and actions) between source and target entries interlingua-based interlingua-based monolingual entries linked to language-independent lexical objects (e.g. semantic frames, “primitive predicates”, etc.) monolingual entries linked to language-independent lexical objects (e.g. semantic frames, “primitive predicates”, etc.)

48 Pisa, September 2004 The Multi-MILE Multi-MILE specifies a formal environment to express multilingual correspondences between lexical items Multi-MILE specifies a formal environment to express multilingual correspondences between lexical items Source and target lexical entries can be linked by exploiting (possibly combined) aspects of their monolingual descriptions Source and target lexical entries can be linked by exploiting (possibly combined) aspects of their monolingual descriptions monolingual lexicons act as pivot lexical repositories, on top of which language-to-language multilingual modules can be defined monolingual lexicons act as pivot lexical repositories, on top of which language-to-language multilingual modules can be defined

49 Pisa, September 2004 The Multi-MILE Multi-MILE may include: Multi-MILE may include: Multlingual operations to establish transfer links between source and target mono-MILE Multlingual operations to establish transfer links between source and target mono-MILE Multlingual lexical objects Multlingual lexical objects enrich the source and target lexical descripotions, but enrich the source and target lexical descripotions, but do not belong to the monolingual lexicons do not belong to the monolingual lexicons Language-independent lexical objects: Language-independent lexical objects: Primitive semantic frames, “interlingual synsets”, etc. Primitive semantic frames, “interlingual synsets”, etc. Relevant for interlingua approaches to multilinguality Relevant for interlingua approaches to multilinguality

50 Pisa, September 2004 MU_1 SynU_2 SemU_2 SynU_1 SemU_1 Italian mono-MILE IT-to-EN multi-MILE Multi-MILE IT_SemU_2  En_SemU_1 IT_SynU_2  En_SynU_1 IT_Slot_0  EN_Slot_1 IT_Slot_1  EN_Slot_0 MU_1 SynU_1 SemU_1 English mono-MILE AddFeature to source SemU +HUMAN AddSlot to target SynU MODIF [PP_with]

51 Pisa, September 2004 Multi-MILE dito finger toe modif(mano) modif(piede) multilingual conditions run + PP_into entrare “to enter” +PP_di_corsa multilingual conditions IT Lexicon EN Lexicon

52 Pisa, September 2004 MILE Lexical Classes Represent the main building blocks of lexical entries Represent the main building blocks of lexical entries Formalize the MILE Basic Notions Formalize the MILE Basic Notions Define an ontology of lexical objects Define an ontology of lexical objects represent lexical notions such as semantic unit, syntactic feature, syntactic frame, semantic predicate, semantic relation, synset, etc. represent lexical notions such as semantic unit, syntactic feature, syntactic frame, semantic predicate, semantic relation, synset, etc. Similar to class definitions in OO languages Similar to class definitions in OO languages specify the relevant attributes specify the relevant attributes define the relations with other classes define the relations with other classes hierarchically structured hierarchically structured

53 Pisa, September 2004 MILE Lexical Classes an ontology of lexical objects

54 Pisa, September 2004 MILE Lexical Data Categories MDC are instances of the MILE lexical Classes MDC are instances of the MILE lexical Classes Can be used “off the shelf” or as a departure point for the definition of new or modified categories Can be used “off the shelf” or as a departure point for the definition of new or modified categories Enable modular specification of lexical entities using all or parts of the lexical information in the repository Enable modular specification of lexical entities using all or parts of the lexical information in the repository Each MDC respresents a resource Each MDC respresents a resource uniquely identified by a URI uniquely identified by a URI Two types of MDC: Two types of MDC: Core MDC Core MDC belong to shared repositories ( Lexical Data Category Registry ) belong to shared repositories ( Lexical Data Category Registry ) lexical objects and linguistic notions with wide consensus lexical objects and linguistic notions with wide consensus User Defined MLDC User Defined MLDC user-specific or language specific lexical objects user-specific or language specific lexical objects

55 Pisa, September 2004 User-defined MDC The MILE Data Categories Instances of the MILE Lexical Classes are Data Categories Instances of the MILE Lexical Classes are Data Categories MDC can belong to a shared repository or be user-defined MDC can belong to a shared repository or be user-defined Core MDC MLC

56 Pisa, September 2004 The MILE Data Categories User-adaptability and extensibility HUMAN ARTIFACT EVENT ANIMAL GROUP AGE MAMMAL instance_of Core UserDefined MLC:SemanticFeature

57 Pisa, September 2004 MILE Lexical Data Categories MLM:Feature MLM:SemFeatureMLM:SynFeature HUMAN ARTIFACTUAL EVENT DURATION GROUP AGE ANIMATE instance_of Core UserDefined MDC GENDER CASE PERSON TENSE CONTROL ASPECT Core UserDefined instance_of MDC MLM:GrammaticalFunction SUBJ OBJ IOBJ PRED X_COMP C_COMP Core UserDefined instance_of MDC

58 Pisa, September 2004 MILE Lexical Operations They are used to state conditions and perform operations over lexical entries They are used to state conditions and perform operations over lexical entries Link syntactic slots and semantic arguments Link syntactic slots and semantic arguments Constrain the syntax-semantic link Constrain the syntax-semantic link Express tests and actions in the transfer conditions in the multi-MILE Express tests and actions in the transfer conditions in the multi-MILE … They provide the “glue” to link various independent intra-lexical and inter-lexical components They provide the “glue” to link various independent intra-lexical and inter-lexical components

59 Pisa, September 2004 Multilingual Operations Source-to-target language transfer conditions can be expressed by combining multilingual operations Source-to-target language transfer conditions can be expressed by combining multilingual operations Three types of multingual operations: Three types of multingual operations: Multilingual correspondences Multilingual correspondences Link a source lexical object (MU, SemU, SynU, semantic argument, syntactic slot) and a target lexical object (MU, SemU, SynU, semantic argument, syntactic slot) Link a source lexical object (MU, SemU, SynU, semantic argument, syntactic slot) and a target lexical object (MU, SemU, SynU, semantic argument, syntactic slot) Add-operations Add-operations Add lexical information relevant for the cross-lingual link, but not present in the source or target mono-MILE Add lexical information relevant for the cross-lingual link, but not present in the source or target mono-MILE Constrain-operations Constrain-operations Constrain the transfer link to some portions of source and target mono-MILE Constrain the transfer link to some portions of source and target mono-MILE

60 Pisa, September 2004 Defining the MLM MILE Entry Schema MILE Lexical Classes User Defined MDC Registry RDF/S Descriptions Monolingual/Multilingual Lexicon

61 Pisa, September 2004 RDF Instantiation of the MLM Lexicon#1 Lexicon#2 Lexicon#3 Resources Lexical Objects Lexical Classes Lexical Data Categories Resources Metadata

62 Pisa, September 2004 MILE Lexical Model Ideal structure for rendering in RDF: Ideal structure for rendering in RDF: hierarchy of lexical objects built up by combining atomic data categories via clearly defined relations hierarchy of lexical objects built up by combining atomic data categories via clearly defined relations Proof of concept: Proof of concept: Create an RDF schema for the MILE Lexical Model Create an RDF schema for the MILE Lexical Model version 1.2 version 1.2 Instantiate MILE Lexical Data Categories Instantiate MILE Lexical Data Categories

63 Pisa, September 2004 User-Adaptability and Resource Sharing in MILE Compatible with different models of lexical analysis: Compatible with different models of lexical analysis: Relational semantic models (e.g. WordNet) Relational semantic models (e.g. WordNet) Syntactic and semantic frames Syntactic and semantic frames Ontology-based lexicons Ontology-based lexicons Compatible with different degrees of specification: Compatible with different degrees of specification: Deep lexical representations (e.g. PAROLE-SIMPLE) Deep lexical representations (e.g. PAROLE-SIMPLE) Terminological lexicons Terminological lexicons Compatible with different paradigm of multilinguality Compatible with different paradigm of multilinguality Lexicons for Transfer Based MT Lexicons for Transfer Based MT Interlingua-based lexicons Interlingua-based lexicons …

64 Pisa, September 2004 The MILE Lexical Model MILE Lexical Model lexicon_1lexicon_2lexicon_3 DTD_1DTD_2 … DTD_n

65 Pisa, September 2004 RDF Instantiation of the MLM Enable universal access to sophisticated linguistic info Enable universal access to sophisticated linguistic info Provide means for inferencing over lexical info Provide means for inferencing over lexical info Incorporate lexical information into the Semantic Web Incorporate lexical information into the Semantic Web W3C standards: W3C standards: Resource Definition Framework (RDF) Resource Definition Framework (RDF) Ontology Web Language (OWL) Ontology Web Language (OWL) Built on the XML web infrastructure to enable the creation of a Semantic Web Built on the XML web infrastructure to enable the creation of a Semantic Web web objects are classified according to their properties web objects are classified according to their properties semantics of relations (links) to other web objects precisely defined semantics of relations (links) to other web objects precisely defined

66 Pisa, September 2004 The RDF Schema Defines classes of objects (MLC) and their relations to other objects Defines classes of objects (MLC) and their relations to other objects Like a class definition in Java, etc. Like a class definition in Java, etc. Classes and properties in the schema correspond to the E-R model Classes and properties in the schema correspond to the E-R model Can specify sub-classes/sub-properties and inheritance Can specify sub-classes/sub-properties and inheritance

67 Pisa, September 2004 Goals Lexical information will form a central component of semantic information Lexical information will form a central component of semantic information Need a standardized, machine processable format so that information can be used, merged with others Need a standardized, machine processable format so that information can be used, merged with others Main task: get the data model right Main task: get the data model right See Semantic Web

68 Pisa, September 2004 Advantages of RDF Modularity Modularity Can create “instances” of bits of lexical information for re- use in a single lexicon or across lexicons Can create “instances” of bits of lexical information for re- use in a single lexicon or across lexicons Instances can be stored in a central repository for use by others Instances can be stored in a central repository for use by others Can use partial information or all of it Can use partial information or all of it Building block approach to lexicon creation Building block approach to lexicon creation Web-compatible Web-compatible RDF instantiation will integrate into Semantic Web RDF instantiation will integrate into Semantic Web Inferencing capabilities Inferencing capabilities

69 Pisa, September 2004 Example Three parts: Three parts: RDF Schema for lexical entries RDF Schema for lexical entries Defines classes and properties, sub-classes, etc. Defines classes and properties, sub-classes, etc. Sample repository of RDF-instantiated lexical objects Sample repository of RDF-instantiated lexical objects Three levels of granularity Three levels of granularity Sample lexicon entries Sample lexicon entries Use repository information at different levels Use repository information at different levels

70 Pisa, September 2004 Sample Repositories 1 repository of enumerated classes for lexical objects at the lowest level of granularity definition of sets of possible values for various lexical objects definition of sets of possible values for various lexical objects 2 repository of phrases for common phrase types, e.g., NP, VP, etc. 3 repository of constructions for common syntactic constructions

71 Subj Obj Comp Arg Iobj tense gender control person aux have be subject_control object_control masculine feminine Enumerated classes

72 Pisa, September 2004 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:mlc="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#"> Sample LDCR for a Phrase Object

73 Pisa, September 2004 Sample LDCR entry for a Construction object <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6#"> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>

74 Pisa, September 2004 Full entry John ate the cake Continued…

75 Pisa, September 2004 Continued from previous slide…

76 Pisa, September 2004 Entry Using Phrase John ate the cake <headedBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/> <filledBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#NP"/>

77 Pisa, September 2004 Entry Using Construction John ate the cake <headedBy rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Phrases#Vauxhave"/> <hasConstruction rdf:resource= "http://www.cs.vassar.edu/~ide/rdf/isle-datcats/Constructions#TransIntrans"/>

78 Pisa, September 2004 Semantic Representation The data model underlying RDF/UML, etc. is universal, abstract enough to capture all types of info The data model underlying RDF/UML, etc. is universal, abstract enough to capture all types of info Semantic representations: Semantic representations: Registry of basic data categories Registry of basic data categories “meta”-categories: addressee, utterance, etc. “meta”-categories: addressee, utterance, etc. Information categories: eyebrow movement, gestures, pitch, … Information categories: eyebrow movement, gestures, pitch, … Supporting ONTOLOGY of information categories Supporting ONTOLOGY of information categories Interpretative procedures yield another level of meaning represent. Interpretative procedures yield another level of meaning represent. Registry of categories…. Registry of categories…. UNINTERPRETED REPRESENATION INTERPRETATION PROCESS INTERPRETEDREPRESENTATION

79 Pisa, September 2004 MILE Lexical Data Category Registry (MDC) Instantiation of pre-defined lexical objects Instantiation of pre-defined lexical objects Extension of the shared class schema with lexicon- specific sub-classes and sub-properties Extension of the shared class schema with lexicon- specific sub-classes and sub-properties Can be used “off the shelf” or as a departure point for the definition of new or modified categories Can be used “off the shelf” or as a departure point for the definition of new or modified categories Enables modular specification of lexical entities Enables modular specification of lexical entities eliminate redundancy eliminate redundancy identify lexical entries or sub-entries with shared properties identify lexical entries or sub-entries with shared properties

80 Pisa, September 2004 MLC in RDF/S features mlm:LexObjectmlm:Values mlm:feature mlm:SemValues mlm:SynValues rdfs:subClassOf mlm:semFeature rdfs:subClassOf mlm:synFeature rdfs:subPropertyOf features are properties of lexical objects

81 Pisa, September 2004 MLC in RDF/S syntactic features <rdfs:subPropertyOf rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile- schema-v.1#synFeature"/> <rdfs:range rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile- schema-v.1#SynCatValues”/> <rdfs:subClassOf rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile- schema-v.1 #SynValues”/>... feature values

82 Pisa, September 2004 MLC in RDF/S semantic features <rdfs:subPropertyOf rdf:resource="http://webilc.ilc.cnr.it/~lenci/isle/mile- schema-v.1#semFeature"/> <rdfs:range rdf:resource=“http://webilc.ilc.cnr.it/~lenci/isle/mile- schema-v.1 #DomainValues”/>... “domain ontology”

83 Pisa, September 2004 Synsets in RDF/S mlm:Synset rdfs:literal mlm:word mlm:Synset mlm:synsetRelation mlm:Values rdfs:literal mlm:gloss mlm:feature cf. also http://www.semanticweb.org/library/wordnet/wordnet-20000620.rdfshttp://www.semanticweb.org/library/wordnet/wordnet-20000620.rdfs

84 Pisa, September 2004 Synset This class formalizes the notion of synset as defined in WordNet (Fellbaum 1998). The WordNet hypernym relation The WordNet meronym relation Synsets in RDF/S relation between synsets different types of synset relations

85 Pisa, September 2004 <mlm:Synset rdf:about="http://www.cogsci.princeton.edu/~wn1.7/concept#01752990“ mlm:source="WordNet1.7"> A member of the genus Canis dog domestic dog Canis familiaris <mdc:hypernym rdf:resource="http://www.cogsci.princeton.edu/~wn1.7/concept #01752283"/> WordNet 1.7 Synsets features hypernym

86 Pisa, September 2004 Foundations of the Mapping Experiment

87 1. The MILE building-block model The MILE Lexical Classes and the MILE Lexical Data Categories are the main building blocks of the MILE lexical architecture The MILE Lexical Classes and the MILE Lexical Data Categories are the main building blocks of the MILE lexical architecture Building blocks allow two kinds of reusability: Building blocks allow two kinds of reusability: intra-lexicon reusability (within the same lexicon) intra-lexicon reusability (within the same lexicon) inter-lexicon reusability (among different lexicons) inter-lexicon reusability (among different lexicons)

88 Pisa, September 2004 syntactic frame phrase slot Syn feature Lexical Objects Sem feature How building-blocks work? Lexical entry 1 Lexical entry 2 Lexical entry 3

89 Pisa, September 2004 2. MILE: a meta-entry MILE is MILE is a general schema for multilingual lexical resources a general schema for multilingual lexical resources a lexical meta-entry, a common representational layer for multilingual lexicons a lexical meta-entry, a common representational layer for multilingual lexicons Computational lexicons can be viewed as different instances of the MILE schema Computational lexicons can be viewed as different instances of the MILE schema MILE lexicon#1 lexicon#3 lexicon#2

90 Pisa, September 2004 MILE and Content Interoperability This common shared compatible representation of lexical objects is particularly suited to This common shared compatible representation of lexical objects is particularly suited to manipulate objects available in different lexical resources manipulate objects available in different lexical resources understand their deep semantics understand their deep semantics apply the same operations to lexical objects of the same type apply the same operations to lexical objects of the same type key elements of Content Interoperability key elements of Content Interoperability

91 Pisa, September 2004 The Mapping Experiment: Why? It is a concrete experiment aimed to test the expressive potentialities and capabilities of the MILE It is a concrete experiment aimed to test the expressive potentialities and capabilities of the MILE The idea is that if the MILE atomic notions combined together in different ways suit the different “visions” underlying two lexicons such as FrameNet and NOMLEX, The idea is that if the MILE atomic notions combined together in different ways suit the different “visions” underlying two lexicons such as FrameNet and NOMLEX,  the MILE will come out fortified  its adoption as an interface between differently conceived lexical architectures can be pushed more  key issues for content interoperability between resources can be addressed

92 Pisa, September 2004 The mapping scenarios 1.High level mapping of the objects of a lexicon into the objects of the abstract model  the native structure is maintained and no format conversion is performed 2.Translate instances of lexical entries directly in MILE  acts as a true interchange format

93 Pisa, September 2004 FrameNet to MILE

94 Pisa, September 2004 FrameNet-MILE: Observations The mapping is promising Frame ↔ Predicate (primitive) Frame ↔ Predicate (primitive) Frame Elements ↔ Argument (enlarge the set of possible values) Frame Elements ↔ Argument (enlarge the set of possible values) Lexical_Unit ↔ SemU Lexical_Unit ↔ SemU Link SemU-Predicate (obligatory) should become underspecified Link SemU-Predicate (obligatory) should become underspecified But … Lack of inheritance mechanism in the Predicate does not allow to represent the hierarchical organization of Frames and Sub-frames, temporal ordering among Frames, subsumption relations among Frames Lack of inheritance mechanism in the Predicate does not allow to represent the hierarchical organization of Frames and Sub-frames, temporal ordering among Frames, subsumption relations among Frames We could add a new object PredicateRelation to allow for the description of relations occurring between predicates and sub- predicates We could add a new object PredicateRelation to allow for the description of relations occurring between predicates and sub- predicates

95 Pisa, September 2004 MLC:SynUMLC:SemU MLC:SemanticFrame TypeOfLinkAgentnom IncludedArg 0 MLC:Predicate MLC:Argument MLC:Corresp SynUSemU :nom-type ((subject))

96 Pisa, September 2004 NOMLEX-MILE: Observations The mapping is promising Notions represented in NOMLEX have a correspondent in MILE Notions represented in NOMLEX have a correspondent in MILE But.. are expressed with two opposite lexical structures are expressed with two opposite lexical structures In NOMLEX, In NOMLEX, lexical information is expressed in a very compact way lexical information is expressed in a very compact way no clear cut boundaries between the levels of linguistic description no clear cut boundaries between the levels of linguistic description In MILE In MILE compressed info should be decompressed and spread over different MILE lexical layers and objects: SynU, SemU, SemanticFrame with its Predicate and relevant Arguments to account for the incorporation of the Agent. compressed info should be decompressed and spread over different MILE lexical layers and objects: SynU, SemU, SemanticFrame with its Predicate and relevant Arguments to account for the incorporation of the Agent.

97 Pisa, September 2004 Lesson Learned from the mapping The results of the experiments are promising The results of the experiments are promising FrameNet offers the possibility to be confronted with two similar lexical models, but not perfectly overlapping lexical objects test the adequacy of the linguistic objects FrameNet offers the possibility to be confronted with two similar lexical models, but not perfectly overlapping lexical objects test the adequacy of the linguistic objects NOMLEX gives the opportunity to work with two lexicons where linguistic notions correspond but are expressed with an opposite lexicon structure test the adequacy of the architectural model NOMLEX gives the opportunity to work with two lexicons where linguistic notions correspond but are expressed with an opposite lexicon structure test the adequacy of the architectural model The high granularity and modularity of MILE The high granularity and modularity of MILE allow the compatibility with differently packaged linguistic objects allow the compatibility with differently packaged linguistic objects allow the addition of new objects and relations without perverting the general architecture allow the addition of new objects and relations without perverting the general architecture

98 Pisa, September 2004 RDF and MILE: Why? Some reasons (from Nancy Ide et al. 2003) MILE as a hierarchy of lexical objects built up by combining data categories via clearly defined relations is an ideal structure for rendering in RDF MILE as a hierarchy of lexical objects built up by combining data categories via clearly defined relations is an ideal structure for rendering in RDF RDF mechanism, with the capacity of expressing named relations between objects, offers a web-based means to represent the MILE architecture RDF mechanism, with the capacity of expressing named relations between objects, offers a web-based means to represent the MILE architecture RDF representation of linguistic information is an invaluable resource for language processing applications in the Semantic Web RDF representation of linguistic information is an invaluable resource for language processing applications in the Semantic Web RDF description and instantiation is in line with the goal of ISO TC37 SC4 RDF description and instantiation is in line with the goal of ISO TC37 SC4

99 Pisa, September 2004 RDF Representation of MILE MILE was already supplied with MILE was already supplied with an RDF schema for the MILE Syntactic Layer an RDF schema for the MILE Syntactic Layer an instantiation of pre-defined syntactic objects an instantiation of pre-defined syntactic objects We increased the repository of shared lexical objects with the RDF description and (partial!) instantiations of the objects of the semantic and linking layers We increased the repository of shared lexical objects with the RDF description and (partial!) instantiations of the objects of the semantic and linking layers This has been carried out with the intent to This has been carried out with the intent to be submitted within the ISO TC37/SC4 be submitted within the ISO TC37/SC4 foster the adoption of MILE, by offering a library of RDF objects ready-to-use foster the adoption of MILE, by offering a library of RDF objects ready-to-use

100 Pisa, September 2004 An RDF Schema for the synt-sem linking <!-- An RDF Schema for ISLE lexical entries v 0.1 2004/05/05 Author: Monachini --> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl ="http://www.w3.org/2002/07/owl# xmlns:mlc ="http://www.cs.vassar.edu/~ide/rdf/isle-schema-v.6# xmlns:mlc ="http://www.ilc.cnr.it/clips/rdf/isle-schema-syntsemlinking_v.1#"> CorrespSynUSemU This class links a SynU to a SemU PredicativeCorresp This class contains the associations between the syntactic slots and semantic argument SlotArgCorresp This class links a syntactic slots to a semantic argument Classes

101 Pisa, September 2004 An RDF Schema for the synt-sem linking hasSourceSynU hasTargetSemU hasPredicativeCorresp includesSlotArgCorresp Properties

102 Pisa, September 2004 The library of Pre-instantiated objects Enable modular specification of lexical entities Enable modular specification of lexical entities eliminate redundancy eliminate redundancy identify lexical entries or sub-entries with shared properties identify lexical entries or sub-entries with shared properties create ready-to-use packages that can be combined in different ways create ready-to-use packages that can be combined in different ways Can be used “off the shelf” or as a departure point for the definition of new or modified categories Can be used “off the shelf” or as a departure point for the definition of new or modified categories

103 Pisa, September 2004 MDCR for some objects <!-- Sample LDCR entry for a PredicativeCorresp and SlotArgCorresp objects DataCats for ISLE lexical entries DataCats for ISLE lexical entries v 0.1 2004/05/17 v 0.1 2004/05/17 Author: Monachini --> Author: Monachini --> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" … … … … <includesSlotArgCorresp <includesSlotArgCorresp rdf:resource=“http://SlotArgCorresp#Arg0Slot0 Arg1Slot1“/> rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle-datacats/SlotArgCorresp#Arg0Slot0 Arg1Slot1“/> </PredicativeCorresp> <SlotArgCorresp rdf:ID="Arg0Slot0" SlotNumber="0" SlotNumber="0" ArgNumber"0"> ArgNumber"0"></SlotArgCorresp> <SlotArgCorresp rdf:ID="Arg1Slot1" SlotNumber="1" SlotNumber="1" ArgNumber"1"> ArgNumber"1"></SlotArgCorresp></rdf:RDF> Pre-instantiated Pre-instantiated PredicativeCorres p Pre-instantiated SlotArgCorresp

104 Pisa, September 2004 A Sample Entry in MILE The entry is shown in a double alternative: The entry is shown in a double alternative: 1.the full specification of a lexical object PredicativeCorresp 2.an already instantiated object PredicativeCorresp The advantage is that The advantage is that the object does not need to be specified in the entry the object does not need to be specified in the entry and can be used and reused in other entries and can be used and reused in other entries explore the potential of MILE for representation of lexical data explore the potential of MILE for representation of lexical data

105 Pisa, September 2004 Sample full entry for amareV </hasSynu> The “full” object PredicativeCorres p

106 Pisa, September 2004 … the abbreviated entry <hasPredicativeCorresp <hasPredicativeCorresp rdf:resource=“http:// PredicativeCorresp#isobivalent“/> rdf:resource=“http://www.ilc.cnr.it/clips/rdf/isle- datacats/PredicativeCorresp#isobivalent“/> Instantiated object PredicativeCorres p

107 Pisa, September 2004 The RDF Schema, the DCR for MILE objects and the entries are available at www.ilc.cnr.it/clips/rdf/

108 and INTERA? … INTERA Multilingual Terminological Lexica will follow and merge the two frameworks INTERA Multilingual Terminological Lexica will follow and merge the two frameworks The MILE and The MILE and ISO TMF (Terminological Markup Framework) ISO TMF (Terminological Markup Framework)

109 Pisa, September 2004 MILE Lexical Model oriented towards an MILE Lexical Model oriented towards an Open Distributed Lexical Infrastructure: Lexical Information Servers for multiple access to lexical information repositories Lexical Information Servers for multiple access to lexical information repositories Enhance Enhance user-adaptivity user-adaptivity resource sharing resource sharing cooperative creation cooperative creation Develop integration and interchange tools Develop integration and interchange tools Beyond MILE: future work

110 Pisa, September 2004 Broadening MILE:... other languages Ongoing enlargement to Asian languages (Chinese, Japanese, Korean, Thai, Hindi...) Ongoing enlargement to Asian languages (Chinese, Japanese, Korean, Thai, Hindi...) promote common initiatives between Asia & Europe (e.g. within the EU 6th FP) promote common initiatives between Asia & Europe (e.g. within the EU 6th FP) The creation of an Open Distributed Lexical Infrastructure, also supported by Asian Institutions: The creation of an Open Distributed Lexical Infrastructure, also supported by Asian Institutions: AFNLP AFNLP University of Tokyo (Dept. of Computer Science) University of Tokyo (Dept. of Computer Science) Korean KAIST and KORTERM Korean KAIST and KORTERM Academia Sinica (Taiwan) Academia Sinica (Taiwan) … world-wide To valorise results & increase visibility of LR & standardisation initiatives in a world-wide context, platform while concretely promoting the launching of a new common platform for multilingual LR creation & management

111 Pisa, September 2004 Using semantically tagged corpora to … acquire semantic info and enhance Lexicons  evaluate the disambiguating power of the semantic types of the lexicon  assess the need of integrating lexicons with attested senses and/or phraseology  identify the inadequacy of sense distinctions in lexicons  check actual frequency of known senses in different text types  have a more precise and complete view on the semantics of a lemma  identify the most general senses  capture the most specific shifts of meaning Capture just the core, basic distinctions in a core lexicon Corpus analysis must not lead to excessive granularity of sense distinctions, but draw a distinction between sense discrimination – to be kept “under control” - clustering (manually or automatically) sense discrimination – to be kept “under control” - clustering (manually or automatically) additional, more granular information (often of collocational nature) which can/must be acquired/encoded within the broader senses, e.g. to help translation additional, more granular information (often of collocational nature) which can/must be acquired/encoded within the broader senses, e.g. to help translation

112 Pisa, September 2004 … Dynamic lexicon  Current computational lexicons (even WordNets) are static objects, still shaped on traditional dictionaries  suffering from the limitations induced by paper support Thinking at the complex relationships between lexicon and corpus  towards a flexible model of dynamic lexicon  extending the expressiveness of a core static lexicon adapting to the requirements of language in use as attested in corpora  with semantic clustering techniques, etc. Convert the extreme flexibility & multidimensionality of meaning into large- scale and exploitable (VIRTUAL?) resources a Lexicon and Corpus together

113 Pisa, September 2004 What to annotate? Mix of:  Word-sense annotation (implicit semantic markup)  Semantic/conceptual markup  … Syntagmatic relations  Dependency relations  Semantic roles  …

114 Pisa, September 2004 Need for a common Encoding Policy ? Agree on common policy issues?  is it feasible?  desirable?  to what extent? This would imply, among others:  analysis of needs – also applicative/industrial - before any large development initiative  base semantic tagging on commonly accepted standards/guidelines ??  up to which level? Common semantic tagset: Gold Standard?? Common semantic tagset: Gold Standard??  build a core set of semantically tagged corpora, encoded in a harmonised way, for a number of languages??  make annotated corpora available to the community by large  involve the community, collect and analyse existing semantically tagged corpora  devise common set of parameters for analysis

115 Pisa, September 2004 A few Issues for discussion: MILE & lexicon standards More standardisation initiatives? MILE - a general schema for encoding multilingual lexical info, as a meta-entry, as a common representational layer Short & medium term requirements wrt standards for multilingual lexicons and content encoding, also industrial requirements Short & medium term requirements wrt standards for multilingual lexicons and content encoding, also industrial requirements Relation with Spoken language community (see ELRA) Relation with Spoken language community (see ELRA) Semantic Web standards & the needs of content processing technologies: importance of reaching consensus on (linguistic & non-linguistic) “content”, in addition to agreement on formats & encoding issues (…words convey content & knowledge) Semantic Web standards & the needs of content processing technologies: importance of reaching consensus on (linguistic & non-linguistic) “content”, in addition to agreement on formats & encoding issues (…words convey content & knowledge) Define further steps necessary to converge on common priorities Define further steps necessary to converge on common priorities

116 Pisa, September 2004 NLP, lexicons, terminologies, ontologies, Semantic Web: a continuum? Knowledge management is critical. For “content” interoperability, need to converge around agreed standards also for the semantic/conceptual level  is the field ‘mature’ enough to converge around agreed standards also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?  Is the field of multilingual lexical resources ready to tackle the challenges set by the Semantic Web development? Foster better integration with  corpus-driven data  terminology/ontology/semantic web communities  multimodal & multimedial aspects Broadening MILE:... other communities open, distributed Oriented towards open, distributed lexical resources: Lexical Information Servers Lexical Information Servers for multiple access to lexical information repositories

117 Pisa, September 2004 A few Issues for discussion: NLP, lexicons, content, ontologies, Semantic Web: … a continuum? Need for robust systems, able to acquire/tune multilingual lexical/linguistic/conceptual knowledge, to auto-enrich static basic resources Need for robust systems, able to acquire/tune multilingual lexical/linguistic/conceptual knowledge, to auto-enrich static basic resources Relation betw. lexical standards & acquisition & text annotation protocols Relation betw. lexical standards & acquisition & text annotation protocols

118 Pisa, September 2004 Target….. Multilingual Knowledge Management Technical Feasibility: Prerequisite: is it an achievable goal a commonly agreed text/lexicon annotation protocol also for the semantic/conceptual level (to be able to automatically establish links among different languages)? Yes, at the lexical level More complex, for corpus annotation? EAGLES/ISLE

119 Pisa, September 2004 HLT Natural convergence with HLT : multilingual semantic processingmultilingual semantic processing ontologiesontologies semantic-syntactic computational lexiconssemantic-syntactic computational lexicons To make the Semantic Web a reality... …need to tackle the twofold challenge of content availability & content availability & multilinguality multilinguality

120 Pisa, September 2004 … enables a new role of Multilingual Lexicons: to become essential component for the Semantic Web  Language - & lexicons - are the gateway to knowledge  Semantic Web developers need repositories of words & terms - & knowledge of their relations in language use & ontological classification  The cost of adding this structured and machine-understandable lexical information can be one of the factors that delays its full deployment  The effort of making available millions of ‘words’ for dozens of languages is something that no small group is able to afford radical shift in the lexical paradigm A radical shift in the lexical paradigm - whereby many participants add linguistic content descriptions in an open distributed lexical framework - required to make the Web usable

121 Pisa, September 2004 Create a first repository of shared lexical entries “extracted” from different lexical resources & mapped to MILE ( choosing e.g. lexical entries in areas related to the Olympic Games) Create a first repository of shared lexical entries “extracted” from different lexical resources & mapped to MILE ( choosing e.g. lexical entries in areas related to the Olympic Games) to test mapping different lexicon models to MILE to test mapping different lexicon models to MILE provide a grid with all the ISLE Basic Notions, short descriptions, attributes and sub-elements,to be filled with the correspondent "notions” provide a grid with all the ISLE Basic Notions, short descriptions, attributes and sub-elements,to be filled with the correspondent "notions” Create a list (Open Lexicon Interest Group) Create a list (Open Lexicon Interest Group)...... Beyond MILE: next steps... …. towards an Open Distributed Lexical Infrastucture Language user-adaptivity, resource sharing, cooperative creation & managementEnhance user-adaptivity, resource sharing, cooperative creation & management Lexical Information ServersLexical Information Servers for multiple access to lexical information repositories Knowledge

122 Pisa, September 2004 A new paradigm for a “new generation” of LR? New Strategic Vision New Strategic Vision towards a Distributed Open Lexical Infrastructure cooperation, Focus on cooperation, between different communities also between different communities distributed & cooperative creation for distributed & cooperative creation, management, etc. of Lexical Resources MILE MILE as a common platform technical & organisational requirements technical & organisational requirements

123 Pisa, September 2004 Beyond MILE: towards open & distributed lexicons Semantic Lexicon URI = http://www.xxx… Syntactic Constructions URI = http://www.yyy… Ontology URI = http://www.zzz… Monolingual/Multilingual Lexicon Lexicon Lex_object: semFeature URI = http://www.xxx…#HUMAN Lex_object: syntagmaNT URI = http://www.zzz…#NP corpora

124 Pisa, September 2004 A few issues for the future... Integration betw. WLR/SLR/MMR (see e.g. LREC) Integration betw. WLR/SLR/MMR (see e.g. LREC) Integration betw. LRs & SemWeb Integration betw. LRs & SemWeb Integration of Lexicons/Terminologies/Ontologies: towards Knowledge Resources Integration of Lexicons/Terminologies/Ontologies: towards Knowledge Resources Multilingual Resources: an open infrastructure Multilingual Resources: an open infrastructure Integration of Lexicon/Corpus (see e.g. Framenet) Integration of Lexicon/Corpus (see e.g. Framenet) Parallel evolution of LRs & LTechnology Parallel evolution of LRs & LTechnology

125 Pisa, September 2004 from Computational Lexicons to Knowledge Resources Unified framework for lexicons, ontologies, terminologies, etc. Towards an open, distributed infrastructure for lexical resources Lexical Information Servers Lexical Information Servers flexible and extensible flexible and extensible integrated with multimodal and multimedial data integrated with multimodal and multimedial data integrated with Web technology integrated with Web technology related initiatives: INTERA, ICWLRE related initiatives: INTERA, ICWLRE

126 Pisa, September 2004 …with a world-wide participation looking for an appropriate call ….. pushing to launch an Open & Distributed Lexical Infrastructure  for content description and content interoperability,  to make lexical resources usable within the emerging Semantic Web scenario for Language Resources & Semantic Web….

127 Pisa, September 2004 How to go to a framework allowing incremental creation/merging/… How to: "organise" creation/acquisition of multilingual LRs: evaluate different models "organise" creation/acquisition of multilingual LRs: evaluate different models cope with/affect maintenance cope with/affect maintenance organise technology transfer among languages organise technology transfer among languages support BLARK (a commonly agreed list of minimal requirements for “national” LRs) support BLARK (a commonly agreed list of minimal requirements for “national” LRs) launch an international initiative linking Semantic Web & LRs launch an international initiative linking Semantic Web & LRs bootstrap this by "opening" a few LRs bootstrap this by "opening" a few LRs role of standards

128 Pisa, September 2004 Lexical WEB & Content Interoperability As a critical step for semantic mark-up in the SemWeb As a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y MILE with intelligent ?? agents?? NomLex

129 Pisa, September 2004 Semantic Lexicon http://www.xxx… Syntactic Lexicon http://www.yyy… Ontology http://www.zzz… corpora A new paradigm for a “new generation” of LRs? Cross-linguallinks


Download ppt "Pisa, September 2004 Infrastructural Language Resources & Standards for Multilingual Computational Lexicons Nicoletta Calzolari … with many others Istituto."

Similar presentations


Ads by Google