Presentation is loading. Please wait.

Presentation is loading. Please wait.

N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche.

Similar presentations


Presentation on theme: "N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche."— Presentation transcript:

1 N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche (lessici, corpora, ontologie, …) Standard e tecnologie linguistiche With many others at ILC

2 N. Calzolari2Dottorato, Pisa, Maggio ) Because the main trend until mid-80s was to privilege the processing of critical phenomena, studied by the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a language 1) Because the main trend until mid-80s was to privilege the processing of critical phenomena, studied by the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a language As a result CL was focusing on: As a result CL was focusing on: few examples - often artificially built lexicons made of few entries (toy lexicons) grammars with poor coverage 2) Because large-scale LRs are costly & their production requires a big organizing effort 2) Because large-scale LRs are costly & their production requires a big organizing effort Why such needed LRs, were lacking after 30 years of R&D in the field? Old slide with Antonio Zampolli (80s/early 90s) Why we still lack them??

3 N. Calzolari3Dottorato, Pisa, Maggio 2009 Early interest: To become machine-tractable To extract info from them – with much less powerful tools than now Precursor of the trend of automatic acquisition from corpora Acquilex (Pisa et al.) Work on/with Longman dictionary (Las Cruces) NSF & EC International Cooperation grant, NSF & EC International Cooperation grant, promoted by Wilks, Zampolli, Calzolari (Las Cruces & Pisa) Don Walker & Antonio Zampolli Work on Machine Readable Dictionaries: The beginnings… After many years of complete disregard – or even disdain and contempt – for LRs, due mainly to the prevalence and influence of the generativist school PioneeringResearch Historical notes

4 N. Calzolari4Dottorato, Pisa, Maggio 2009 … back from the 70s/80s It became evident that: Part of the results of meaning extraction, e.g. many meaning distinctions, which could be generalised over lexicographic definitions and automatically captured, were unmanageable at the formal representation level, and had to be blurred into unique features and values. Unfortunately, it is still today difficult to constrain word-meanings within a rigorously defined organization: by their very nature they tend to evade any strict boundaries Automatic acquisition of lexical information from MRDs Was at the centre of activities in Pisa group, Amsler, Briscoe, Boguraev, Wilks group, IBM, then Japanese groups, … The trend was: large-scale computational methods for the transformation of machine readable dictionaries (MRDs) into machine tractable dictionaries

5 N. Calzolari5Dottorato, Pisa, Maggio 2009 The lexicon has become ever more relevant Both international and national authorities started investing in the field as never before, interested in technologies & systems which are really working and are economically interesting The need of empirical methods, based on the analysis of large amount of data, has been recognized LRs must be robust enough for analysing the concrete uses of a language, either theoretically interesting or not After that pioneering era, production & use of adequate LRs strongly increased Data-driven approaches

6 N. Calzolari6Dottorato, Pisa, Maggio 2009 LRs have acquired larger resonance in the last 2 decades, when many activities, in Europe and world-wide, have contributed to substantial advances in knowledge and capability of how to represent, create, acquire, access, exploit, harmonise, tune, maintain, distribute, etc. large lexical and textual repositories In Europe an essential role was played by the EC, through initiativesNERCPAROLESIMPLEEuroWordNetEAGLESISLEELSNETRELATOR… that saw the participation of many EU groups, linked over the years by sharing common approaches and visions Since then …

7 N. Calzolari7Dottorato, Pisa, Maggio 2009 Automatic acquisition of info from texts: Automatic acquisition of info from texts: This trend has become today a consolidated fact, and we have moved from focusing on acquisition of linguistic information (as at the beginning) to broader acquisition of general knowledge, with more data intensive, robust, reliable methods … back from the late 80s After acquisition from MRDs,

8 N. Calzolari8Dottorato, Pisa, Maggio 2009 LRs give to NLP systems the knowledge needed for the various linguistic processing Realising that most of the needed information escapes individual introspection escapes individual introspection can only be acquired analysing large textual corpora attesting language use in different fields/communicative contexts can only be acquired analysing large textual corpora attesting language use in different fields/communicative contexts need of adequate models BUT need of adequate models to handle actual usage of language LRs as necessary infrastructure (Lexicons/Corpora) both for research & applications: Sub-product?: Importance of statistical methods Sub-product?: Importance of statistical methods Lesson: to large coverage changes to the models Going from core sets to large coverage has implications not just in quantitative terms, but more interestingly in terms of changes to the models and the strategies of processes We started building:

9 N. Calzolari9Dottorato, Pisa, Maggio 2009 What are we (LT& LR) assembling, …. since many years? Lexicons & their Ontologies Lexicons & their Ontologies Written, Spoken, ItalWordNets, PAROLE/SIMPLE, FrameNets, … Written, Spoken, ItalWordNets, PAROLE/SIMPLE, FrameNets, … Annotated corpora/Treebanks Annotated corpora/Treebanks Basic Tools Basic Tools Integrated Architecture for Integrated Architecture for Annotation at various levels (from morph. to conceptual) Annotation at various levels (from morph. to conceptual) Acquisition/learning Acquisition/learning Classification Classification Ontology creation Ontology creation … Methodologies Methodologies Know-how & expertise Know-how & expertise Infrastructural bodies Infrastructural bodies (on which to build) Standards … components of a very large infrastructure of LRs & LT

10 N. Calzolari10Dottorato, Pisa, Maggio 2009 History: Some international LRs initiatives ACQUILEX [ since 88 ] ACQUILEX [ since 88 ] MULTILEX MULTILEX ET-7 ET-7 ET-10 ET-10 TEI TEI NERC NERC RELATOR RELATOR ONOMASTICA ONOMASTICA MULTEXT MULTEXT COLSIT COLSIT LSGRAM LSGRAM DELIS DELIS EAGLES EAGLES PAROLE PAROLE SIMPLE SIMPLE SPARKLE SPARKLE ELSNET ELSNET EuroWordNet EuroWordNet MATE MATE NITE NITE Cluster 488 (Italian) Cluster 488 (Italian) TAL (Italian) TAL (Italian) ISLE ISLE ENABLER ENABLER INTERA INTERA LIRICS LIRICS … Senseval/Semeval Senseval/Semeval WRITE WRITE Forum TAL (Italian) Forum TAL (Italian) … ISO ISO ELRA ELRA LREC LREC LRE Journal LRE Journal NEDO NEDO Language Grid Language Grid BootStrep BootStrep KYOTO KYOTO … Essential role of EC to start a basic Infrastructure EU at the forefront in the areas of LRs and standards in the 90s EU at the forefront in the areas of LRs and standards in the 90s Established a model

11 N. Calzolari11Dottorato, Pisa, Maggio 2009 Today: a broad potential Infrastructure RELATOREAGLES/ISLEENABLERELSNETTELRIINTERALIRICS…ELRABLARK Unified Lexicon (W/S) LREC LRE journal …ERANET-LangNet… LDC & others ISOCOCOSDA/WRITE US Cyberinfrastructure Japan COE21 NEDO Language Grid … EU Internat National ……… Cooperative initiatives – Links to… FLaReNet (ICT) CLARIN (ESFRI) Vitality & Success signs… for LRs

12 N. Calzolari12Dottorato, Pisa, Maggio 2009 {Casa,abitazione,dimora} Hyperonym : Hyperonym : {edificio,..} Hyponym: {villetta } {catapecchia, bicocca,.. } {cottage} {bungalow } Role_location: {stare, abitare,...} Role_target_direction: {rincasare} Role_patient: {affitto, locazione} Mero_part: {vestibolo} {stanza} Holo_part : {casale} {frazione} {caseggiato} {} {home,domicile,..} {} {house} TOP Concepts Object,Artifact,Building TOP Concepts: Object,Artifact,Building WordNets Synsets linked by semantic relations

13 N. Calzolari13Dottorato, Pisa, Maggio 2009ItalWordNet Semantic Network EuroWordNet [Italian module of EuroWordNet] synonym groupssynsets hierarchies ~ lemmas organized in synonym groups (synsets), structured in hierarchies & linked by ~ semantic relations ~ ~ hyperonymy/hyponymy relations ~ relations among different POS (role, cause, derivation, etc..) ~ part-whole relations ~ antonymy relations, …etc. linked to the InterLingual Index Synsets linked to the InterLingual Index (ILI=Princeton WordNet), ILIEuropean WordNets Through the ILI link to all the European WordNets (de-facto standard) Top Ontology & to the common Top Ontology Usable in IR, CLIR, IE, QA,... plug-in withdomain terminological lexicons Possibility of plug-in with domain terminological lexicons … linguistic (legal, maritime, … linguistic)

14 N. Calzolari14Dottorato, Pisa, Maggio 2009 ItalWordNet: Clusters of Base Concepts classified according to Ontology Top Concepts = words = features Lexicon or ontology ???

15 N. Calzolari15Dottorato, Pisa, Maggio stOrderEntityOriginNaturalLivingPlantHumanCreatureAnimalArtifactForm SubstanceSolid LiquidGasObject1CompositionPartGroupFunctionVehicleRepresentationMoneyRepresentationLanguageRepresentationImageRepresentationSoftwarePlaceOccupationInstrumentGarmentFurnitureCoveringContainerComestibleBuilding2ndOrderEntitySituationTypeDynamicBoundedEventUnboundedEventStaticPropertyRelationSituationComponentCauseAgentivePhenomenalStimulatingCommunicationConditionExistenceExperienceLocationMannerMentalModalPhysicalPossessionPurposeQuantitySocialTimeUsage3rdOrderEntityEWNTop-Ontology ItalWordNet

16 N. Calzolari16Dottorato, Pisa, Maggio stOrderEntity Any concrete entity (publicly) perceivable by the senses and located at any point in time, in a three-dimensional space. 2ndOrderEntity Any Static Situation (property, relation) or Dynamic Situation, which cannot be grasped, heart, seen, felt as an independent physical thing. They can be located in time and occur or take place rather than exist; e.g. continue, occur, apply 3rdOrderEntity An unobservable proposition which exists independently of time and space. They can be true or false rather than real. They can be asserted or denied, remembered or forgotten. E.g. idea, thought, information, theory, plan. EWN/IWN Ontology Top nodes

17 N. Calzolari17Dottorato, Pisa, Maggio 2009 EuroWordNet Multilingual Data Structure EnglishEnglish ………… …………

18 N. Calzolari18Dottorato, Pisa, Maggio 2009 Terminological Wordnets: e.g. Jur-WordNet Jur-WordNet Extension for the juridical domain of ItalWordNet Jur-WordNet Extension for the juridical domain of ItalWordNet (With ITTIG-CNR - Istituto di Teoria e Tecniche dellInformazione Giuridica) Knowledge base for multilingual access to sources of legal information Knowledge base for multilingual access to sources of legal information Source of metadata for semantic markup oflegal texts Source of metadata for semantic markup oflegal texts To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc. To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.

19 N. Calzolari19Dottorato, Pisa, Maggio 2009 Terminological Lexicon of Navigation Nolo Nolo Synset Lemmas Senses Nouns Verbs 205 Adjectives 35 Proper Nouns

20 N. Calzolari20Dottorato, Pisa, Maggio 2009 SIMPLE Lexicon & Ontology Multidimensional Type Hierarchy Shared by 12 European languages Shared by 12 European languages Theoretical background: Generative Lexicon (Pustejovsky) Theoretical background: Generative Lexicon (Pustejovsky) 157 language independent SIMPLE semantic types: 157 language independent SIMPLE semantic types: Based on hierarchical & non-hierarch. conceptual relations Based on hierarchical & non-hierarch. conceptual relations Difference of internal complexity: Difference of internal complexity: Simple types (one-dimensional) characterised in terms of hyperonymic relations Simple types (one-dimensional) characterised in terms of hyperonymic relations Unified types (multi-dimensional) only definable through the combination of: Unified types (multi-dimensional) only definable through the combination of: the relation to their supertype + the relation to their supertype + the reference to orthogonal dimensions of meanings (through the Qualia Structure) the reference to orthogonal dimensions of meanings (through the Qualia Structure)

21 N. Calzolari21Dottorato, Pisa, Maggio 2009 PAROLE- SIMPLE-CLIPS Lexicon: …harmonised model for 12 European languages

22 N. Calzolari22Dottorato, Pisa, Maggio 2009 SemU Predicate, arguments, Selection restrictions Pred. Layer QualiaDerivation Polysemy Event Type Instantiation … Italian lexicon Type Ontology 150 types 150 types Template Catalan lexicon Danish lexicon Greek lexicon Overall Organization...

23 N. Calzolari23Dottorato, Pisa, Maggio 2009 Model Architecture The first three levels : Information content Phonological Unit stress position vowel openness cons. prononciation PoS (& PoS subcategory) inflectional paradigm Morphological Unit position list position restr. a. head properties b. subcat. frame Corresp. PhnU-MrphU Corresp. MrphU-SynU Syntactic Unit Synt. Struct Synt. Struct 2 Frameset 1 a. head properties b. subcat. frame syntactic argument syntactic behaviour

24 N. Calzolari24Dottorato, Pisa, Maggio 2009 Semantic Unit arguments: sem. role; sem. restr. lexical predicate Semantic properties Ontological type Domain Event Type Extended Qualia Structure Synonymy Regular Polysemy alt. Derivation Predicative Representation Link to syntactic unit FEATURESFEATURES RELATIONSRELATIONS AMONGSEMUSAMONGSEMUS The semantic level: Information types

25 N. Calzolari25Dottorato, Pisa, Maggio 2009 Aumento (Increase): Semantic type: Cause_change_of_value Gloss: accrescimento in dimensione o quantità Agentivecause: yes Laumento dei prezzi di un venti% Supertype: Cause_relational_change Eventype: transition Domain: general, economics aumento Isa cambiamento aumento resulting_state maggiore Direction: up Morphological derivation: Eventverb aumentare Semantic predicate: PRED_aumentare; 3 arguments Type of link: event nominalization Arguments description: range, semantic role & selectional restriction: Arg0 Protoagent Human / Institution Arg1 ProtoPatient Entity Arg2 Quantifier Amount SEMANTIC ENTRY CONTENT ONTOLOGICAL INFO. EXTENDED QUALIA INFO. PREDICATIVE REPRESENTATION

26 N. Calzolari26Dottorato, Pisa, Maggio 2009 Semantic entry ontological typeevent typedomain informationqualia features Extended Qualia Structure regular polysemypredicative representation semantic type: Instrument unification_path: [Concrete_entity | ArtifactAgentive | Telic] semantic type: Instrument unification_path: [Concrete_entity | ArtifactAgentive | Telic] eventype: ===== cleaning, gardening, cosmetics ===== USem3527vaporizzatore isa Usem3479apparecchio USem3527vaporizzatore has_as_part Usem61633pulsante USem3527vaporizzatore created_by UsemD387fabbricare USem3527vaporizzatore used_for UsemD66019nebulizzare USem3527vaporizzatore isa Usem3479apparecchio USem3527vaporizzatore has_as_part Usem61633pulsante USem3527vaporizzatore created_by UsemD387fabbricare USem3527vaporizzatore used_for UsemD66019nebulizzare regular polysemy: ===== USem3527vaporizzatore free definition apparecchio usato per vaporizzare example un vaporizzatore per piante semantic relations USem3527vaporizzatore synonymy USem72288nebulizzatore USem3527vaporizzatore instrumentverb Usem5239vaporizzare USem3527vaporizzatore synonymy USem72288nebulizzatore USem3527vaporizzatore instrumentverb Usem5239vaporizzare semantic entry: vaporizzatore (spray) semantic type: instrument supertype: artifact semantic type: instrument supertype: artifact eventype: ===== domain: cleaning, gardening, cosmetics semantic class: apparatus domain: cleaning, gardening, cosmetics semantic class: apparatus synonymy: nebulizzatore morpho. derivation: eventverb vaporizzare synonymy: nebulizzatore morpho. derivation: eventverb vaporizzare formal: vaporizzatore isa apparecchio constitutive: vaporizzatore has_as_part pulsante agentive: vaporizzatore created_by fabbricare telic: vaporizzatore used_for atomizzare formal: vaporizzatore isa apparecchio constitutive: vaporizzatore has_as_part pulsante agentive: vaporizzatore created_by fabbricare telic: vaporizzatore used_for atomizzare semantic predicate: PRED_vaporizzare type of link: instrument nominalization arguments description: range semantic role select. restrictions semantic predicate: PRED_vaporizzare type of link: instrument nominalization arguments description: range semantic role select. restrictions arg0 Protoagent Human/Instrument arg0 Protoagent Human/Instrument arg1 Protopatient +liquid arg1 Protopatient +liquid arg2 Location Concrete_entity arg2 Location Concrete_entity regular polysemy: ===== semantic predicate: PRED_vaporizzare-1 type of link: instrument nominalization arguments description: range semantic role select. restrictions semantic predicate: PRED_vaporizzare-1 type of link: instrument nominalization arguments description: range semantic role select. restrictions arg0_vaporizzare_1 Protoagent Human/Instrument arg0_vaporizzare_1 Protoagent Human/Instrument arg1_vaporizzare_1 Protopatient +liquid arg1_vaporizzare_1 Protopatient +liquid arg2_vaporizzare_1 Location Concrete_entity arg2_vaporizzare_1 Location Concrete_entity from Nilda Ruimy

27 N. Calzolari27Dottorato, Pisa, Maggio 2009 Semantic entry ontological type event typedomain informationqualia features Extended Qualia Structure regular polysemypredicative representation semantic type: Cause_change_of_state supertype: Cause_relational_change semantic type: Cause_change_of_state supertype: Cause_relational_change eventype: transition biomedicine agentive_cause: yes resulting_state: yes agentive_cause: yes resulting_state: yes formal: Usem79678regulate isa Usem64875process constitutive: ===== agentive: ===== telic: ===== formal: Usem79678regulate isa Usem64875process constitutive: ===== agentive: ===== telic: ===== regular polysemy: ===== semantic predicate: PRED_regulate-1 type of link: master arguments description: range semantic role select. restrictions semantic predicate: PRED_regulate-1 type of link: master arguments description: range semantic role select. restrictions arg0_regulate_1 Protoagent Natural_Substance arg0_regulate_1 Protoagent Natural_Substance arg1_regulate_1 Protopatient Natural_Substance arg1_regulate_1 Protopatient Natural_Substance USem79678regulate free definition regulation of a function or a physiological process example IL2 negatively regulates IL7 semantic relations synonymy: ===== morpho. derivation: ===== synonymy: ===== morpho. derivation: ===== from Nilda Ruimy

28 N. Calzolari28Dottorato, Pisa, Maggio 2009 Semantic entry ontological typeevent typedomain informationqualia features Extended Qualia Structure regular polysemypredicative representation semantic type: Disease unification_path: [Phenomenon | Agentive] semantic type: Disease unification_path: [Phenomenon | Agentive] eventype: ===== Ear-Nose-Throat agentive_cause: yes USemTH31676parotite isa USem3868malattia USemTH31676parotite affects USem1788ghiandola USemTH31676parotite causes Usem72131gonfiore USemTH31676parotite caused_by USem1971virus USemTH31676parotite typical_of USem3593bambino USemTH31676parotite isa USem3868malattia USemTH31676parotite affects USem1788ghiandola USemTH31676parotite causes Usem72131gonfiore USemTH31676parotite caused_by USem1971virus USemTH31676parotite typical_of USem3593bambino regular polysemy: ===== UsemTH31676parotite free definition Infiammazione delle ghiandole parotidi example il bambino ha una parotite semantic relations USemTH31676parotite synonymy USem79528orecchione semantic entry: vaporizzatore (spray) semantic type: instrument supertype: artifact semantic type: instrument supertype: artifact eventype: ===== domain: cleaning, gardening, cosmetics semantic class: apparatus domain: cleaning, gardening, cosmetics semantic class: apparatus synonymy: nebulizzatore morpho. derivation: eventverb vaporizzare synonymy: nebulizzatore morpho. derivation: eventverb vaporizzare formal: vaporizzatore isa apparecchio constitutive: vaporizzatore has_as_part pulsante agentive: vaporizzatore created_by fabbricare telic: vaporizzatore used_for atomizzare formal: vaporizzatore isa apparecchio constitutive: vaporizzatore has_as_part pulsante agentive: vaporizzatore created_by fabbricare telic: vaporizzatore used_for atomizzare semantic predicate: PRED_vaporizzare type of link: instrument nominalization arguments description: range semantic role select. restrictions semantic predicate: PRED_vaporizzare type of link: instrument nominalization arguments description: range semantic role select. restrictions arg0 Protoagent Human/Instrument arg0 Protoagent Human/Instrument arg1 Protopatient +liquid arg1 Protopatient +liquid arg2 Location Concrete_entity arg2 Location Concrete_entity regular polysemy: ===== ===== from Nilda Ruimy

29 N. Calzolari29Dottorato, Pisa, Maggio 2009 SYNU_regulateV Syntactic entry verb auxiliary: have passivization: + verb auxiliary: have passivization: + P0 : subject mandatory NP P0 : subject mandatory NP head properties subcategorization frame P1 : object mandatory NP P1 : object mandatory NP NF-AT positively regulates IL2, which negatively regulates IL7 USem79678regulate USem79678regulate link to Semantic Unit syntactic arguments from Nilda Ruimy

30 N. Calzolari30Dottorato, Pisa, Maggio 2009 domain semant. class a. head properties b. subcat. frame position synt. restr. syntactic structure 1 ontological type Corresp. SynU-SemU event type semant. features semant. relations Extended Qualia Structure regular polysemy sem. restr. arguments predicate predicative represent. Corresp. Syntax-Semantics type of link Semantic Unit synonymy derivation constitutive role formal role telic role agentive role syntactic structure 2 position synt. restr. Frameset a. head properties b. subcat. frame Syntactic Unit Syntax-semantics mapping (1) from Nilda Ruimy

31 N. Calzolari31Dottorato, Pisa, Maggio 2009 P0 : subject mandatory NP P0 : subject mandatory NP subcategorization frame id: np-v-np P1 : object mandatory NP P1 : object mandatory NP predicative representation semantic predicate: PRED_regulate-1 type of link: master semantic arguments description: range semantic role select. restrictions semantic predicate: PRED_regulate-1 type of link: master semantic arguments description: range semantic role select. restrictions arg0_regulate_1 Protoagent Natural_Substance arg0_regulate_1 Protoagent Natural_Substance arg1_regulate_1 Protopatient Natural_Substance arg1_regulate_1 Protopatient Natural_Substance syntactic arguments Regulate: Syntax-Semantics mapping SYNTAXSYNTAX SEMANTICSSEMANTICS synsem correspondence from Nilda Ruimy

32 N. Calzolari32Dottorato, Pisa, Maggio 2009 PRED_ aumentare_1 ARG0 : Agent Entity ARG1 : Patient Entity ARG2 : Undersc. Amount SynU_aumentare_V Transitive structure P0 P1 P2 Intransitive structure P0 P1 Frameset SYNTACTIC LEVEL SEMANTIC LEVEL SemU2_aumentare Sem.Type: CHANGE_OF_VALUE SemU1_aumentare Sem.Type: CAUSE_CHANGE_OF_VALUE to increase SEMANTIC PREDICATE LINK PREDICATE-SEMANTIC UNIT SYNTAX-SEMANTIC MAPPING from N. Ruimy

33 N. Calzolari33Dottorato, Pisa, Maggio 2009 PRED_ aumentare ARG0 : Agent ARG1 : Patient SynU_aumentare_V Transitive structure P0 P1 P2 Intransitive structure P0 P1 Frameset ARG2 : Undersc. isomorphic correspondence non-isomorphic corresp. SemU1_aumentareSemU2_aumentare CHANGE_OF_VALUECAUSE_CHANGE_OF_VALUE CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME SYNTAX-SEMANTIC MAPPING from N. Ruimy

34 N. Calzolari34Dottorato, Pisa, Maggio 2009 SemU Sell V SemU Sale N SemU Seller N Pred_SELL Pred_SELL,,, Event_noun Relations and Predicates Is_the_agent_of

35 N. Calzolari35Dottorato, Pisa, Maggio 2009 PRED_ACCUSARE,,, accusare accusatore accusa master agent nominalisation process nominalisation accusato patient nominalisation Predicate - semantic unit(s) link & Relations to accuse accusation accusatoraccused from Nilda Ruimy Is_the_agent_of Event_noun

36 N. Calzolari36Dottorato, Pisa, Maggio 2009 The SIMPLE ontology Simple Ontology: multidimensional type hierarchy based on both multidimensional type hierarchy based on both hierarchical and non-hierarchical conceptual relations hierarchical and non-hierarchical conceptual relations from Nilda Ruimy In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information

37 N. Calzolari37Dottorato, Pisa, Maggio 2009 TELIC AGENTIVE CONSTITUTIVEENTITY CONCRETE_ENTITYABSTRACT_ENTITYPROPERTYREPRESENTATIONEVENT CAUSE TOP Location Material Artifact Food Physical Object Organic Object Living Entity Substance PART GROUP AMOUNT Quality Psych Property Physi Property Social Property Domain Time Moral Standards Cognitive Fact Mvmt of Thought Institution Convention Abstract Location Language Sign Information Number Unit of measure Metalanguage Human Animal Vegetal Entity Artifact Material Furniture Clothing Container Artwork Instrument Money Vehicle Semiotic Artifact Aspectual Cause Aspect. Phenomenon Weather verbs Disease Stimuli State Exist Rel. State Act Non Rel. Act Relational Act Move Cause Act Speech Act Psychological_event Cognitive Event Experience Event Change Rel. Change Change Possession Change Location Natural Transition Acquire Knowledge Cause_change Cause Rel. Change Cause Change Location Cause Natural Transition Creation Give Knowledge from Nilda Ruimy The SIMPLE ontology Multidimensionality

38 N. Calzolari38Dottorato, Pisa, Maggio 2009 Ontology of Structured Semantic Types: a Template Schema providing a set of structured information crucial to the definition of a semantic type Interface between ontology & lexicon Guide for the lexicographer

39 N. Calzolari39Dottorato, Pisa, Maggio 2009 Semantic type in the SIMPLE Ontology Not just a label but rather a classificatory device consisting of a cluster of structured semantic information distinguishing it by other senses of the same word expressing its similarity with other words Type assignment means endowing a word-sense with a structured set of semantic features and relations with a view to: expressing its relationships to other words drawing inferences from this information Each semantic type is associated to a template, i.e. a schematic structure that contains a cluster of type-defining properties and imposes constraints on lexical items for type membership Templates: interface between Ontology and Lexicon Template-driven encoding methodology ensures internal and cross-lexicons consistency from Nilda Ruimy

40 N. Calzolari40Dottorato, Pisa, Maggio 2009 ontological information predicative representation extended qualia structure Template for the sem. type Instrument from Nilda Ruimy

41 N. Calzolari41Dottorato, Pisa, Maggio 2009TopFormalConstitutiveAgentive Telic Is_aIs_a_part_ofProperty Contains Created_byAgentive_causeIndirect_telicPurpose InstrumentalIs_the_habit_of Used_forUsed_as... The targets of relations identify: prototypical semantic information associated with a SemU prototypical semantic information associated with a SemU elements of dictionary definitions of SemUs elements of dictionary definitions of SemUs typical corpus collocates of the SemU typical corpus collocates of the SemU 100 Rels. 100 Rels... Activity.... For a BioLexicon

42 N. Calzolari42Dottorato, Pisa, Maggio 2009 Qualia Structure Consists of four qualia roles encoding orthogonal dimensions of meaning : formal role (general identification) constitutive role (composition) agentive role (origin) telic role (function) One of the four levels of semantic representation in the theory of Generative Lexicon

43 N. Calzolari43Dottorato, Pisa, Maggio 2009 isa antonym_comp antonym_grad mult_opposition result_of agentive_prog agentive_cause agentive_experience caused_by source AGENTIVEAGENTIVE ARTIFACTUAL AGENTIVE created_by derived_from made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses CONSTITUTIVECONSTITUTIVE causes concerns affects constitutive_activity contains has_as_colour has_as_effect has_as_property measured_by measures produces produced_by property_of quantifies related_to successor_of precedes typical_of contains feeling PROPERTYPROPERTY is_in lives_in typical_location LOCATION Formal ConstitutiveAgentiveTelic used_for used_as used_by used_against TELIC INSTRUMENTAL DIRECT TELIC indirect_telic purpose object_of_activity is_the_activity_of is_the_ability_of is_the_habit_of ACTIVITY Extended Qualia Structure proiettile, colpire bisturi, chirurgo medico, curare disgusto, provare casa, costruire mohair, capra pane, farina senatore, senato manubrio, bicicletta projectile, hit lancet, surgeon doctor, cure disgust, feel house, build mohair, goat bread, flour senator, senate handlebar, bicycle regulates is_regulated_by …..

44 N. Calzolari44Dottorato, Pisa, Maggio 2009 is_a antonym_comp antonym_grad mult_opposition result_of agentive_prog agentive_cause agentive_experience caused_by source created_by derived_from AGENTIVEAGENTIVE ARTIFACTUAL AGENTIVE CONSTITUTIVECONSTITUTIVE PROPERTYPROPERTY LOCATION Formal ConstitutiveAgentiveTelic used_for used_as used_by used_against TELIC INSTRUMENTAL DIRECT TELIC indirect_telic purpose object_of_activity is_the_activity_of is_the_ability_of is_the_habit_of ACTIVITY regulates is_regulated_by ….. Extended Qualia Structure T-cell, Blood Stem Cell Ribose, Nucleotide Catalyze, Enzyme NEW! made_of is_a_follower_of has_as_member is_a_member_of has_as_part instrument kinship is_a_part_of resulting_state relates uses causes concerns affects constitutive_activity contains has_as_colour has_as_effect has_as_property measured_by measures produces produced_by property_of quantifies related_to successor_of precedes typical_of feeling is_in lives_in typical_location

45 N. Calzolari45Dottorato, Pisa, Maggio 2009 recipiente di legno fatto che serve per la conservazione e il trasporto Formal: isa Constitutive: made_of Agentive: created_by Constitutive: contains Telic: used_for di doghe arcuate tenute unite da cerchi di ferro Constitutive: made_of di liquidi, specialmente vino bottebotte barrel traditional dictionary definition Meaning dimensions expressed by Qualia relations from Nilda Ruimy

46 N. Calzolari46Dottorato, Pisa, Maggio 2009 volare used_for aeroplano part_of uccello part_of edificio part_of Ala SemU: 3232 Type: [Part] Parte di aeroplano SemU: 3268 Type: [Part] Parte di edificio SemU: D358 Type: [Body_part] Organo degli uccelli SemU: 3467 Type: [Role] Ruolo nel gioco del calcio giocatore isa agentive fabbricare agentive squadra member_of …by using Lexical Resources Multidimensional Knowledge Bases Multidimensional Knowledge Bases

47 N. Calzolari47Dottorato, Pisa, Maggio 2009 Semantic Multidimensionality & NLP multidimensional aspects of word meaning NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of word meaning: Extended Qualia Relations Is_a_part_of Member_of Telic Made_of la pagina del libro (the page of the book) il difensore della Juventus (Juventus fullback) il suonatore di liuto (the lute player) il tavolo di legno (the wooden table)

48 N. Calzolari48Dottorato, Pisa, Maggio 2009 duna di sabbia bicchiere di birra fetta di pane made_of is_a_part_of contains ? ? ? Nilda Ruimy ONTOLOGY …….. SUBSTANCE ARTIFACTUAL_DRINK ………. liquid Disambiguation = Interpretation of conceptual relations in context from Nilda Ruimy

49 N. Calzolari49Dottorato, Pisa, Maggio 2009 mangiare Used_for Object_of_th e_ aactivity mangiare mangiare tavola FURNITURE forchetta posata INSTRUMENT ristorante BUILDING cucinare cuocere mestolo pentola CONTAINER mangiare friggere friggitrice bollitore bollire pesce pesciera Is_the_activity_of cuoco PROFESSION cucinare mangiare mangiare mangiare mangiare coniglio carne mela carota arrosto mangiare ARTIFACT _FOOD VEGETABLES FRUIT FOOD SUBSTANCE_FOOD +edible zucchero alloro tartufo VEGETAL_ENTITY FLAVOURING NATURAL_SUBSTANCE AGENTIVE TELIC Created_by cucinare cuocere arrostire bollire lessare stufare friggere rosolare grigliare …… Domain - Semantic class from Nilda Ruimy

50 N. Calzolari50Dottorato, Pisa, Maggio 2009 Noun Compounds/Complex Nominals …are pervasive There is a motivation in most N+N construction : There is a motivation in most N+N construction : the context provides it the context provides it The FrameNet (SIMPLE) way The FrameNet (SIMPLE) way appeal to specific frame structures (qualia structures) associated with the head noun, appeal to specific frame structures (qualia structures) associated with the head noun, determine from corpus attestations which frame elements (qualia) can get instantiated as a modifier word determine from corpus attestations which frame elements (qualia) can get instantiated as a modifier word container: complex nominals can specify: container: complex nominals can specify: material (aluminium c., glass c., …) material (aluminium c., glass c., …) contents (food c., trash c., …) contents (food c., trash c., …) size (3 quart c., …) size (3 quart c., …) function (shipping c., storage c., …) function (shipping c., storage c., …)......

51 N. Calzolari51Dottorato, Pisa, Maggio 2009 Noun Compounds/Complex Nominals & multidimensional semantic approaches a.FrameNet Container Frame Structure : Frame Elements:Container Frame Structure : Frame Elements: Material: aluminum container, glass c., metal c., tin c. Material: aluminum container, glass c., metal c., tin c. Contents: food container, beverage c., trash c., water c., milk c., fuel c. Contents: food container, beverage c., trash c., water c., milk c., fuel c. Size: 3 quart container Size: 3 quart container Function: shipping container, storage c. Function: shipping container, storage c. b. SIMPLE b. SIMPLE Qualia Relations of "container" as Qualia Relations of "container" as used in compounds: Constitutive: made_of [MATERIAL] aluminum container, glass c., metal c., tin c. Constitutive: made_of [MATERIAL] aluminum container, glass c., metal c., tin c. Telic: contains [ENTITY] food container, beverage c., trash c., water c., milk c., fuel c. Telic: contains [ENTITY] food container, beverage c., trash c., water c., milk c., fuel c. Constitutive:size [QUANTITY] 3 quart container Constitutive:size [QUANTITY] 3 quart container Telic:is_used_for [EVENT] shipping container, storage c. Telic:is_used_for [EVENT] shipping container, storage c.

52 N. Calzolari52Dottorato, Pisa, Maggio 2009 E.g. knife (coltello) triggers: a cutting frame (FrameNet) a cutting frame (FrameNet) specific (SIMPLE) dimensions of meaning specific (SIMPLE) dimensions of meaning SIMPLE Extended Qualia structure for the interpretation of the semantic relation betw. Ns (internal relational structure of MWE) butchers knife (coltello da macellaio) TELIC (used_by) Y [Human] PPda plastic knife (coltello di plastica) CONST (made_of) X [Material] PPdi table knife (coltello da tavola) TELIC (used_in) Z [Location] PPda hunting knife (coltello da caccia) TELIC (used_in_activity) E[Activity] Ppda piatto di legno CONST (made_of) X [Material] PPdi piatto di pasta CONST (contains) X [Food] PPdi Complex Nominals PP disambig.

53 N. Calzolari53Dottorato, Pisa, Maggio 2009 Deverbal nominalisation: Deverbal nominalisation: o noun murder (uccisione, delitto, omicidio (different sem. pref.) PPdi PPdi PPda_parte_di, di PPda_parte_di, di o verb murder (uccidere) subj:NP subj:NP obj:NP obj:NP :instr: PPcon [ Weapon ] (knife m., con coltello) :means: PPper [ Action ] (strangulation m., per strangolamento) :loc: Ppploc|di [ Location ] (Kent State murders, nel...) :time: Ppptime|di [ Time ] (1983 murders, del 1983) SIMPLE: possible extension As if it were a Situation PRED : MURDER (uccidere) ARG1 : agent [Hum/Anim?] ARG2 : patient [Hum/Anim?] MOD1 : instr [Weapon] MOD2 : means [Action] MOD3 :... […]

54 N. Calzolari54Dottorato, Pisa, Maggio 2009 Ontologisation of SIMPLE Automatically converting and enriching a computational lexicon into a formal Ontology For NLP semantic tasks Potential of ontologies in NLP as Backbone in LKBs Pivot in multilingual architectures (e.g. KYOTO) Reasoning capabilities Ontologisation of SIMPLE into OWL Conversion of the SIMPLE ontology Bottom-up enrichment: promoting lexicon knowledge to the ontology level Language independent knowledge from Italian lexico-semantic information from Antonio Toral

55 N. Calzolari55Dottorato, Pisa, Maggio 2009 Named Entity Repository Automatically build LRs from existing LRs and Web 2.0 semi- structured resources. Combine: Authoritative lexicographic experience precision Collaborative wisdom of the crowds recall Case study: Multilingual NE repository from LRs (en WN, es WN, it SIMPLE) & Wikipedia NEs linked to three LRs and two ontologies (SUMO, SIMPLE) Interoperable resource: LMF compliant Applied to cross-lingual QA (validate answers): prec. +16,3% from Antonio Toral

56 N. Calzolari56Dottorato, Pisa, Maggio 2009 Different PoS may realise an event: verbs, nouns, adjectives, prep. phrases The SIMPLE Lexicon helps in identifying & classifying Events (eventive nouns & adjectives) in a 10K Words Annotation Experiment each event is associated with an Ontological Type the Event-Type from the SIMPLE-Ontology can be used as default value to provide event composition, and consequently to instantiate a temporal representation for each Event improvement both in identification & classification of Events by annotators: 81.17% accuracy (vs.72.35%) and K-coefficient = 0.84 (vs. 0.7) Morpho-Syntactic Analysis SIMPLE Lexicon Event Detection & Classification Use of SIMPLE Lexicon & Ontology for Time and Event detection/annotation from Tommaso Caselli

57 N. Calzolari57Dottorato, Pisa, Maggio 2009 Mapping SIMPLE Semantic Types to TimeML Classes from Tommaso Caselli

58 N. Calzolari58Dottorato, Pisa, Maggio 2009 GLML – Generative Lexicon Markup Language with James Pustejovsky, Olga Batiukova, Anna Rumshisky, Marc Verhagen Annotating texts with Argument Selection, Argument Coercion, & Qualia Roles The corpus brings reality to the model, provides statistical cues to improve language models Lexical semantic info, like type coercion/selection, required for applications such as WSD, categorisation, IR (query reformulation, filtering…), IE (coreference resolution, relation extraction…), entailment,.. Predicate – Argument constructions Predicate Sense Disambiguation Predicate Sense Disambiguation Argument selection: type selection /coercion Argument selection: type selection /coercion Qualia role/relation selection Qualia role/relation selection Modification constructions Noun Sense Disambiguation Qualia role/relation selection in Adjectival Modification Qualia role/relation selection in Nominal Modification Complex Types Type selection in modification of Dot Objects from Valeria Quochi

59 N. Calzolari59Dottorato, Pisa, Maggio 2009 Using Existing Resources for Italian SIMPLE Lexicon&Ontology/ItalWordNet Sense Disambiguation Sense Disambiguation Type selection /coercion Type selection /coercion Type selection in Dot Objects Type selection in Dot Objects 59 SIMPLE Extended Qualia Structure Selection of Qualia roles/relations., e.g. Constitutive Relations e.g Is_a_part_of, Is_a_member_of Telic Relations e.g. Purpose, Object_of_the_activity Agentive Relations e.g. Source, Result_of from Valeria Quochi

60 N. Calzolari60Dottorato, Pisa, Maggio 2009 Ontology & Lexicon Today we can easily say that ontology learning, i.e. the practical feasibility of supporting knowledge acquisition in a domain, depends on developing automatic methods for acquiring conceptual representations from natural language text Semantic Web initiatives are also focussing on the building of ontological representations from texts, and in this respect show a large amount of conceptual overlap with the notion of a dynamic lexicon Based on various experiences, and as a work strategy for lexical/textual resources We should push towards innovative types of lexicons: a sort of example- based living lexicons that participate of properties of both lexicons and corpora We should push towards innovative types of lexicons: a sort of example- based living lexicons that participate of properties of both lexicons and corpora In such a lexicon redundancy is not a problem, but rather a benefit In such a lexicon redundancy is not a problem, but rather a benefit Lexicon & Corpus

61 N. Calzolari61Dottorato, Pisa, Maggio 2009 Often a gap between advancement in LRs and LT Often a gap between advancement in LRs and LT Either adequate LRs are missing … or there are no systems able to use knowledge intensive LRs effectively Either adequate LRs are missing … or there are no systems able to use knowledge intensive LRs effectively Shortcomings: Shortcomings: lack of usable implementations fully exploiting new types of LRs lack of usable implementations fully exploiting new types of LRs LR claims are not empirically evaluated LR claims are not empirically evaluated BUT… Mismatch between LRs and LT A parallel evolution of R&D for both LRs and LT is needed

62 N. Calzolari62Dottorato, Pisa, Maggio 2009 Phenomena to be represented/What is missing?? from Ed Hovy 1. Bracketing / grouping of predications around entities (basic frame structure) 1. Bracketing / grouping of predications around entities (basic frame structure) 2. Concepts: 2. Concepts: Choice of meaning/sense, with frames in some cases Choice of meaning/sense, with frames in some cases Definition and nature of concept repository / ontology Definition and nature of concept repository / ontology Major high-level concept groupings and classes Major high-level concept groupings and classes 3. Labels on (dependency) arcs (thematic roles, types of attributes, modifiers, etc.) 3. Labels on (dependency) arcs (thematic roles, types of attributes, modifiers, etc.) 4. Coreference (explicit and indirect): 4. Coreference (explicit and indirect): intra-sentential intra-sentential intersentential and cross-documents intersentential and cross-documents 5. Information Structure and Discourse structure: 5. Information Structure and Discourse structure: theme-rheme and topic-focus theme-rheme and topic-focus salience salience coordination coordination nonsemantic inter-clausal relations (RSTs interpersonal ones) nonsemantic inter-clausal relations (RSTs interpersonal ones) etc. etc. done done done?? done??

63 N. Calzolari63Dottorato, Pisa, Maggio 2009 Phenomena to be represented/ What is missing?? Ed Hovy 6. Pragmatics: 6. Pragmatics: Speech Acts Speech Acts Participants and audience modeling Participants and audience modeling Modality: Modality: Epistemic modalities Epistemic modalities Deontic modalities Deontic modalities Personal attitudes Personal attitudes Deixis / reference to external world (or databases) Deixis / reference to external world (or databases) Social register, genre, and style Social register, genre, and style 7. Polarity (including scoping) 7. Polarity (including scoping) 8. Microtheories (many of them to be incorporated elsewhere) 8. Microtheories (many of them to be incorporated elsewhere) Time (Reichenbach) Time (Reichenbach) Space (OWL upper ontology of space, etc.) Space (OWL upper ontology of space, etc.) Cardinality Cardinality Quantification Quantification Manner Manner Degree and comparison Degree and comparison Possession Possession Existentials Existentials Copular constructions Copular constructions Conditionals Conditionals Consequences and inference Consequences and inference Co-text and intertextuality (including formatting and other media) Co-text and intertextuality (including formatting and other media) Meaning of prosody and other speech-related effects Meaning of prosody and other speech-related effects done?? done?? Towards a common encoding policy???

64 N. Calzolari64Dottorato, Pisa, Maggio 2009 Lexicon and Corpus: a multi-faceted interaction L Ctagging L Ctagging C Lfrequencies (of different linguistic objects) C Lfrequencies (of different linguistic objects) C Lproper nouns, acronyms, … C Lproper nouns, acronyms, … L Cparsing, chunking, … L Cparsing, chunking, … C Ltraining of parsers C Ltraining of parsers C Llexicon updating C Llexicon updating C Lcollocational data (MWE, idioms, gram. patterns...) C Lcollocational data (MWE, idioms, gram. patterns...) C Lnuances of meanings & semantic clustering C Lnuances of meanings & semantic clustering C L acquisition of lexical (syntactic/semantic) knowledge C L acquisition of lexical (syntactic/semantic) knowledge L Csemantic tagging/word-sense disambiguation L Csemantic tagging/word-sense disambiguation (e.g. in Senseval) (e.g. in Senseval) C Lmore semantic information on LE C Lmore semantic information on LE C Lcorpus based computational lexicography C Lcorpus based computational lexicography C Lvalidation of lexical models C Lvalidation of lexical models C L… C L… L C... L C...

65 N. Calzolari65Dottorato, Pisa, Maggio 2009 … Dynamic lexicons Current computational lexicons (even WordNets) are static objects, still shaped on traditional dictionaries Current computational lexicons (even WordNets) are static objects, still shaped on traditional dictionaries Towards a flexible model of dynamic lexicon Towards a flexible model of dynamic lexicon extending the expressiveness of a core static lexicon extending the expressiveness of a core static lexicon adapting to the requirements of language in use as attested in corpora adapting to the requirements of language in use as attested in corpora with semantic clustering techniques, etc. with semantic clustering techniques, etc. Convert the extreme flexibility & multidimensionality of meaning into large-scale and exploitable (VIRTUAL?) resources a Lexicon & Corpus together Sort of Example-based Lexicon BUT

66 N. Calzolari66Dottorato, Pisa, Maggio 2009 Verb/Arguments Interaction at the Lexical-Semantic Level Verb meaning determines/selects the sense of its subject and/or direct object e.g. arrestare, both to arrest & to stop, selects direct objects which have themselves, or receive from the verb, a negative connotation DobjSem.type Conn.Feat. DobjSem.type Conn.Feat. o ladro1agent_temp_actneg o spacciatore1agent_temp_actneg o trafficante1agent_temp_actneg o traffico 2actneg o invasione1cause_actneg o massacro1cause_nat_transneg o inflazione1eventneg o pregiudicato1humanneg o balordo1human neg o maniaco1human neg o strozzino 1agent_temp_actneg

67 N. Calzolari67Dottorato, Pisa, Maggio 2009 Complexity of Word Sense in context: many potential clues A particular meaning (of a verb) may be selected by: A specific syntactic pattern A specific syntactic pattern comprendere + that-clause = to understand [not = to include] comprendere + that-clause = to understand [not = to include] aprire + PP introduced by a (preferably with human head) = to be ready, open, well disposed towards someone (e.g. Cossiga apre a La Malfa) aprire + PP introduced by a (preferably with human head) = to be ready, open, well disposed towards someone (e.g. Cossiga apre a La Malfa) The semantic type of subjects, dir objects, ind. objects The semantic type of subjects, dir objects, ind. objects human subject (if not collective type) always selects the meaning to understand of the verb comprendere human subject (if not collective type) always selects the meaning to understand of the verb comprendere The domain of use The domain of use perseguire un reato to prosecute a crime (domain=law) perseguire un reato to prosecute a crime (domain=law) A specific modifier A specific modifier perseguire penalmente to prosecute at the penal level, not to pursue (a goal) perseguire penalmente to prosecute at the penal level, not to pursue (a goal) comprendere benissimo to understand very well, not to include comprendere benissimo to understand very well, not to include Two different senses of a lemma cannot be selected simultaneously in the same context BUT… BUT…

68 N. Calzolari68Dottorato, Pisa, Maggio 2009 Complexity of Word Sense identification The problem: not sure tests not sure tests only partial validity & not completely discriminating only partial validity & not completely discriminating Moreover, its not easy to predict when to apply which test Moreover, its not easy to predict when to apply which test Word Sense Disambiguation (WSD) in different contexts is better achieved using info types at different levels of linguistic description: in different contexts is better achieved using info types at different levels of linguistic description: morphosyntactic/syntactic/semantic/pragmatic…, even multilingual morphosyntactic/syntactic/semantic/pragmatic…, even multilingual BUT a-priori unpredictable where is the clue BUT a-priori unpredictable where is the clue

69 N. Calzolari69Dottorato, Pisa, Maggio 2009 Complexity of Word Sense & use of Corpora The availability of large quantities of semantically tagged corpora helps to The availability of large quantities of semantically tagged corpora helps to analyse the impact of different clues to perform WSD in different contexts analyse the impact of different clues to perform WSD in different contexts study the interaction of clues belonging to different levels of linguistic description, to improve WSD strategies study the interaction of clues belonging to different levels of linguistic description, to improve WSD strategies not just statistics!! not just statistics!! Automatically acquire syntactic, semantic, collocational (lexical) indicators which can help in the identification of a word-sense which can help in the identification of a word-sense List them in the lexicon?? List them in the lexicon??

70 N. Calzolari70Dottorato, Pisa, Maggio 2009 Problem of regular polysemy … and more BUT… actual occurrence of two senses in the same context… actual occurrence of two senses in the same context… e.g. both act & result (for deverbal nouns, etc.) e.g. both act & result (for deverbal nouns, etc.) In una comunicazione al Parlamento la Commissione ha illustrato le sue riflessioni su … In una comunicazione al Parlamento la Commissione ha illustrato le sue riflessioni su … Berlusconi dovrà scegliere se fare luomo di governo o mantenere il controllo delle sue tv Berlusconi dovrà scegliere se fare luomo di governo o mantenere il controllo delle sue tv Underspecified meanings? Underspecified meanings? maybe subsuming more granular distictions, to be used only when disambiguation is feasible/useful in a context maybe subsuming more granular distictions, to be used only when disambiguation is feasible/useful in a context Theoretical language, invented by lexicographers/linguists who have/want to classify in disjoint classes, vs. actual usage a continuum actual usage a continuum resistant to clear-cut disjunctions resistant to clear-cut disjunctions by necessity ambiguous wrt imposed classifications by necessity ambiguous wrt imposed classifications

71 N. Calzolari71Dottorato, Pisa, Maggio 2009 … what cannot be easily encoded at the Lexical-Semantic Level In a Senseval framework … When sense interpretation requires appeal to extra-linguistic knowledge ( not to be captured at the lexical-semantic level of description) When corpus annotation either diverges from the lexical resource or further specifies it words acquiring a specific sense, strictly dependent on the context words acquiring a specific sense, strictly dependent on the context la donna Pauline Collins, che ha già visto arrestare il marito dai tedeschi,… variety of nuances of a verb, e.g. according to co-occurring dir.obj. sem-type variety of nuances of a verb, e.g. according to co-occurring dir.obj. sem-type metaphors extended to an entire sentence metaphors extended to an entire sentence lauto verde arriva sul tavolo del governo (lit. the green car arrives on the table of the government) Not all these shifts of meanings can/must be captured through lexical-semantic annotation e.g.

72 N. Calzolari72Dottorato, Pisa, Maggio 2009 Wrt Senseval jargon, neologisms, evaluative suffixation, titles, … vetturetta vetturetta minitaxi minitaxi fumantino (agg. una persona fumantina) fumantino (agg. una persona fumantina) komeinista komeinista … Primula rossa (= boss mafioso) Scarpa d'oro (= un bravo giocatore) … Not in any lexicon… a semantic type easier to assign than a word-sense in a lexicon

73 N. Calzolari73Dottorato, Pisa, Maggio 2009 Compounds and idioms uscire di scena uscire di scena farla franca farla franca fare fuoco fare fuoco andare in onda andare in onda … fare [in tempo] fare [in tempo] andare [a piedi] andare [a piedi] essere [in testa] essere [in testa] (= essere il primo) vincere [per un soffio] vincere [per un soffio] partire [a razzo] partire [a razzo] Croce Rossa Caschi Blu conflitto a fuoco atletica leggera famiglia bene un bagno di folla … Where is the boundary of the MWE? Where is the boundary of the MWE? "andare_a_piedi" vs. andare (Pos V) a_piedi (Pos Adv.loc).? "andare_a_piedi" vs. andare (Pos V) a_piedi (Pos Adv.loc).?

74 N. Calzolari74Dottorato, Pisa, Maggio 2009 Locutions and Figurative usages per carità per carità in questione in questione per caso per caso in lizza in lizza a volontà a volontà a buon mercato a buon mercato … ci mancherebbe! ci mancherebbe! c'è mancato poco c'è mancato poco … due lavoratori su tre sono a casa (= essere disoccupato) [the collocation with lavoratori disambiguates the expression] uomo [di polso] zona medaglia d'oro (= tra i primi) a cielo aperto (discarica a..) la bella vita (fare …) … If annotation of individual components, loss of the semantic contribution of the MWE If annotation of individual components, loss of the semantic contribution of the MWE acquistare un oggetto a buon (Pos A) mercato (Pos S) !! acquistare un oggetto a buon (Pos A) mercato (Pos S) !!

75 N. Calzolari75Dottorato, Pisa, Maggio 2009 Usual issues: Is there a fixed set of senses? or Do senses exist as separate objects? Criteria for sense distinction very application-dependent Criteria for sense distinction very application-dependent greater vs. lesser granularity depend on the task/ domain/situation/etc. greater vs. lesser granularity depend on the task/ domain/situation/etc. i.e. the communication purpose i.e. the communication purpose & there is no inherently true (upper or lower) limit to the granularity... Impossible a checklist theory of meaning: meaning as a piece of information with an autonomous status independent of its use Impossible a checklist theory of meaning: meaning as a piece of information with an autonomous status independent of its use Computational resources should provide multi-dimensional information multi-dimensional information the highest expressiveness in terms of sense-discriminating power the highest expressiveness in terms of sense-discriminating power contextual information contextual information Are we dealing with semantic annotation in the right way??

76 N. Calzolari76Dottorato, Pisa, Maggio 2009 Divergences betw. Lexicon encoding & Corpus annotation In the lexicon senses are de-contextualized (a necessity to capture generalizations) sense discrimination must be kept under control clustering (manually or automatically) In the corpus sense annotation task contextualization plays a predominant role calls for a range of pragmatic issues corpus analysis per se would lead to excessive granularity of sense distinctions Capture just the core basic distinctions in a core lexicon & Acquire additional, more granular info (usu. of collocational nature) from corpora to be encoded within the broader senses, e.g. to help translation not yet solved

77 N. Calzolari77Dottorato, Pisa, Maggio 2009 Between LRs and Linguistics: A consequence of the corpus-based approach is Compels to break hypotheses too easily taken for granted in mainstream linguistics Compels to break hypotheses too easily taken for granted in mainstream linguistics In actual usage a characteristics of language is to display many properties which behave as a continuum, not as yes/no properties In actual usage a characteristics of language is to display many properties which behave as a continuum, not as yes/no properties The same holds true for so-called rules: we find more frequently tendencies towards a rule than precise rules The same holds true for so-called rules: we find more frequently tendencies towards a rule than precise rules Many of the theoretical rules appear to be simplifications or idealisations in fact dispelled by real usage Many of the theoretical rules appear to be simplifications or idealisations in fact dispelled by real usage A number of dichotomies must then be reconciled A number of dichotomies must then be reconciled Lesson learned : [IN-]Adequacy of Lexical resources A long way to be able to recognise & integrate the many dimensions relevant to content interpretation

78 N. Calzolari78Dottorato, Pisa, Maggio 2009 A number of dichotomies not as opposite views, but as complementary perspectives èLanguage as a continuum: rules vs. tendencies rules vs. tendencies absolute constraints vs. preferences absolute constraints vs. preferences discreteness vs. continuum/gradedness discreteness vs. continuum/gradedness theoretical/potential vs. actual theoretical/potential vs. actual intuition/introspection vs. empirical evidence intuition/introspection vs. empirical evidence theory-driven vs. data-driven theory-driven vs. data-driven symbolic vs. statistical symbolic vs. statistical the right part must be highlighted, then to combine the two Choices on the syntagmatic axis are pervasive Lexicon & Corpus must converge


Download ppt "N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche."

Similar presentations


Ads by Google