Download presentation
Presentation is loading. Please wait.
Published byJasmine Stowers Modified over 9 years ago
1
Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya
2
Acquisition of Lexical Knowledge for NLP2 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora
3
Acquisition of Lexical Knowledge for NLP3 Acquisition of Lexical Knowledge for NLP Setting n NLP and the Lexicon –Theoretical: WG, GPSG, HPSG. –Practical: realistic complexity and coverage n Lexical Bottleneck (Briscoe 91) –Even worse for languages other than English
4
Acquisition of Lexical Knowledge for NLP4 Acquisition of Lexical Knowledge for NLP Setting n Which LK is needed by a concrete NLP system? n Where is this LK located? n Which procedures can be applied?
5
Acquisition of Lexical Knowledge for NLP5 Acquisition of Lexical Knowledge for NLP Setting n Which LK is needed by a concrete NLP system? –Phonology: phonemes, stress, etc. –Morphology: POS, etc. –Syntactic:category, subcat., etc. –Semantic:class, SRs, etc. –Pragmatic:usage, registers, TDs, etc. –Translations:translation links
6
Acquisition of Lexical Knowledge for NLP6 Acquisition of Lexical Knowledge for NLP Setting n Where is this LK located? –Human brain –Structured Lexical Resources: n Monolingual and bilingual MRDs n Thesauri –Unstructured Lexical Resources: n Monolingual and bilingual Corpora –Mixing resources
7
Acquisition of Lexical Knowledge for NLP7 Acquisition of Lexical Knowledge for NLP Setting n Which procedures can be applied? –Prescriptive approach n Machine-aided manual construction –Descriptive approach n Automatic acquisition from pre-existing Lexical Resources –Mixed approach
8
Acquisition of Lexical Knowledge for NLP8 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora
9
Acquisition of Lexical Knowledge for NLP9 Words and Works Where is this Lexical Knowledge located? –Human brain: n Linguistic String Project (Fox et al. 88) –Lexical Information for 10,000 entries n WordNet (Miller et al. 90) –Semantic Information v1.6 with 99,642 synsets n Comlex (Grishman et al. 94) –Syntactic information 38,000 English words n CYC Ontology (Lenat 95) –a person-century of effort to produce 100,000 terms n LDOCE3-NLP –dictionary with 80,000 senses
10
Acquisition of Lexical Knowledge for NLP10 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Monolingual MRDs: –LDOCE n learner’s dictionary n 35,956 entries and 76,059 definitions n 86% semantic and 44% pragmatic codes n controlled vocabulary of 2,000 words n (Boguraev & Briscoe 89) n (Vossen & Serail 90) n (Bruce & Guthrie 92), (Wilks et al. 93) n (Dolan et al. 93), (Richardson 97)
11
Acquisition of Lexical Knowledge for NLP11 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Other Monolingual MRDs: –Webster’s (Jensen & Ravin 87) –LPPL (Artola 93) –DGILE (Castellón 93), (Taulé 95), (Rigau 98) –CIDE (Harley & Glennon 97) –AHD (Richardson 97) –WordNet (Harabagiu 98) n Bilingual MRDs –Collins Spanish/English (Knigth & Luk 94) –Vox/Harrap’s Spanish/English (Rigau 98)
12
Acquisition of Lexical Knowledge for NLP12 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Thesauri: –Roget’s Thesaurus n 60,071 words in 1,000 categories n (Yarowsky 92), (Grefenstette 93), (Resnik 95) –Roget’s II and The New Collins Thesaurus n (Byrd 89) –Macquarie’s thesaurus n (Grefenstette 93) –Bunrui Goi Hyou Japanese thesaurus n (Utsuro et al. 93)
13
Acquisition of Lexical Knowledge for NLP13 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Encyclopaedia –Grolier’s Encyclopaedia (Yarowsky 92) –Encarta (Richardson et al. 98) n Others –Telephonic Guides n Mixing structured lexical resources –Roget’s Thesaurus and Grolier’s (Yarowsky 92) –LDOCE, WN, Collins, ONTOS, UM (Knight & Luk 94) –Japanese MRD to WN (Okumura & Hovy 94) –LLOCE, LDOCE (Chen & Chang 98)
14
Acquisition of Lexical Knowledge for NLP14 Words and Works Where is this Lexical Knowledge located? –Unstructured Lexical Resources n Corpora: –WSJ, Brown Corpus (SemCor), Hansard –Proper Nouns (Hearst & Schütze 95) –Idiosyncratic Collocations (Church et al. 91) –Preposition preferences (Resnik and Hearst 93) –Subcategorization structures (Briscoe and Carroll 97) –Selectional restrictions (Resnik 93), (Ribas 95) –Thematic structure (Basili et al. 92) –Word semantic classes (Dagan et al. 94) –Bilingual Lexicons for MT (Fung 95)
15
Acquisition of Lexical Knowledge for NLP15 Words and Works Where is this Lexical Knowledge located? –Mixing structured and non-structured Lexical Resources n MRDs and Corpora –(Liddy & Paik 92) –(Klavans & Tzoukermann 96) n WordNet and Corpora –(Resnik 93), (Ribas 95), (Li & Abe 95), (McCarthy 01) –(Mihalcea & Moldovan 99)
16
Acquisition of Lexical Knowledge for NLP16 Words and Works Lexical Acquisition from MRDs –Syntactic Disambiguation (Dolan et al. 93) –Semantic Processing (Vanderwende 95) –WSD (Lesk 86), (Wilks & Stevenson 97), (Rigau 98) –IR (Krovetz & Croft 92) –MT (Knight and Luk 94), (Tanaka & Umemura 94) –Semantically enriching MRDs n (Yarowsky 92), (Knight 93), (Chen & Chan 98) –Building LKBs n (Bruce & Guthrie 92) n (Dolan et al. 93) n (Artola 93) n (Castellón 93), (Taulé 95), (Rigau 98)
17
Acquisition of Lexical Knowledge for NLP17 Words and Works International Projects on Lexical Acquisition –Japanese Projects n EDR (Yokoi 95) –Nine years project oriented to MT –Bilingual Corpora with 250,000 words –Monolingual, bilingual and coocurrence dictionaries –200,000 general vocabulary –100,000 technical terminology –400,000 concepts
18
Acquisition of Lexical Knowledge for NLP18 Words and Works International Projects on Lexical Acquisition –American Projects n Comlex (Grishman et al. 94) –Syntactic information for 38,000 words n WordNet (Miller 90) –Semantic information –more than 123,000 words organised in 99,000 synsets –more than 116,000 relations between synsets n Pangloss (Knight & Luk 94) –PUM, ONTOS, LDOCE semantic categories, WordNet n Cyc (Lenat 95) –common-sense knowledge –100,000 concepts and 1,000,000 axioms
19
Acquisition of Lexical Knowledge for NLP19 Words and Works International Projects on Lexical Acquisition –European Projects n Acquilex I and II –LA from monolingual and bilingual MRDs and corpora n LE-Parole –Large-scale harmonised set of corpora and lexicons for all the EU languages n EuroWordNet –To develop a multilingual WordNet for several European Languages
20
Acquisition of Lexical Knowledge for NLP20 Acquisition of Lexical Knowledge for NLP Setting n Acquilex I n Acquilex II n EuroWordNet
21
Acquisition of Lexical Knowledge for NLP21 Words and Works Acquilex n Lexical Knowledge Acquisition n Mixed approach n Dictionaries (MRD -> MTD -> LDB -> LKB) n Partners –Cambridge University –Instituto di Linguistica Computazional de Pisa –Amsterdam University –Dublin University n 30 months n Thesis –(Castellón 1993) –(Taulé 1995) –(Rigau 1998)
22
Acquisition of Lexical Knowledge for NLP22 Words and Works Acquilex II n Lexical Knowledge Acquisition n Mixed approach n Corpora n Partners –Cambridge University –Instituto di Linguistica Computazional de Pisa –Amsterdam University n 30 months n Thesis –[Ribas 1995] (Acquisition of Selectional Restrictions) –[Ageno...] (Robust Parsing) –[Padró 1998] (Relaxation labelling) –[Màrquez 1999] (Desition Trees)
23
Acquisition of Lexical Knowledge for NLP23 Words and Works EuroWordNet n Multilingual WordNet n Partners –English, Spanish, Dutch, Italian –(and French, German, Txec, Estonian) n 25.000 noun synsets and 5.000 verbal synsets n 30 months n Thesis –[Farreres...] (Mapping of Bilingual dictionaries) –[Daudé...] (Mapping of hierarchies)
24
Acquisition of Lexical Knowledge for NLP24 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora
25
Acquisition of Lexical Knowledge for NLP25 Structured Sources Acquisition of LK from MRDs n Focusing on: –the massive acquisition of LK –from MRDs (conventional, in any language) –using automatic methodologies n Why MRDs? The conventional dictionaries for human use usually “contain spelling, pronunciation, hyphenation, capitalization, usage notes for semantic domains, geographic regions, and propiety; ethimological, syntactic and semantic information about the most basic units of the language” (Amsler 81)
26
Acquisition of Lexical Knowledge for NLP26 Structured Sources Dictionaries n LDOCE (Longman Dictionary of Contemporary English) n DGILE (Diccionario General Ilustrado de la Lengua Española) n DGLC (Diccionari General de la Llengua Catalana) n DVHE (Diccionari Vox-Harrap’s Esencial)
27
Acquisition of Lexical Knowledge for NLP27 Structured Sources Dictionaries: LDOCE n Higly coded, restricted vocabulary n 76.059 senses in 30.373 entries –LDOCE id, POS, Grammatical Code, Idiom, Pragmatic Code, –Semantic Code (subject-preference), object-reference, –indirect-object-preference, definition. |cheese_0_1| <> <> <> |cheese_0_1| <> <> <> |cheese_0_2| <> <> <> |cheese_0_2| <> <> <> |cheese_0_3| <> <> <> |cheese_0_3| <> <> <>
28
Acquisition of Lexical Knowledge for NLP28 Structured Sources Dictionaries: DGILE n Poorly coded, no restricted vocabulary n 157.843 senses in 89.043 entries n 1.4 million words in definitions and examples ((queso ) (ETIM l. caseu ) (Sense 1) (CA m.) (DEF Masa que se obtiene cuajando la leche, exprimiéndola para que deje suero y echándole sal para que se conserve: ~ de Gruyre; ~ de Roquefort; ~ de bola, el de tipo holandés, de forma esférica; ~ de hierba, el que se hace cuajando la leche con hierba a propósito; ~ manchego, el de pasta compacta, algo dura, crudo, de leche de oveja.) (Sense 2) (CA m.) (DEF ~ de cerdo, manjar hecho con carne de cerdo o jabalí, picada y prensada.) (Sense 3)(CA m.)(DEF ~ helado, helado compacto hecho en molde.) (Sense 4)(CA m.)(DEF Medio ~, tablero grueso, semicircular, que usan los sastres para planchar cuellos y solapas y para sentar costuras curvas.) (Sense 5)(CA m.)(REG fam.)(DEF Pie.) (Sense 6)(CA m.)(GEO Venez.)(DEF ~ frito, estafa.) (RELA 1)(TIPOR Rel.)(TXR Del l. caseu derivan numerosos tecn. como caseína, cáseo, caseificar, caseico, caseoso.))
29
Acquisition of Lexical Knowledge for NLP29 Structured Sources Dictionaries: DGLC (Fabra) n Poorly coded, no restricted vocabulary n 89.360 senses in 51.135 entries ((formatge)(CC m.) (NS 1 > 1 > 1 > 0 > 0 > 0)(CG m.)(DF Massa alimentosa que s’obté coagulant la llet, esprement-ne el xerigot i consolidant la part presa.) (NS 2 > 1 > 0 > 1 > 0 > 0)(CG m.)(EX Formatge de Ma.) (NS 3 > 1 > 0 > 2 > 0 > 0)(CG m.)(EX Formatge fresc, salat.) (NS 4 > 1 > 0 > 3 > 0 > 0)(CG m.)(EX Ratllar formatge.) (NS 5 > 1 > 0 > 4 > 0 > 0)(CG m.)(EX Un formatge.) (FI f4) )
30
Acquisition of Lexical Knowledge for NLP30 –Morphological Information n POS (n, v, adj, adv, etc.) n Derivative forms n Composed forms n Derivative Model (verbs) –Sintactic Information n Idioms n Implicit Knowledge barrer_1_1 limpiar (el suelo) con la escoba. freír_1_1cocer (un manjar) en aceite o grasa hirviendo. comprar_1_1adquirir (una cosa) a cambio de cierta cantidad de dinero. cazar_1_1buscar o perseguir (a las aves, fieras, etc.) para cogerlas o matarlas. Structured Sources Acquilex
31
Acquisition of Lexical Knowledge for NLP31 n Explicit (LDOCE: Pragmatic Codes, Semantic Code, etc.;DGILE: Tema, sinonims, antonims, sentits figutats, etc.) n Implicit jardín_1_1Terreno donde se cultivan plantas y flores ornamentales. jardín_1_1Terreno donde se cultivan plantas y flores ornamentales. florero_1_4 Maceta con flores. ramo_1_3 Conjunto natural o artificial de flores, ramas o hierbas. pétalo_1_1Hoja que forma la corola de la flor. tálamo_1_3Receptáculo de la flor. miel_1_1 Substancia viscosa y muy dulce que elaboran las abejas, en una distensión del esófago, con el jugo de las flores y luego depositan en las celdillas de sus panales. florería_1_1 Floristería; tienda o puesto donde se venden flores. florista_1_1 Persona que tiene por oficio hacer o vender flores. camelia_1_1 Arbusto cameliáceo de jardín, originario de Oriente, de hojas perennes y lustrosas, y flores grandes, blancas, rojas o rosadas (Camellia japonica). camelia_1_2 Flor de este arbusto. camelia_1_2 Flor de este arbusto. rosa_1_1 Flor del rosal. rosa_1_1 Flor del rosal. Structured Sources Semantic Information
32
Acquisition of Lexical Knowledge for NLP32 Structured Sources Main Problems –Conventional dictionaries are not systematic –Dictionaries are built for human use –Implicit Knowledge n words are described/translated in terms of words
33
Acquisition of Lexical Knowledge for NLP33 Structured Sources SEISD (Rigau 98) n The System –General Frame –Methodology –SEISD –Application of the methodology
34
Acquisition of Lexical Knowledge for NLP34 SEISD General Frame –Characteristics of the Lexical Resources used –Lexical Knowledge to be extracted –Lexical Knowledge Representation –The acquisition process
35
Acquisition of Lexical Knowledge for NLP35 SEISD General Frame –Characteristics of the Lexical Resources used n DGILE n Spanish/English bilingual Dictionaries n WordNet n Type System of the LKB
36
Acquisition of Lexical Knowledge for NLP36 –Characteristics of the Lexical Resources used n DGILE –89,043 entries and 157,842 senses –1.4 million words in definitions and examples –neither semantic nor pragmatic codes –no restricted vocabulary vino (l. vinu) m. Zumo de uvas fermentado;... 2 fig. Bautizar o cristianizar, el ~, echarle agua. 3 fig. Dormir uno el ~, dormir mientras dura la borrachera; tener uno mal ~, ser pendenciero en la embriaguez. 4 p.ext. Zumo. | HOMOF.: vino (v.), bino (v.). REL. Enológico, enólogo, enotecnia, derivados de enología, ciencia de la vinicultura, formada del gr. oinos. SEISD General Frame
37
Acquisition of Lexical Knowledge for NLP37 –Characteristics of the Lexical Resources used n Spanish/English bilingual Dictionaries –EEI: 16,463 entries with 28,002 translation fields –EIE: 15,352 entries with 27,033 translation fields vino m wine. ~ de Jerez, sherry; ~ tinto, red wine. wine n vino SEISD General Frame
38
Acquisition of Lexical Knowledge for NLP38 SEISD General Frame –Characteristics of the Lexical Resources used n WordNet –v1.6 has 123,497 content words and 99,642 synsets Sense 1 wine, vino -- (fermented juice (of grapes especially)) => sake, saki -- (Japanese beverage from fermented rice...) => sake, saki -- (Japanese beverage from fermented rice...) => vintage -- (a season's yield of wine from a vineyard) => vintage -- (a season's yield of wine from a vineyard) => red wine -- (wine having a red color derived from skins...) => red wine -- (wine having a red color derived from skins...) => Pinot noir -- (dry red California table wine...) => Pinot noir -- (dry red California table wine...) => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine) => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine) => Saint Emilion -- (full-bodied red wine from...) => Saint Emilion -- (full-bodied red wine from...) => Chianti -- (dry red Italian table wine from the Chianti...) => Chianti -- (dry red Italian table wine from the Chianti...) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine) => Rioja -- (dry red table wine from the Rioja...) => Rioja -- (dry red table wine from the Rioja...) => zinfandel -- (dry fruity red wine from California) => zinfandel -- (dry fruity red wine from California)
39
Acquisition of Lexical Knowledge for NLP39 SEISD General Frame –Characteristics of the Lexical Resources used n Type System of the LKB (Copestake 92) –527 types with 196 features
40
Acquisition of Lexical Knowledge for NLP40 SEISD General Frame –Lexical Knowledge to be extracted n Explicit information (POS, TD, uses, etc.) n Implicit information –Hypernym/hyponym relations (class/subclass) –Synonymy/Antonymy relations –Meronym/Holonym relation (part/whole,...) –Case role relations (agentive, telic,...) –Content relations (qualia, form, constitutive,...) –Collocational relations (compounds, idioms,...) –Selectional restrictions (typical subject, object,...) –Translation Equivalences
41
Acquisition of Lexical Knowledge for NLP41 SEISD General Frame –Lexical Knowledge Representation n LKB (Copestake 92) –represent both syntactic and semantic information –Type Feature Structure formalism (Carpenter 92) –default inheritance –lexical and phrasal rules –multilingual relations
42
Acquisition of Lexical Knowledge for NLP42 Structured Sources SEISD n The System –General Frame –Methodology –SEISD –Application of the methodology
43
Acquisition of Lexical Knowledge for NLP43 SEISD Methodology MRD1 MRDn LDB1Tax1 LDBn Taxn MLKB... LKB1 LKBn...
44
Acquisition of Lexical Knowledge for NLP44 SEISD Methodology –Problems following a pure descriptive approach n Circularity n Errors and inconsistencies n Definitions with omitted genus n Top dictionary senses do not usually represent useful knowledge for the LKB –Too general –Too specific
45
Acquisition of Lexical Knowledge for NLP45 SEISD Methodology Mixed Methodology Prescriptive approach Manual construction of the Type System Manual construction of the Type System
46
Acquisition of Lexical Knowledge for NLP46 SEISD Methodology Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Type System Manual construction of the Type System
47
Acquisition of Lexical Knowledge for NLP47 SEISD Methodology Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Type System Manual construction of the Type System
48
Acquisition of Lexical Knowledge for NLP48 SEISD Methodology –Step 1: Selection of the main top beginners for a semantic primitive –Step 2: Exploiting genus, construction of taxonomies –Step 3: Exploiting differentia –Step 4: Mapping the LK into the LKB –Step 5: Tlinks Generation –Step 6: Validation and exploitation of the LKB
49
Acquisition of Lexical Knowledge for NLP49 Structured Sources SEISD n The System –General Frame –Methodology –SEISD –Application of the methodology
50
Acquisition of Lexical Knowledge for NLP50 Structured Sources SEISD –SEISD: Sistema d’Extracció d’Informació Semàntica de Diccionaris (Ageno et al. 92) n designed to support the main methodology n taking into account the characteristics of the Lexical resources used n reusability of software and lexical resources n allowing modular improvements n minimal effort
51
Acquisition of Lexical Knowledge for NLP51SEISD User LKB System PRE SemBuild TaxBuild CRS TGE LDB/LKB system LDB System Linguistic Knowledge SegWord and FPar MACO+, Relax and SinPar Lexical Knowledge User LDB DGILE English/Spanish Spanish/English MTDs WordNet Taxonomies LKB Type System Lexicons LDB/LKB Lexicons
52
Acquisition of Lexical Knowledge for NLP52 Structured Sources SEISD (Rigau 98) n The System –General Frame –Methodology –SEISD –Application of the methodology
53
Acquisition of Lexical Knowledge for NLP53 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive (Rigau et al. 98) Word sense: zumo_1_1 Attached-to:c_art_subst type. Definition:líquido que se extrae de las flores, hierbas, frutos, etc. (liquid extracted from flowers, herbs, fruits, etc). fruits, etc).
54
Acquisition of Lexical Knowledge for NLP54 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A) Attaching DGILE senses to semantic primitives n 1) First labelling: –Conceptual Distance (Rigau 94) n 2) Second labelling: –Salient Words (Yarowsky 92) –B) Filtering Process
55
Acquisition of Lexical Knowledge for NLP55 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.1) First labelling: n Conceptual Distance (Agirre et al. 94) –length of the shortest path –specificity of the concepts n using WordNet n Bilingual dictionary
56
Acquisition of Lexical Knowledge for NLP56 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive <entity> abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) <monastery><abbey> <convent><abbey>
57
Acquisition of Lexical Knowledge for NLP57 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive <entity> 06 ARTIFACT 06 ARTIFACT abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) <monastery><abbey> <convent><abbey>
58
Acquisition of Lexical Knowledge for NLP58 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.1) First labelling (Results) n 29,205 labelled definitions (31%) n 61% accuracy at a sense level n 64% accuracy at a file level
59
Acquisition of Lexical Knowledge for NLP59 –A.2) Second labelling: n Salient Words (Yarowsky 92) SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive n Importance –local frequency –appears more significantly more often in the corpus of a semantic category than at other points in the whole corpus
60
Acquisition of Lexical Knowledge for NLP60 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.2) Second labelling (Results): n 86,759 labelled definitions (93%) n 80% accuracy at a file level biberón_1_1 ARTIFACT 4.8399 Frasco de cristal... (glass flask...) biberón_1_2 FOOD 7.4443 Leche que contiene este frasco... (milk contained in that flask...)
61
Acquisition of Lexical Knowledge for NLP61 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –B) Filtering process (FOODs) n removes all genus terms –FILTER 1: not FOODs by the bilingual mapping –FILTER 2: appear more often as genus in other SC –FILTER 3: with a low frequency
62
Acquisition of Lexical Knowledge for NLP62 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –B) Filtering process (FOOD Results)
63
Acquisition of Lexical Knowledge for NLP63 Word sense:vino_1_1 Hypernym:zumo_1_1. Definition:zumo de uvas fermentado. (fermented juice of grapes). Word sense: rueda_2_1 Hypernym:vino_1_1. Definition:vino procedente de la región de Rueda (Valladolid). (wine from the region of Rueda). SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus (Rigau et al. 97)
64
Acquisition of Lexical Knowledge for NLP64 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Genus Sense Identification n 97% accuracy for nouns –Genus Sense Disambiguation n Unsupervised WSD n Unrestricted WSD (coverage 100%) n Eight Heuristics (McRoy 92) –Combining several lexical resources –Combining several methods
65
Acquisition of Lexical Knowledge for NLP65 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Results:
66
Acquisition of Lexical Knowledge for NLP66 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Knowledge provided by each heuristic:
67
Acquisition of Lexical Knowledge for NLP67 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –F2+F3>9: 35,099 definitions –F2+F3>4: 40,754 definitions –No filters: 111,624 definitions
68
Acquisition of Lexical Knowledge for NLP68 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus... zumo_1_1 vino_1_1 quianti_1_1 zumo_1_1 vino_1_1 raya_1_8 zumo_1_1 vino_1_1 requena_1_1 zumo_1_1 vino_1_1 reserva_1_12 zumo_1_1 vino_1_1 ribeiro_1_1 zumo_1_1 vino_1_1 rioja_1_1 zumo_1_1 vino_1_1 roete_1_1 zumo_1_1 vino_1_1 rosado_1_3 zumo_1_1 vino_1_1 rueda_2_1 zumo_1_1 vino_1_1 sherry_1_1 zumo_1_1 vino_1_1 tarragona_1_1 zumo_1_1 vino_1_1 tintilla_1_1 zumo_1_1 vino_1_1 tintorro_1_1 zumo_1_1 vino_1_1 toro_3_1...
69
Acquisition of Lexical Knowledge for NLP69 Word Sense:rueda_2_1 Definition:vino procedente de la región de Rueda SinPar:sn:[n:vino] origin:[n:región, sp:[r0d:de, sp:[r0d:de, sn:[n:Rueda]]]. sn:[n:Rueda]]]. SEISD: Application of the methodology Step 3 (SemBuild): Exploiting Differentia
70
Acquisition of Lexical Knowledge for NLP70 SEISD: Application of the methodology Step 3 (SemBuild): Exploiting Differentia –MACO+, Relax (Padró 97), SinPar
71
Acquisition of Lexical Knowledge for NLP71 rueda x_1_1 = (“VOX”) = (“VOX”) = (“rueda”) = (“rueda”) = (“2”) = (“2”) = (“1”) = (“1”) = = <(“Rueda”). SEISD: Application of the methodology Step 4 (CRS): Placing the LK into the LKB
72
Acquisition of Lexical Knowledge for NLP72 rueda_x_2_1 linked to wine_l_1_1 (parent) rueda_x_2_1 linked to drink_l_2_1 (grandparent) rueda_x_2_1 linked to (parent) rueda_x_2_1 linked to (grandparent) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation (Ageno et al. 93)
73
Acquisition of Lexical Knowledge for NLP73 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation –Simple tlink: –Partial tlink –Rioja_x_1_1 linked to wine_l_1_1 –Phrasal tlink –ahumado_x_1_1 linked to smoked_food_l_1_1
74
Acquisition of Lexical Knowledge for NLP74 n First experiment –(semi)automatic approach using PRE –linking DGILE to LDOCE –drink taxonomy (235 definitions) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation
75
Acquisition of Lexical Knowledge for NLP75 n First experiment (results) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation
76
Acquisition of Lexical Knowledge for NLP76 n Second experiment –automatic approach using PRE n Conceptual Distance –linking DGILE to WordNet –food taxonomy (140 definitions) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation
77
Acquisition of Lexical Knowledge for NLP77 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation zumo_1_1 vino_1_1 rueda_2_1 <juice> <foodstuff> <object><entity>
78
Acquisition of Lexical Knowledge for NLP78 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation zumo_1_1 vino_1_1 rueda_2_1 <juice> <foodstuff> <object><entity> simple-tlink simple-tlink partial-tlink
79
Acquisition of Lexical Knowledge for NLP79 n Second experiment (results) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation
80
Acquisition of Lexical Knowledge for NLP80 n Ten class methods –Four monosemic criteria –Four polysemic criteria –two hybrid criteria n Three conceptual distance methods –CD1: using pairwise word coocurrences –CD2: using headword and genus –CD3: using bilingual Spanish entries with multiple translations SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet (Atserias et al. 97)
81
Acquisition of Lexical Knowledge for NLP81 n Ten class methods –Four monosemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEW SWEWEW SWEWEWSW SWEWSW
82
Acquisition of Lexical Knowledge for NLP82 n Ten class methods –Four monosemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEWSWEWEW Synset Synset Synset SynsetSWEWEWSWSynsetSynset SWEW SW
83
Acquisition of Lexical Knowledge for NLP83 n Ten class methods –Four polysemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEWSWEWEW SWEW EWSW Synset+ Synset+ Synset+ Synset+ Synset+ Synset+SWEWSW
84
Acquisition of Lexical Knowledge for NLP84 n Ten class methods –Variant criterion –Field criterion SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SW SW
85
Acquisition of Lexical Knowledge for NLP85 n Ten class methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet
86
Acquisition of Lexical Knowledge for NLP86 n Three CD methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet
87
Acquisition of Lexical Knowledge for NLP87 n Combining methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet
88
Acquisition of Lexical Knowledge for NLP88 n Resulting Spanish WordNets SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet
89
Acquisition of Lexical Knowledge for NLP89 Acquisition of Lexical Knowledge for NLP LK from MRDs: Taxonomies... zumo_1_1 vino_1_1 quianti_1_1 zumo_1_1 vino_1_1 raya_1_8 zumo_1_1 vino_1_1 requena_1_1 zumo_1_1 vino_1_1 reserva_1_12 zumo_1_1 vino_1_1 ribeiro_1_1 zumo_1_1 vino_1_1 rioja_1_1 zumo_1_1 vino_1_1 roete_1_1 zumo_1_1 vino_1_1 rosado_1_3 zumo_1_1 vino_1_1 rueda_2_1 zumo_1_1 vino_1_1 sherry_1_1 zumo_1_1 vino_1_1 tarragona_1_1 zumo_1_1 vino_1_1 tintilla_1_1 zumo_1_1 vino_1_1 tintorro_1_1 zumo_1_1 vino_1_1 toro_3_1...
90
Acquisition of Lexical Knowledge for NLP90 Acquisition of Lexical Knowledge for NLP LK from MRDs: MTDs 371.616 conexions 11.8004 9.8 16 elaborado queso 35 113 10.8938 8.0 23 pasta queso 178 113 10.4846 7.5 25 leche queso 274 113 10.2483 9.2 13 oveja queso 45 113 9.1513 7.6 16 queso sabor 113 160 7.4956 8.3 8 queso tortilla 113 51 6.7732 7.5 8 queso vaca 113 84 6.5830 6.1 12 maíz queso 347 113 6.2208 8.9 5 queso suero 113 21 6.1509 8.8 5 mantequilla queso 22 113 6.1474 7.9 6 compacta queso 50 113 5.9918 7.7 6 picante queso 55 113 5.9002 9.8 4 manchego queso 9 113 5.6805 7.3 6 cabra queso 75 113 5.6300 5.9 9 pan queso 287 113
91
Acquisition of Lexical Knowledge for NLP91
92
Acquisition of Lexical Knowledge for NLP92 Acquisition of Lexical Knowledge for NLP LK from MRDs: EuroWordNet SPANISH synsetswordsvariants adjs12,4618,71416,713 nouns43,57347,81362,319 verbs8,2986,01013,230 Sum64,33262,53792,262 CATALAN adjs1,3861,5072,030 nouns30,70132,98842,320 verbs4,4874,28910,297 Sum36,57438,78454,647
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.