Presentation is loading. Please wait.

Presentation is loading. Please wait.

Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt TALP Research Center Departament de Llenguatges i Sistemes.

Similar presentations


Presentation on theme: "Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt TALP Research Center Departament de Llenguatges i Sistemes."— Presentation transcript:

1 Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

2 Acquisition of Lexical Knowledge for NLP2 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora

3 Acquisition of Lexical Knowledge for NLP3 Acquisition of Lexical Knowledge for NLP Setting n NLP and the Lexicon –Theoretical: WG, GPSG, HPSG. –Practical: realistic complexity and coverage n Lexical Bottleneck (Briscoe 91) –Even worse for languages other than English

4 Acquisition of Lexical Knowledge for NLP4 Acquisition of Lexical Knowledge for NLP Setting n Which LK is needed by a concrete NLP system? n Where is this LK located? n Which procedures can be applied?

5 Acquisition of Lexical Knowledge for NLP5 Acquisition of Lexical Knowledge for NLP Setting n Which LK is needed by a concrete NLP system? –Phonology: phonemes, stress, etc. –Morphology: POS, etc. –Syntactic:category, subcat., etc. –Semantic:class, SRs, etc. –Pragmatic:usage, registers, TDs, etc. –Translations:translation links

6 Acquisition of Lexical Knowledge for NLP6 Acquisition of Lexical Knowledge for NLP Setting n Where is this LK located? –Human brain –Structured Lexical Resources: n Monolingual and bilingual MRDs n Thesauri –Unstructured Lexical Resources: n Monolingual and bilingual Corpora –Mixing resources

7 Acquisition of Lexical Knowledge for NLP7 Acquisition of Lexical Knowledge for NLP Setting n Which procedures can be applied? –Prescriptive approach n Machine-aided manual construction –Descriptive approach n Automatic acquisition from pre-existing Lexical Resources –Mixed approach

8 Acquisition of Lexical Knowledge for NLP8 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora

9 Acquisition of Lexical Knowledge for NLP9 Words and Works Where is this Lexical Knowledge located? –Human brain: n Linguistic String Project (Fox et al. 88) –Lexical Information for 10,000 entries n WordNet (Miller et al. 90) –Semantic Information v1.6 with 99,642 synsets n Comlex (Grishman et al. 94) –Syntactic information 38,000 English words n CYC Ontology (Lenat 95) –a person-century of effort to produce 100,000 terms n LDOCE3-NLP –dictionary with 80,000 senses

10 Acquisition of Lexical Knowledge for NLP10 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Monolingual MRDs: –LDOCE n learner’s dictionary n 35,956 entries and 76,059 definitions n 86% semantic and 44% pragmatic codes n controlled vocabulary of 2,000 words n (Boguraev & Briscoe 89) n (Vossen & Serail 90) n (Bruce & Guthrie 92), (Wilks et al. 93) n (Dolan et al. 93), (Richardson 97)

11 Acquisition of Lexical Knowledge for NLP11 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Other Monolingual MRDs: –Webster’s (Jensen & Ravin 87) –LPPL (Artola 93) –DGILE (Castellón 93), (Taulé 95), (Rigau 98) –CIDE (Harley & Glennon 97) –AHD (Richardson 97) –WordNet (Harabagiu 98) n Bilingual MRDs –Collins Spanish/English (Knigth & Luk 94) –Vox/Harrap’s Spanish/English (Rigau 98)

12 Acquisition of Lexical Knowledge for NLP12 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Thesauri: –Roget’s Thesaurus n 60,071 words in 1,000 categories n (Yarowsky 92), (Grefenstette 93), (Resnik 95) –Roget’s II and The New Collins Thesaurus n (Byrd 89) –Macquarie’s thesaurus n (Grefenstette 93) –Bunrui Goi Hyou Japanese thesaurus n (Utsuro et al. 93)

13 Acquisition of Lexical Knowledge for NLP13 Words and Works Where is this Lexical Knowledge located? –Structured Lexical Resources n Encyclopaedia –Grolier’s Encyclopaedia (Yarowsky 92) –Encarta (Richardson et al. 98) n Others –Telephonic Guides n Mixing structured lexical resources –Roget’s Thesaurus and Grolier’s (Yarowsky 92) –LDOCE, WN, Collins, ONTOS, UM (Knight & Luk 94) –Japanese MRD to WN (Okumura & Hovy 94) –LLOCE, LDOCE (Chen & Chang 98)

14 Acquisition of Lexical Knowledge for NLP14 Words and Works Where is this Lexical Knowledge located? –Unstructured Lexical Resources n Corpora: –WSJ, Brown Corpus (SemCor), Hansard –Proper Nouns (Hearst & Schütze 95) –Idiosyncratic Collocations (Church et al. 91) –Preposition preferences (Resnik and Hearst 93) –Subcategorization structures (Briscoe and Carroll 97) –Selectional restrictions (Resnik 93), (Ribas 95) –Thematic structure (Basili et al. 92) –Word semantic classes (Dagan et al. 94) –Bilingual Lexicons for MT (Fung 95)

15 Acquisition of Lexical Knowledge for NLP15 Words and Works Where is this Lexical Knowledge located? –Mixing structured and non-structured Lexical Resources n MRDs and Corpora –(Liddy & Paik 92) –(Klavans & Tzoukermann 96) n WordNet and Corpora –(Resnik 93), (Ribas 95), (Li & Abe 95), (McCarthy 01) –(Mihalcea & Moldovan 99)

16 Acquisition of Lexical Knowledge for NLP16 Words and Works Lexical Acquisition from MRDs –Syntactic Disambiguation (Dolan et al. 93) –Semantic Processing (Vanderwende 95) –WSD (Lesk 86), (Wilks & Stevenson 97), (Rigau 98) –IR (Krovetz & Croft 92) –MT (Knight and Luk 94), (Tanaka & Umemura 94) –Semantically enriching MRDs n (Yarowsky 92), (Knight 93), (Chen & Chan 98) –Building LKBs n (Bruce & Guthrie 92) n (Dolan et al. 93) n (Artola 93) n (Castellón 93), (Taulé 95), (Rigau 98)

17 Acquisition of Lexical Knowledge for NLP17 Words and Works International Projects on Lexical Acquisition –Japanese Projects n EDR (Yokoi 95) –Nine years project oriented to MT –Bilingual Corpora with 250,000 words –Monolingual, bilingual and coocurrence dictionaries –200,000 general vocabulary –100,000 technical terminology –400,000 concepts

18 Acquisition of Lexical Knowledge for NLP18 Words and Works International Projects on Lexical Acquisition –American Projects n Comlex (Grishman et al. 94) –Syntactic information for 38,000 words n WordNet (Miller 90) –Semantic information –more than 123,000 words organised in 99,000 synsets –more than 116,000 relations between synsets n Pangloss (Knight & Luk 94) –PUM, ONTOS, LDOCE semantic categories, WordNet n Cyc (Lenat 95) –common-sense knowledge –100,000 concepts and 1,000,000 axioms

19 Acquisition of Lexical Knowledge for NLP19 Words and Works International Projects on Lexical Acquisition –European Projects n Acquilex I and II –LA from monolingual and bilingual MRDs and corpora n LE-Parole –Large-scale harmonised set of corpora and lexicons for all the EU languages n EuroWordNet –To develop a multilingual WordNet for several European Languages

20 Acquisition of Lexical Knowledge for NLP20 Acquisition of Lexical Knowledge for NLP Setting n Acquilex I n Acquilex II n EuroWordNet

21 Acquisition of Lexical Knowledge for NLP21 Words and Works Acquilex n Lexical Knowledge Acquisition n Mixed approach n Dictionaries (MRD -> MTD -> LDB -> LKB) n Partners –Cambridge University –Instituto di Linguistica Computazional de Pisa –Amsterdam University –Dublin University n 30 months n Thesis –(Castellón 1993) –(Taulé 1995) –(Rigau 1998)

22 Acquisition of Lexical Knowledge for NLP22 Words and Works Acquilex II n Lexical Knowledge Acquisition n Mixed approach n Corpora n Partners –Cambridge University –Instituto di Linguistica Computazional de Pisa –Amsterdam University n 30 months n Thesis –[Ribas 1995] (Acquisition of Selectional Restrictions) –[Ageno...] (Robust Parsing) –[Padró 1998] (Relaxation labelling) –[Màrquez 1999] (Desition Trees)

23 Acquisition of Lexical Knowledge for NLP23 Words and Works EuroWordNet n Multilingual WordNet n Partners –English, Spanish, Dutch, Italian –(and French, German, Txec, Estonian) n 25.000 noun synsets and 5.000 verbal synsets n 30 months n Thesis –[Farreres...] (Mapping of Bilingual dictionaries) –[Daudé...] (Mapping of hierarchies)

24 Acquisition of Lexical Knowledge for NLP24 Acquisition of Lexical Knowledge for NLP Outline n Setting n Words and Works n Structured Sources –MRDs, thesauri n Unstructured Sources –corpora

25 Acquisition of Lexical Knowledge for NLP25 Structured Sources Acquisition of LK from MRDs n Focusing on: –the massive acquisition of LK –from MRDs (conventional, in any language) –using automatic methodologies n Why MRDs? The conventional dictionaries for human use usually “contain spelling, pronunciation, hyphenation, capitalization, usage notes for semantic domains, geographic regions, and propiety; ethimological, syntactic and semantic information about the most basic units of the language” (Amsler 81)

26 Acquisition of Lexical Knowledge for NLP26 Structured Sources Dictionaries n LDOCE (Longman Dictionary of Contemporary English) n DGILE (Diccionario General Ilustrado de la Lengua Española) n DGLC (Diccionari General de la Llengua Catalana) n DVHE (Diccionari Vox-Harrap’s Esencial)

27 Acquisition of Lexical Knowledge for NLP27 Structured Sources Dictionaries: LDOCE n Higly coded, restricted vocabulary n 76.059 senses in 30.373 entries –LDOCE id, POS, Grammatical Code, Idiom, Pragmatic Code, –Semantic Code (subject-preference), object-reference, –indirect-object-preference, definition. |cheese_0_1| <> <> <> |cheese_0_1| <> <> <> |cheese_0_2| <> <> <> |cheese_0_2| <> <> <> |cheese_0_3| <> <> <> |cheese_0_3| <> <> <>

28 Acquisition of Lexical Knowledge for NLP28 Structured Sources Dictionaries: DGILE n Poorly coded, no restricted vocabulary n 157.843 senses in 89.043 entries n 1.4 million words in definitions and examples ((queso ) (ETIM l. caseu ) (Sense 1) (CA m.) (DEF Masa que se obtiene cuajando la leche, exprimiéndola para que deje suero y echándole sal para que se conserve: ~ de Gruyre; ~ de Roquefort; ~ de bola, el de tipo holandés, de forma esférica; ~ de hierba, el que se hace cuajando la leche con hierba a propósito; ~ manchego, el de pasta compacta, algo dura, crudo, de leche de oveja.) (Sense 2) (CA m.) (DEF ~ de cerdo, manjar hecho con carne de cerdo o jabalí, picada y prensada.) (Sense 3)(CA m.)(DEF ~ helado, helado compacto hecho en molde.) (Sense 4)(CA m.)(DEF Medio ~, tablero grueso, semicircular, que usan los sastres para planchar cuellos y solapas y para sentar costuras curvas.) (Sense 5)(CA m.)(REG fam.)(DEF Pie.) (Sense 6)(CA m.)(GEO Venez.)(DEF ~ frito, estafa.) (RELA 1)(TIPOR Rel.)(TXR Del l. caseu derivan numerosos tecn. como caseína, cáseo, caseificar, caseico, caseoso.))

29 Acquisition of Lexical Knowledge for NLP29 Structured Sources Dictionaries: DGLC (Fabra) n Poorly coded, no restricted vocabulary n 89.360 senses in 51.135 entries ((formatge)(CC m.) (NS 1 > 1 > 1 > 0 > 0 > 0)(CG m.)(DF Massa alimentosa que s’obté coagulant la llet, esprement-ne el xerigot i consolidant la part presa.) (NS 2 > 1 > 0 > 1 > 0 > 0)(CG m.)(EX Formatge de Ma.) (NS 3 > 1 > 0 > 2 > 0 > 0)(CG m.)(EX Formatge fresc, salat.) (NS 4 > 1 > 0 > 3 > 0 > 0)(CG m.)(EX Ratllar formatge.) (NS 5 > 1 > 0 > 4 > 0 > 0)(CG m.)(EX Un formatge.) (FI f4) )

30 Acquisition of Lexical Knowledge for NLP30 –Morphological Information n POS (n, v, adj, adv, etc.) n Derivative forms n Composed forms n Derivative Model (verbs) –Sintactic Information n Idioms n Implicit Knowledge barrer_1_1 limpiar (el suelo) con la escoba. freír_1_1cocer (un manjar) en aceite o grasa hirviendo. comprar_1_1adquirir (una cosa) a cambio de cierta cantidad de dinero. cazar_1_1buscar o perseguir (a las aves, fieras, etc.) para cogerlas o matarlas. Structured Sources Acquilex

31 Acquisition of Lexical Knowledge for NLP31 n Explicit (LDOCE: Pragmatic Codes, Semantic Code, etc.;DGILE: Tema, sinonims, antonims, sentits figutats, etc.) n Implicit jardín_1_1Terreno donde se cultivan plantas y flores ornamentales. jardín_1_1Terreno donde se cultivan plantas y flores ornamentales. florero_1_4 Maceta con flores. ramo_1_3 Conjunto natural o artificial de flores, ramas o hierbas. pétalo_1_1Hoja que forma la corola de la flor. tálamo_1_3Receptáculo de la flor. miel_1_1 Substancia viscosa y muy dulce que elaboran las abejas, en una distensión del esófago, con el jugo de las flores y luego depositan en las celdillas de sus panales. florería_1_1 Floristería; tienda o puesto donde se venden flores. florista_1_1 Persona que tiene por oficio hacer o vender flores. camelia_1_1 Arbusto cameliáceo de jardín, originario de Oriente, de hojas perennes y lustrosas, y flores grandes, blancas, rojas o rosadas (Camellia japonica). camelia_1_2 Flor de este arbusto. camelia_1_2 Flor de este arbusto. rosa_1_1 Flor del rosal. rosa_1_1 Flor del rosal. Structured Sources Semantic Information

32 Acquisition of Lexical Knowledge for NLP32 Structured Sources Main Problems –Conventional dictionaries are not systematic –Dictionaries are built for human use –Implicit Knowledge n words are described/translated in terms of words

33 Acquisition of Lexical Knowledge for NLP33 Structured Sources SEISD (Rigau 98) n The System –General Frame –Methodology –SEISD –Application of the methodology

34 Acquisition of Lexical Knowledge for NLP34 SEISD General Frame –Characteristics of the Lexical Resources used –Lexical Knowledge to be extracted –Lexical Knowledge Representation –The acquisition process

35 Acquisition of Lexical Knowledge for NLP35 SEISD General Frame –Characteristics of the Lexical Resources used n DGILE n Spanish/English bilingual Dictionaries n WordNet n Type System of the LKB

36 Acquisition of Lexical Knowledge for NLP36 –Characteristics of the Lexical Resources used n DGILE –89,043 entries and 157,842 senses –1.4 million words in definitions and examples –neither semantic nor pragmatic codes –no restricted vocabulary vino (l. vinu) m. Zumo de uvas fermentado;... 2 fig. Bautizar o cristianizar, el ~, echarle agua. 3 fig. Dormir uno el ~, dormir mientras dura la borrachera; tener uno mal ~, ser pendenciero en la embriaguez. 4 p.ext. Zumo. | HOMOF.: vino (v.), bino (v.). REL. Enológico, enólogo, enotecnia, derivados de enología, ciencia de la vinicultura, formada del gr. oinos. SEISD General Frame

37 Acquisition of Lexical Knowledge for NLP37 –Characteristics of the Lexical Resources used n Spanish/English bilingual Dictionaries –EEI: 16,463 entries with 28,002 translation fields –EIE: 15,352 entries with 27,033 translation fields vino m wine. ~ de Jerez, sherry; ~ tinto, red wine. wine n vino SEISD General Frame

38 Acquisition of Lexical Knowledge for NLP38 SEISD General Frame –Characteristics of the Lexical Resources used n WordNet –v1.6 has 123,497 content words and 99,642 synsets Sense 1 wine, vino -- (fermented juice (of grapes especially)) => sake, saki -- (Japanese beverage from fermented rice...) => sake, saki -- (Japanese beverage from fermented rice...) => vintage -- (a season's yield of wine from a vineyard) => vintage -- (a season's yield of wine from a vineyard) => red wine -- (wine having a red color derived from skins...) => red wine -- (wine having a red color derived from skins...) => Pinot noir -- (dry red California table wine...) => Pinot noir -- (dry red California table wine...) => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine) => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine) => Saint Emilion -- (full-bodied red wine from...) => Saint Emilion -- (full-bodied red wine from...) => Chianti -- (dry red Italian table wine from the Chianti...) => Chianti -- (dry red Italian table wine from the Chianti...) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine) => Rioja -- (dry red table wine from the Rioja...) => Rioja -- (dry red table wine from the Rioja...) => zinfandel -- (dry fruity red wine from California) => zinfandel -- (dry fruity red wine from California)

39 Acquisition of Lexical Knowledge for NLP39 SEISD General Frame –Characteristics of the Lexical Resources used n Type System of the LKB (Copestake 92) –527 types with 196 features

40 Acquisition of Lexical Knowledge for NLP40 SEISD General Frame –Lexical Knowledge to be extracted n Explicit information (POS, TD, uses, etc.) n Implicit information –Hypernym/hyponym relations (class/subclass) –Synonymy/Antonymy relations –Meronym/Holonym relation (part/whole,...) –Case role relations (agentive, telic,...) –Content relations (qualia, form, constitutive,...) –Collocational relations (compounds, idioms,...) –Selectional restrictions (typical subject, object,...) –Translation Equivalences

41 Acquisition of Lexical Knowledge for NLP41 SEISD General Frame –Lexical Knowledge Representation n LKB (Copestake 92) –represent both syntactic and semantic information –Type Feature Structure formalism (Carpenter 92) –default inheritance –lexical and phrasal rules –multilingual relations

42 Acquisition of Lexical Knowledge for NLP42 Structured Sources SEISD n The System –General Frame –Methodology –SEISD –Application of the methodology

43 Acquisition of Lexical Knowledge for NLP43 SEISD Methodology MRD1 MRDn LDB1Tax1 LDBn Taxn MLKB... LKB1 LKBn...

44 Acquisition of Lexical Knowledge for NLP44 SEISD Methodology –Problems following a pure descriptive approach n Circularity n Errors and inconsistencies n Definitions with omitted genus n Top dictionary senses do not usually represent useful knowledge for the LKB –Too general –Too specific

45 Acquisition of Lexical Knowledge for NLP45 SEISD Methodology Mixed Methodology Prescriptive approach Manual construction of the Type System Manual construction of the Type System

46 Acquisition of Lexical Knowledge for NLP46 SEISD Methodology Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Type System Manual construction of the Type System

47 Acquisition of Lexical Knowledge for NLP47 SEISD Methodology Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Type System Manual construction of the Type System

48 Acquisition of Lexical Knowledge for NLP48 SEISD Methodology –Step 1: Selection of the main top beginners for a semantic primitive –Step 2: Exploiting genus, construction of taxonomies –Step 3: Exploiting differentia –Step 4: Mapping the LK into the LKB –Step 5: Tlinks Generation –Step 6: Validation and exploitation of the LKB

49 Acquisition of Lexical Knowledge for NLP49 Structured Sources SEISD n The System –General Frame –Methodology –SEISD –Application of the methodology

50 Acquisition of Lexical Knowledge for NLP50 Structured Sources SEISD –SEISD: Sistema d’Extracció d’Informació Semàntica de Diccionaris (Ageno et al. 92) n designed to support the main methodology n taking into account the characteristics of the Lexical resources used n reusability of software and lexical resources n allowing modular improvements n minimal effort

51 Acquisition of Lexical Knowledge for NLP51SEISD User LKB System PRE SemBuild TaxBuild CRS TGE LDB/LKB system LDB System Linguistic Knowledge SegWord and FPar MACO+, Relax and SinPar Lexical Knowledge User LDB DGILE English/Spanish Spanish/English MTDs WordNet Taxonomies LKB Type System Lexicons LDB/LKB Lexicons

52 Acquisition of Lexical Knowledge for NLP52 Structured Sources SEISD (Rigau 98) n The System –General Frame –Methodology –SEISD –Application of the methodology

53 Acquisition of Lexical Knowledge for NLP53 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive (Rigau et al. 98) Word sense: zumo_1_1 Attached-to:c_art_subst type. Definition:líquido que se extrae de las flores, hierbas, frutos, etc. (liquid extracted from flowers, herbs, fruits, etc). fruits, etc).

54 Acquisition of Lexical Knowledge for NLP54 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A) Attaching DGILE senses to semantic primitives n 1) First labelling: –Conceptual Distance (Rigau 94) n 2) Second labelling: –Salient Words (Yarowsky 92) –B) Filtering Process

55 Acquisition of Lexical Knowledge for NLP55 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.1) First labelling: n Conceptual Distance (Agirre et al. 94) –length of the shortest path –specificity of the concepts n using WordNet n Bilingual dictionary

56 Acquisition of Lexical Knowledge for NLP56 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive <entity> abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) <monastery><abbey> <convent><abbey>

57 Acquisition of Lexical Knowledge for NLP57 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive <entity> 06 ARTIFACT 06 ARTIFACT abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) <monastery><abbey> <convent><abbey>

58 Acquisition of Lexical Knowledge for NLP58 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.1) First labelling (Results) n 29,205 labelled definitions (31%) n 61% accuracy at a sense level n 64% accuracy at a file level

59 Acquisition of Lexical Knowledge for NLP59 –A.2) Second labelling: n Salient Words (Yarowsky 92) SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive n Importance –local frequency –appears more significantly more often in the corpus of a semantic category than at other points in the whole corpus

60 Acquisition of Lexical Knowledge for NLP60 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –A.2) Second labelling (Results): n 86,759 labelled definitions (93%) n 80% accuracy at a file level biberón_1_1 ARTIFACT 4.8399 Frasco de cristal... (glass flask...) biberón_1_2 FOOD 7.4443 Leche que contiene este frasco... (milk contained in that flask...)

61 Acquisition of Lexical Knowledge for NLP61 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –B) Filtering process (FOODs) n removes all genus terms –FILTER 1: not FOODs by the bilingual mapping –FILTER 2: appear more often as genus in other SC –FILTER 3: with a low frequency

62 Acquisition of Lexical Knowledge for NLP62 SEISD: Application of the methodology Step 1: Selection of the main top beginners for a semantic primitive –B) Filtering process (FOOD Results)

63 Acquisition of Lexical Knowledge for NLP63 Word sense:vino_1_1 Hypernym:zumo_1_1. Definition:zumo de uvas fermentado. (fermented juice of grapes). Word sense: rueda_2_1 Hypernym:vino_1_1. Definition:vino procedente de la región de Rueda (Valladolid). (wine from the region of Rueda). SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus (Rigau et al. 97)

64 Acquisition of Lexical Knowledge for NLP64 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Genus Sense Identification n 97% accuracy for nouns –Genus Sense Disambiguation n Unsupervised WSD n Unrestricted WSD (coverage 100%) n Eight Heuristics (McRoy 92) –Combining several lexical resources –Combining several methods

65 Acquisition of Lexical Knowledge for NLP65 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Results:

66 Acquisition of Lexical Knowledge for NLP66 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –Knowledge provided by each heuristic:

67 Acquisition of Lexical Knowledge for NLP67 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus –F2+F3>9: 35,099 definitions –F2+F3>4: 40,754 definitions –No filters: 111,624 definitions

68 Acquisition of Lexical Knowledge for NLP68 SEISD: Application of the methodology Step 2 (TaxBuild): Exploiting Genus... zumo_1_1 vino_1_1 quianti_1_1 zumo_1_1 vino_1_1 raya_1_8 zumo_1_1 vino_1_1 requena_1_1 zumo_1_1 vino_1_1 reserva_1_12 zumo_1_1 vino_1_1 ribeiro_1_1 zumo_1_1 vino_1_1 rioja_1_1 zumo_1_1 vino_1_1 roete_1_1 zumo_1_1 vino_1_1 rosado_1_3 zumo_1_1 vino_1_1 rueda_2_1 zumo_1_1 vino_1_1 sherry_1_1 zumo_1_1 vino_1_1 tarragona_1_1 zumo_1_1 vino_1_1 tintilla_1_1 zumo_1_1 vino_1_1 tintorro_1_1 zumo_1_1 vino_1_1 toro_3_1...

69 Acquisition of Lexical Knowledge for NLP69 Word Sense:rueda_2_1 Definition:vino procedente de la región de Rueda SinPar:sn:[n:vino] origin:[n:región, sp:[r0d:de, sp:[r0d:de, sn:[n:Rueda]]]. sn:[n:Rueda]]]. SEISD: Application of the methodology Step 3 (SemBuild): Exploiting Differentia

70 Acquisition of Lexical Knowledge for NLP70 SEISD: Application of the methodology Step 3 (SemBuild): Exploiting Differentia –MACO+, Relax (Padró 97), SinPar

71 Acquisition of Lexical Knowledge for NLP71 rueda x_1_1 = (“VOX”) = (“VOX”) = (“rueda”) = (“rueda”) = (“2”) = (“2”) = (“1”) = (“1”) = = <(“Rueda”). SEISD: Application of the methodology Step 4 (CRS): Placing the LK into the LKB

72 Acquisition of Lexical Knowledge for NLP72 rueda_x_2_1 linked to wine_l_1_1 (parent) rueda_x_2_1 linked to drink_l_2_1 (grandparent) rueda_x_2_1 linked to (parent) rueda_x_2_1 linked to (grandparent) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation (Ageno et al. 93)

73 Acquisition of Lexical Knowledge for NLP73 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation –Simple tlink: –Partial tlink –Rioja_x_1_1 linked to wine_l_1_1 –Phrasal tlink –ahumado_x_1_1 linked to smoked_food_l_1_1

74 Acquisition of Lexical Knowledge for NLP74 n First experiment –(semi)automatic approach using PRE –linking DGILE to LDOCE –drink taxonomy (235 definitions) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation

75 Acquisition of Lexical Knowledge for NLP75 n First experiment (results) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation

76 Acquisition of Lexical Knowledge for NLP76 n Second experiment –automatic approach using PRE n Conceptual Distance –linking DGILE to WordNet –food taxonomy (140 definitions) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation

77 Acquisition of Lexical Knowledge for NLP77 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation zumo_1_1 vino_1_1 rueda_2_1 <juice> <foodstuff> <object><entity>

78 Acquisition of Lexical Knowledge for NLP78 SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation zumo_1_1 vino_1_1 rueda_2_1 <juice> <foodstuff> <object><entity> simple-tlink simple-tlink partial-tlink

79 Acquisition of Lexical Knowledge for NLP79 n Second experiment (results) SEISD: Application of the methodology Step 5 (TGE): Tlinks Generation

80 Acquisition of Lexical Knowledge for NLP80 n Ten class methods –Four monosemic criteria –Four polysemic criteria –two hybrid criteria n Three conceptual distance methods –CD1: using pairwise word coocurrences –CD2: using headword and genus –CD3: using bilingual Spanish entries with multiple translations SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet (Atserias et al. 97)

81 Acquisition of Lexical Knowledge for NLP81 n Ten class methods –Four monosemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEW SWEWEW SWEWEWSW SWEWSW

82 Acquisition of Lexical Knowledge for NLP82 n Ten class methods –Four monosemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEWSWEWEW Synset Synset Synset SynsetSWEWEWSWSynsetSynset SWEW SW

83 Acquisition of Lexical Knowledge for NLP83 n Ten class methods –Four polysemic criteria SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SWEWSWEWEW SWEW EWSW Synset+ Synset+ Synset+ Synset+ Synset+ Synset+SWEWSW

84 Acquisition of Lexical Knowledge for NLP84 n Ten class methods –Variant criterion –Field criterion SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet SW SW

85 Acquisition of Lexical Knowledge for NLP85 n Ten class methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet

86 Acquisition of Lexical Knowledge for NLP86 n Three CD methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet

87 Acquisition of Lexical Knowledge for NLP87 n Combining methods (results) SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet

88 Acquisition of Lexical Knowledge for NLP88 n Resulting Spanish WordNets SEISD: Application of the methodology Step 5: Mapping bilingual entries to WordNet

89 Acquisition of Lexical Knowledge for NLP89 Acquisition of Lexical Knowledge for NLP LK from MRDs: Taxonomies... zumo_1_1 vino_1_1 quianti_1_1 zumo_1_1 vino_1_1 raya_1_8 zumo_1_1 vino_1_1 requena_1_1 zumo_1_1 vino_1_1 reserva_1_12 zumo_1_1 vino_1_1 ribeiro_1_1 zumo_1_1 vino_1_1 rioja_1_1 zumo_1_1 vino_1_1 roete_1_1 zumo_1_1 vino_1_1 rosado_1_3 zumo_1_1 vino_1_1 rueda_2_1 zumo_1_1 vino_1_1 sherry_1_1 zumo_1_1 vino_1_1 tarragona_1_1 zumo_1_1 vino_1_1 tintilla_1_1 zumo_1_1 vino_1_1 tintorro_1_1 zumo_1_1 vino_1_1 toro_3_1...

90 Acquisition of Lexical Knowledge for NLP90 Acquisition of Lexical Knowledge for NLP LK from MRDs: MTDs 371.616 conexions 11.8004 9.8 16 elaborado queso 35 113 10.8938 8.0 23 pasta queso 178 113 10.4846 7.5 25 leche queso 274 113 10.2483 9.2 13 oveja queso 45 113 9.1513 7.6 16 queso sabor 113 160 7.4956 8.3 8 queso tortilla 113 51 6.7732 7.5 8 queso vaca 113 84 6.5830 6.1 12 maíz queso 347 113 6.2208 8.9 5 queso suero 113 21 6.1509 8.8 5 mantequilla queso 22 113 6.1474 7.9 6 compacta queso 50 113 5.9918 7.7 6 picante queso 55 113 5.9002 9.8 4 manchego queso 9 113 5.6805 7.3 6 cabra queso 75 113 5.6300 5.9 9 pan queso 287 113

91 Acquisition of Lexical Knowledge for NLP91

92 Acquisition of Lexical Knowledge for NLP92 Acquisition of Lexical Knowledge for NLP LK from MRDs: EuroWordNet SPANISH synsetswordsvariants adjs12,4618,71416,713 nouns43,57347,81362,319 verbs8,2986,01013,230 Sum64,33262,53792,262 CATALAN adjs1,3861,5072,030 nouns30,70132,98842,320 verbs4,4874,28910,297 Sum36,57438,78454,647


Download ppt "Acquisition of Lexical Knowledge for NLP German Rigau i Claramunt TALP Research Center Departament de Llenguatges i Sistemes."

Similar presentations


Ads by Google