Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-automatic methods for WordNet construction German Rigau i Claramunt TALP Research Center Universitat Politècnica de Catalunya.

Similar presentations


Presentation on theme: "Semi-automatic methods for WordNet construction German Rigau i Claramunt TALP Research Center Universitat Politècnica de Catalunya."— Presentation transcript:

1 Semi-automatic methods for WordNet construction German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Universitat Politècnica de Catalunya Eneko Agirre http://www.ji.si.upc.es/users/eneko IxA NLP Group University of the Basque Country 2002 International WordNet Conference

2 Semi-automatic methods for WN construction 2002 International WordNet Conference - 2 NLP and the Lexicon Theoretical: WG, GPSG, HPSG. Practical: realistic complexity and coverage Lexical bottleneck (Briscoe 91) Even worse for languages other than English Setting

3 Semi-automatic methods for WN construction 2002 International WordNet Conference - 3 Which LK is needed by a concrete NLP system? Where is this LK located? Which procedures can be applied? Setting

4 Semi-automatic methods for WN construction 2002 International WordNet Conference - 4 Which LK is needed by a concrete NLP system? Phonology: phonemes, stress, etc. Morphology: POS, etc. Syntactic:category, subcat., etc. Semantic:class, SRs, etc. Pragmatic:usage, registers, TDs, etc. Translations:translation links Setting

5 Semi-automatic methods for WN construction 2002 International WordNet Conference - 5 Where is this LK located? Human brain Structured Lexical Resources: Monolingual and bilingual MRDs Thesauri Unstructured Lexical Resources: Monolingual and bilingual Corpora Mixing resources Setting

6 Semi-automatic methods for WN construction 2002 International WordNet Conference - 6 Which procedures can be applied? Prescriptive approach Machine-aided manual construction Descriptive approach Automatic acquisition from pre-existing Lexical Resources Mixed approach Setting

7 Semi-automatic methods for WN construction 2002 International WordNet Conference - 7 Setting Words and Works Merge approach Taxonomy construction: monolingual MRDs Mapping taxonomies: bilingual MRDs Expand approach Translation of synsets: bilingual MRDs Interface for manual revision Conclusions Outline

8 Semi-automatic methods for WN construction 2002 International WordNet Conference - 8 Words and Works Where is this Lexical Knowledge located? Human brain: Linguistic String Project (Fox et al. 88) Lexical Information for 10,000 entries WordNet (Miller et al. 90) Semantic Information v1.6 with 99,642 synsets Comlex (Grishman et al. 94) Syntactic information 38,000 English words CYC Ontology (Lenat 95) a person-century of effort to produce 100,000 terms LDOCE3-NLP dictionary with 80,000 senses

9 Semi-automatic methods for WN construction 2002 International WordNet Conference - 9 Structured Lexical Resources Monolingual MRDs: LDOCE learners dictionary 35,956 entries and 76,059 definitions 86% semantic and 44% pragmatic codes controlled vocabulary of 2,000 words (Boguraev & Briscoe 89) (Vossen & Serail 90) (Bruce & Guthrie 92), (Wilks et al. 93) (Dolan et al. 93), (Richardson 97) Words and Works Where is this Lexical Knowledge located?

10 Semi-automatic methods for WN construction 2002 International WordNet Conference - 10 Structured Lexical Resources Other Monolingual MRDs: Websters (Jensen & Ravin 87) LPPL (Artola 93) DGILE (Castellón 93), (Taulé 95), (Rigau 98) CIDE (Harley & Glennon 97) AHD (Richardson 97) WordNet (Harabagiu 98) Bilingual MRDs Collins Spanish/English (Knigth & Luk 94) Vox/Harraps Spanish/English (Rigau 98) Words and Works Where is this Lexical Knowledge located?

11 Semi-automatic methods for WN construction 2002 International WordNet Conference - 11 Structured Lexical Resources Thesauri: Rogets Thesaurus 60,071 words in 1,000 categories (Yarowsky 92), (Grefenstette 93), (Resnik 95) Rogets II and The New Collins Thesaurus (Byrd 89) Macquaries thesaurus (Grefenstette 93) Bunrui Goi Hyou Japanese thesaurus (Utsuro et al. 93) Words and Works Where is this Lexical Knowledge located?

12 Semi-automatic methods for WN construction 2002 International WordNet Conference - 12 Structured Lexical Resources Encyclopaedia Groliers Encyclopaedia (Yarowsky 92) Encarta (Richardson et al. 98) Others Telephonic Guides Mixing structured lexical resources Rogets Thesaurus and Groliers (Yarowsky 92) LDOCE, WN, Collins, ONTOS, UM (Knight & Luk 94) Japanese MRD to WN (Okumura & Hovy 94) LLOCE, LDOCE (Chen & Chang 98) Words and Works Where is this Lexical Knowledge located?

13 Semi-automatic methods for WN construction 2002 International WordNet Conference - 13 Unstructured Lexical Resources Corpora: WSJ, Brown Corpus (SemCor), Hansard Proper Nouns (Hearst & Schütze 95) Idiosyncratic Collocations (Church et al. 91) Preposition preferences (Resnik and Hearst 93) Subcategorization structures (Briscoe and Carroll 97) Selectional restrictions (Resnik 93), (Ribas 95) Thematic structure (Basili et al. 92) Word semantic classes (Dagan et al. 94) Bilingual Lexicons for MT (Fung 95) Words and Works Where is this Lexical Knowledge located?

14 Semi-automatic methods for WN construction 2002 International WordNet Conference - 14 Using both structured and non-structured Lexical Resources MRDs and Corpora (Liddy & Paik 92) (Klavans & Tzoukermann 96) WordNet and Corpora (Resnik 93), (Ribas 95), (Li & Abe 95), (McCarthy 01) Words and Works Where is this Lexical Knowledge located?

15 Semi-automatic methods for WN construction 2002 International WordNet Conference - 15 Japanese Projects EDR (Yokoi 95) Nine years project oriented to MT Bilingual Corpora with 250,000 words Monolingual, bilingual and coocurrence dictionaries 200,000 general vocabulary 100,000 technical terminology 400,000 concepts Words and Works International Projects on Lexical Acquisition

16 Semi-automatic methods for WN construction 2002 International WordNet Conference - 16 American Projects Comlex (Grishman et al. 94) Syntactic information for 38,000 words WordNet (Miller 90) Semantic Information more than 123,000 words organised in 99,000 synsets more than 116,000 relations between synsets Pangloss (Knight & Luk 94) PUM, ONTOS, LDOCE semantic categories, WordNet Cyc (Lenat 95) common-sense knowledge 100,000 concepts and 1,000,000 axioms Words and Works International Projects on Lexical Acquisition

17 Semi-automatic methods for WN construction 2002 International WordNet Conference - 17 European Projects Acquilex I and II LA from monolingual and bilingual MRDs and corpora LE-Parole Large-scale harmonised set of corpora and lexicons for all the EU languages EuroWordNet Multilingual WordNet for several European Languages Meaning Large-scale of LK from the web Large-scale WSD Words and Works International Projects on Lexical Acquisition

18 Semi-automatic methods for WN construction 2002 International WordNet Conference - 18 Syntactic Disambiguation (Dolan et al. 93) Semantic Processing (Vanderwende 95) WSD (Lesk 86), (Wilks & Stevenson 97), (Rigau 98) IR (Krovetz & Croft 92) MT (Knight and Luk 94), (Tanaka & Umemura 94) Semantically enriching MRDs (Yarowsky 92), (Knight 93), (Chen & Chan 98) Building LKBs (Bruce & Guthrie 92) (Dolan et al. 93) (Artola 93) (Castellón 93), (Taulé 95), (Rigau 98) Words and Works Lexical Acquisition from MRDs

19 Semi-automatic methods for WN construction 2002 International WordNet Conference - 19 This tutorial focus on: the massive acquisition of LK from MRDs (conventional, in any language) using (semi) automatic methodologies Why MRDs? The conventional dictionaries for human use usually contain spelling, pronunciation, hyphenation, capitalization, usage notes for semantic domains, geographic regions, and propiety; ethimological, syntactic and semantic information about the most basic units of the language (Amsler 81) Words and Works Acquisition of LK from MRDs

20 Semi-automatic methods for WN construction 2002 International WordNet Conference - 20 Conventional dictionaries are not systematic Dictionaries are built for human use Implicit Knowledge words are described/translated in terms of words Words and Works Main Problems of MRDs

21 Semi-automatic methods for WN construction 2002 International WordNet Conference - 21 jardín_1_1Terreno donde se cultivan plantas y flores ornamentales. florero_1_4 Maceta con flores. ramo_1_3 Conjunto natural o artificial de flores, ramas o hierbas. pétalo_1_1Hoja que forma la corola de la flor. tálamo_1_3Receptáculo de la flor. miel_1_1 Substancia viscosa y muy dulce que elaboran las abejas, en una distensión del esófago, con el jugo de las flores y luego depositan en las celdillas de sus panales. florería_1_1 Floristería; tienda o puesto donde se venden flores. florista_1_1 Persona que tiene por oficio hacer o vender flores. camelia_1_1 Arbusto cameliáceo de jardín, originario de Oriente, de hojas perennes y lustrosas, y flores grandes, blancas, rojas o rosadas (Camellia japonica). camelia_1_2 Flor de este arbusto. rosa_1_1 Flor del rosal. Words and Works MRDs and Semantic Knowledge

22 Semi-automatic methods for WN construction 2002 International WordNet Conference - 22 Setting Words and Works Merge approach Taxonomy construction: monolingual MRDs Mapping taxonomies: bilingual MRDs Expand approach Translation of synsets: bilingual MRDs Interface for manual revision Conclusions Outline

23 Semi-automatic methods for WN construction 2002 International WordNet Conference - 23 MRD1 MRDn LDB1Tax1 LDBn Taxn MLKB... LKB1 LKBn... Merge approach Main Methodology

24 Semi-automatic methods for WN construction 2002 International WordNet Conference - 24 Taxonomy construction: (Rigau et al. 98, 97) monolingual MRDs Step 1: Selection of the main top beginners for a semantic primitive Step 2: Exploiting genus, construction of taxonomies for each semantic primitive Mapping taxonomies: (Daudé et al. 99) bilingual MRDs Step 3: Creation of translation links Merge approach Main Methodology

25 Semi-automatic methods for WN construction 2002 International WordNet Conference - 25 Problems following a pure descriptive approach Circularity Errors and inconsistencies Definitions with omitted genus Top dictionary senses do not usually represent useful knowledge for the LKB Too general Too specific Merge approach: Taxonomy Construction Methodology

26 Semi-automatic methods for WN construction 2002 International WordNet Conference - 26 Mixed Methodology Prescriptive approach Manual construction of the Top Structure Merge approach: Taxonomy Construction Methodology

27 Semi-automatic methods for WN construction 2002 International WordNet Conference - 27 Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Top Structure Merge approach: Taxonomy Construction Methodology

28 Semi-automatic methods for WN construction 2002 International WordNet Conference - 28 Mixed Methodology Descriptive approach Acquiring implicit information from MRDs Prescriptive approach Manual construction of the Top Structure Merge approach: Taxonomy Construction Methodology

29 Semi-automatic methods for WN construction 2002 International WordNet Conference - 29 Word sense: zumo_1_1 Attached-to:c_art_subst type. Definition:líquido que se extrae de las flores, hierbas, frutos, etc. (liquid extracted from flowers, herbs, fruits, etc). Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

30 Semi-automatic methods for WN construction 2002 International WordNet Conference - 30 A) Attaching DGILE senses to semantic primitives 1) First labelling: Conceptual Distance (Rigau 94) 2) Second labelling: Salient Words (Yarowsky 92) B) Filtering Process Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

31 Semi-automatic methods for WN construction 2002 International WordNet Conference - 31 A.1) First labelling: Conceptual Distance (Agirre et al. 94) length of the shortest path specificity of the concepts using WordNet Bilingual dictionary Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

32 Semi-automatic methods for WN construction 2002 International WordNet Conference - 32 abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

33 Semi-automatic methods for WN construction 2002 International WordNet Conference - 33 06 ARTIFACT abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

34 Semi-automatic methods for WN construction 2002 International WordNet Conference - 34 A.1) First labelling (Results) 29,205 labelled definitions (31%) 61% accuracy at a sense level 64% accuracy at a file level Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

35 Semi-automatic methods for WN construction 2002 International WordNet Conference - 35 A.2) Second labelling: Salient Words (Yarowsky 92) Importance local frequency appears more significantly more often in the corpus of a semantic category than at other points in the whole corpus Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

36 Semi-automatic methods for WN construction 2002 International WordNet Conference - 36 A.2) Second labelling (Results): 86,759 labelled definitions (93%) 80% accuracy at a file level biberón_1_1 ARTIFACT 4.8399 Frasco de cristal... (glass flask...) biberón_1_2 FOOD 7.4443 Leche que contiene este frasco... (milk contained in that flask...) Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

37 Semi-automatic methods for WN construction 2002 International WordNet Conference - 37 B) Filtering process (FOODs) removes all genus terms FILTER 1: not FOODs by the bilingual mapping FILTER 2: appear more often as genus in other Semantic Primitive FILTER 3: with a low frequency Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

38 Semi-automatic methods for WN construction 2002 International WordNet Conference - 38 B) Filtering process (FOOD Results) Merge approach: Taxonomy Construction Step 1: Selection of the main top beginners

39 Semi-automatic methods for WN construction 2002 International WordNet Conference - 39 Word sense:vino_1_1 Hypernym:zumo_1_1. Definition:zumo de uvas fermentado. (fermented juice of grapes). Word sense: rueda_2_1 Hypernym:vino_1_1. Definition:vino procedente de la región de Rueda (Valladolid). (wine from the region of Rueda). Merge approach: Taxonomy Construction Step 2: Exploiting Genus

40 Semi-automatic methods for WN construction 2002 International WordNet Conference - 40 Genus Sense Identification 97% accuracy for nouns Genus Sense Disambiguation Unrestricted WSD (coverage 100%) Knowledge-based WSD (not supervised) Eight Heuristics (McRoy 92) Combining several lexical resources Combining several methods Merge approach: Taxonomy Construction Step 2: Exploiting Genus

41 Semi-automatic methods for WN construction 2002 International WordNet Conference - 41 Results: Merge approach: Taxonomy Construction Step 2: Exploiting Genus

42 Semi-automatic methods for WN construction 2002 International WordNet Conference - 42 Knowledge provided by each heuristic: Merge approach: Taxonomy Construction Step 2: Exploiting Genus

43 Semi-automatic methods for WN construction 2002 International WordNet Conference - 43 F2+F3>9: 35,099 definitions F2+F3>4: 40,754 definitions No filters: 111,624 definitions Merge approach: Taxonomy Construction Step 2: Exploiting Genus

44 Semi-automatic methods for WN construction 2002 International WordNet Conference - 44... zumo_1_1 vino_1_1 quianti_1_1 zumo_1_1 vino_1_1 raya_1_8 zumo_1_1 vino_1_1 requena_1_1 zumo_1_1 vino_1_1 reserva_1_12 zumo_1_1 vino_1_1 ribeiro_1_1 zumo_1_1 vino_1_1 rioja_1_1 zumo_1_1 vino_1_1 roete_1_1 zumo_1_1 vino_1_1 rosado_1_3 zumo_1_1 vino_1_1 rueda_2_1 zumo_1_1 vino_1_1 sherry_1_1 zumo_1_1 vino_1_1 tarragona_1_1 zumo_1_1 vino_1_1 tintilla_1_1 zumo_1_1 vino_1_1 tintorro_1_1 zumo_1_1 vino_1_1 toro_3_1... Merge approach: Taxonomy Construction Step 2: Exploiting Genus

45 Semi-automatic methods for WN construction 2002 International WordNet Conference - 45 C1 C2 C3 C5 C6 C4 Merge approach: Mapping Taxonomies Step 3: Creation of translation links

46 Semi-automatic methods for WN construction 2002 International WordNet Conference - 46 C1 C2 C3 C5 C6 C4 Merge approach: Mapping Taxonomies Step 3: Creation of translation links

47 Semi-automatic methods for WN construction 2002 International WordNet Conference - 47 Connecting already existing Hierarchies Relaxation labelling Algorithm Constraints Between Spanish taxonomy automatically derived from an MRD (Rigau et al. 98) WordNet using a bilingual MRD Merge approach: Mapping Taxonomies Step 3: Creation of translation links

48 Semi-automatic methods for WN construction 2002 International WordNet Conference - 48 animal ave rapaz faisán (Tops ) (person ) (animal ) (artifact ) (food ) (person ) (animal ) (food ) (animal ) (artifact ) (food ) (person ) Merge approach: Mapping Taxonomies Step 3: Creation of translation links

49 Semi-automatic methods for WN construction 2002 International WordNet Conference - 49 Iterative algorithm for function optimisation based on local information it can deal with any kind of constraints variables (senses of the taxonomy) labels (synsets) Finds a weight assignment for each possible label for each variable weights for the labels of the same variable add up to one weight assignation satisfies -to the maximum possible extent- the set of constraints Merge approach: Mapping Taxonomies Step 3: Relaxation Labelling algorithm

50 Semi-automatic methods for WN construction 2002 International WordNet Conference - 50 1) Start with a random weight assignment 2) Compute the support value for each label of each variable (according to the constraints) 3) Increase the weights of the labels more compatible with context and decrease those and decrease those of the less compatible labels. 4) If a stopping/convergence is satisfied, stop, otherwise go to step 2. Merge approach: Mapping Taxonomies Step 3: Relaxation Labelling algorithm

51 Semi-automatic methods for WN construction 2002 International WordNet Conference - 51 Rely on the taxonomy structure Coded with three characters X: Spanish Taxonomy,I (immediate), Y: English Taxonomy,A (ancestor) X: Relation, E (hypernym), O (hyponym), B (both) Examples: IIEAAB ++ ++ Merge approach: Mapping Taxonomies Step 3: Constraints

52 Semi-automatic methods for WN construction 2002 International WordNet Conference - 52 PolyTOK, FOKTOK, FNOKtotal animal279 (90%)30 (91%)209 (90%) food166 (94%)3 (100%)169 (94%) cognition198 (67%)27 (90%)225 (69%) communication533 (77%)40 (97%)573 (78%) allTOK, FOKTOK, FNOKtotal animal424 (93%)62 (95%)486 (90%) food166 (94%)83 (100%)249 (96%) cognition200 (67%)245 (90%)445 (82%) communication536 (77%)234 (97%)760 (81%) Merge approach: Mapping Taxonomies Step 3: Results

53 Semi-automatic methods for WN construction 2002 International WordNet Conference - 53 piel visón marta (substance ) Merge approach: Mapping Taxonomies Step 3: Example

54 Semi-automatic methods for WN construction 2002 International WordNet Conference - 54 Setting Words and Works Merge approach Taxonomy construction: monolingual MRDs Mapping taxonomies: bilingual MRDs Expand approach Translation of synsets: bilingual MRDs Interface for manual revision Conclusions Outline

55 Semi-automatic methods for WN construction 2002 International WordNet Conference - 55 Expand approach Take one WordNet as starting point Translate synsets: English: Basque: We obtain a structurally similar WordNet in another language, but some of the synsets will be missing Use bilingual dictionary maintien n.m. (attitude) bearing; (conservation) maintenance 1. Keep bilingual senses (Agirre & Rigau 95) maintien1: (attitude) bearing maintien2: (conservation) maintenance 2. Produce all translation pairs (Atserias et al. 97) maintien - bearing maintien - maintenance

56 Semi-automatic methods for WN construction 2002 International WordNet Conference - 56 Expand approach - produce all pairings Used to produce the first version of the nominal part of the Spanish WordNet Based on WN 1.5 Both directions in bilingual dictionary merged Spanish/English: 19,443 translation pairs English/Spanish: 16,324 translation pairs Harmonized bilingual: 28,131 translation pairs Overlap with WordNet: 12,665 nouns (14%) Two methods: class methods: consider only pairings conceptual distance methods: consider similarity of synsets

57 Semi-automatic methods for WN construction 2002 International WordNet Conference - 57 Expand approach - produce all pairings Ten class methods Four monosemic criteria Four polysemic criteria Two hybrid criteria Three conceptual distance methods CD1: using pairwise word coocurrences CD2: using headword and genus CD3: using bilingual Spanish entries with multiple translations

58 Semi-automatic methods for WN construction 2002 International WordNet Conference - 58 SWEWSWEW SWEW SW EW SW Expand approach - produce all pairings Class methods: Four possible configurations for pairs which either share an English word or an Spanish word: connected graph.

59 Semi-automatic methods for WN construction 2002 International WordNet Conference - 59 SWEW SWEW Synset SWEW SW Synset SWEW SW Expand approach - produce all pairings 4 monosemous class methods: All English words involved are monosemous in WN M1 M2 M3 M4

60 Semi-automatic methods for WN construction 2002 International WordNet Conference - 60 SWEW SWEW Synset+ SWEW SW Synset+ SWEW SW Expand approach - produce all pairings 4 polysemous class methods: At least 1 English word involved is polysemous P1 P2 P3 P4

61 Semi-automatic methods for WN construction 2002 International WordNet Conference - 61 SW SW Expand approach - produce all pairings 2 other class methods Variant criterion: two synonyms share a single SW Field criterion: use field indicators in bilingual entry when available VC FC

62 Semi-automatic methods for WN construction 2002 International WordNet Conference - 62 Expand approach - produce all pairings Ten class methods (results)

63 Semi-automatic methods for WN construction 2002 International WordNet Conference - 63 Using WordNet Bilingual dictionary Expand approach - produce all pairings Conceptual Distance Methods (Agirre et al. 94) length of the shortest path specificity of the concepts

64 Semi-automatic methods for WN construction 2002 International WordNet Conference - 64 Expand approach - produce all pairings Three conceptual distance methods CD1: using pairwise word coocurrences from monolingual dict. CD2: using headword and genus from monolingual def. CD3: using bilingual Spanish entries with multiple translations

65 Semi-automatic methods for WN construction 2002 International WordNet Conference - 65 abadía_1_2 Iglesia o monasterio regido por un abad o abadesa (abbey, a church or a monastery ruled by an abbot or an abbess) Expand approach - produce all pairings CD2

66 Semi-automatic methods for WN construction 2002 International WordNet Conference - 66 Expand approach - produce all pairings Three conceptual distance methods

67 Semi-automatic methods for WN construction 2002 International WordNet Conference - 67 Expand approach - produce all pairings Keep SW-synset pairs produced by methods with precision above 85% mono1 mono2 mono3 mono4 variant But, if two different methods propose the same SW- synset pair, it could get better confidence try pairwise combinations of methods

68 Semi-automatic methods for WN construction 2002 International WordNet Conference - 68 Expand approach - produce all pairings Combinations of methods: higher precision in some cases

69 Semi-automatic methods for WN construction 2002 International WordNet Conference - 69 Expand approach - produce all pairings Results SpWN v 0.1 BasqueWN v 0.1: 2 bilingual dictionaries apply first 8 class methods only

70 Semi-automatic methods for WN construction 2002 International WordNet Conference - 70 Expand approach - bilingual senses Smaller experiment with French bilingual dictionary Based on WN 1.5 Keep structure of bilingual dictionary: bilingual senses 21322 entries, 31502 subentries (senses) 16917 nominal subentries Disambiguation is possible: 1) one of the translation words is monosemous in WordNet. 2) the translation is given by a list of words. 3) a cue in French is provided alongside the translation. 4) a semantic field is provided. folie 1: n.f. madness provision 1: n.f. supply, store trésor 2: n.m. (ressources) (comm.) finances

71 Semi-automatic methods for WN construction 2002 International WordNet Conference - 71 Expand approach - bilingual senses Possible disambiguation case by case

72 Semi-automatic methods for WN construction 2002 International WordNet Conference - 72 Expand approach - bilingual senses Disambiguation: Conceptual Density [Agirre & Rigau 95]: The relatedness of a certain word-sense to the words in the context (cue, other translations and/or semantic field) allows us to select that sense over the others Bilingual dictionary + English WordNet

73 Semi-automatic methods for WN construction 2002 International WordNet Conference - 73 Expand approach - summary all pairings coverage and precision produce a good starting point for manual revision bilingual senses keeping bilingual sense might help precision very low coverage

74 Semi-automatic methods for WN construction 2002 International WordNet Conference - 74 Setting Words and Works Merge approach Taxonomy construction: monolingual MRDs Mapping taxonomies: bilingual MRDs Expand approach Translation of synsets: bilingual MRDs Interface for manual revision Conclusions Outline

75 Semi-automatic methods for WN construction 2002 International WordNet Conference - 75 Interface for manual revision

76 Semi-automatic methods for WN construction 2002 International WordNet Conference - 76 Interface for manual revision

77 Semi-automatic methods for WN construction 2002 International WordNet Conference - 77 Interface for manual revision Client/Server achitecture Data base: EWN design implemented on SQL tables English, Spanish, Catalan and Basque Interface: Perl CGIs that access the data bases

78 Semi-automatic methods for WN construction 2002 International WordNet Conference - 78 Setting Words and Works Merge approach Taxonomy construction: monolingual MRDs Mapping taxonomies: bilingual MRDs Expand approach Translation of synsets: bilingual MRDs Interface for manual revision Conclusions Outline

79 Semi-automatic methods for WN construction 2002 International WordNet Conference - 79 Conclusions methods to automatically produce preliminary versions methods mainly for nouns need to manually revise merge approach method to produce native hierarchies and word senses trust lexicographers hierarchies need to map to ILI in independent process expand approach method to translate English WNs synsets trusts WNs hierarchies, sense distinctions mapping to ILI for free

80 Semi-automatic methods for WN construction 2002 International WordNet Conference - 80 Conclusions merge approach manual work: revising and re-organizing the automatic hierarchies (hard) revising automatic mapping (very hard) allows for integration of data from monolingual dictionary definition text itself lexico-semantic relations from definitions expand approach manual work: revise proposed translations (fast) review the rest of the synsets (many) include glosses

81 Semi-automatic methods for WN construction 2002 International WordNet Conference - 81 Conclusions Interface to speed up manual work Downloadable soon: WN 1.5 in data-base format Interface WordNets can be checked at: http://www.lsi.upc.es/~nlp http://ixa.si.ehu.es/wei3.html This slides will (shortly) be available at : http://... http://www.ji.si.ehu.es/users/eneko

82 Semi-automatic methods for WN construction 2002 International WordNet Conference - 82 Bibliography

83 Semi-automatic methods for WordNet construction German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Universitat Politècnica de Catalunya Eneko Agirre http://www.ji.si.upc.es/users/eneko IxA NLP Group University of the Basque Country 2002 International WordNet Conference


Download ppt "Semi-automatic methods for WordNet construction German Rigau i Claramunt TALP Research Center Universitat Politècnica de Catalunya."

Similar presentations


Ads by Google