Presentation is loading. Please wait.

Presentation is loading. Please wait.

Results and Evaluation of Hungarian Nominal WordNet v1.0 Márton Miháltz MorphoLogic The 2 nd Global WordNet Conference, 2004, Brno.

Similar presentations


Presentation on theme: "Results and Evaluation of Hungarian Nominal WordNet v1.0 Márton Miháltz MorphoLogic The 2 nd Global WordNet Conference, 2004, Brno."— Presentation transcript:

1 Results and Evaluation of Hungarian Nominal WordNet v1.0 Márton Miháltz MorphoLogic The 2 nd Global WordNet Conference, 2004, Brno

2 Outline 1. About the Hungarian WordNet project 2. Automatic methods 3. Evaluation 4. Combination of results 5. Future work

3 Hungarian WN project 1. •Started in 2001 •MA Thesis; MorhoLogic project •1st (current) phase: nominal database •Minimizing costs: –Expand method

4 Hungarian WN project 1. {body} {arm}{leg} {human, person} {parent} {mother}{father} ember, személy, individum hozzátartozó, rokon anyaapakarláb test Hypernym Meronym Antonym

5 Hungarian WN project 2. •Started in 2000 •MA Thesis; MorhoLogic project •1st phase: nominal database •Minimizing costs: –Expand method –Semantic Similarity Hypothesis –Automatic methods

6 Hungarian WN project 3.  Ambiguity problem:  9 disambiguation heuristics (Atserias et al, 1997) ló horse knight {horse, Equus caballus} {horse} (gymnastic apparatus) {knight, horse} (chess figure) {knight} (person of noble origin) (avg. 1.71)(avg. 2.16)

7 Hungarian WN Project 4. •Electronic resources: –Princeton WN 1.6 –Hungarian-English bilingual dictionary •17,000—12,400 headwords (WN) –Monolingual (Hungarian) explanatory dictionary •42,000 nominal entries •64,000 definitions

8 Disambiguation Heuristics 1. A) Heuristics based on bilingual dictionary: •Monosemous translation: •Variant English words: •Intersection method: HuEn 1 {ss 1 } … HuEn 1 {ss 1 } En 2 … Hu 1 En 1 {ss 1 } En 2 … …Hu 2

9 Disambiguation Heuristics 2. A) Heuristics based on bilingual (cont’d): • Identifying derivational hypernyms: –Hungarian endocentric N+N compounds –Humor analyzer: last segment (head) = hypernym hangverseny+zongora  zongora (‘concert+piano’  ‘piano’) –Conceptual Distance

10 Disambiguation Heuristics 3. B) Parsing monolingual definitions: •Synonyms: lélekelemzés_1_1: A tudat alatti lelki jelenségek vizsgálata; pszichoanalízis [psychoanalysis] •Hypernyms: koala_1_1: Ausztráliában honos, fán élő, medvére emlékeztető erszényes emlős. [mammal] •Latin equivalents: ló_1_1 [horse]: Vontatásra és lovaglásra haszn., páratlan ujjú patás háziállat (Equus Caballus) HuEn 1 {ss 1 } En 2 …Syn En j1 … {ss 2 }Hu En i1 … {ss 1 }HypHyp HuEn 1 {ss 1 } … Lat min

11 Disambiguation Heuristics 4. C) Methods for increasing coverage (+9.2%): •Derivational hypernym of hyp./syn.: HuHyp/Syn (  Eng) DerivHyp Eng 2 •Lookup of hyp./syn. in monolingual: HuHyp/Syn (  Eng) Monolingual: monosemous? Hyp Eng 2 YES Eng 1

12 Results & Validation •Results from 9 unsupervised heuristics: –Total: 13,948 Hung. Nouns  12,085 PWN synsets (22,169 connections) –Different methods: different confidence! •Validation: –Gold standard: 400 nouns random from biling./Hu –Manual disambiguation (2,201 possible connections) –IAA: 84.7% –Evaluation of 9 result sets against GS –Precision: 49%—92% –Coverage: 49%—0,5%

13 Evaluation of Individual Methods MethodPrecision#Words Variant92.01%85 Latin82.00%1,600 Synonym80.00%1,360 DerivHyp70.31%2,974 Incr. cov %1,275 Mono65.15%11,772 Intersection58.56%2,975 Incr. cov %1,024 Hypernym48.55%8,372

14 Combining results 1. •Combinig different result sets: –2 different confidence thresholds –1-4. methods: precision  75% (2,445 n, 2,170 ss) –1-6. methods: precision  63% (12,275 n, 12,004 ss) •Validating and combining results not included in the previous step –8 of 13 intersection sets: precision  75% –9 intersection sets : precision  63%

15 Combining results 2. •Combination of the 2 base sets & the intersection sets w.r.t. the 2 thresholds Result set#Words#Synsets#ConnectionsPrecision 1. set % 2. set %

16 Further Work 1. •Increase precision: –Complete manual checking of words in synsets –Editing of hierarchies •Increase coverage: –Use additional bilingual dictionaries w/ best auto methods –Use Hung. taxonomies from monolingual dict. –Add multiwords –Add derivational links –Upgrade to WN 2.0

17 Further Work 2. •Funding from IKTA grant ( ?): –Manual supervision –Connect to EuroWordNet Top Ontology/ILI –Do verbs (adjectives, adverbs) –Add special domain: financial terms

18 Thank you for your attention! Márton Miháltz


Download ppt "Results and Evaluation of Hungarian Nominal WordNet v1.0 Márton Miháltz MorphoLogic The 2 nd Global WordNet Conference, 2004, Brno."

Similar presentations


Ads by Google