Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics.

Similar presentations


Presentation on theme: "Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics."— Presentation transcript:

1 Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

2 Outline Introduction WordNet WN Domains Experimentation Evaluation and results Discussion Conclusions

3 Introduction To semantically enrich any WN version with the semantic domain labels of MultiWordNet Domains WN is an standard resource for semantic processing Effectiveness of Word Domain Disambiguation The work presented explores the automatic and sistematic assignment of domain labels to glosses Proposed Method can be used to correct and verify the suggested labeling

4 WordNet The version WN1.6 was used because of the availability of WN Domains

5 WN Domains TOP pure_science biology botany zoology entomology anatomy mathematics geometry statistics... WordNet Domain hierarchy developed at IRST (Magnini and Cavagliá, 2000)

6 WN Domains The synsets have been annotated semiautomatically with one or more labels Most of synsets it has single a label Distribution of domain labels for synset noun = verb = adj = adv = Average labels for synset

7 WN Domains A domain may include synsets of different syntactic categories : e.g. MEDICINE doctor#1(n) operar#7(v) medical#1(a) clinically#1(r) A domain label may also contain senses from different Wn subhierarchies. e.g. SPORT athleta#1  life-form#1 game-equipment#1  physical-object#1 sport#1  act#2 playing-field#1  location#1

8 WN Domains Synsets that have more than one label, do not seem to follow any pattern sultana#n#1 (pale yellow seedless grape used for raisins and wine) morocco#n#2 (a soft pebble-grained leather made from goatskin; used for shoes and book bindings etc.) canicola_fever#n#1(an acute feverish disease in people and in dogs marked by gastroenteritis and mild jaundice) blue#n#1, blueness#n#1 (the color of the clear sky in the daytime; "he had eyes of bright blue") Botany Gastronomy Anatomy Zoology Medicine Physiology Zoology Color Quality

9 WN Domains FACTOTUM : Used to mark the senses of WN that do not have a specific domain STOP Senses: The synsets that appear frequently in different contexts, for instance: numbers, colours, etc. Word Sense Disambiguation Word Domain Disambiguation Text Categorization, etc. Applications of WN Domains

10 Experimentation Process to automatically assign domain labels to WN1.6 glosses Validation procedures of the consistency of the domains assignment in WN1.6, and especially, the automatic assignment of the factotum labels Distribution of synset with and without the domain label factotum in WN1.6

11 Experimentación Test set was randomly selected (around 1%) and the other synsets were used as a training set Corpus test for nouns and verbs

12 Experimentation castle#n#4, castling#n#1CHESS SPORT castle castling | interchanging the positions of the king and a rook castlechess castlesport castlingchess castlingsport interchanging chess interchangingsport interchanging chess interchangingsport interchangingchess interchangingsport kingchess kingsport rookchess rooksport Calculation of frequency castlechess68 castlesport27 castlehystory18 castlearchictecture57 castlelaw12 castletourism24 …

13 M2: Association Ratio Experimentation Measures Ar(w,D) = Pr(w|D)log 2 (Pr(w|D) / Pr(w)) M3: Logarithm formula log 2 (N*c(w,D) / c(w)c(D)) M1: Square root formula c(w,D) - 1/N*c(w)c(D) c(w,D)

14 Experimentation CALCULATION MATRIX OF WEIGHTS orange botany orange gastronomy orange color orange jewellery orange entomology orange quality orange hunting orange geology orange chemistry orange biology VALIDATION TRAINING

15 Experimentation gloss variant V D =  weigth(w i,d j )*percentage person POSITION 1: person = POSITION 2: politics = POSITION 3: law = leader | a person who rules or guides or inspires others leader#n#1 PERSON politics 4.30 history 3.33 religion 2.19 person 1.78 mythology 1.17 commerce 1.11 person law 8.01 economy 4.74 religion 4.24 anthropology 3.74 sexuality 3.53 politics 3.49 law 2.70 factotum 2.09 computer_science2.05 mathematics 1.83 grammar 1.68 play 1.57 linguistics 1.54 politics 1.35 tourism 1.64 industry 1.54 person 1.46 mechanics 1.26 factotum 1.24 occultism0.98 pedagogy 0.93 psychology 0.96 factotum 0.82

16 Evaluation y Results: nouns Results for nouns with factotum CF AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for nouns without factotum SF

17 Evaluation y Results: verbs Results for verbs with factotum CF AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) Results for verbs without factotum SF MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy).

18 Evaluation y Results On average, the method assigns: Noun : 1.23 domains labels (1.170) Verb : 1.20 domains labels (1.078) We obtain better results with nouns The best average results were obtained with the M1 measure The first proposed label (noun): 70% accuracy The results of verbs are worse than nouns, one of the reasons may be the high number of verbal synsets labels with factotum domain

19 Discussion Monosemic words: credit application#n#1 (an application for a line of credit) Domains: SCHOOL Proposal 1. Banking Proposal 2. Economy Banking economy banking

20 Discussion Relation between labels: Academic_program#n#1 (a program of education in liberal arts and sciences (usually in preparation for higher education)) Domains: PEDAGOGY Proposal 1. School Proposal 2. University pedagogy schooluniversity

21 Discussion shopping#n#1 (searching for or buying goods or services: "went shopping for a reliable plumber"; "does her shopping at the mall rather than down town") Domains: ECONOMY Proposal 1. Commerce social_science commerceeconomy Relation between labels:

22 Discussion Fire_control_radar#n#1 (radar that controls the delivery of fire on a military target) Domains: MERCHANT_NAVY Proposal 1. Military social_science transport merchant_navy military Relation between labels:

23 Discussion Uncertain cases: birthmark#n#1 (a blemish on the skin formed before birth) Domains: QUALITY Proposal 1. Medicine bardolatry#n#1 (idolization of William Shakespeare) Domains: RELIGION Proposal 1. History Proposal 1. Literature

24 Conclusions The procedure to assign automatically domain labels to WN gloss seems to be dificult The proposal process is very reliable with the first proposal labels The proposal labels are ordered by priority It is posible to add new correct labels or validate the old ones

25 Mauro Castillo V. Francis Real V. German Rigau C. Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet GWC 2004


Download ppt "Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics."

Similar presentations


Ads by Google