Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
EuroCondens SGB E.
Addition and Subtraction Equations
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
Measurements and Their Uncertainty 3.1
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Order the numbers and find the middle value.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
The basics for simulations
Use of Facilitative Vocabulary Techniques in Teachers with Differing Views of Collaboration Danielle LaPrairie Eastern Illinois University.
The Pecan Market How long will prices stay this high?? Brody Blain Vice – President.
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
MM4A6c: Apply the law of sines and the law of cosines.
Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined.
Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
Opportunities for Prevention & Intervention in Child Maltreatment Investigations Involving Infants in Ontario Barbara Fallon, PhD Assistant Professor Jennifer.
Adding Up In Chunks.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Static Equilibrium; Elasticity and Fracture
Age Biased Technical and Organisational Change, Training and Employment Prospects of Older Workers Luc Behaghel, Eve Caroli and Muriel Roger Paris School.
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
PSSA Preparation.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
1 Adolescence Chapter 11: Sexuality 2 What do these women have in common?
Select a time to count down from the clock above
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
What impact does the address have on the tribe?
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Comparing Ontology-based and Corpus- based Domain Annotations in WordNet. A paper by: Bernardo Magnini Carlo Strapparava Giovanni Pezzulo Alfio Glozzo.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Presentation transcript:

Automatic Assignment of Domain Labels to WordNet Mauro Castillo V. Francis Real V. German Rigau C. GWC 2004 Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

Outline Introduction WordNet WN Domains Experimentation Evaluation and results Discussion Conclusions

Introduction To semantically enrich any WN version with the semantic domain labels of MultiWordNet Domains WN is an standard resource for semantic processing Effectiveness of Word Domain Disambiguation The work presented explores the automatic and sistematic assignment of domain labels to glosses Proposed Method can be used to correct and verify the suggested labeling

WordNet The version WN1.6 was used because of the availability of WN Domains

WN Domains TOP pure_science biology botany zoology entomology anatomy mathematics geometry statistics... WordNet Domain hierarchy developed at IRST (Magnini and Cavagliá, 2000)

WN Domains The synsets have been annotated semiautomatically with one or more labels Most of synsets it has single a label Distribution of domain labels for synset noun = verb = adj = adv = Average labels for synset

WN Domains A domain may include synsets of different syntactic categories : e.g. MEDICINE doctor#1(n) operar#7(v) medical#1(a) clinically#1(r) A domain label may also contain senses from different Wn subhierarchies. e.g. SPORT athleta#1 life-form#1 game-equipment#1 physical-object#1 sport#1 act#2 playing-field#1 location#1

WN Domains Synsets that have more than one label, do not seem to follow any pattern sultana#n#1 (pale yellow seedless grape used for raisins and wine) morocco#n#2 (a soft pebble-grained leather made from goatskin; used for shoes and book bindings etc.) canicola_fever#n#1(an acute feverish disease in people and in dogs marked by gastroenteritis and mild jaundice) blue#n#1, blueness#n#1 (the color of the clear sky in the daytime; "he had eyes of bright blue") Botany Gastronomy Anatomy Zoology Medicine Physiology Zoology Color Quality

WN Domains FACTOTUM : Used to mark the senses of WN that do not have a specific domain STOP Senses: The synsets that appear frequently in different contexts, for instance: numbers, colours, etc. Word Sense Disambiguation Word Domain Disambiguation Text Categorization, etc. Applications of WN Domains

Experimentation Process to automatically assign domain labels to WN1.6 glosses Validation procedures of the consistency of the domains assignment in WN1.6, and especially, the automatic assignment of the factotum labels Distribution of synset with and without the domain label factotum in WN1.6

Experimentación Test set was randomly selected (around 1%) and the other synsets were used as a training set Corpus test for nouns and verbs

Experimentation castle#n#4, castling#n#1CHESS SPORT castle castling | interchanging the positions of the king and a rook castlechess castlesport castlingchess castlingsport interchanging chess interchangingsport interchanging chess interchangingsport interchangingchess interchangingsport kingchess kingsport rookchess rooksport Calculation of frequency castlechess68 castlesport27 castlehystory18 castlearchictecture57 castlelaw12 castletourism24 …

M2: Association Ratio Experimentation Measures Ar(w,D) = Pr(w|D)log 2 (Pr(w|D) / Pr(w)) M3: Logarithm formula log 2 (N*c(w,D) / c(w)c(D)) M1: Square root formula c(w,D) - 1/N*c(w)c(D) c(w,D)

Experimentation CALCULATION MATRIX OF WEIGHTS orange botany orange gastronomy orange color orange jewellery orange entomology orange quality orange hunting orange geology orange chemistry orange biology VALIDATION TRAINING

Experimentation gloss variant V D = weigth(w i,d j )*percentage person POSITION 1: person = POSITION 2: politics = POSITION 3: law = leader | a person who rules or guides or inspires others leader#n#1 PERSON politics 4.30 history 3.33 religion 2.19 person 1.78 mythology 1.17 commerce 1.11 person law 8.01 economy 4.74 religion 4.24 anthropology 3.74 sexuality 3.53 politics 3.49 law 2.70 factotum 2.09 computer_science2.05 mathematics 1.83 grammar 1.68 play 1.57 linguistics 1.54 politics 1.35 tourism 1.64 industry 1.54 person 1.46 mechanics 1.26 factotum 1.24 occultism0.98 pedagogy 0.93 psychology 0.96 factotum 0.82

Evaluation y Results: nouns Results for nouns with factotum CF AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy). Results for nouns without factotum SF

Evaluation y Results: verbs Results for verbs with factotum CF AP: Accuracy first label AT: Accuracy all labels P : Precision R : Recall F1 : 2PR/(P+R) Results for verbs without factotum SF MiA : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct MiD : Measures the success of each formula (M1, M2 or M3) when the first proposed label is correct (or subsumed as correct one in the domain hierarchy).

Evaluation y Results On average, the method assigns: Noun : 1.23 domains labels (1.170) Verb : 1.20 domains labels (1.078) We obtain better results with nouns The best average results were obtained with the M1 measure The first proposed label (noun): 70% accuracy The results of verbs are worse than nouns, one of the reasons may be the high number of verbal synsets labels with factotum domain

Discussion Monosemic words: credit application#n#1 (an application for a line of credit) Domains: SCHOOL Proposal 1. Banking Proposal 2. Economy Banking economy banking

Discussion Relation between labels: Academic_program#n#1 (a program of education in liberal arts and sciences (usually in preparation for higher education)) Domains: PEDAGOGY Proposal 1. School Proposal 2. University pedagogy schooluniversity

Discussion shopping#n#1 (searching for or buying goods or services: "went shopping for a reliable plumber"; "does her shopping at the mall rather than down town") Domains: ECONOMY Proposal 1. Commerce social_science commerceeconomy Relation between labels:

Discussion Fire_control_radar#n#1 (radar that controls the delivery of fire on a military target) Domains: MERCHANT_NAVY Proposal 1. Military social_science transport merchant_navy military Relation between labels:

Discussion Uncertain cases: birthmark#n#1 (a blemish on the skin formed before birth) Domains: QUALITY Proposal 1. Medicine bardolatry#n#1 (idolization of William Shakespeare) Domains: RELIGION Proposal 1. History Proposal 1. Literature

Conclusions The procedure to assign automatically domain labels to WN gloss seems to be dificult The proposal process is very reliable with the first proposal labels The proposal labels are ordered by priority It is posible to add new correct labels or validate the old ones

Mauro Castillo V. Francis Real V. German Rigau C. Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Automatic Assignment of Domain Labels to WordNet GWC 2004

Discussion Relations WN: bowling#n#2 (a game in which balls are rolled at an object or group of objects with the aim of knocking them over) Domains: BOWLING Proposal 1. Play free_time game#n#2 bowling#n#2 play bowling sport play#n#16 play sport hyp hol

WN Domains Example (B. Magnini et. Al., 2001)

WN Domains N SFDOMAINSSUMOTOP ONTOLOGY #1 GroupEconomyCorporation Function Group Human #2 ObjectGeography GeologyLand-areaNatural Place Substance #3 PossessionEconomyKeeping Function Moneyrepresentation Part #4 ArtifactArchitecture EconomyBuilding Artifact Function Object #5 GroupFactotumCollectionGroup #6 ArtifactEconomyArtifact Artifact Container Instrument Object #7 ObjectGeography GeologyLand-areaNatural Place Solid Substance #8 PossessionEconomy PlayCurrency-measureFunction #9 ObjectArchitectureLand-area Natural Place Substance #10 ActTransportMotionAgentive Boundedevent Cause Condition Dynamic Purpose