Word sense disambiguation (1) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide set was adapted.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Advertisements

A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
11 Chapter 20 Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani.
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Advances in Word Sense Disambiguation Tutorial at ACL 2005 June 25, 2005 Ted Pedersen University of Minnesota, Duluth Rada.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Advances in Word Sense Disambiguation Tutorial at AAAI-2005 July 9, 2005 Rada Mihalcea University of North Texas Ted Pedersen.
Natural Language Processing Chapter 19 Computational Lexical Semantics Part 2 [Includes slides from a AAAI-2005 tutorial by Rada Mihalcea and Ted Pedersen]
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Advances in Word Sense Disambiguation Tutorial at AAAI-2005 July 9, 2005 Rada Mihalcea University of North Texas Ted Pedersen.
Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Part 3. Knowledge-based Methods for Word Sense Disambiguation.
Word Sense Disambiguation Part I – Introduction Alexander Fraser CIS, LMU München WSD and MT.
Word Sense Disambiguation (WSD)
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
11 Chapter 20 Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
1 Query Operations Relevance Feedback & Query Expansion.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Word Sense Disambiguation Reading: Chap 16-17, Jurafsky & Martin Instructor: Rada Mihalcea.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Clustering Word Senses Eneko Agirre, Oier Lopez de Lacalle IxA NLP group
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Using Semantic Relatedness for Word Sense Disambiguation
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.
Knowledge-based Methods for Word Sense Disambiguation From a tutorial at AAAI by Ted Pedersen and Rada Mihalcea [edited by J. Wiebe]
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Statistical NLP: Lecture 7
Lecture 21 Computational Lexical Semantics
Statistical NLP: Lecture 9
WordNet WordNet, WSD.
A method for WSD on Unrestricted Text
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Word sense disambiguation (1) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide set was adapted from a tutorial given by Rada Mihalcea & Ted Pedersen at ACL 2005

Slide 1 Definitions Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually comes from a dictionary or thesaurus. Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory. Unsupervised techniques

Slide 2 Computers versus Humans Polysemy – most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human… Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases…

Slide 3 Ambiguity for Humans - Newspaper Headlines! DRUNK GETS NINE YEARS IN VIOLIN CASE FARMER BILL DIES IN HOUSE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH

Slide 4 Ambiguity for a Computer The fisherman jumped off the bank and into the water. The bank down the street was robbed! Back in the day, we had an entire bank of computers devoted to this problem. The bank in that road is entirely too steep and is really dangerous. The plane took a bank to the left, and then headed off towards the mountains.

Slide 5 Early Days of WSD Noted as problem for Machine Translation (Weaver, 1949) A word can often only be translated if you know the specific sense intended (A bill in English could be a pico or a cuenta in Spanish) Bar-Hillel (1960) posed the following: Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Is “pen” a writing instrument or an enclosure where children play? …declared it unsolvable, left the field of MT!

Slide 6 Since then… 1970s s Rule based systems Rely on hand crafted knowledge sources 1990s Corpus based approaches Dependence on sense tagged text (Ide and Veronis, 1998) overview history from early days to s Hybrid Systems Minimizing or eliminating use of sense tagged text Taking advantage of the Web

Slide 7 Practical Applications Machine Translation Translate “bill” from English to Spanish Is it a “pico” or a “cuenta”? Is it a bird jaw or an invoice? Information Retrieval Find all Web Pages about “cricket” The sport or the insect? Question Answering What is George Miller’s position on gun control? The psychologist or US congressman? Knowledge Acquisition Add to KB: Herb Bergson is the mayor of Duluth. Minnesota or Georgia?

Slide 8 Knowledge-based WSD Task definition Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text Resources Yes Machine Readable Dictionaries Raw corpora No Manually annotated corpora Scope All open-class words

Slide 9 Machine Readable Dictionaries In recent years, most dictionaries made available in Machine Readable format (MRD) Oxford English Dictionary Collins Longman Dictionary of Ordinary Contemporary English (LDOCE) Thesauruses – add synonymy information Roget Thesaurus Semantic networks – add more semantic relations WordNet EuroWordNet

Slide 10 MRD – A Resource for Knowledge- based WSD For each word in the language vocabulary, an MRD provides: A list of meanings Definitions (for all word meanings) Typical usage examples (for most word meanings) WordNet definitions/examples for the noun plant 1.buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles“ 2.a living organism lacking the power of locomotion 3.something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant" 4.an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience

Slide 11 MRD – A Resource for Knowledge- based WSD A thesaurus adds: An explicit synonymy relation between word meanings A semantic network adds: Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, entailnment, etc. WordNet synsets for the noun “plant” 1. plant, works, industrial plant 2. plant, flora, plant life WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} holonym: {Plantae, kingdom Plantae, plant kingdom}

Slide 12 Lesk Algorithm (Michael Lesk 1986): Identify senses of words in context using definition overlap Algorithm: Retrieve from MRD all sense definitions of the words to be disambiguated Determine the definition overlap for all possible sense combinations Choose senses that lead to highest overlap Example: disambiguate PINE CONE PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees Pine#1  Cone#1 = 0 Pine#2  Cone#1 = 0 Pine#1  Cone#2 = 1 Pine#2  Cone#2 = 0 Pine#1  Cone#3 = 2 Pine#2  Cone#3 = 0

Slide 13 Lesk Algorithm for More than Two Words? I saw a man who is 98 years old and can still walk and tell jokes nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) 43,929,600 sense combinations! How to find the optimal sense combination? Simulated annealing (Cowie, Guthrie, Guthrie 1992) Define a function E = combination of word senses in a given text. Find the combination of senses that leads to highest definition overlap (redundancy) 1. Start with E = the most frequent sense for each word 2. At each iteration, replace the sense of a random word in the set with a different sense, and measure E 3. Stop iterating when there is no change in the configuration of senses

Slide 14 Lesk Algorithm: A Simplified Version Original Lesk definition: measure overlap between sense definitions for all words in context Identify simultaneously the correct senses for all words in context Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and current context Identify the correct sense for one word at a time Search space significantly reduced

Slide 15 Lesk Algorithm: A Simplified Version Example: disambiguate PINE in “Pine cones hanging in a tree” PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness Pine#1  Sentence = 1 Pine#2  Sentence = 0 Algorithm for simplified Lesk: 1.Retrieve from MRD all sense definitions of the word to be disambiguated 2.Determine the overlap between each sense definition and the current context 3.Choose the sense that leads to highest overlap

Slide 16 Evaluations of Lesk Algorithm Initial evaluation by M. Lesk 50-70% on short samples of text manually annotated set, with respect to Oxford Advanced Learner’s Dictionary Simulated annealing 47% on 50 manually annotated sentences Evaluation on Senseval-2 all-words data, with back-off to random sense (Mihalcea & Tarau 2004) Original Lesk: 35% Simplified Lesk: 47% Evaluation on Senseval-2 all-words data, with back-off to most frequent sense (Vasilescu, Langlais, Lapalme 2004) Original Lesk: 42% Simplified Lesk: 58%

Slide 17 Selectional Preferences A way to constrain the possible meanings of words in a given context E.g. “Wash a dish” vs. “Cook a dish” WASH-OBJECT vs. COOK-FOOD Capture information about possible relations between semantic classes Common sense knowledge Alternative terminology Selectional Restrictions Selectional Preferences Selectional Constraints

Slide 18 Acquiring Selectional Preferences From annotated corpora Circular relationship with the WSD problem Need WSD to build the annotated corpus Need selectional preferences to derive WSD From raw corpora Frequency counts Information theory measures Class-to-class relations

Slide 19 Preliminaries: Learning Word-to- Word Relations An indication of the semantic fit between two words 1. Frequency counts Pairs of words connected by a syntactic relations 2. Conditional probabilities Condition on one of the words

Slide 20 Learning Selectional Preferences (1) Word-to-class relations (Resnik 1993) Quantify the contribution of a semantic class using all the concepts subsumed by that class where

Slide 21 Learning Selectional Preferences (2) Determine the contribution of a word sense based on the assumption of equal sense distributions: e.g. “plant” has two senses  50% occurrences are sense 1, 50% are sense 2 Example: learning restrictions for the verb “to drink” Find high-scoring verb-object pairs Find “prototypical” object classes (high association score)

Slide 22 Using Selectional Preferences for WSD Algorithm: 1.Learn a large set of selectional preferences for a given syntactic relation R 2. Given a pair of words W 1 – W 2 connected by a relation R 3. Find all selectional preferences W 1 – C (word-to-class) or C 1 – C 2 (class-to-class) that apply 4. Select the meanings of W 1 and W 2 based on the selected semantic class Example: disambiguate coffee in “ drink coffee ” 1. (beverage) a beverage consisting of an infusion of ground coffee beans 2. (tree) any of several small trees native to the tropical Old World 3. (color) a medium to dark brown color Given the selectional preference “ DRINK BEVERAGE ” : coffee#1

Slide 23 Evaluation of Selectional Preferences for WSD Data set mainly on verb-object, subject-verb relations extracted from SemCor Compare against random baseline Results (Agirre and Martinez, 2000) Average results on 8 nouns Similar figures reported in (Resnik 1997)

Slide 24 Semantic Similarity Words in a discourse must be related in meaning, for the discourse to be coherent (Haliday and Hassan, 1976) Use this property for WSD – Identify related meanings for words that share a common context Context span: 1. Local context: semantic similarity between pairs of words 2. Global context: lexical chains

Slide 25 Semantic Similarity in a Local Context Similarity determined between pairs of concepts, or between a word and its surrounding context Relies on similarity metrics on semantic networks (Rada et al. 1989) carnivore wild dogwolf bearfeline, felidcanine, canidfissiped mamal, fissiped dachshund hunting doghyena dogdingo hyenadog terrier

Slide 26 Semantic Similarity Metrics for WSD Disambiguate target words based on similarity with one word to the left and one word to the right (Patwardhan, Banerjee, Pedersen 2002) Evaluation: 1,723 ambiguous nouns from Senseval-2 Among 5 similarity metrics, (Jiang and Conrath 1997) provide the best precision (39%) Example: disambiguate PLANT in “ plant with flowers ” PLANT 1.plant, works, industrial plant 2.plant, flora, plant life Similarity (plant#1, flower) = 0.2 Similarity (plant#2, flower) = 1.5 : plant#2

Slide 27 Semantic Similarity in a Global Context Lexical chains (Hirst and St-Onge 1988), (Haliday and Hassan 1976) “A lexical chain is a sequence of semantically related words, which creates a context and contributes to the continuity of meaning and the coherence of a discourse” Algorithm for finding lexical chains: Select the candidate words from the text. These are words for which we can compute similarity measures, and therefore most of the time they have the same part of speech. For each such candidate word, and for each meaning for this word, find a chain to receive the candidate word sense, based on a semantic relatedness measure between the concepts that are already in the chain, and the candidate word meaning. If such a chain is found, insert the word in this chain; otherwise, create a new chain.

Slide 28 Semantic Similarity of a Global Context A very long train traveling along the rails with a constant velocity v in a certain direction … train #1: public transport #2: order set of things #3: piece of cloth travel #1 change location #2: undergo transportation rail #1: a barrier # 2: a bar of steel for trains #3: a small bird

Slide 29 Lexical Chains for WSD Identify lexical chains in a text Usually target one part of speech at a time Identify the meaning of words based on their membership to a lexical chain Evaluation: (Galley and McKeown 2003) lexical chains on 74 SemCor texts give 62.09% (Mihalcea and Moldovan 2000) on five SemCor texts give 90% with 60% recall lexical chains “anchored” on monosemous words (Okumura and Honda 1994) lexical chains on five Japanese texts give 63.4%

Slide 30 Example: “ plant/flora ” is used more often than “ plant/factory ” - annotate any instance of PLANT as “ plant/flora ” Heuristics: Most Frequent Sense Identify the most often used meaning and use this meaning by default Word meanings exhibit a Zipfian distribution E.g. distribution of word senses in SemCor

Slide 31 E.g. The ambiguous word PLANT occurs 10 times in a discourse all instances of “ plant ” carry the same meaning Heuristics: One Sense Per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse (Gale, Church, Yarowksy 1992) What does this mean? Evaluation: 8 words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning The grain of salt: Performance depends on granularity (Krovetz 1998) experiments with words with more than two senses Performance of “one sense per discourse” measured on SemCor is approx. 70%

Slide 32 The ambiguous word PLANT preserves its meaning in all its occurrences within the collocation “ industrial plant ”, regardless of the context where this collocation occurs Heuristics: One Sense per Collocation A word tends to preserve its meaning when used in the same collocation (Yarowsky 1993) Strong for adjacent collocations Weaker as the distance between words increases An example Evaluation: 97% precision on words with two-way ambiguity Finer granularity: (Martinez and Agirre 2000) tested the “one sense per collocation” hypothesis on text annotated with WordNet senses 70% precision on SemCor words