CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.
Advertisements

Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
5/16/2015CPSC503 Winter CPSC 503 Computational Linguistics Computational Lexical Semantics Lecture 14 Giuseppe Carenini.
 Aim to get back on Tuesday  I grade on a curve ◦ One for graduate students ◦ One for undergraduate students  Comments?
What is Statistical Modeling
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
CS 4705 Relationships among Words, Semantic Roles, and Word- Sense Disambiguation.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
CS 4705 Semantic Roles and Disambiguation. Today Semantic Networks: Wordnet Thematic Roles Selectional Restrictions Selectional Association Conceptual.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Presented by Zeehasham Rasheed
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Semi-Supervised Learning
Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.
Natural Language Processing Lecture 22—11/14/2013 Jim Martin.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.
Text Classification, Active/Interactive learning.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Lexical Semantics & Word Sense Disambiguation CMSC Natural Language Processing May 15, 2003.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.
Classification Techniques: Bayesian Classification
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Lexical Semantics September 13, /20/2018.
Lecture 21 Computational Lexical Semantics
Word Sense Disambiguation
Statistical NLP: Lecture 9
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

CS 4705 Word Sense Disambiguation

Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based techniques

Disambiguation via Selectional Restrictions A step toward semantic parsing –Different verbs select for different thematic roles wash the dishes (takes washable-thing as patient) serve delicious dishes (takes food-type as patient) Method: rule-to-rule syntactico-semantic analysis –Semantic attachment rules are applied as sentences are syntactically parsed VP --> V NP V  serve {theme:food-type} –Selectional restriction violation: no parse

Requires: –Write selectional restrictions for each sense of each predicate – or use FrameNetFrameNet Serve alone has 15 verb senses –Hierarchical type information about each argument (a la WordNet) WordNet How many hypernyms does dish have? How many lexemes are hyponyms of dish?hyponyms But also: –Sometimes selectional restrictions don’t restrict enough (Which dishes do you like?) –Sometimes they restrict too much (Eat dirt, worm! I’ll eat my hat!)

Can we take a more statistical approach? How likely is dish/crockery to be the object of serve? dish/food? A simple approach (baseline): predict the most likely sense –Why might this work? –When will it fail? A better approach: learn from a tagged corpus –What needs to be tagged? An even better approach: Resnik’s selectional association (1997, 1998) –Estimate conditional probabilities of word senses from a corpus tagged only with verbs and their arguments (e.g. ragout is an object of served -- Jane served/V ragout/Obj

How do we get the word sense probabilities? –For each verb object (e.g. ragout) Look up hypernym classes in WordNet Distribute “credit” for this object sense occurring with this verb among all the classes to which the object belongs Brian served/V the dish/Obj Jane served/V food/Obj If ragout has N hypernym classes in WordNet, add 1/N to each class count (including food) as object of serve If tureen has M hypernym classes in WordNet, add 1/M to each class count (including dish) as object of serve –Pr(Class|v) is the count(c,v)/count(v) –How can this work? Ambiguous words have many superordinate classes John served food/the dish/tuna/curry There is a common sense among these which gets “credit” in each instance, eventually dominating the likelihood score

To determine most likely sense of ‘bass’ in Bill served bass –Having previously assigned ‘credit’ for the occurrence of all hypernyms of things like fish and things like musical instruments to all their hypernym classes (e.g. ‘fish’ and ‘musical instruments’) –Find the hypernym classes of bass (including fish and musical instruments) –Choose the class C with the highest probability, given that the verb is serve Results: –Baselines: random choice of word sense is 26.8% choose most frequent sense (NB: requires sense-labeled training corpus) is 58.2% –Resnik’s: 44% correct with only pred/arg relations labeled

Machine Learning Approaches Learn a classifier to assign one of possible word senses for each word –Acquire knowledge from labeled or unlabeled corpus –Human intervention only in labeling corpus and selecting set of features to use in training Input: feature vectors –Target (dependent variable) –Context (set of independent variables) Output: classification rules for unseen text

Supervised Learning Training and test sets with words labeled as to correct sense (It was the biggest [fish: bass] I’ve seen.) –Obtain values of independent variables automatically (POS, co-occurrence information, …) –Run classifier on training data –Test on test data –Result: Classifier for use on unlabeled data

Input Features for WSD POS tags of target and neighbors Surrounding context words (stemmed or not) Punctuation, capitalization and formatting Partial parsing to identify thematic/grammatical roles and relations Collocational information: –How likely are target and left/right neighbor to co- occur Co-occurrence of neighboring words –Intuition: How often does sea or words with bass

–How do we proceed? Look at a window around the word to be disambiguated, in training data Which features accurately predict the correct tag? Can you think of other features might be useful in general for WSD? Input to learner, e.g. Is the bass fresh today? [w-2, w-2/pos, w-1,w-/pos,w+1,w+1/pos,w+2,w+2/pos… [is,V,the,DET,fresh,RB,today,N...

Types of Classifiers Naïve Bayes –ŝ = p(s|V), or –Where s is one of the senses possible and V the input vector of features –Assume features independent, so probability of V is the product of probabilities of each feature, given s, so – and p(V) same for any ŝ –Then

Rule Induction Learners (e.g. Ripper) Given a feature vector of values for independent variables associated with observations of values for the training set (e.g. [fishing,NP,3,…] + bass 2 ) Produce a set of rules that perform best on the training data, e.g. –bass 2 if w-1==‘fishing’ & pos==NP –…

–like case statements applying tests to input in turn fish within window--> bass 1 striped bass--> bass 1 guitar within window--> bass 2 bass player--> bass 1 … –Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio Decision Lists

Bootstrapping I –Start with a few labeled instances of target item as seeds to train initial classifier, C –Use high confidence classifications of C on unlabeled data as training data –Iterate Bootstrapping II –Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries –One Sense per Discourse hypothesis

Unsupervised Learning Cluster feature vectors to ‘discover’ word senses using some similarity metric (e.g. cosine distance) –Represent each cluster as average of feature vectors it contains –Label clusters by hand with known senses –Classify unseen instances by proximity to these known and labeled clusters Evaluation problem –What are the ‘right’ senses?

–Cluster impurity –How do you know how many clusters to create? –Some clusters may not map to ‘known’ senses

Dictionary Approaches Problem of scale for all ML approaches –Build a classifier for each sense ambiguity Machine readable dictionaries (Lesk ‘86) –Retrieve all definitions of content words occurring in context of target (e.g. the happy seafarer ate the bass) –Compare for overlap with sense definitions of target entry (bass 2 : a type of fish that lives in the sea) –Choose sense with most overlap Limits: Entries are short --> expand entries to ‘related’ words

Summary Many useful approaches developed to do WSD –Supervised and unsupervised ML techniques –Novel uses of existing resources (WN, dictionaries) Future –More tagged training corpora becoming available –New learning techniques being tested, e.g. co-training Next class: –Ch 17:3-5