600.465 - Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Slides:

Advertisements

Similar presentations

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.

Albert Gatt Corpora and Statistical Methods Lecture 13.

Introduction to Information Retrieval

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.

Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.

Bayesian Networks. Male brain wiring Female brain wiring.

Intro to NLP - J. Eisner1 Grouping Words.

Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.

Text Classification, Active/Interactive learning.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.

Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Automatic Writing Evaluation

Step 1: Specify a null hypothesis

Approaches to Machine Translation

Text Mining CSC 600: Data Mining Class 20.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Bayesian Modeling of Lexical Resources for Low-Resource Settings

Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon

Natural Language Processing (NLP)

Intro to NLP and Deep Learning

Lecture 15: Text Classification & Naive Bayes

Lecture 21 Computational Lexical Semantics

CS 4/527: Artificial Intelligence

Using UMLS CUIs for WSD in the Biomedical Domain

Statistical NLP: Lecture 9

Objective of This Course

Revision (Part II) Ke Chen

Machine Learning in Practice Lecture 11

Word Embedding Word2Vec.

Introduction Task: extracting relational facts from text

Approaches to Machine Translation

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Text Mining CSC 576: Data Mining.

Grouping Words Intro to NLP - J. Eisner.

Natural Language Processing (NLP)

Word embeddings (continued)

Unsupervised Learning: Clustering

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Statistical NLP : Lecture 9 Word Sense Disambiguation

CS249: Neural Language Model

Natural Language Processing (NLP)

Presentation transcript:

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

slide courtesy of D. Yarowsky

Intro to NLP - J. Eisner9 Representing Word as Vector  Could average over many occurrences of the word...  Each word type has a different vector?  Each word token has a different vector?  Each word sense has a different vector? (for this one, we need sense-tagged training data) (is this more like a type vector or a token vector?)  What is each of these good for?

Intro to NLP - J. Eisner10 Each word type has a different vector  We saw this yesterday  It’s good for grouping words  similar semantics?  similar syntax?  depends on how you build the vector

Intro to NLP - J. Eisner11 Each word token has a different vector  Good for splitting words - unsupervised WSD  Cluster the tokens: each cluster is a sense!  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt  that had been passed to a second party who made a financial decision  A future obliges each party to the contract to fulfil it by

Intro to NLP - J. Eisner12 Each word sense has a different vector  Represent each new word token as vector, too  Now assign each token the closest sense  (could lump together all tokens of the word in the same document: assume they all have same sense)  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  let you know that there’s a party at my house tonight. Directions: Drive  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt ? ?

Intro to NLP - J. Eisner13 Where can we get sense- labeled training data?  To do supervised WSD, need many examples of each sense in context  have turned it into the hot dinner-party topic. The comedy is the  selection for the World Cup party, which will be announced on May 1  the by-pass there will be a street party. "Then," he says, "we are going  let you know that there’s a party at my house tonight. Directions: Drive  in the 1983 general election for a party which, when it could not bear to  to attack the Scottish National Party, who look set to seize Perth and  number-crunchers within the Labour party, there now seems little doubt ? ?

Intro to NLP - J. Eisner14 Where can we get sense- labeled training data?  Sources of sense-labeled training text:  Human-annotated text - expensive  Bilingual text (Rosetta stone) – can figure out which sense of “plant” is meant by how it translates  Dictionary definition of a sense is one sample context  Roget’s thesaurus entry of a sense is one sample context hardly any data per sense – but we’ll use it later to get unsupervised training started  To do supervised WSD, need many examples of each sense in context

Intro to NLP - J. Eisner15 A problem with the vector model  Bad idea to treat all context positions equally:  Possible solutions:  Faraway words don’t count as strongly?  Words in different positions relative to plant are different elements of the vector?  (i.e., (pesticide, -1) and (pesticide,+1) are different features)  Words in different syntactic relationships to plant are different elements of the vector?

Intro to NLP - J. Eisner16 Just one cue is sometimes enough... slide courtesy of D. Yarowsky (modified)

Intro to NLP - J. Eisner17 An assortment of possible cues... slide courtesy of D. Yarowsky (modified) generates a whole bunch of potential cues – use data to find out which ones work best

Intro to NLP - J. Eisner18 An assortment of possible cues... slide courtesy of D. Yarowsky (modified) merged ranking of all cues of all these types only a weak cue... but we’ll trust it if there’s nothing better

Intro to NLP - J. Eisner19 Final decision list for lead (abbreviated) slide courtesy of D. Yarowsky (modified) To disambiguate a token of lead :  Scan down the sorted list  The first cue that is found gets to make the decision all by itself  Not as subtle as combining cues, but works well for WSD Cue’s score is its log-likelihood ratio: log [ p(cue | sense A) [smoothed] / p(cue | sense B) ]

slide courtesy of D. Yarowsky (modified) very readable paper at sketched on the following slides... unsupervised learning!

slide courtesy of D. Yarowsky

unsupervised learning! reasonably accurate reasonably accurate 1% 98% slide courtesy of D. Yarowsky (modified)

unsupervised learning! slide courtesy of D. Yarowsky

unsupervised learning! no surprise what the top cues are but other cues also good for discriminating these seed examples slide courtesy of D. Yarowsky (modified)

unsupervised learning! the strongest of the new cues help us classify more examples... from which we can extract and rank even more cues that discriminate them...

slide courtesy of D. Yarowsky unsupervised learning!

life and manufacturing are no longer even in the top cues! many unexpected cues were extracted, without supervised training slide courtesy of D. Yarowsky (modified) Now use the final decision list to classify test examples: top ranked cue appearing in this test example

Intro to NLP - J. Eisner28 “One sense per discourse”  A final trick:  All tokens of plant in the same document probably have the same sense. slide courtesy of D. Yarowsky (modified) unsupervised learning! 3 tokens in same document gang up on the 4th

Intro to NLP - J. Eisner29 “One sense per discourse”  A final trick:  All tokens of plant in the same document probably have the same sense. slide courtesy of D. Yarowsky (modified) unsupervised learning!

Intro to NLP - J. Eisner30 these stats come from term papers of known authorship (i.e., supervised training) A Note on Combining Cues slide courtesy of D. Yarowsky (modified)

Intro to NLP - J. Eisner31 A Note on Combining Cues slide courtesy of D. Yarowsky (modified) “Naive Bayes” model for classifying text (Note the naive independence assumptions!) We’ll look at it again in a later lecture Would this kind of sentence be more typical of a student A paper or a student B paper?

Intro to NLP - J. Eisner32 “Naive Bayes” model for classifying text Used here for word sense disambiguation A Note on Combining Cues slide courtesy of D. Yarowsky (modified) plantA plantB plantA Would this kind of sentence be more typical of a plant A context or a plant B context?