Word Sense Disambiguation

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Chapter 5: Introduction to Information Retrieval

Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

What is Statistical Modeling

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Recommender systems Ram Akella November 26 th 2008.

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.

1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Natural Language Processing Lecture 22—11/14/2013 Jim Martin.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Word Sense Disambiguation and Information Retrieval ByGuitao Gao Qing Ma Prof:Jian-Yun Nie.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Automatic Writing Evaluation

Semi-Supervised Clustering

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

School of Computer Science & Engineering

Statistical NLP: Lecture 13

Lexical Semantics September 13, /20/2018.

Lecture 21 Computational Lexical Semantics

Lecture 16: Lexical Semantics, Wordnet, etc

Category-Based Pseudowords

Statistical NLP: Lecture 9

Lecture 26 Lexical Semantics

Revision (Part II) Ke Chen

Revealing priors on category structures through iterated learning

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Revision (Part II) Ke Chen

Word Embedding Word2Vec.

Family History Technology Workshop

Text Categorization Berlin Chen 2003 Reference:

Information Retrieval

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

Statistical NLP : Lecture 9 Word Sense Disambiguation

Statistical NLP: Lecture 10

CPSC 503 Computational Linguistics

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Presentation transcript:

Word Sense Disambiguation September 27, 2006 11/11/2018

Word-Sense Disambiguation Word sense disambiguation refers to the process of selecting the right sense for a word from among the senses that the word is known to have Semantic selection restrictions can be used to disambiguate Ambiguous arguments to unambiguous predicates Ambiguous predicates with unambiguous arguments Ambiguity all around 11/11/2018

Word-Sense Disambiguation We can use selectional restrictions for disambiguation. He cooked simple dishes. He broke the dishes. But sometimes, selectional restrictions will not be enough to disambiguate. What kind of dishes do you recommend? -- we cannot know what sense is used. There can be two lexemes (or more) with multiple senses. They serve vegetarian dishes. Selectional restrictions may block the finding of meaning. If you want to kill Turkey, eat its banks. Kafayı yedim. These situations leave the system with no possible meanings, and they can indicate a metaphor. 11/11/2018

WSD Approaches Disambiguation based on manually created rules Disambiguation using machine readable dictionaries Disambiguation using thesauri Disambiguation based on unsupervised machine learning with corpora 11/11/2018

Disambiguation based on manually created rules Weiss’ approach [Lesk 1988] : set of rules to disambiguate five words context rule: within 5 words template rule: specific location accuracy : 90% IR improvement: 1% Small & Rieger’s approach [Small 1982] : Expert system 11/11/2018

WSD and Selection Restrictions Ambiguous arguments Prepare a dish Wash a dish Ambiguous predicates Serve Denver Serve breakfast Both Serves vegetarian dishes 11/11/2018

WSD and Selection Restrictions This approach is complementary to the compositional analysis approach. You need a parse tree and some form of predicate-argument analysis derived from The tree and its attachments All the word senses coming up from the lexemes at the leaves of the tree Ill-formed analyses are eliminated by noting any selection restriction violations 11/11/2018

Problems As we saw last time, selection restrictions are violated all the time. This doesn’t mean that the sentences are ill-formed or preferred less than others. This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated 11/11/2018

Can we take a more statistical approach? How likely is dish/crockery to be the object of serve? dish/food? A simple approach (baseline): predict the most likely sense Why might this work? When will it fail? A better approach: learn from a tagged corpus What needs to be tagged? An even better approach: Resnik’s selectional association (1997, 1998) Estimate conditional probabilities of word senses from a corpus tagged only with verbs and their arguments (e.g. ragout is an object of served -- Jane served/V ragout/Obj 11/11/2018

How do we get the word sense probabilities? For each verb object (e.g. ragout) Look up hypernym classes in WordNet Distribute “credit” for this object sense occurring with this verb among all the classes to which the object belongs Brian served/V the dish/Obj Jane served/V food/Obj If ragout has N hypernym classes in WordNet, add 1/N to each class count (including food) as object of serve If tureen has M hypernym classes in WordNet, add 1/M to each class count (including dish) as object of serve Pr(Class|v) is the count(c,v)/count(v) How can this work? Ambiguous words have many superordinate classes John served food/the dish/tuna/curry There is a common sense among these which gets “credit” in each instance, eventually dominating the likelihood score 11/11/2018

To determine most likely sense of ‘bass’ in Bill served bass Having previously assigned ‘credit’ for the occurrence of all hypernyms of things like fish and things like musical instruments to all their hypernym classes (e.g. ‘fish’ and ‘musical instruments’) Find the hypernym classes of bass (including fish and musical instruments) Choose the class C with the highest probability, given that the verb is serve Results: Baselines: random choice of word sense is 26.8% choose most frequent sense (NB: requires sense-labeled training corpus) is 58.2% Resnik’s: 44% correct with only pred/arg relations labeled 11/11/2018

Machine Learning Approaches Learn a classifier to assign one of possible word senses for each word Acquire knowledge from labeled or unlabeled corpus Human intervention only in labeling corpus and selecting set of features to use in training Input: feature vectors Target (dependent variable) Context (set of independent variables) Output: classification rules for unseen text 11/11/2018

WSD Tags A dictionary sense? What’s a tag? For example, for WordNet an instance of “bass” in a text has 8 possible tags or labels (bass1 through bass8). 11/11/2018

WordNet Bass The noun ``bass'' has 8 senses in WordNet bass - (the lowest part of the musical range) bass, bass part - (the lowest part in polyphonic music) bass, basso - (an adult male singer with the lowest voice) sea bass, bass - (flesh of lean-fleshed saltwater fish of the family Serranidae) freshwater bass, bass - (any of various North American lean-fleshed freshwater fishes especially of the genus Micropterus) bass, bass voice, basso - (the lowest adult male singing voice) bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny-finned fishes) 11/11/2018

Representations Most supervised ML approaches require a very simple representation for the input training data. Vectors of sets of feature/value pairs i.e. files of comma-separated values So our first task is to extract training data from a corpus with respect to a particular instance of a target word This typically consists of a characterization of the window of text surrounding the target 11/11/2018

Representations This is where ML and NLP intersect If you stick to trivial surface features that are easy to extract from a text, then most of the work is in the ML system If you decide to use features that require more analysis (say parse trees) then the ML part may be doing less work (relatively) if these features are truly informative 11/11/2018

Surface Representations Collocational and co-occurrence information Collocational Encode features about the words that appear in specific positions to the right and left of the target word Often limited to the words themselves as well as they’re part of speech Co-occurrence Features characterizing the words that occur anywhere in the window regardless of position Typically limited to frequency counts 11/11/2018

Collocational Position-specific information about the words in the window guitar and bass player stand [guitar, NN, and, CJC, player, NN, stand, VVB] In other words, a vector consisting of [position n word, position n part-of-speech…] 11/11/2018

Co-occurrence Information about the words that occur within the window. First derive a set of terms to place in the vector. Then note how often each of those terms occurs in a given window. 11/11/2018

Supervised Learning Training and test sets with words labeled as to correct sense (It was the biggest [fish: bass] I’ve seen.) Obtain values of independent variables automatically (POS, co-occurrence information, …) Run classifier on training data Test on test data Result: Classifier for use on unlabeled data 11/11/2018

Input Features for WSD POS tags of target and neighbors Surrounding context words (stemmed or not) Punctuation, capitalization and formatting Partial parsing to identify thematic/grammatical roles and relations Collocational information: How likely are target and left/right neighbor to co-occur Co-occurrence of neighboring words Intuition: How often does sea or words with bass 11/11/2018

How do we proceed? Input to learner, e.g. Is the bass fresh today? Look at a window around the word to be disambiguated, in training data Which features accurately predict the correct tag? Can you think of other features might be useful in general for WSD? Input to learner, e.g. Is the bass fresh today? [w-2, w-2/pos, w-1,w-/pos,w+1,w+1/pos,w+2,w+2/pos… [is,V,the,DET,fresh,RB,today,N... 11/11/2018

Classifiers Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible Naïve Bayes (the right thing to try first) Decision lists Decision trees Neural nets Support vector machines Nearest neighbor methods… 11/11/2018

Classifiers The choice of technique, in part, depends on the set of features that have been used Some techniques work better/worse with features with numerical values Some techniques work better/worse with features that have large numbers of possible values For example, the feature the word to the left has a fairly large number of possible values 11/11/2018

Types of Classifiers Naïve Bayes ŝ = p(s|V), or where s is one of the senses possible and V the input vector of features Assume features independent, so probability of V is the product of probabilities of each feature, given s, so and p(V) same for any ŝ Then 11/11/2018

Rule Induction Learners (e.g. Ripper) Given a feature vector of values for independent variables associated with observations of values for the training set (e.g. [fishing,NP,3,…] + bass2) Produce a set of rules that perform best on the training data, e.g. bass2 if w-1==‘fishing’ & pos==NP … 11/11/2018

Decision Lists like case statements applying tests to input in turn fish within window --> bass1 striped bass --> bass1 guitar within window --> bass2 bass player --> bass1 … Yarowsky ‘96’s approach orders tests by individual accuracy on entire training set based on log-likelihood ratio 11/11/2018

Bootstrapping I Bootstrapping II Start with a few labeled instances of target item as seeds to train initial classifier, C Use high confidence classifications of C on unlabeled data as training data Iterate Bootstrapping II Start with sentences containing words strongly associated with each sense (e.g. sea and music for bass), either intuitively or from corpus or from dictionary entries One Sense per Discourse hypothesis 11/11/2018

Statistical Word-Sense Disambiguation Where s is a vector of senses, V is the vector representation of the input By Bayesian rule By making independence assumption of meanings. This means that the result is the product of the probabilities of its individual features given that its sense 11/11/2018

Problems Given these general ML approaches, how many classifiers do I need to perform WSD robustly One for each ambiguous word in the language How do you decide what set of tags/labels/senses to use for a given word? Depends on the application 11/11/2018

Unsupervised Learning Cluster feature vectors to ‘discover’ word senses using some similarity metric (e.g. cosine distance) Represent each cluster as average of feature vectors it contains Label clusters by hand with known senses Classify unseen instances by proximity to these known and labeled clusters Evaluation problem What are the ‘right’ senses? 11/11/2018

How do you know how many clusters to create? Cluster impurity How do you know how many clusters to create? Some clusters may not map to ‘known’ senses 11/11/2018

Dictionary Approaches Problem of scale for all ML approaches Build a classifier for each sense ambiguity Machine readable dictionaries (Lesk ‘86) Retrieve all definitions of content words occurring in context of target (e.g. the happy seafarer ate the bass) Compare for overlap with sense definitions of target entry (bass2: a type of fish that lives in the sea) Choose sense with most overlap Limits: Entries are short --> expand entries to ‘related’ words 11/11/2018

Disambiguation using machine readable dictionaries Lesk’s approach [Lesk 1988] : Senses are represented by different definitions Look up context words definitions Find co-occurring words Select most similar sense Accuracy: 50% - 70%. Problem: not enough overlapping words between definitions 11/11/2018

Disambiguation using machine readable dictionaries Wilks’ approach [Wilks 1990] : Attempt to solve Lesk’s problem Expanding dictionary definition Use Longman Dictionary of Contemporary English ( LDOCE ) more word co-occurring evidence collected Accuracy: between 53% and 85%. 11/11/2018

Wilks’ approach [Wilks 1990] Commonly co-occurring words in LDOCE. [Wilks 1990] 11/11/2018

Disambiguation using machine readable dictionaries Luk’s approach [Luk 1995]: Statistical sense disambiguation Use definitions from LDOCE co-occurrence data collected from Brown corpus defining concepts : 1792 words used to write definitions of LDOCE LDOCE pre-processed :conceptual expansion 11/11/2018

Luk’s approach [Luk 1995]: Entry in LDOCE Conceptual expansion 1. (an order given by a judge which fixes) a punishment for a criminal found guilty in court found guilty in court { {order, judge, punish, crime, criminal,find, guilt, court}, 2. a group of words that forms a statement, command, exclamation, or question, usu. contains a subject and a verb, and (in writing) begins with a capital letter and ends with one of the marks. ! ? {group, word, form, statement, command, question, contain, subject, verb, write, begin, capital, letter, end, mark} } 11/11/2018 Noun “sentence” and its conceptual expansion [Luk 1995]

Luk’s approach [Luk 1995] cont. Collect co-occurrence data of defining concepts by constructing a two-dimensional Concept Co-occurrence Data Table (CCDT) Brown corpus divided into sentences collect conceptual co-occurrence data for each defining concept which occurs in the sentence Insert collect data in the Concept Co-occurrence Data Table. 11/11/2018

Luk’s approach [Luk 1995] cont. Score each sense S with respect to context C 11/11/2018 [Luk 1995]

Luk’s approach [Luk 1995] cont. Select sense with the highest score Accuracy: 77% Human accuracy: 71% 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] Resources used: Roget's Thesaurus Grolier Multimedia Encyclopedia Senses of a word: categories in Roget's Thesaurus 1042 broad categories covering areas like, tools/machinery or animals/insects 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. tool, implement, appliance, contraption, apparatus, utensil, device, gadget, craft, machine, engine, motor, dynamo, generator, mill, lathe, equipment, gear, tackle, tackling, rigging, harness, trappings, fittings, accoutrements, paraphernalia, equipage, outfit, appointments, furniture, material, plant, appurtenances, a wheel, jack, clockwork, wheel-work, spring, screw, Some words placed into the tools/machinery category [Yarowsky 1992] 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Collect context for each category: From Grolier Encyclopedia each occurrence of each member of the category extracts 100 surrounding words Sample occurrence of words in the tools/machinery category [Yarowsky 1992] 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Identify and weight salient words: Sample salient words for Roget categories 348 and 414 [Yarowsky 1992] To disambiguate a word: sums up the weights of all salient words appearing in context Accuracy: 92% disambiguating 12 words 11/11/2018

Summary Many useful approaches developed to do WSD Future Next class: Supervised and unsupervised ML techniques Novel uses of existing resources (WN, dictionaries) Future More tagged training corpora becoming available New learning techniques being tested, e.g. co-training Next class: Ch 17:3-5 11/11/2018

11/11/2018

Disambiguation based on manually created rules Weiss’ approach [Lesk 1988] : set of rules to disambiguate five words context rule: within 5 words template rule: specific location accuracy : 90% IR improvement: 1% Small & Rieger’s approach [Small 1982] : Expert system 11/11/2018

Disambiguation using machine readable dictionaries Lesk’s approach [Lesk 1988] : Senses are represented by different definitions Looked up context words definitions Find co-occurring words Select most similar sense Accuracy: 50% - 70%. Problem: no enough overlapping words between definitions 11/11/2018

Disambiguation using machine readable dictionaries Wilks’ approach [Wilks 1990] : Attempt to solve Lesk’s problem Expanding dictionary definition Use Longman Dictionary of Contemporary English ( LDOCE ) more word co-occurring evidence collected Accuracy: between 53% and 85%. 11/11/2018

Wilks’ approach [Wilks 1990] Commonly co-occurring words in LDOCE. [Wilks 1990] 11/11/2018

Disambiguation using machine readable dictionaries Luk’s approach [Luk 1995]: Statistical sense disambiguation Use definitions from LDOCE co-occurrence data collected from Brown corpus defining concepts : 1792 words used to write definitions of LDOCE LDOCE pre-processed :conceptual expansion 11/11/2018

Luk’s approach [Luk 1995]: Entry in LDOCE Conceptual expansion 1. (an order given by a judge which fixes) a punishment for a criminal found guilty in court found guilty in court { {order, judge, punish, crime, criminal,find, guilt, court}, 2. a group of words that forms a statement, command, exclamation, or question, usu. contains a subject and a verb, and (in writing) begins with a capital letter and ends with one of the marks. ! ? {group, word, form, statement, command, question, contain, subject, verb, write, begin, capital, letter, end, mark} } 11/11/2018 Noun “sentence” and its conceptual expansion [Luk 1995]

Luk’s approach [Luk 1995] cont. Collect co-occurrence data of defining concepts by constructing a two-dimensional Concept Co-occurrence Data Table (CCDT) Brown corpus divided into sentences collect conceptual co-occurrence data for each defining concept which occurs in the sentence Insert collect data in the Concept Co-occurrence Data Table. 11/11/2018

Luk’s approach [Luk 1995] cont. Score each sense S with respect to context C 11/11/2018 [Luk 1995]

Luk’s approach [Luk 1995] cont. Select sense with the highest score Accuracy: 77% Human accuracy: 71% 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] Resources used: Roget's Thesaurus Grolier Multimedia Encyclopedia Senses of a word: categories in Roget's Thesaurus 1042 broad categories covering areas like, tools/machinery or animals/insects 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. tool, implement, appliance, contraption, apparatus, utensil, device, gadget, craft, machine, engine, motor, dynamo, generator, mill, lathe, equipment, gear, tackle, tackling, rigging, harness, trappings, fittings, accoutrements, paraphernalia, equipage, outfit, appointments, furniture, material, plant, appurtenances, a wheel, jack, clockwork, wheel-work, spring, screw, Some words placed into the tools/machinery category [Yarowsky 1992] 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Collect context for each category: From Grolier Encyclopedia each occurrence of each member of the category extracts 100 surrounding words Sample occurrence of words in the tools/machinery category [Yarowsky 1992] 11/11/2018

Approaches using Roget's Thesaurus [Yarowsky 1992] cont. Identify and weight salient words: Sample salient words for Roget categories 348 and 414 [Yarowsky 1992] To disambiguate a word: sums up the weights of all salient words appearing in context Accuracy: 92% disambiguating 12 words 11/11/2018

Introduction to WordNet(1) Online thesaurus system Synsets: Synonymous Words Hierachical Relationship 11/11/2018

Introduction to WordNet(2) [Sanderson 2000] 11/11/2018

Voorhees’ Disambg. Experiment Calculation of Semantic Distance: Synset and Context words Word’s Sense: Synset closest to Context Words Retrieval Result: Worse than non-Disambig. 11/11/2018

Gonzalo’s IR experiment(1) Two Questions Can WordNet really offer any potential for text retrieval How is text Retrieval performance affected by the disambiguation errors? 11/11/2018

Gonzalo’s IR experiment(2) Text Collection: Summary and Document Experiments 1. Standard Smart Run 2. Indexed In Terms of Word-Sense 3. Indexed In Terms of Synset 4. Introduction of Disambiguation Error 11/11/2018

Gonzalo’s IR experiment(3) Experiements %correct document retrieved Indexed by synsets 62.0 Indexing by word senses 53.2 Indexing by words 48.0 Indexing by synsets(5% error) 62.0 Id. with 10% errors 60.8 Id. with 20% errors 56.1 Id. with 30% errors 54.4 Id. with all possible 52.6 Id. with 60% errors 49.1 11/11/2018

Gonzalo’s IR experiment(4) Disambiguation with WordNet can improve text retrieval Solution lies in reliable Automatic WSD technique 11/11/2018

Disambiguation With Unsupervised Learning Yarowsky’s Unsupervised Method One Sense Per Collocation eg: Plant(manufacturing/life) One Sense Per Discourse eg: defense(War/Sports) 11/11/2018

Yarowsky’s Unsupervised Method cont. Algorithm Details Step1:Store Word and its contexts as line eg:….zonal distribution of plant life….. Step2: Identify a few words that represent the word Sense eg. plant(manufacturing/life) Step3a: Get rules from the training set plant + X => A, weight plant + Y => B, weight Step3b:Use the rules created in 3a to classify all occurrences of plant sample set. 11/11/2018

Yarowsky’s Unsupervised Method cont. Step3c: Use one-sense-per-discourse rule to filter or augment this addition Step3d: Repeat Step 3 a-b-c iteratively. Step4: the training converges on a stable residual set. Step 5: the result will be a set of rules. Those rules will be used to disambiguate the word “plant”. eg. plant + growth => life plant + car => manufacturing 11/11/2018

Yarowsky’s Unsupervised Method cont. Advantages of this method: Better accuracy compared to other unsupervised method No need for costly hand-tagged training sets(supervised method) 11/11/2018

Schütze and Pedersen’s approach [Schütze 1995] Source of word sense definitions Not using a dictionary or thesaurus Only using only the corpus to be disambiguated (Category B TREC-1 collection ) Thesaurus construction Collect a (symmetric ) term-term matrix C Entry cij : number of times that words i and j co-occur in a symmetric window of total size k Use SVD to reduce the dimensionality 11/11/2018

Schütze and Pedersen’s approach [Schütze 1995] cont. Thesaurus vector: columns Semantic similarity: cosine between columns Thesaurus: associate each word with its nearest neighbors Context vector: summing thesaurus vectors of context words 11/11/2018

Schütze and Pedersen’s approach [Schütze 1995] cont. Disambiguation algorithm Identify context vectors corresponding to all occurrences of a particular word Partition them into regions of high density Tag a sense for each such region Disambiguating a word: Compute context vector of its occurrence Find the closest centroid of a region Assign the occurrence the sense of that centroid 11/11/2018

Schütze and Pedersen’s approach [Schütze 1995] cont. Accuracy: 90% Application to IR replacing the words by word senses sense based retrieval’s average precision for 11 points of recall increased 4% with respect to word based. Combine the ranking for each document: average precision increased: 11% Each occurrence is assigned n(2,3,4,5) senses; average precision increased: 14% for n=3 11/11/2018

Schütze and Pedersen’s approach [Schütze 1995] cont. 11/11/2018

Conclusion How much can WSD help improve IR effectiveness? Open question Weiss: 1%, Voorhees’ method : negative Krovetz and Croft, Sanderson : only useful for short queries Schütze and Pedersen’s approaches and Gonzalo’s experiment : positive result WSD must be accurate to be useful for IR Schütze and Pedersen’s, Yarowsky’s algorithm: promising for IR Luk’s approach : robust for data sparse, suitable for small corpus. 11/11/2018

References [Krovetz 92] R. Krovetz & W.B. Croft (1992). Lexical Ambiguity and Information Retrieval, in ACM Transactions onInformation Systems, 10(1). Gonzalo 1998] J. Gonzalo, F. Verdejo, I. Chugur and J. Cigarran, “Indexing with WordNet synsets can improve Text Retrieval”, Proceedings of the COLING/ACL ’98 Workshop on Usage of WordNet for NLP, Montreal,1998 [Gonzalo 1992] R. Krovetz & W.B. Croft . “Lexical Ambiguity and Information Retrieval”, in ACM Transactions on Information Systems, 10(1), 1992 [Lesk 1988] M. Lesk , “They said true things, but called them by wrong names” – vocabulary problems in retrieval systems, in Proc. 4th Annual Conference of the University of Waterloo Centre for the New OED, 1988 [Luk 1995] A.K. Luk. “Statistical sense disambiguation with relatively small corpora using dictionary definitions”. In Proceedings of the 33rd Annual Meeting of the ACL, Columbus, Ohio, June 1995. Association for Computational Linguistics. [Salton 83] G. Salton & M.J. McGill (1983). Introduction To Modern Information Retrieval. The SMART and SIRE experimental retrieval systems, in New York: McGraw-Hill [Sanderson 1997] Sanderson, M. Word Sense Disambiguation and Information Retrieval, PhD Thesis, Technical Report (TR-1997-7) of the Department of Computing Science at the University of Glasgow, Glasgow G12 8QQ, UK. [Sanderson 2000] Sanderson, Mark, “Retrieving with Good Sense”, http://citeseer.nj.nec.com/sanderson00retrieving.html, 2000 11/11/2018

References cont. [Schütze 1995] H. Schütze & J.O. Pedersen. “Information retrieval based on word senses”, in Proceedings of the Symposium on Document Analysis and Information Retrieval, 4: 161-175. [Small 1982] S. Small & C. Rieger , “Parsing and comprehending with word experts (a theoryand its realisation) ” in Strategies for Natural Language Processing, W.G. Lehnert & M.H. Ringle, Eds., LEA: 89-148, 1982 [Voorhees 1993] E. M. Voorhees, “Using WordNet™ to disambiguate word sense for text retrieval, in Proceedings of ACM SIGIR Conference”, (16): 171-180. 1993 [Weiss 73] S.F. Weiss (1973). Learning to disambiguate, in Information Storage and Retrieval, 9:33-41, 1973 [Wilks 1990] Y. Wilks, D. Fass, C. Guo, J.E. Mcdonald, T. Plate, B.M. Slator (1990). ProvidingMachine Tractable Dictionary Tools, in Machine Translation, 5: 99-154, 1990 [Yarowsky 1992] D. Yarowsky, `“Word sense disambiguation using statistical models of Roget’s categories trained on large corpora, in Proceedings of COLING Conference”: 454-460, 1992 [Yarowsky 1994] Yarowsky, D. “Decision lists for lexical ambiguity resolution:Application to Accent Restoration in Spanish and French.” In Proceedings of the 32rd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, 1994 [Yarowsky 1995] Yarowsky, D. “Unsupervised word sense disambiguation rivaling supervised methods.” In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 189-- 196, Cambridge, MA, 1995 11/11/2018