DISTRIBUTIONAL WORD SIMILARITY David Kauchak CS159 Fall 2014.

Slides:



Advertisements
Similar presentations
Improved TF-IDF Ranker
Advertisements

TEXT SIMILARITY David Kauchak CS159 Spring Quiz #2  Out of 30 points  High:  Ave: 23  Will drop lowest quiz  I do not grade based on.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
The Google Similarity Distance  We’ve been talking about Natural Language parsing  Understanding the meaning in a sentence requires knowing relationships.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
Text Similarity David Kauchak CS457 Fall 2011.
Creating a Similarity Graph from WordNet
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
Measuring Semantic Similarity between Words Using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka Topic  Semantic similarity measures.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Computing Trust in Social Networks
1 Lecture 8 Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
PI  WORD SIMILARITY David Kauchak CS159 Spring 2011.
Results: Prominence prediction without lexical information Each type of feature reduces the error rate over the baseline. SRF and INF features appear to.
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
MACHINE LEARNING BASICS David Kauchak CS457 Fall 2011.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Text Classification, Active/Interactive learning.
I L OVE F OXS Foxs "I T ' S SO CUTE ! H OW ADORABLE !" THESE WORDS ARE JUST SOME OF THE WORDS THAT PEOPLE USUALLY SAY WHEN THEY SEE FOXS !
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
WORD SIMILARITY David Kauchak CS457 Fall Admin  Assignment 3 grades sent back  Quiz 2  Average 22.7  Assignment 4  Reading.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Chapter 6: Information Retrieval and Web Search
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Scary Animals By Miss Hill.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Vector Space Models.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
CSC 594 Topics in AI – Text Mining and Analytics
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
IR 6 Scoring, term weighting and the vector space model.
Plan for Today’s Lecture(s)
Semantic Processing with Context Analysis
CSC 594 Topics in AI – Natural Language Processing
Vector-Space (Distributional) Lexical Semantics
From frequency to meaning: vector space models of semantics
Measuring Complexity of Web Pages Using Gate
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
David Kauchak CS159 Spring 2019
Unsupervised learning of visual sense models for Polysemous words
Presentation transcript:

DISTRIBUTIONAL WORD SIMILARITY David Kauchak CS159 Fall 2014

Admin Assignment 5 out

Word similarity How similar are two words? sim(w 1, w 2 ) = ? ? score: rank: w w1w2w3w1w2w3 list: w 1 and w 2 are synonyms

Word similarity Four categories of approaches (maybe more)  Character-based turned vs. truned cognates (night, nacht, nicht, natt, nat, noc, noch)  Semantic web-based (e.g. WordNet)  Dictionary-based  Distributional similarity-based similar words occur in similar contexts

Word similarity Four general categories  Character-based turned vs. truned cognates (night, nacht, nicht, natt, nat, noc, noch)  Semantic web-based (e.g. WordNet)  Dictionary-based  Distributional similarity-based similar words occur in similar contexts

Dictionary-based similarity a large, nocturnal, burrowing mammal, Orycteropus afer, ofcentral and southern Africa, feeding on ants and termites andhaving a long, extensile tongue, strong claws, and long ears. aardvark WordDictionary blurb One of a breed of small hounds having long ears, short legs, and a usually black, tan, and white coat. beagle Any carnivore of the family Canidae, having prominent canine teeth and, in the wild state, a long and slender muzzle, a deep-chested muscular body, a bushy tail, and large, erect ears. Compare canid. dog

Dictionary-based similarity sim(dog, beagle) = sim(, One of a breed of small hounds having long ears, short legs, and a usually black, tan, and white coat. Any carnivore of the family Canidae, having prominent canine teeth and, in the wild state, a long and slender muzzle, a deep-chested muscular body, a bushy tail, and large, erect ears. Compare canid. ) Utilize our text similarity measures

Dictionary-based similarity What about words that have multiple senses/parts of speech?

Dictionary-based similarity 1.part of speech tagging 2.word sense disambiguation 3.most frequent sense 4.average similarity between all senses 5.max similarity between all senses 6.sum of similarity between all senses

Dictionary + WordNet WordNet also includes a “gloss” similar to a dictionary definition Other variants include the overlap of the word senses as well as those word senses that are related (e.g. hypernym, hyponym, etc.)  incorporates some of the path information as well  Banerjee and Pedersen, 2003

Word similarity Four general categories  Character-based turned vs. truned cognates (night, nacht, nicht, natt, nat, noc, noch)  Semantic web-based (e.g. WordNet)  Dictionary-based  Distributional similarity-based similar words occur in similar contexts

Corpus-based approaches aardvark Word ANY blurb with the word beagle dog Ideas?

Corpus-based The Beagle is a breed of small to medium-sized dog. A member of the Hound Group, it is similar in appearance to the Foxhound but smaller, with shorter leg Beagles are intelligent, and are popular as pets because of their size, even temper, and lack of inherited health problems. Dogs of similar size and purpose to the modern Beagle can be traced in Ancient Greece[2] back to around the 5th century BC. From medieval times, beagle was used as a generic description for the smaller hounds, though these dogs differed considerably from the modern breed. In the 1840s, a standard Beagle type was beginning to develop: the distinction between the North Country Beagle and Southern

Corpus-based: feature extraction We’d like to utilize or vector-based approach How could we we create a vector from these occurrences?  collect word counts from all documents with the word in it  collect word counts from all sentences with the word in it  collect all word counts from all words within X words of the word  collect all words counts from words in specific relationship: subject- object, etc. The Beagle is a breed of small to medium-sized dog. A member of the Hound Group, it is similar in appearance to the Foxhound but smaller, with shorter leg

Word-context co-occurrence vectors The Beagle is a breed of small to medium-sized dog. A member of the Hound Group, it is similar in appearance to the Foxhound but smaller, with shorter leg Beagles are intelligent, and are popular as pets because of their size, even temper, and lack of inherited health problems. Dogs of similar size and purpose to the modern Beagle can be traced in Ancient Greece[2] back to around the 5th century BC. From medieval times, beagle was used as a generic description for the smaller hounds, though these dogs differed considerably from the modern breed. In the 1840s, a standard Beagle type was beginning to develop: the distinction between the North Country Beagle and Southern

Word-context co-occurrence vectors The Beagle is a breed Beagles are intelligent, and to the modern Beagle can be traced From medieval times, beagle was used as 1840s, a standard Beagle type was beginning the:2 is:1 a:2 breed:1 are:1 intelligent:1 and:1 to:1 modern:1 … Often do some preprocessing like lowercasing and removing stop words

Corpus-based similarity sim(dog, beagle) = sim( context_vector(dog), context_vector(beagle) ) the:2 is:1 a:2 breed:1 are:1 intelligent:1 and:1 to:1 modern:1 … the:5 is:1 a:4 breeds:2 are:1 intelligent:5 …

Web-based similarity Ideas?

Web-based similarity beagle

Web-based similarity Concatenate the snippets for the top N results Concatenate the web page text for the top N results

Another feature weighting TF- IDF weighting takes into account the general importance of a feature For distributional similarity, we have the feature (f i ), but we also have the word itself (w) that we can use for information sim( context_vector(dog), context_vector(beagle) ) the:2 is:1 a:2 breed:1 are:1 intelligent:1 and:1 to:1 modern:1 … the:5 is:1 a:4 breeds:2 are:1 intelligent:5 …

Another feature weighting sim( context_vector(dog), context_vector(beagle) ) the:2 is:1 a:2 breed:1 are:1 intelligent:1 and:1 to:1 modern:1 … the:5 is:1 a:4 breeds:2 are:1 intelligent:5 … Feature weighting ideas given this additional information?

Another feature weighting sim( context_vector(dog), context_vector(beagle) ) count how likely feature f i and word w are to occur together  incorporates co-occurrence  but also incorporates how often w and f i occur in other instances Does IDF capture this? Not really. IDF only accounts for f i regardless of w

Mutual information A bit more probability When will this be high and when will this be low?

Mutual information A bit more probability if x and y are independent (i.e. one occurring doesn’t impact the other occurring) then:

Mutual information A bit more probability if x and y are independent (i.e. one occurring doesn’t impact the other occurring) then: What does this do to the sum?

Mutual information A bit more probability if they are dependent then:

Mutual information What is this asking? When is this high? How much more likely are we to see y given x has a particular value!

Point-wise mutual information Mutual information Point-wise mutual information How related are two variables (i.e. over all possible values/events) How related are two particular events/values

PMI weighting Mutual information is often used for feature selection in many problem areas PMI weighting weights co-occurrences based on their correlation (i.e. high PMI) context_vector(beagle) the:2 is:1 a:2 breed:1 are:1 intelligent:1 and:1 to:1 modern:1 … How do we calculate these?