Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in.

Similar presentations


Presentation on theme: "Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in."— Presentation transcript:

1 Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. Claims such measures capture syntactic, semantic, and associative information about words.

2 The Syntagmatic-Paradigmatic (SP) Model Partition corpus into equivalence classes of equal-length sentence fragments: A nice picture OF THE A quick copy OF THE A nice descriptionOF THE ONTO THE picture OF ONTO THE copy OF

3 A picture OF THE A copy OF THE A descriptionOF THE Define similarity within equivalence class C: Pr C (w1,w2 ) = # words w1 & w2 fragments share total # words shared with w1 fragment in C Overall similarity: Pr(w1,w2) = mean Pr C (w1,w2) in Cs with w1. The Syntagmatic-Paradigmatic (SP) Model

4 The Pooled Adjacent Context (PAC) Model Scan corpus for five word wide windows: found a picture of the found a picture in her a pretty picture of her Assign each word a high-dimensional vector: One component for each. Component values are occurrence counts. Similarity: use Spearman’s rank correlation.

5 Sample results Most similar words: SPPAC Bandgroup, kind, piece,statement, degree, bridge, hat, amount, lot, set, …clock, tribe, scene, … Agreewant, believe, deal, depend, forget, realize, listen, play, try, talk, …survive, seek, recognize, … Ninesix, four, several, five,twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, leastyounger, rough, thirty, dinner, …

6 Syntactic results 90% of time, cue and “similar” words share a basic WordNet syntactic category (N, V, ADJ, ADV) (both SP & PAC, top 10 similar words, chance 60%) 60-70% with all 45 extended WordNet categories (chance 25%)  Can we do POS tagging with less labeled data?  Note: no phrase structure yet.

7 Semantic results 1 SPPAC Ninesix, four, several, five,twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least, …younger, rough, thirty, dinner, … AustraliaChina, India, Europe,Philadelphia, Brazil, Florida, Kansas, power, Canda, California, Cuba, vapor, senate, males, England, … Pennsylvania… Mean LSA cosine between cue and similar words is 0.15 or so (both SP & PAC, top 10 similar words, chance 0.1)

8 Semantic results 2 Can compare to human free association studies. Looking at 1,934 words: 300 FA words judged most similar (SP) 1000 FA words in top 5 similar (SP) 1400 FA words in top 10 similar (SP) (chance: < 100 in all cases)

9 Discussion - uses What can we do with similarities?  Bootstrapping other learning processes (e.g. learning color words)  Retrieval of related information from DB  ???

10 Discussion – reference What can we not do with similarities alone? Person to ATM: “I need ninety dollars.” Ninetyseventy, sixty, ten, most, eighty, lunch, rough, dinner, …  Intuitively, useful agents need to know more than what “ninety”, “Australia”, etc., are similar to; they need to know what they refer to.

11 Discussion – inference DB contains “Cats only eat mice.” Query: “Do cats eat dogs?” onlyevery, just, no, usually, none, forever, lunch, … Communication by inference is ubiquitous.  Intuitively, to answer such queries, we need to know more than what “only” is similar to, but also what inferences it licenses.


Download ppt "Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in."

Similar presentations


Ads by Google