Presentation on theme: "Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in."— Presentation transcript:
Extracting lexical information with statistical models Dennis (2003) compares two methods for inducing a word similarity measure from local context in a corpus. Claims such measures capture syntactic, semantic, and associative information about words.
The Syntagmatic-Paradigmatic (SP) Model Partition corpus into equivalence classes of equal-length sentence fragments: A nice picture OF THE A quick copy OF THE A nice descriptionOF THE ONTO THE picture OF ONTO THE copy OF
A picture OF THE A copy OF THE A descriptionOF THE Define similarity within equivalence class C: Pr C (w1,w2 ) = # words w1 & w2 fragments share total # words shared with w1 fragment in C Overall similarity: Pr(w1,w2) = mean Pr C (w1,w2) in Cs with w1. The Syntagmatic-Paradigmatic (SP) Model
The Pooled Adjacent Context (PAC) Model Scan corpus for five word wide windows: found a picture of the found a picture in her a pretty picture of her Assign each word a high-dimensional vector: One component for each. Component values are occurrence counts. Similarity: use Spearman’s rank correlation.
Syntactic results 90% of time, cue and “similar” words share a basic WordNet syntactic category (N, V, ADJ, ADV) (both SP & PAC, top 10 similar words, chance 60%) 60-70% with all 45 extended WordNet categories (chance 25%) Can we do POS tagging with less labeled data? Note: no phrase structure yet.
Semantic results 1 SPPAC Ninesix, four, several, five,twelve, fifteen, fifty, twenty, lunch, seven, eight, ten, least, …younger, rough, thirty, dinner, … AustraliaChina, India, Europe,Philadelphia, Brazil, Florida, Kansas, power, Canda, California, Cuba, vapor, senate, males, England, … Pennsylvania… Mean LSA cosine between cue and similar words is 0.15 or so (both SP & PAC, top 10 similar words, chance 0.1)
Semantic results 2 Can compare to human free association studies. Looking at 1,934 words: 300 FA words judged most similar (SP) 1000 FA words in top 5 similar (SP) 1400 FA words in top 10 similar (SP) (chance: < 100 in all cases)
Discussion - uses What can we do with similarities? Bootstrapping other learning processes (e.g. learning color words) Retrieval of related information from DB ???
Discussion – reference What can we not do with similarities alone? Person to ATM: “I need ninety dollars.” Ninetyseventy, sixty, ten, most, eighty, lunch, rough, dinner, … Intuitively, useful agents need to know more than what “ninety”, “Australia”, etc., are similar to; they need to know what they refer to.
Discussion – inference DB contains “Cats only eat mice.” Query: “Do cats eat dogs?” onlyevery, just, no, usually, none, forever, lunch, … Communication by inference is ubiquitous. Intuitively, to answer such queries, we need to know more than what “only” is similar to, but also what inferences it licenses.