Presentation is loading. Please wait.

Presentation is loading. Please wait.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

Similar presentations


Presentation on theme: "Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller."— Presentation transcript:

1 Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller

2 Word Sense Disambiguation The electricity plant supplies 500 homes with power. A plant requires water and sunlight to survive. vs. Clues That plant produces bottled water. Tricky:

3 WSD as Classification Senses s 1, s 2, …, s k correspond to classes c 1, c 2, …, c k Features: properties of context of word occurrence Subject or verb of sentence Any word occurring within 4 words of occurrence Document: set of features corresponding to an occurrence The electricity plant supplies 500 homes with power.

4 Simple Approaches Only features are what words appear in context Naïve Bayes Discriminative, e.g. SVM Problems: Feature set not rich enough Data extremely sparse space occurs 38 times in corpus with 200,000 words

5 Available Data WordNet – electronic thesaurus –Words grouped by meaning into synsets –Slightly over 100,000 synsets –For nouns and verbs, hierarchy over synsets Mammal Dog, Hound, Canine RetrieverTerrier Animal Bird

6 Available Data Around 400,000 word corpus labeled with synsets from WordNet Sample sentences from WordNet Very sparse for most words

7 What Hasn’t Worked Intuition: context of “dog” similar to context of “retriever” Use hierarchy to determine possibly useful data Using cross-validation, learn what data is actually useful This hasn’t worked out very well

8 Why? Lots of parameters (not even counting parameters estimated using MLE) –> 100K for one model, ~ 20K for another Not much data (400K words) –a, the, and, of, to occur ~ 65K times (together) Hierarchy may not be very useful –Hand-built; not designed for this task Features not very expressive Luke is looking at this more closely using an SVM

9 Collective WSD Ideas: Determine senses of all words in a document simultaneously –Allows for richer features Train on unlabeled data as well as labeled –Lots and lots of unlabeled text available

10 Model Variables: –S 1,S 2, …, S n – synsets –W 1,W 2, …, W n – words, always observed S1S1 S3S3 S2S2 S4S4 S5S5 W1W1 W3W3 W2W2 W4W4 W5W5

11 Model Each synset generated from previous context – size of context a parameter (4) P(S,W) = ∏ i = 1 n P(W i | S i ) * P(S i | S i-3,S i-2,S i-1 ) P(S i =s | S i-3,S i-2,S i-1 ) = Z(s i-3,s i-2,s i-1 ) exp(λ s (s i-3 )+λ s (s i-2 )+λ s (s i-1 )+λ s ) P(W) = Σ P(S,W)

12 Learning Two sets of parameters –P(W i | S i ) – Given current estimates of marginals P(S i ), expected counts –λ s (s’) – For s’  Domain(S i-1 ), s  Domain(S i ), gradient descent on log likelihood gives: λ s (s’) + = Σ S i-3,S i-2 [ P(w,s i-3,s i-2,s’,s) – P(w,s i-3,s i-2,s’) * P(s | s i-3,s i-2,s’)]

13 Efficiency Only need to calculate marginals over contexts –Forwards-backwards Issue: some words have many possible synsets (40-50) – want very fast inference –Possibly prune values?

14 WordNet and Synsets Model uses WordNet to determine domain of S i –Synset information should be more reliable This allows us learn without any labeled data Consider synsets {eagle,hawk}, {eagle (golf shot)}, and {hawk(to sell)} –Since parameters depend only on synset, even without labeled data, can find correct clustering

15 Richer Features Heuristic: “One sense per discourse” = usually, within a document any given word only takes one of its possible senses Can capture this using long-range links –Could assume each word independent of all occurrences besides the ones immediately before and after –Or, could use approximate inference (Kikuchi)

16 Richer Features Can reduce feature sparsity using hierarchy (e.g., replace all occurrences of “dog” and “cat” with “animal”) –Need collective classification to do this Could add “global” hidden variables to try to capture document subject

17 Advanced Parameters Lots of parameters Regularization likely helpful Could tie parameters together based on similarity in the WordNet hierarchy –Ties in what I was working on before –More data in this situation (unlabeled)

18 Experiments Soon


Download ppt "Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller."

Similar presentations


Ads by Google