Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Analogies and Semantic Relations Nov 29 2010 William Cohen.

Similar presentations


Presentation on theme: "Learning Analogies and Semantic Relations Nov 29 2010 William Cohen."— Presentation transcript:

1 Learning Analogies and Semantic Relations Nov 29 2010 William Cohen

2 Announcements Upcoming assignments: –Wiki pages for October should be revised –Wiki pages for November due tomorrow 11/30 –Projects due Fri 12/10 Project presentations next week: –Monday12/6 and Wed 12/8 –20min including time for Q/A –30min for the group project –(Order is reverse of mid-term project reports)

3 [Machine Learning, 2005]

4 Motivation Information extraction is about understanding entity names in text… … and also relations between entities. How do you determine if you “understand” an arbitrary relation? –For fixed relations R: labeled data (ACE) –For arbitrary relations: … ?

5 Evaluation

6 How do you measure the similarity of relation instances? 1.Create a feature vector r x:y for each instance x:y mason:stone  soldier:gun  2.Use cosine distance.

7 Creating an instance vector for x:y Generate a bunch of queries. –“X of the Y” (“stone of the mason”) –“X with the Y” (soldier with the gun”) –… For each query q j (X,Y), record the number of hits in a search engine as r x:y,j –Actually record log(#hits+1) –Actually sometimes replace X with stem(X)*

8 The queries used Similar to Hearst ’92 & followups

9 Some results Ranking 369 possible x:y pairs as possible answers

10 How do you measure the similarity of relation instances? 1.Create a feature vector r x:y for each instance x:y 2.Use cosine distance to rank (a),…(d) 3.Test-taking strategy: -Define margin=(bestScore-secondBest) -If margin 0 then skip -If margin<θ and θ<0 then guess the top 2.

11 Results

12

13

14 Followup work Given x:y pairs, replace vectors with rows in M’: 1.Look up synonyms x’, y’ of x and y and construct “near analogies” x’:y, x:y’. Drop any that don’t occur frequently. - e.g. “mason:stone”  “mason:rock” 2.Search for phrase “x Q y” or “y Q x”, using near analogies as well as original pair x:y, and any sequence of up to three words Q. 3.For each phrase create patterns by introducing wildcards. 4.Build a pair-pattern matrix frequency M. 5.Apply SVD to M to get best 300 dimensions  M’. Define sim 1 (x:y, u:v) = cosine distance in M’. Compute similarity of x:y and u:v as average of sim1(p1,p2) for all pairs p1,p2 where (a) p1 is x:y or an alternate; (b) p2 is u:v or an alternate; and (c) sim1(p1,p2)>=sim1(x:y,u:v) [Turney, CL 2006]

15 Results for LRA 56.5 On 50B word WMTS corpus… 40.3 VSM-WMTS

16 Additional application: relation classification

17 Relation classification

18 Ablation experiments - 1

19 Ablation experiments - 2 What is the effect of using many automatically-generated patterns vs only 64 manually-generated ones? (Most of manual patterns are found automatically). Feature selection in pattern space instead of SVD

20 Lessons and questions How are relations and surface patterns correlated? –One-many? (several class-subclass patterns) –Many-one? (some patterns are ambiguous) –Many-many? (and is it 10-10, 100-100, 1000-1000?) Is it surprising that information about relation similarity is spread out across –So much text? –So many surface patterns?

21 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY like this: –Find phrases: left? X middle{0,3} Y right? (e.g., the mason cut the stone with”) and stem –In each phrase, replace all words other than X and Y are replace them with wildcards, creating 2 n-2 patterns: (e.g., * mason cut the stone with”, “the mason * the stone with”, … “*mason * * stone *”) –Retain the 20M examples associated with the most X,Y pairs –Weight a pattern that appears i times for X,Y as log(i+1). –Normalize vectors to unit length Use supervised learning on this representation [Turney, COLING 2008]

22 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY Use supervised learning for synonym-or-not [Turney, COLING 2008] Use 10-CV on 80 questions = 320 word pairs Accuracy 76.2% Rank = 9/15 compared to prior approaches (best, 97.5; avg human, 64.5)

23 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY Use supervised learning for synonym-vs-antonym [Turney, COLING 2008] Use 10-CV on 136 sample questions Accuracy 75% First published results

24 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY Use supervised learning for synonym-vs-antonym [Turney, COLING 2008] Use 10-CV on 136 sample questions Accuracy 75% First published results

25 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY Use supervised learning for similar/associated/both [Turney, COLING 2008] Use 10-CV on 144 pairs labeled in psychological experiments Accuracy 77.1% First published results

26 Followup 2 … a pure corpus-based approach Given M word pairs X,Y, construct feature vectors f XY Use supervised learning for analogies [Turney, COLING 2008] From another problem Repeat 10x with a different “negative” example and average scores for test cases, then pick best answer Accuracy: 52.1% Rank: 3/12 prior papers (best 56.1%; avg student 57%)

27 Summary

28 Background for Wed: pair HMMs and generative models of alignment

29 Alignments and expectations Simplified version of the idea: from Learning String Edit Distance, Ristad and Yianilos, PAMI 1998

30 HMM Example 1 2 Pr(1->2) Pr(2->1) Pr(2->2)Pr(1->1) Pr(1->x) d0.3 h0.5 b0.2 Pr(2->x) a0.3 e0.5 o0.2 Sample output: x T =heehahaha, s T =122121212

31 HMM Inference t=1t=2...t=T l=1... l=2... l=K... Key point: Pr(s i =l) depends only on Pr(l’->l) and s i-1 so you can propogate probabilities forward... x1x1 x2x2 x3x3 xTxT

32 Pair HMM Notation Andrew will use “null”

33 Pair HMM Example 1 ePr(e) 0.10 0.10 0.10 0.05 0.05 0.01.....

34 Pair HMM Example 1 ePr(e) 0.10 0.10 0.10 0.05 0.05 0.01..... Sample run: z T =,,, Strings x,y produced by z T : x=heehee, y=teehe Notice that x,y is also produced by z 4 +, and many other edit strings

35 Distances based on pair HMMs

36 Pair HMM Inference Dynamic programming is possible: fill out matrix left- to-right, top-down

37 Pair HMM Inference t=1t=2...t=T v=1... v=2... v=K...

38 Pair HMM Inference t=1t=2...t=T v=1... v=2... v=K... One difference: after i emissions of pair HMM, we do not know the column position i=1 i=2i=3 i=1 i=2

39 Pair HMM Inference: Forward-Backward t=1t=2...t=T v=1... v=2... v=K...

40 Multiple states SUB ePr(e) 0.10 0.10 0.10 0.05 0.01 0.01..... IX ePr(e) 0.11 0.21 0.11 …… IY

41 ...v=K... v=2...v=1 t=T...t=2 t=1 l=2 An extension: multiple states...v=K... v=2...v=1 t=T...t=2 t=1 l=1 conceptually, add a “state” dimension to the model EM methods generalize easily to this setting SUB IX


Download ppt "Learning Analogies and Semantic Relations Nov 29 2010 William Cohen."

Similar presentations


Ads by Google