Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 2016/5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and.

Similar presentations


Presentation on theme: "Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 2016/5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and."— Presentation transcript:

1 Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 2016/5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and Yuji Matsumoto † † Nara Institute of Science and Technology, Japan ‡ Google Inc.

2 5/27/20162 Bootstrapping and semantic drift  Bootstrapping grows small amount of seed instances Buy # at Apple Store Generic patterns = Patterns which co-occur with many (irrelevant) instances Seed instanceExtracted instance iPhone iPod # for sale Co-occurrence pattern iPod MacBook Air car house

3 5/27/20163 Main contributions of this work 3 1. Suggest a parallel between semantic drift in Espresso- like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2. Solve semantic drift by graph kernels used in link analysis community

4 Espresso Algorithm [Pantel and Pennacchiotti, 2006]  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20164

5 Espresso uses pattern-instance matrix A for ranking patterns and instances |P|×|I| -dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances 5/27/20165 12...i |I| 1 2 : p : |P| [A] p,i = pmi(p,i) / max p,i pmi(p,i) instance indices pattern indices

6 Pattern/instance ranking in Espresso p = pattern score vector i = instance score vector A = pattern-instance matrix 5/27/20166... pattern ranking... instance ranking Reliable instances are supported by reliable patterns, and vice versa |P| = number of patterns |I| = number of instances normalization factors to keep score vectors not too large

7 Espresso Algorithm [Pantel and Pennacchiotti, 2006]  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20167 For graph-theoretic analysis, we will introduce 3 simplifications to Espresso For graph-theoretic analysis, we will introduce 3 simplifications to Espresso

8 First simplification  Compute the pattern-instance matrix  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20168 Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm

9 Second simplification  Compute the pattern-instance matrix  Repeat  Pattern ranking  Pattern selection  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20169 Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances

10 Third simplification  Compute the pattern-instance matrix  Repeat  Pattern ranking  Instance ranking  Until a stopping criterion is met 5/27/201610 Until score vectors p and i converge Simplification 3 No early stopping i.e., run until convergence Simplification 3 No early stopping i.e., run until convergence

11 Input  Initial score vector of seed instances  Pattern-instance co-occurrence matrix A Main loop Repeat Until i and p converge Output Instance and pattern score vectors i and p... pattern ranking... instance ranking 5/27/201611 Simplified Espresso

12 5/27/201612 Simplified Espresso is HITS Simplified Espresso =HITS in a bipartite graph whose adjacency matrix is A Problem  No matter which seed you start with, the same instance is always ranked topmost  Semantic drift (also called topic drift in HITS) The ranking vector i tends to the principal eigenvector of A T A as the iteration proceeds regardless of the seed instances!

13 How about the original Espresso? Original Espresso has heuristics not present in Simplified Espresso  Early stopping  Pattern and instance selection Do these heuristics really help reduce semantic drift? 5/27/201613

14 Experiments on semantic drift Does the heuristics in original Espresso help reduce drift? 5/27/201614

15 5/27/201615 Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Possibly aligned to water a sort of bank(???) by a rushing river. Training instances are annotated with their sense Predict the sense of target word in the test set

16 5/27/201616 Word sense disambiguation by Original Espresso Seed instance = the instance to predict its sense System output = k-nearest neighbor (k=3) Heuristics of Espresso  Pattern and instance selection  # of patterns to retain p=20 (increase p by 1 on each iteration)  # of instance to retain m=100 (increase m by 100 on each iteration)  Early stopping Seed instance

17 5/27/201617 Convergence process of Espresso Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Output the most frequent sense regardless of input Original Espresso Simplified Espresso Most frequent sense (baseline) Semantic drift occurs (always outputs the most frequent sense)

18 Learning curve of Original Espresso: per-sense breakdown 5/27/201618 # of most frequent sense predictions increases Recall for infrequent senses worsens even with original Espresso Most frequent sense Other senses

19 5/27/201619 Summary: Espresso and semantic drift Semantic drift happens because  Espresso is designed like HITS  HITS gives the same ranking list regardless of seeds Some heuristics reduce semantic drift  Early stopping is crucial for optimal performance Still, these heuristics require  many parameters to be calibrated  but calibration is difficult

20 5/27/201620 Main contributions of this work 20 1. Suggest a parallel between semantic drift in Espresso- like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2. Solve semantic drift by graph kernels used in link analysis community

21 Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds) 5/27/201621

22 5/27/201622 The regularized Laplacian kernel  A relatedness measure  Takes higher-order relations into account  Has only one parameter Graph Laplacian Regularized Laplacian matrix A :adjacency matrix of the graph D :(diagonal) degree matrix β:parameter Each column of R β gives the rankings relative to a node

23 Evaluation of regularized Laplacian Comparison to (Agirre et al. 2006) 5/27/201623

24 algorithmF measure Most frequent sense (baseline)54.5 HyperLex64.6 PageRank64.6 Simplified Espresso44.1 Espresso (after convergence)46.9 Espresso (optimal stopping)66.5 Regularized Laplacian ( β =10 -2 )67.1 5/27/201624 WSD on all nouns in Senseval-3 Outperforms other graph-based methods Espresso needs optimal stopping to achieve an equivalent performance

25 5/27/201625 Conclusions  Semantic drift in Espresso is a parallel form of topic drift in HITS  The regularized Laplacian reduces semantic drift  inherently a relatedness measure (  importance measure)

26 5/27/201626 Future work  Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training)  Try other popular tasks of bootstrapping such as named entity extraction  Selection of seed instances matters

27 5/27/201627

28 AlgorithmMost frequent senseOther senses Simplified Espresso100.00.0 Espresso (after convergence)100.030.2 Espresso (optimal stopping)94.467.4 Regularized Laplacian ( β =10 -2 )92.162.8 5/27/201628 Label prediction of “bank” (F measure) The regularized Laplacian keeps high recall for infrequent senses Espresso suffers from semantic drift (unless stopped at optimal stage)

29 Regularized Laplacian is mostly stable across a parameter 5/27/201629

30 Pattern/instance ranking in Espresso Score for pattern p Score for instance i 08.10.2430 p: pattern i: instance P: set of patterns I: set of instances pmi: pointwise mutual information max pmi: max of pmi in all the patterns and instances


Download ppt "Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 2016/5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and."

Similar presentations


Ads by Google