Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.

Similar presentations


Presentation on theme: "Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology."— Presentation transcript:

1 Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology

2 Corpus Supervised learning has succeeded in many natural language processing tasks Needs time-consuming annotation (data creation) Why not learning from minimally annotated resource? Dictionary Classifier Corpus High-quality Large-scale High-accuracy 2

3 Corpus-based extraction of semantic category 3 Singapore Hong Kong Visa for __ Hong Kong China History of __ Australia Egypt InstancePatternNew instance Alternate step by step InputOutput(Extracted from corpus) Visa for Singapore Travel guide to Singapore

4 Semantic drift is the central problem of bootstrapping Singapore card visa __ is Australia messages greeting __ card words InstancePatternNew instance Errors propagate to successive iteration InputOutput(Extracted from corpus) Semantic category changed! Generic patterns Patterns co-occurring with many irrelevant instances Generic patterns Patterns co-occurring with many irrelevant instances 4

5 Two major problems solved by this work Why semantic drift occurs? Is there any way to prevent semantic drift? 5

6 Answers to the problems of semantic drift 1.Suggest a parallel between semantic drift in Espresso [Pantel and Pennachiotti, 2006] style bootstrapping and topic drift in HITS [Kleinberg, 1999] 2.Solve semantic drift using “relatedness” measure (regularized Laplacian) instead of “importance” measure (HITS authority) used in link analysis community 6

7 Table of contents 7 4.Graph-based Analysis of Espresso-style Bootstrapping Algorithms 3.Espresso-style Bootstrapping Algorithms 2. Overview of Bootstrapping Algorithms 5.Word Sense Disambiguation 6.Bilingual Dictionary Construction 7.Learning Semantic Categories

8 Preliminaries: Espresso and HITS

9 Espresso Algorithm [Pantel and Pennacchiotti, 2006] Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 9

10 Pattern/instance ranking is mutually defined in Espresso Score for pattern p Score for instance i 10 p: pattern i: instance P: set of patterns I: set of instances pmi: pointwise mutual information max pmi: max of pmi in all the patterns and instances Reliable instances are supported by reliable patterns, and vice versa

11 HITS (Hypertext Induced Topic Search) finds Hubs and Authorities in a linked graph 11 Hub hAuthority a Hub score = sum of weights for all nodes pointed by h Authority score = sum of weights for all nodes pointing to a

12 HITS Algorithm [Kleinberg 1999] Input Initial hub score vector Adjacency matrix A Main loop Repeat Until a and h converge Output Hub and authority score vectors a and h 12 α: normalization factor β: normalization factor

13 HITS converges to fixed points regardless of initial input Authority score vector Hub score vector 13 a(k): vector a on k-th iteration h(k): vector h on k-th iteration HITS authority vector a = the principal eigenvector of A T A HITS hub vector h = the principal eigenvector of AA T where A T A = co-citation matrix AA T = bibliographic coupling matrix

14 Graph-based Analysis of Espresso-style Bootstrapping Algorithms How Espresso works, and how Espresso fails to solve semantic drift

15 p = pattern score vector i = instance score vector A = pattern-instance matrix Make Espresso look like HITS 15... pattern ranking... instance ranking |P| = number of patterns |I| = number of instances normalization factors to keep score vectors not too large

16 Espresso uses pattern-instance matrix A as adjacency matrix in HITS |P|×|I| -dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances 16 12...i |I| 1 2 : p : |P| [A] p,i = pmi(p,i) / max p,i pmi(p,i) instance indices pattern indices

17 Three simplifications to reduce Espresso to HITS Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 17 For graph-theoretic analysis, we will introduce 3 simplifications to Espresso For graph-theoretic analysis, we will introduce 3 simplifications to Espresso

18 Keep pattern-instance matrix constant in the main loop Compute the pattern-instance matrix Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 18 Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm

19 Remove pattern/instance selection heuristics Compute the pattern-instance matrix Repeat Pattern ranking Pattern selection Instance ranking Instance selection Until a stopping criterion is met 19 Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances

20 Remove early stopping heuristics Compute the pattern-instance matrix Repeat Pattern ranking Instance ranking Until a stopping criterion is met 20 Until score vectors p and i converge Simplification 3 No early stopping i.e., run until convergence Simplification 3 No early stopping i.e., run until convergence

21 Simplified Espresso Input Initial score vector of seed instances Pattern-instance co-occurrence matrix A Main loop Repeat Until i and p converge Output Instance and pattern score vectors i and p 21... pattern ranking... instance ranking

22 HITS Algorithm [Kleinberg 1999] Input Initial hub score vector Adjacency matrix A Main loop Repeat Until a and h converge Output Hub and authority score vectors a and h 22 α: normalization factor β: normalization factor

23 Simplified Espresso is essentially HITS Simplified Espresso =HITS Problem  No matter which seed you start with, the same instance is always ranked topmost  Semantic drift (also called topic drift in HITS) 23 The ranking vector i tends to the principal eigenvector of A T A as the iteration proceeds regardless of the seed instances!

24 How about Espresso? Espresso has two heuristics not present in Simplified Espresso Early stopping Pattern and instance selection Do these heuristics really help reduce semantic drift? And how? 24

25 Experiments on semantic drift Does the heuristics in original Espresso help reduce drift?

26 Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” 26 … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Possibly aligned to water a sort of bank(???) by a rushing river. Training instances are annotated with their sense Predict the sense of target word in the test set

27 Word sense disambiguation by Espresso Seed instance = the instance to predict its sense Proximity measure = instance score vector given by Espresso 27 Seed instance... pattern ranking... instance ranking

28 Example of k-NN classification by Espresso System output = k-nearest neighbor (k=3) i=(0.9, 0.1, 0.8, 0.5, 0, 0, 0.95, 0.3, 0.2, 0.4) → sense A 28 Seed instance … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden

29 Two heuristics in Espresso Early stopping Plot results on each iteration Pattern and instance selection # of patterns to retain p=20 (increase p by 1 on each iteration) # of instance to retain m=100 (increase m by 100 on each iteration) Evaluation metric Recall = |correct instances| / |total true instances| 29

30 10/16/201530 Convergence process of Espresso Original Espresso Simplified Espresso Most frequent sense (baseline) Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Output the most frequent sense regardless of input Semantic drift occurs (always outputs the most frequent sense)

31 Learning curve of Espresso: per-sense breakdown 10/16/201531 # of most frequent sense predictions increases Recall for infrequent senses worsens even with original Espresso Most frequent sense Other senses

32 Summary: Espresso and semantic drift Semantic drift happens because Espresso is designed like HITS HITS gives the same ranking list regardless of seeds Some heuristics reduce semantic drift Early stopping is crucial for optimal performance Still, these heuristics require many parameters to be calibrated but calibration is difficult 32

33 Main contributions of this work 1.Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2.Solve semantic drift by graph kernels used in link analysis community 33

34 Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds) 34

35 The regularized Laplacian kernel A relatedness measure Has only one parameter 35 Normalized Graph Laplacian Regularized Laplacian matrix A :adjacency matrix of the graph D :(diagonal) degree matrix β:parameter Each column of R β gives the rankings relative to a node

36 Word Sense Disambiguation Evaluation of regularized Laplacian against Espresso and other graph-based algorithms

37 Label prediction of “bank” (Recall) AlgorithmMost frequent senseOther senses Simplified Espresso100.00.0 Espresso (after convergence)100.030.2 Espresso (optimal stopping)94.467.4 Regularized Laplacian ( β =10 -2 )92.162.8 37 The regularized Laplacian keeps high recall for infrequent senses Espresso suffers from semantic drift (unless stopped at optimal stage)

38 WSD on all nouns in Senseval-3 algorithmRecall Most frequent sense (baseline)54.5 HyperLex [Agirre et al. 2005]64.6 PageRank [Agirre et al. 2005]64.6 Simplified Espresso44.1 Espresso (after convergence)46.9 Espresso (optimal stopping)66.5 Regularized Laplacian ( β =10 -2 )67.1 38 Outperforms other graph-based methods Espresso needs optimal stopping to achieve an equivalent performance

39 Regularized Laplacian is stable across a parameter 39

40 Conclusions Semantic drift in Espresso is a parallel form of topic drift in HITS The regularized Laplacian reduces semantic drift in bootstrapping for natural language processing tasks inherently a relatedness measure (  importance measure) 40

41 Future work Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training) Investigate the influence of seed selection to bootstrapping algorithms and propose a way to select effective seed instances 41


Download ppt "Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology."

Similar presentations


Ads by Google