Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.

Similar presentations


Presentation on theme: "Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology."— Presentation transcript:

1 Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology

2 Corpus-based extraction of semantic knowledge 2 Singapore Hong Kong ___ visa Hong Kong China ___ history Australia Egypt InstancePatternNew instance Alternate step by step InputOutput(Extracted from corpus) Singapore visa Singapore map

3 Semantic drift is the central problem of bootstrapping Singapore card visa ___ Australia messages greeting __ card words InstancePatternNew instance Errors propagate to successive iteration InputOutput(Extracted from corpus) Semantic category changed! Generic patterns Patterns co-occurring with many irrelevant instances Generic patterns Patterns co-occurring with many irrelevant instances 3

4 Main contributions of this work 1.Suggest a parallel between semantic drift in Espresso [Pantel and Pennachiotti, 2006] style bootstrapping and topic drift in HITS [Kleinberg, 1999] 2.Solve semantic drift using “relatedness” measure (regularized Laplacian) instead of “importance” measure (HITS authority) used in link analysis community 4

5 Table of contents 5 4.Graph-based Analysis of Espresso-style Bootstrapping Algorithms 3.Espresso-style Bootstrapping Algorithms 2. Overview of Bootstrapping Algorithms 5.Word Sense Disambiguation 6.Bilingual Dictionary Construction 7.Learning Semantic Categories

6 Espresso Algorithm [Pantel and Pennacchiotti, 2006] Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 6

7 Pattern/instance ranking in Espresso Score for pattern p Score for instance i 7 p: pattern i: instance P: set of patterns I: set of instances pmi: pointwise mutual information max pmi: max of pmi in all the patterns and instances

8 Espresso uses pattern-instance matrix A for ranking patterns and instances |P|×|I| -dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances 8 12...i |I| 1 2 : p : |P| [A] p,i = pmi(p,i) / max p,i pmi(p,i) instance indices pattern indices

9 p = pattern score vector i = instance score vector A = pattern-instance matrix Pattern/instance ranking in Espresso 9... pattern ranking... instance ranking Reliable instances are supported by reliable patterns, and vice versa |P| = number of patterns |I| = number of instances normalization factors to keep score vectors not too large

10 Espresso Algorithm [Pantel and Pennacchiotti, 2006] Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 10 For graph-theoretic analysis, we will introduce 3 simplifications to Espresso For graph-theoretic analysis, we will introduce 3 simplifications to Espresso

11 First simplification Compute the pattern-instance matrix Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection Until a stopping criterion is met 11 Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm

12 Second simplification Compute the pattern-instance matrix Repeat Pattern ranking Pattern selection Instance ranking Instance selection Until a stopping criterion is met 12 Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances

13 Third simplification Compute the pattern-instance matrix Repeat Pattern ranking Instance ranking Until a stopping criterion is met 13 Until score vectors p and i converge Simplification 3 No early stopping i.e., run until convergence Simplification 3 No early stopping i.e., run until convergence

14 Simplified Espresso Input Initial score vector of seed instances Pattern-instance co-occurrence matrix A Main loop Repeat Until i and p converge Output Instance and pattern score vectors i and p 14... pattern ranking... instance ranking

15 Simplified Espresso is HITS Simplified Espresso =HITS in a bipartite graph whose adjacency matrix is A Problem  No matter which seed you start with, the same instance is always ranked topmost  Semantic drift (also called topic drift in HITS) 15 The ranking vector i tends to the principal eigenvector of A T A as the iteration proceeds regardless of the seed instances!

16 How about Espresso? Espresso has heuristics not present in Simplified Espresso Early stopping Pattern and instance selection Do these heuristics really help reduce semantic drift? 16

17 Experiments on semantic drift Does the heuristics in original Espresso help reduce drift?

18 Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” 18 … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Possibly aligned to water a sort of bank(???) by a rushing river. Training instances are annotated with their sense Predict the sense of target word in the test set

19 Word sense disambiguation by Espresso Seed instance = the instance to predict its sense System output = k-nearest neighbor (k=3) Heuristics of Espresso Pattern and instance selection # of patterns to retain p=20 (increase p by 1 on each iteration) # of instance to retain m=100 (increase m by 100 on each iteration) Early stopping 19 Seed instance

20 Convergence process of Espresso 20 Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Output the most frequent sense regardless of input Espresso Simplified Espresso Most frequent sense (baseline) Semantic drift occurs (always outputs the most frequent sense)

21 Learning curve of Espresso: per-sense breakdown 21 # of most frequent sense predictions increases Precision for infrequent senses worsens even with original Espresso Most frequent sense Other senses

22 Summary: Espresso and semantic drift Semantic drift happens because Espresso is designed like HITS HITS gives the same ranking list regardless of seeds Some heuristics reduce semantic drift Early stopping is crucial for optimal performance Still, these heuristics require many parameters to be calibrated but calibration is difficult 22

23 Main contributions of this work 1.Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2.Solve semantic drift by graph kernels used in link analysis community 23

24 Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds) 24

25 The regularized Laplacian kernel A relatedness measure Takes higher-order relations into account Has only one parameter 25 Normalized Graph Laplacian Regularized Laplacian matrix A :adjacency matrix of the graph D :(diagonal) degree matrix β:parameter Each column of R β gives the rankings relative to a node

26 Label propagation algorithm [Zhou et al. 2004] Form the affinity matrix W defined by if i != j and W ii = 0. Construct the matrix in which D is a diagonal matrix with its (i,i)-element equal to the sum of the i-th row of W. Iterate until convergence, where alpha is a parameter in (0,1) Let F * denote the limit of the sequence {F(t)}. Label each point x i as a label 26 X is a set of instances and x i is an instance

27 Laplacian label propagation algorithm Form the affinity matrix W defined by where A is a row-normalized instance-pattern co- occurrence matrix Construct the normalized Laplacian matrix Iterate until convergence, where alpha is a parameter in (0,1) Let F * denote the limit of the sequence {F(t)}. Label each point x i as a label 27

28 The sequence {F(t)} converges to F* = (1-α)(I-αS) -1 Y Proof: Suppose F(0) = Y. By iterative algorithm, Since 0 < α < 1 and the eigenvalues of (-L) in [-1, 1] Hence for classification, which is equivalent to 28 This is also a form of the regularized Laplacian kernel

29 Word Sense Disambiguation Evaluation of regularized Laplacian against Espresso

30 Label prediction of “bank” (F measure) AlgorithmMost frequent senseOther senses Simplified Espresso100.00.0 Espresso (after convergence)100.030.2 Espresso (optimal stopping)94.467.4 Regularized Laplacian ( β =10 -2 )92.162.8 30 The regularized Laplacian keeps high recall for infrequent senses Espresso suffers from semantic drift (unless stopped at optimal stage)

31 WSD on all nouns in Senseval-3 algorithmF measure Most frequent sense (baseline)54.5 HyperLex64.6 PageRank64.6 Simplified Espresso44.1 Espresso (after convergence)46.9 Espresso (optimal stopping)66.5 Regularized Laplacian ( β =10 -2 )67.1 31 Outperforms other graph-based methods Espresso needs optimal stopping to achieve an equivalent performance

32 Regularized Laplacian is stable across a parameter 32

33 Learning Semantic Categories Evaluation of regularized Laplacian by large-scaled web data

34 Graph-based approach to learn semantic knowledge We showed that some of bootstrapping algorithms can be regarded as computing similarity over a graph 34 Singapore Hong Kong ___ map ___ visa UFJ ANA ___ history ? China ___ airlines Never tried large- scale data ->Does it scale? Tested on word sense disambiguation task ->Is it task- independent? Never tried large- scale data ->Does it scale? Tested on word sense disambiguation task ->Is it task- independent?

35 Bootstrapping achieves high precision, but requires calibration in practice Pros Dedicated to Japanese web search query logs Outperforms previous methods on this task Orders of magnitude faster than previous bootstrapping methods Cons Needs to control many parameters and configurations Has little theoretical background Does not scale to large corpus 35

36 Graph-based methods are simple but effective on large-scale data such as web documents Pros Can scale to massive amount of raw data Firm mathematical background (Can employ classical optimization methods) Cons Computational efficiency (Can be approximated) No obvious “good” graph Requires computational resource (CPU, disk, memory, etc) and technical know-how 36

37 Quetchup algorithm (QUEry Term CHUnk Processor) Using clickthrough logs as source of information Semi-supervised learning with Laplacian Label Propagation Efficient computation over a graph 37

38 Learning semantic categories from clickthrough graph Query “Singapore” –(click)-> http://www.cikm2009.org/ 38 Singapore Hong Kong http://en.wikipedia.org/wiki/Hong_Kong http://www.bk.mufg.jp/ UFJ ANA http://www.singaporair.com/hk.jsp China http://www.china-airlines.co.jp/ http://www.ana.co.jp/ http://www.acl-ijcnlp-2009.org/ http://www.cikm2009.org/

39 Experimental setting Search logs: Japanese web search logs collected in August 2008; retained the top 10 million distinct queries Target categories: Travel and Finance [Komachi and Suzuki, 2008] Evaluation: precision at k; relative recall [Pantel and Ravichandran, 2004] 39 where R A|B is relative recall of system A given B, C X is the number of correct output from system X, C is the true number of the correct instances, P X is precison of system X, and |X| is the number of input instances for system X

40 Seed instances for each category CategorySeed Traveljal (Japan Airlines), ana (All Nippon Airways), jr (Japan Railways), じゃらん (jalan: online travel guide site), his (H.I.S.Co.,Ltd.: travel agency) Finance みずほ銀行 (Mizuho Bank), 三井住友銀行 (Sumitomo Mitsui Banking Corporation), jcb, 新生銀行 (Shinsei Bank), 野村證券 (Nomura Securities) 40

41 Precision of Travel domain 41 The regularized Laplacian gives best precision

42 Precision of Finance domain 42 Again, the regularized Laplacian gives best precision

43 Relative recall of Travel domain Clickthrough logs not only achieve high precision but also high recall 43

44 Relative recall of Finance domain 44

45 Relative recall of Quetchup with varying click:query 45

46 Precision of Quethcup with varying click:query 46

47 Instances and patterns with the top 10 highest scores SystemInstancePattern Quetchup click じゃらん 宿泊 (jalan accommodation), じゃラ ン (jalan), ジャラン (jalan), jarann, jaran, じゃ らん net (jalan.net), jalan, じゅらん (julan), ana 予約 (ana reservation), ana.co.jp www.jalan.net/, www.ana.co.jp/, www.his-j.com/, www.jreast.co.jp/, www.jtb.co.jp/, www.jtb.co.jp/ace/, www.westjr.co.jp/, www.jtb.co.jp/kaigai/, nippon.his.co.jp/, www.jr.cyberstation.ne.jp/ Quetchup query 中部発 (Traveling from Midland), his 関西 (his Kansai), 伊平屋島 (Iheya Island), ホテルコン チネンタル横浜 (Hotel Continental Yokohama), げんじいの森 (Genjii-no-mori; spa), フジサ ファリパーク (Fuji Safari Park), ad-box, アダ コミ (adacomi; offensive), スカイチーム (SkyTeam), ノースウェスト (Northwest) # 時刻表 (timetable), # 国内旅行 (domestic tour), # 宿泊 (accommodation), # 北海道 (Hokkaido), # 関西 (Kansai), # 九州 (Kyushu), # マイレージ (mileage), # 名 古屋 (Nagoya), # 沖縄 (Okinawa), # 温 泉 (spa) Tchai 静鉄バス (Seitetsu Bus), 相鉄バス (Sotetsu Bus), 函館バス (Hakodate Bus), 大阪地下鉄 (Osaka subway), 琴電 (Kotoden railways), 地下 鉄御堂筋線 (Subway Midosuji Line), 芸陽バス (Geiyo Bus), 新京成バス (Shin-keisei Bus), jr 阪和線 (Japan Railways Hanna Line), 常磐線 (Joban Line) # 時刻表, # 路線図 (route map), # 運 賃 (fare), # 料金 (fare), # 定期 (season ticket), # 運行状況 (service situation), # 路線 (route), # 定期代 (season ticket fare), # 定期券 (season ticket), # 時刻 47

48 Random samples of extracted instances Type (Frequency)Example Transportation (54) 広島 新幹線 (Hiroshima Super Express), 東海道線 (Tokaido Line), jr 飯田線 (JR Iida Line), jr 博多 (JR Hakata), 京都 新幹線 (Kyoto Super Express) Accommodation (10) ホテルビーナス (Hotel Venus), リーガロイヤルホテル大阪 (Rihga Royal Hotel Osaka), www.route-inn.co.jp, ホテル京阪ユニバーサル・シティ (Hotel Keihan Universal City), 札幌全日空ホテル (ANA Hotel Sapporo) Travel Information (10) 外務省 安全 (Foreign Ministry safety), チケットショップ 大阪 (ticket shop Osaka), 観光 関西 (Site seeing Kansai), 高山観光協会 (Takayama Tourism Association), グーグル ナビ (Google navi) Travel Agency (6) jr おでかけネット (Odekake net), 近畿ツー (Kinki Tourist), タビックス 静岡 (Tabix Shizuoka), フレックスインターナショナル (Flex International), オリ オンツアー (Orion Tour) Others (2) プロテカ (Proteca; bag for travel), jal 紀行倶楽部 (JAL Travel Club) Not Related (20) 格安航空チケット 海外 (discount flight ticket overseas), 新幹線予約状況 (Super express reservation situation), 新幹線 時刻表 (Super express timetable), 温泉宿 (spa accommodation), 新幹線 停車駅 (Super express stops), 虎 (tiger), youtubu 海外ドラマ (overseas drama), 法務部採用 (legal department recruitment), おくりびと (Okuribito; film), 社会人野球 (amateur baseball) 48

49 Learning curve of Quetchup click The more the data, the higher the precision 49

50 Sensitivity to Quetchup click to parameter α Clickthrough graph seems much denser than query graph, and large alpha (less weight on initial labels) yields better performance than small alpha 50

51 Conclusions Semantic drift in Espresso is a parallel form of topic drift in HITS The regularized Laplacian reduces semantic drift in two tasks: word sense disambiguation and named entity extraction inherently a relatedness measure (  importance measure) 51

52 Future work Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training) Investigate the influence of seed selection to bootstrapping algorithms and propose a way to select effective seed instances Explore relation between Markov random walk and label propagation methods 52


Download ppt "Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology."

Similar presentations


Ads by Google