Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 2016/5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and.

Slides:



Advertisements
Similar presentations
Regularization David Kauchak CS 451 – Fall 2013.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
PCA + SVD.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Word sense induction using continuous vector space models
Advanced Topics in Algorithms and Data Structures 1 Two parallel list ranking algorithms An O (log n ) time and O ( n log n ) work list ranking algorithm.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Soft Computing Lecture 14 Clustering and model ART.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Graph-based WSD の続き DMLA /7/10 小町守.
From Frequency to Meaning: Vector Space Models of Semantics
Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.
CSE 554 Lecture 8: Alignment
Clustering Usman Roshan.
7CCSMWAL Algorithmic Issues in the WWW
Link Counts GOOGLE Page Rank engine needs speedup
Approximating the Community Structure of the Long Tail
Introduction Task: extracting relational facts from text
Nara Institute of Science and Technology
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Junghoo “John” Cho UCLA
Clustering Usman Roshan CS 675.
Latent Semantic Analysis
Presented by Nick Janus
Presentation transcript:

Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP /5/27 Mamoru Komachi †, Taku Kudo ‡, Masashi Shimbo † and Yuji Matsumoto † † Nara Institute of Science and Technology, Japan ‡ Google Inc.

5/27/20162 Bootstrapping and semantic drift  Bootstrapping grows small amount of seed instances Buy # at Apple Store Generic patterns = Patterns which co-occur with many (irrelevant) instances Seed instanceExtracted instance iPhone iPod # for sale Co-occurrence pattern iPod MacBook Air car house

5/27/20163 Main contributions of this work 3 1. Suggest a parallel between semantic drift in Espresso- like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2. Solve semantic drift by graph kernels used in link analysis community

Espresso Algorithm [Pantel and Pennacchiotti, 2006]  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20164

Espresso uses pattern-instance matrix A for ranking patterns and instances |P|×|I| -dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances 5/27/ i |I| 1 2 : p : |P| [A] p,i = pmi(p,i) / max p,i pmi(p,i) instance indices pattern indices

Pattern/instance ranking in Espresso p = pattern score vector i = instance score vector A = pattern-instance matrix 5/27/ pattern ranking... instance ranking Reliable instances are supported by reliable patterns, and vice versa |P| = number of patterns |I| = number of instances normalization factors to keep score vectors not too large

Espresso Algorithm [Pantel and Pennacchiotti, 2006]  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20167 For graph-theoretic analysis, we will introduce 3 simplifications to Espresso For graph-theoretic analysis, we will introduce 3 simplifications to Espresso

First simplification  Compute the pattern-instance matrix  Repeat  Pattern extraction  Pattern ranking  Pattern selection  Instance extraction  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20168 Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm Simplification 1 Remove pattern/instance extraction steps Instead, pre-compute all patterns and instances once in the beginning of the algorithm

Second simplification  Compute the pattern-instance matrix  Repeat  Pattern ranking  Pattern selection  Instance ranking  Instance selection  Until a stopping criterion is met 5/27/20169 Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances Simplification 2 Remove pattern/instance selection steps which retain only highest scoring k patterns / m instances for the next iteration i.e., reset the scores of other items to 0 Instead, retain scores of all patterns and instances

Third simplification  Compute the pattern-instance matrix  Repeat  Pattern ranking  Instance ranking  Until a stopping criterion is met 5/27/ Until score vectors p and i converge Simplification 3 No early stopping i.e., run until convergence Simplification 3 No early stopping i.e., run until convergence

Input  Initial score vector of seed instances  Pattern-instance co-occurrence matrix A Main loop Repeat Until i and p converge Output Instance and pattern score vectors i and p... pattern ranking... instance ranking 5/27/ Simplified Espresso

5/27/ Simplified Espresso is HITS Simplified Espresso =HITS in a bipartite graph whose adjacency matrix is A Problem  No matter which seed you start with, the same instance is always ranked topmost  Semantic drift (also called topic drift in HITS) The ranking vector i tends to the principal eigenvector of A T A as the iteration proceeds regardless of the seed instances!

How about the original Espresso? Original Espresso has heuristics not present in Simplified Espresso  Early stopping  Pattern and instance selection Do these heuristics really help reduce semantic drift? 5/27/201613

Experiments on semantic drift Does the heuristics in original Espresso help reduce drift? 5/27/201614

5/27/ Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ), bring this up to … In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Possibly aligned to water a sort of bank(???) by a rushing river. Training instances are annotated with their sense Predict the sense of target word in the test set

5/27/ Word sense disambiguation by Original Espresso Seed instance = the instance to predict its sense System output = k-nearest neighbor (k=3) Heuristics of Espresso  Pattern and instance selection  # of patterns to retain p=20 (increase p by 1 on each iteration)  # of instance to retain m=100 (increase m by 100 on each iteration)  Early stopping Seed instance

5/27/ Convergence process of Espresso Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Output the most frequent sense regardless of input Original Espresso Simplified Espresso Most frequent sense (baseline) Semantic drift occurs (always outputs the most frequent sense)

Learning curve of Original Espresso: per-sense breakdown 5/27/ # of most frequent sense predictions increases Recall for infrequent senses worsens even with original Espresso Most frequent sense Other senses

5/27/ Summary: Espresso and semantic drift Semantic drift happens because  Espresso is designed like HITS  HITS gives the same ranking list regardless of seeds Some heuristics reduce semantic drift  Early stopping is crucial for optimal performance Still, these heuristics require  many parameters to be calibrated  but calibration is difficult

5/27/ Main contributions of this work Suggest a parallel between semantic drift in Espresso- like bootstrapping and topic drift in HITS (Kleinberg, 1999) 2. Solve semantic drift by graph kernels used in link analysis community

Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds) 5/27/201621

5/27/ The regularized Laplacian kernel  A relatedness measure  Takes higher-order relations into account  Has only one parameter Graph Laplacian Regularized Laplacian matrix A :adjacency matrix of the graph D :(diagonal) degree matrix β:parameter Each column of R β gives the rankings relative to a node

Evaluation of regularized Laplacian Comparison to (Agirre et al. 2006) 5/27/201623

algorithmF measure Most frequent sense (baseline)54.5 HyperLex64.6 PageRank64.6 Simplified Espresso44.1 Espresso (after convergence)46.9 Espresso (optimal stopping)66.5 Regularized Laplacian ( β =10 -2 )67.1 5/27/ WSD on all nouns in Senseval-3 Outperforms other graph-based methods Espresso needs optimal stopping to achieve an equivalent performance

5/27/ Conclusions  Semantic drift in Espresso is a parallel form of topic drift in HITS  The regularized Laplacian reduces semantic drift  inherently a relatedness measure (  importance measure)

5/27/ Future work  Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training)  Try other popular tasks of bootstrapping such as named entity extraction  Selection of seed instances matters

5/27/201627

AlgorithmMost frequent senseOther senses Simplified Espresso Espresso (after convergence) Espresso (optimal stopping) Regularized Laplacian ( β =10 -2 ) /27/ Label prediction of “bank” (F measure) The regularized Laplacian keeps high recall for infrequent senses Espresso suffers from semantic drift (unless stopped at optimal stage)

Regularized Laplacian is mostly stable across a parameter 5/27/201629

Pattern/instance ranking in Espresso Score for pattern p Score for instance i p: pattern i: instance P: set of patterns I: set of instances pmi: pointwise mutual information max pmi: max of pmi in all the patterns and instances