Exploiting Background Knowledge for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign 1.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Machine learning continued Image source:
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation.
Scalable Text Mining with Sparse Generative Models
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
CROSS-DOCUMENT RELATION DISCOVERY, TRUTH FINDING Heng Ji Nov 12, 2014.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
Open Information Extraction using Wikipedia
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
A Language Independent Method for Question Classification COLING 2004.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Algorithmic Detection of Semantic Similarity WWW 2005.
Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Automatically Labeled Data Generation for Large Scale Event Extraction
Lecture 7: Constrained Conditional Models
Part 2 Applications of ILP Formulations in Natural Language Processing
A Brief Introduction to Distant Supervision
Relation Extraction CSCI-GA.2591
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Lecture 24: NER & Entity Linking
Relational Inference for Wikification
Introduction Task: extracting relational facts from text
Automatic Extraction of Hierarchical Relations from Text
relation EXTRACTION, BOOTSTRAPPING
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Dan Roth Department of Computer Science
Presentation transcript:

Exploiting Background Knowledge for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign 1

Relation Extraction Relation extraction (RE)  “David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team” Supervised RE  Train on sentences annotated with entity mentions and predefined target relations  Common features: BOW, POS tags, syntactic/dependency parses, kernel functions based on structured representations of the sentence 2

Background Knowledge Features employed are usually restricted to being defined on the various representations of the target sentences Humans rely on background knowledge to recognize relations Overall aim of this work  Propose methods of using knowledge or resources that exists beyond the sentence Wikipedia, word clusters, hierarchy of relations, entity type constraints, coreference As additional features, or under the Constraint Conditional Model (CCM) framework with Integer Linear Programming (ILP) 3

4 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

5 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

6 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

7 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge

8 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge David Brian Cone (born January 2, 1963) is a former Major League Baseball pitcher. He compiled an 8–3 postseason record over 21 postseason starts and was a part of five World Series championship teams (1992 with the Toronto Blue Jays and 1996, 1998, 1999 & 2000 with the New York Yankees). He had a career postseason ERA of He is the subject of the book A Pitcher's Story: Innings With David Cone by Roger Angell. Fans of David are known as "Cone-Heads." Major League BaseballpitcherWorld Series1992Toronto Blue Jays New York YankeesRoger AngellCone-Heads Cone lives in Stamford, Connecticut, and is formerly a color commentator for the Yankees on the YES Network. [1]Stamford, Connecticut color commentatorYES Network [1] Contents [hide]hide 1 Early years 2 Kansas City Royals 3 New York Mets Partly because of the resulting lack of leadership, after the 1994 season the Royals decided to reduce payroll by trading pitcher David Cone and outfielder Brian McRae, then continued their salary dump in the 1995 season. In fact, the team payroll, which was always among the league's highest, was sliced in half from $40.5 million in 1994 (fourth-highest in the major leagues) to $18.5 million in 1996 (second-lowest in the major leagues)David ConeBrian McRae1995 season1996

9 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge fine-grained Employment:Staff0.20 Employment:Executive0.15 Personal:Family0.10 Personal:Business0.10 Affiliation:Citizen0.20 Affiliation:Based-in0.25

10 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge fine-grainedcoarse-grained Employment:Staff Employment Employment:Executive0.15 Personal:Family Personal Personal:Business0.10 Affiliation:Citizen Affiliation Affiliation:Based-in0.25

11 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge fine-grainedcoarse-grained Employment:Staff Employment Employment:Executive0.15 Personal:Family Personal Personal:Business0.10 Affiliation:Citizen Affiliation Affiliation:Based-in0.25

12 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Using Background Knowledge fine-grainedcoarse-grained Employment:Staff Employment Employment:Executive0.15 Personal:Family Personal Personal:Business0.10 Affiliation:Citizen Affiliation Affiliation:Based-in

Basic Relation Extraction (RE) System Our basicRE system  Given a sentence “... m 1... m 2...”, predict whether any predefined relation holds  Asymmetric relations, e.g. m 1 :r:m 2 vs m 2 :r:m 1 13

Basic Relation Extraction (RE) System Our basicRE system  Given a sentence “... m 1... m 2...”, predict whether any predefined relation holds  Asymmetric relations, e.g. m 1 :r:m 2 vs m 2 :r:m 1 14

Basic Relation Extraction (RE) System Most of the features based on the work in (Zhou et al., 2005)  Lexical: hw, BOW, bigrams,...  Collocations: words to the left/right of the mentions,...  Structural: m 1 -in- m 2, #mentions between m 1,m 2,...  Entity typing: m 1,m 2 entity-type,...  Dependency: dep-path between m 1,m 2,... 15

Knowledge Sources As additional features  Wikipedia  Word clusters As constraints  Hierarchy of relations  Entity type constraints  Coreference 16

Knowledge 1 : Wikipedia 1 (as additional feature) We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages Introduce a new feature based on:   introduce a new feature by combining the above with the coarse- grained entity types of m i, m j 17 mimi mjmj r ?

Knowledge 1 : Wikipedia 2 (as additional feature) Given m i, m j, we use a Parent-Child system (Do and Roth, 2010) to predict whether they have a parent-child relation Introduce a new feature based on:   combine the above with the coarse-grained entity types of m i, m j 18 mimi mjmj parent-child?

Knowledge 2 : Word Class Information (as additional feature) Supervised systems face an issue of data sparseness (of lexical features) Use class information of words to support generalization better: instantiated as word clusters in our work  Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) apple pear Apple IBM boughtrun of in

Knowledge 2 : Word Class Information Supervised systems face an issue of data sparseness (of lexical features) Use class information of words to support generalization better: instantiated as word clusters in our work  Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) apple pear Apple boughtrun of in IBM

Knowledge 2 : Word Class Information Supervised systems face an issue of data sparseness (of lexical features) Use class information of words to support generalization better: instantiated as word clusters in our work  Automatically generated from unlabeled texts using algorithm of (Brown et al., 1992) apple pear Apple boughtrun of in IBM011

Knowledge 2 : Word Class Information All lexical features consisting of single words will be duplicated with its corresponding bit-string representation apple pear Apple IBM boughtrun of in

Knowledge Sources As additional features  Wikipedia  Word clusters As constraints  Hierarchy of relations  Entity type constraints  Coreference 23

24 weight vector for “local” models collection of classifiers Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008)

25 weight vector for “local” models collection of classifiers penalty for violating the constraint how far y is from a “legal” assignment

Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) 26 Wikipedia word clusters hierarchy of relations entity type constraints coreference

Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008) Goal of CCM: when you want to predict multiple variables, and you want to exploit the fact that they are related  Encode knowledge as constraints to exploit interaction between the multiple predictions  You impose constraints on the predictions of your various models. This is a global inference problem We learn separate models and then perform joint global inference to arrive at final predictions 27

28 David Cone, a Kansas City native, was originally signed by the Royals and broke into the majors with the team Constraint Conditional Models (CCMs) fine-grainedcoarse-grained Employment:Staff Employment Employment:Executive0.15 Personal:Family Personal Personal:Business0.10 Affiliation:Citizen Affiliation Affiliation:Based-in0.25

Key steps  Write down a linear objective function  Write down constraints as linear inequalities  Solve using integer linear programming (ILP) packages 29 Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et al., 2008)

Knowledge 3 : Relations between our target relations... personal... employment family bizexecutive staff 30

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 31 coarse-grained classifier fine-grained classifier

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 32 mimi mjmj coarse-grained? fine-grained?

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 33

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 34

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 35

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 36

Knowledge 3 : Hierarchy of Relations... personal... employment family bizexecutive staff 37

Knowledge 3 : Hierarchy of Relations Write down a linear objective function 38 coarse-grained prediction probabilities fine-grained prediction probabilities

Knowledge 3 : Hierarchy of Relations Write down a linear objective function 39 coarse-grained prediction probabilities fine-grained prediction probabilities coarse-grained indicator variable fine-grained indicator variable indicator variable == relation assignment

Knowledge 3 : Hierarchy of Relations Write down constraints  If a relation R is assigned a coarse-grained label rc, then we must also assign to R a fine-grained relation rf which is a child of rc.  (Capturing the inverse relationship) If we assign rf to R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label 40

Knowledge 4 : Entity Type Constraints ( Roth and Yih, 2004, 2007) Entity types are useful for constraining the possible labels that a relation R can assume 41 mimi mjmj Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in

Entity types are useful for constraining the possible labels that a relation R can assume 42 Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in per org per org per org gpe per mimi mjmj Knowledge 4 : Entity Type Constraints ( Roth and Yih, 2004, 2007)

We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations  By improving the coarse-grained predictions and combining with the hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications 43 Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in per org per org per org gpe per mimi mjmj Knowledge 4 : Entity Type Constraints ( Roth and Yih, 2004, 2007)

Knowledge 5 : Coreference 44 mimi mjmj Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in

Knowledge 5 : Coreference In this work, we assume that we are given the coreference information, which is available from the ACE annotation. 45 mimi mjmj Employment:Staff Employment:Executive Personal:Family Personal:Business Affiliation:Citizen Affiliation:Based-in null

Experiments Used the ACE-2004 dataset for our experiments  Relations do not cross sentence boundaries  We model the argument order (of the mentions) m 1 :r:m 2 vs m 2 :r:m 1  Allow null label prediction when mentions are not related Classifiers  regularized averaged perceptrons implemented within the SNoW (Carlson et al., 1999)  Followed prior work (Jiang and Zhai, 2007) and performed 5-fold cross validation 46

Performance of the Basic RE System Build a BasicRE system using only the basic features Compare against the state-of-the-art feature-based RE system of Jiang and Zhai (2007)  The authors performed their evaluation using undirected coarse- grained relations (7 relation labels + 1 null label)  Evaluation on nwire and bnews corpora of ACE-2004  Performance (F1%) 47 Jiang and Zhai (2007)BasicRE 71.5%71.2%

Experimental Settings ACE-2004: 7 (coarse) and 23 (fine) grained relations Trained two classifiers:  coarse-grained (15 relation labels)  fine-grained (47 relation labels) Focus on evaluation of fine-grained relations Use the nwire corpus for our experiments  Two of our knowledge sources (Wiki system, word clusters) assume inputs of mixed-case text  bnews corpus in lower-cased text  28,943 relation instances with 2,226 (non-null) 48

Evaluation Settings 49 Prior workOur work Train-test data splits at mention level Train-test data splits at document level Evaluation at mention level Evaluation at entity level more realistic

Experimental Settings Evaluate our performance at the entity level  Prior work calculated RE performance at the level of mentions  ACE annotators rarely duplicate a relation link for coreferent mentions:  Given a pair of entities, we establish the set of relation types existing between them, based on their mention annotations r... m i... m j... m k null?

Experiment Results (F1%): fine-grained relations 51 All nwire10% of nwire BasicRE50.5%31.0%

Experiment Results (F1%): fine-grained relations 52 F1% improvement from using each knowledge source All nwire10% of nwire BasicRE50.5%31.0%

Related Work Ji et al. (2005), Zhou et al. (2005,2008), Jiang (2009) 53

Conclusion We proposed a broad range of methods to inject background knowledge into a RE system Some methods (e.g. exploiting the relation hierarchy) are general in nature To combine the various relation predictions, we perform global inference within an ILP framework  This allows for easy injection of knowledge as constraints  Ensures globally coherent models and predictions 54