Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Perceptron Learning Rule
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.
Easy-First Coreference Resolution Veselin Stoyanov and Jason Eisner Johns Hopkins University.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.
Anaphora Resolution Sanghoon Kwak Takahiro Aoyama.
Introduction to Machine Learning Approach Lecture 5.
Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,
Introduction to machine learning
A Discriminative Latent Variable Model for Online Clustering Rajhans Samdani, Kai-Wei Chang, Dan Roth Department of Computer Science University of Illinois.
Neural Networks Lecture 8: Two simple learning algorithms
A Global Relaxation Labeling Approach to Coreference Resolution Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat.
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
Differential effects of constraints in the processing of Russian cataphora Kazanina and Phillips 2010.
Introduction  Information Extraction (IE)  A limited form of “complete text comprehension”  Document 로부터 entity, relationship 을 추출 
Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.
A Language Independent Method for Question Classification COLING 2004.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Coreference Resolution with Knowledge Haoruo Peng March 20,
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
This research is supported by the U.S. Department of Education and DARPA. Focuses on mistakes in determiner and preposition usage made by non-native speakers.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
A Deep Memory Network for Chinese Zero Pronoun Resolution
Part 2 Applications of ILP Formulations in Natural Language Processing
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
NYU Coreference CSCI-GA.2591 Ralph Grishman.
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Improving a Pipeline Architecture for Shallow Discourse Parsing
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Machine Learning Week 1.
Clustering Algorithms for Noun Phrase Coreference Resolution
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Automatic Detection of Causal Relations for Question Answering
A Machine Learning Approach to Coreference Resolution of Noun Phrases
University of Illinois System in HOO Text Correction Shared Task
Presentation transcript:

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL, and by DARPA, under the Machine Reading Program Coreference Coreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity). An American official announced that American President Bill Clinton met his Russian counterpart, Vladimir Putin, today. The president said that Russia was a great country. System Architecture CoNLL Shared Task 2012 Experiments and Results We present the performance of the system on both the OntoNotes-4.0 and OnotNote-5.0 datasets:  Results on CoNLL-12 DEV set with pred. mentions  Results on CoNLL-11 DEV set with pred. mentions  Official scores on CoNLL-12 TEST set: We train the system on both TRAIN and DEV sets  We described strategies for improving mention detection and proposed and online latent structured algorithm for coreference resolution.  Using separate classifiers for coreference resolution on pronominal and non-pronominal mentions improves the system.  The performance of our Chinese coreference system is not as good as that for English. It may because we use the same features for English and Chinese and the feature set is developed for the English Corpus.  Overall, the system improves 5% MUC F1, 0.8% BCUB F1 and 1.7% CEAF F1 over baseline system. MethodMDMUCBCUBCEAFAVG Baseline Sep. Pronouns Latent Structure Final System (+Both) MethodMUCBCUBCEAFAVG Bestline New System English (Pred. Mentions) MDMUCBCUBCEAFAVG English (Pred. Mentions) English (Gold Mention Boundaries) English (Gold Mentions) Chinese (Pred Mentions) Discussions and Conclusions Mention Detection Coreference Resolution with Pairwise Coreference Model Inference procedure: Best-Link strategy knowledge-based constraints Post-Processing The CoNLL Shared Task 2012 is an extension of the last year’s coreference task. We use the Illinois-Coref system from CoNLL-2011 as the basis for our current system. We improve over the baseline system by:  Improving mention detection.  Using separate models for coreference resolution on pronominal and non-pronominal mentions.  Designing a latent structured learning protocol for Bestlink inference Pronoun Resolution Learning Protocol for Best-Link  Baseline uses an identical coreference model on both pronominal and non-pronominal mentions  However, features for coreference resolution on pronouns and non-pronouns are usually different. Lexical features are more important for non- pronominal coreference resolution. Gender features are more important for pronoun anaphora resolution. New System: Use separate models  Training two separate classifiers with different sets of features for pronoun and non-pronoun coreference.  The pronoun anaphora classifier includes the features to identify the speaker and to reflect the document type. We investigate two types of learning protocol for Baselink inference. Baseline: Applying Binary Classification: Baseline system applies the strategy in (Bengtson and Roth, 2008) to learn the pairwise scoring function w on:  Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class.  Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes. Drawbacks:  Suffer from a severe label imbalance problem.  Does not relate well to the Bestlink inference. E.g., consider tree mention belonging to the same cluster: {m1: “President Bush”, m2: “he”, m3:“George Bush”}. Positive pair chosen by baseline system: (m2, m3). Better choice: (m1, m3). New System: Latent Structured Learning: We consider the latent structured learning algorithm: Input: mention v and its preceding mentions {u | u < v }. Output: y(v) : the set of antecedents that co-refer with v. h(v):a latent structure that denotes the Bestlink decision the loss function: At each iteration, the Algorithm takes the following steps: 1.Pick a mention v and finds the Bestlink decision u* that is consistent with the gold cluster. 2.Pick the Bestlink decision u’ with the current model by solving a loss-augmented inference. 3.Update the model by the difference between the features vectors Á (u’, v) - Á (u *,v). Best-Link Inference Procedure The inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. Best-Link Inference: For each mention, Best-Link considers the best mention on its left to connect to (best according the pairwise score) and creates a link between them if the pairwise score is above some threshold. Improving on Mention Detection Baseline system implements a high recall and low precision rule-based system. The error analysis shows that there are two main errors: Incorrect Mention Boundary There are two main reasons for boundary errors:  Parse Mistakes: May be due to a wrong attachment or adding extra words to a mention. (e.g., if the parser attaches the relative clause “President Bush, who traveled to China yesterday” to a different noun. The system may predict “President Bush” as a mention. Solutions: (We can only fix a subset of such errors)  Rule-based: Fix mentions starting with a stop word and mentions ending with a punctuation mark.  Use training data to learn patterns of inappropriate mention boundaries.  Annotation Inconsistency: a punctuation mark or an apostrophe used to mark the possessive form are inconsistently added to the end of mentions. Solutions:  In training phrase, perform a relaxed matching between predicted mentions and gold mentions.  We cannot fix this problem in test phrase. Non-referential Noun Phrases  Candidate noun phrases that are unlikely to refer to any entity in the real world. (E.g., “the same time”).  They are not considered as mentions in Ontonotes. Solutions: Use the statistics in training data. 1.Count the number of times that a candidate mention happens to be a gold mention in the training data. 2.Remove candidate mentions that frequently appear in the training data but never appear as gold mentions. 3.Can also take the predicted head word and the words before and after the mention into account. (e.g., help to remove the noun “fact” in the phrase “in fact”).