Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

CrowdER - Crowdsourcing Entity Resolution

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Fast Algorithms For Hierarchical Range Histogram Constructions

A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Overview of the KBP 2013 Slot Filler Validation Track Hoa Trang Dang National Institute of Standards and Technology.

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

Easy-First Coreference Resolution Veselin Stoyanov and Jason Eisner Johns Hopkins University.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.

Classification Neural Networks 1

Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.

PROMISE: Peer-to-Peer Media Streaming Using CollectCast Mohamed Hafeeda, Ahsan Habib et al. Presented By: Abhishek Gupta.

On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.

Chapter 1: Introduction to Pattern Recognition

Simple Neural Nets For Pattern Classification

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification Aaron Michelony CMPS245 April 12, 2011.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.

Introduction to Machine Learning Approach Lecture 5.

Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,

Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is.

A Discriminative Latent Variable Model for Online Clustering Rajhans Samdani, Kai-Wei Chang, Dan Roth Department of Computer Science University of Illinois.

Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Coreference Resolution with Knowledge Haoruo Peng March 20,

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.

Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.

Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.

Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Solving Hard Coreference Problems Haoruo Peng, Daniel Khashabi and Dan Roth Problem Description  Problems with Existing Coref Systems Rely heavily on.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

This research is supported by the U.S. Department of Education and DARPA. Focuses on mistakes in determiner and preposition usage made by non-native speakers.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Part 2 Applications of ILP Formulations in Natural Language Processing

NYU Coreference CSCI-GA.2591 Ralph Grishman.

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Support Vector Machines

Improving a Pipeline Architecture for Shallow Discourse Parsing

Relational Inference for Wikification

University of Illinois System in HOO Text Correction Shared Task

Machine Learning with Clinical Data

Human-object interaction

Presentation transcript:

Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research is supported by ARL, and by DARPA, under the Machine Reading Program Coreference Coreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity). An American official announced that American President Bill Clinton met his Russian counterpart, Vladimir Putin, today. The president said that Russia was a great country. System Architecture CoNLL Shared Task 2011 Experiments and Results  Results on DEV set with predicted mentions  Results on DEV set with gold mentions  Official scores on TEST set: MethodMDMUCBCUBCEAFAVG Best-Link Best-Link W/ Const All-Link All-Link W/ Const MethodMUCBCUBCEAFAVG Best-Link Best-Link W/ Const All-Link All-Link W/ Const TaskMDMUCBCUBCEAFAVG Pred. Mentions w/ Pred. Boundaries (Best-Link W/ Const. ) Pred. Mentions w/ Gold Boundaries (Best-Link W/ Const. ) Gold Mentions (Best-Link) Discussion and Conclusions Mention Detection Coreference Resolution with Pairwise Coreference Model Inference Procedure Best-Link Strategy All-Link Strategy Knowledge-based Constraints Post-Processing  Coreference resolution on the OntoNotes-4.0 data set.  Based on Bengtson and Roth (2008), our system is built on Learning Based Java (Rizzolo and Roth, 2010)  We participated in the “closed” track of the shared task. Compared to the ACE 2004 Corpus, the OntoNotes-4.0 data set has two main differences:  Singleton mentions are not annotated in OntoNotes-4.0.  OntoNotes-4.0 takes the largest logical span to represent a mention.  We design a high recall (~90%) and low precision (~35%) rule-based mention detection system.  As a post-processing step, we remove all predicted mentions that remain in singleton clusters after the inference stage.  The system achieves 64.88% in F1 score on TEST set. Mention Detection Pairwise Mention Score Knowledge-based Constraints Inference Procedure The inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. We investigate two techniques: Best-Link and All-Link. Best-Link Inference: For each mention, Best-Link considers the best mention on its left to connect to (according the pairwise score) and creates a link between them if the pairwise score is above some threshold. All-Link Inference: The All-Link inference approach scores a clustering of mentions by including all possible pairwise links in the score. It is also known as correlational clustering (Bansal et al., 2002). ILP Formulation: Both Best-Link and All-Link can be written as an Integer linear programming (ILP) problem: Best-Link: All-Link: w uv is the compatibility score of a pair of mentions, y uv is a binary variable. For the All-Link clustering, we drop one of the three transitivity constraints for each triple of mention variables. Similar to Pascal and Baldridge (2009), we observe that this improves accuracy. Weight vector learned from training data Extracted features Compatibility score given by constraints A threshold parameter (to be tuned) Training Procedure We use the same features as Bengtson and Roth (2008) with the knowledge extracted from OntoNotes-4.0. We explored two types of learning strategies, which can be used to learn w in Best-Link and All-Link. The choice of a learning strategy depends on the inference procedure. 1.Binary Classification: Following Bengtson and Roth (2008), we learn the pairwise scoring function w on:  Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class.  Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes. As singleton mentions are not annotated, the sample distributions in the training and inference phases are inconsistent. We apply the mention detector to the training set, and train the classifier using the union set of gold and prediction mentions. 2.Structured Learning (Structured Perceptron): We present a structured perceptron algorithm, which is similar to the supervised clustering algorithm of Finley and Joachims (2005) to learn w: We define three high precision constraints that improve recall on NPs with definite determiners and mentions whose heads are Named Entities. Examples of mention pairs that are correctly linked by the constraints are: [Governor Bush] and [Bush],[a crucial swing state, Florida] and [Florida], [Sony itself] and [Sony].  We investigate two inference methods: Best-Link & All-Link  We provide a flexible architecture for incorporating constraints  We compare and evaluate the two inference approaches and the contribution of constraints Contributions  Best-Link outperforms All-Link. This raises a natural algorithmic question regarding the inherent nature of the clustering style most suitable for coreference resolution and regarding possible ways of infusing more knowledge into different coreference clustering styles.  Constraints improve the recall on a subset of mentions. There are other common errors for the system that might be fixed by constraints.  Our approach accommodates infusion of knowledge via constraints. We have demonstrated its utility in an end-to-end coreference system.