Detecting Anaphoricity and Antecedenthood for Coreference Resolution Olga Uryupina Institute of Linguistics, RAS.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

University of Sheffield NLP Exercise I Objective: Implement a ML component based on SVM to identify the following concepts in company profiles: company.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.
Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.
A Deterministic Co-reference System with Rich Syntactic Features and Semantic Knowledge Heeyoung Lee & Sudarshan Rangarajan Collaborators : Karthik Raghunathan.
Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.
A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,
A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto.
Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Resolving abbreviations to their senses in Medline S. Gaudan, H. Kirsch and D. Rebholz-Schuhmann European Bioinformatics Institute, Wellcome Trust Genome.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
INFORMATION NETWORKS DIVISION COMPUTER FORENSICS UNCLASSIFIED 1 DFRWS2002 Language and Gender Author Cohort Analysis of .
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Summarizing Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov
An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Competence Centre on Information Extraction and Image Understanding for Earth Observation 29th March 2007 Category - based Semantic Search Engine 1 Mihai.
Error Analysis for Learning-based Coreference Resolution Olga Uryupina
1 Italian FE Component CROSSMARC Eighth Meeting Crete 24 June 2003.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Using Social Media to Enhance Emergency Situation Awareness
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
Saisai Gong, Wei Hu, Yuzhong Qu
Erasmus University Rotterdam
Automatic Hedge Detection
Using Transductive SVMs for Object Classification in Images
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Social Knowledge Mining
Clustering Algorithms for Noun Phrase Coreference Resolution
Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox
A Machine Learning Approach to Coreference Resolution of Noun Phrases
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Dynamic Category Profiling for Text Filtering and Classification
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Detecting Anaphoricity and Antecedenthood for Coreference Resolution Olga Uryupina Institute of Linguistics, RAS

Overview Anaphoricity and Antecedenthood Experiments Incorporating A&A detectors into a CR system Conclusion

A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

Anaphoricity Likely anaphors: - pronouns, definite descriptions Unlikely anaphors: - indefinites Unknown: - proper names Poesio&Vieira: more than 50% of definite descriptions in a newswire text are not anaphoric!

A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

A&A: example Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

Antecedenthood Related to referentiality (Karttunen, 1976): „no debt“ etc Antecedenthood vs. Referentiality: corpus-based decision

Experiments Can we learn anaphoricity/antecedenthood classifiers? Do they help for coreference resolution?

Methodology MUC-7 dataset Anaphoricity/antecedenthood induced from the MUC annotations Ripper, SVM

Features Surface form (12) Syntax (20) Semantics (3) Salience (10) „same-head“ (2) From Karttunen, 1976 (7) 49 features – 123 boolean/continuous

Results: anaphoricity Feature groupsRPF Baseline All Surface Syntax Semantics Salience Same-head Karttunen‘s Synt+SH

Results: antecedenthood Feature groupsRPF Baseline All Surface Syntax Semantics Salience Same-head Karttunen‘s

Integrating A&A into a CR system Apply an A&A prefiltering before CR starts: -Saves time -Improves precision Problem: we can filter out good candidates..: - Will loose some recall

Oracle-based A&A prefiltering Take MUC-based A&A classifier („gold standard“ CR system: Soon et al. (2001) with SVMs MUC-7 validation set (3 „training“ documents)

Oracle-based A&A prefiltering RPF No prefilteing ±ana ±ante ±ana & ±ante

Automatically induced classifiers Precision more crucial than Recall Learn Ripper classifiers with different Ls (Loss Ratio)

Anaphoricity prefiltering

Antecedenthood prefiltering

Conclusion Automatically induced detectors: Reliable for anaphoricity Much less reliable for antecedenthood (a corpus, explicitly annotated for referentiality could help) A&A prefiltering: Ideally, should help In practice – substantial optimization required

Thank You!