1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.

Large-Scale Entity-Based Online Social Network Profile Linkage.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,

TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,

Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.

Easy-First Coreference Resolution Veselin Stoyanov and Jason Eisner Johns Hopkins University.

Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.

Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.

1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)

Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.

Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman.

Introduction to Data Mining Engineering Group in ACL.

Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.

Mining and Summarizing Customer Reviews

Andreea Bodnari, 1 Peter Szolovits, 1 Ozlem Uzuner 2 1 MIT, CSAIL, Cambridge, MA, USA 2 Department of Information Studies, University at Albany SUNY, Albany,

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

A Language Independent Method for Question Classification COLING 2004.

Coreference Resolution

Identifying Opinion Holders for Question Answering in Opinion Texts Soo-Min Kim and Eduard Hovy Information Sciences Institute University of Southern California.

Experiments of Opinion Analysis On MPQA and NTCIR-6 Yaoyong Li, Kalina Bontcheva, Hamish Cunningham Department of Computer Science University of Sheffield.

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.

1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.

Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Evaluating an Opinion Annotation Scheme Using a New Multi- perspective Question and Answer Corpus (AAAI 2004 Spring) Veselin Stoyanov Claire Cardie Diane.

Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Evaluation issues in anaphora resolution and beyond Ruslan Mitkov University of Wolverhampton Faro, 27 June 2002.

Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

Clustering Algorithms for Noun Phrase Coreference Resolution

A Machine Learning Approach to Coreference Resolution of Noun Phrases

iSRD Spam Review Detection with Imbalanced Data Distributions

A Machine Learning Approach to Coreference Resolution of Noun Phrases

NAÏVE BAYES CLASSIFICATION

Presentation transcript:

1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850, USA Advisor: Hsin-Hsi Chen Speaker: Yong-Sheng Lo Date: 2006/10/23 ACL2006 Workshop on Sentiment and Subjectivity in Text

2 Agenda Introduction Toward opinion summarization –Source coreference resolution Data set The method Transformation –Standard noun phrase coreference resolution Coreference resolution –By Ng and Cardie (2002) Evaluation Conclusion

3 Introduction 1/4 Problem of opinion summarization Addressing the dearth of approaches for summarizing opinion information –Source coreference resolution »Deciding which source mentions (opinion holders) are associated with opinions that belong to the same real- world entity –Example (see next page) Coreference resolution –Deciding what noun phrases in the text refer to the same real-world entities – 阿扁 or 陳總統 or 中華民國陳總統 = 陳水扁

4 Introduction 2/4 Example (corpus of manually annotated opinions) “ [ Target Delaying of Bulgaria’s accession to the EU] would be a serious mistake” [ Source Bulgarian Prime Minister Sergey Stanishev] said in an interview for the German daily Suddeutsche Zeitung. “ [ Target Our country] serves as a model and encourages countries from the region to follow despite the difficulties”, [ Source he] added. [ Target Bulgaria] is criticized by [ Source the EU] because of slow reforms in the judiciary branch, the newspaper notes. Stanishev was elected prime minister in Since then, [ Source he] has been a prominent supporter of [ Target his country’s accession to the EU].

5 Introduction 3/4

6 Introduction 4/4 Example (source coreference resolution) “ [ Target Delaying of Bulgaria’s accession to the EU] would be a serious mistake” [ Source Bulgarian Prime Minister Sergey Stanishev] said in an interview for the German daily Suddeutsche Zeitung. “ [ Target Our country] serves as a model and encourages countries from the region to follow despite the difficulties”, [ Source he] added. [ Target Bulgaria] is criticized by [ Source the EU] because of slow reforms in the judiciary branch, the newspaper notes. Stanishev was elected prime minister in Since then, [ Source he] has been a prominent supporter of [ Target his country’s accession to the EU].

7 Data set 1/2 MPQA corpus (Wilson and Wiebe, 2003) Multi-Perspective Question Answering Developing annotation using GATE –General Architecture for Text Engineering –Example (see next page) 535 manually annotated documents with phrase-level opinion information Over 11-month period, between June 2001 and May 2002 Suitable for the political, government and commercial domain Can find source coreference chain Contains no coreference information for general NPs (which are not sources)

8 Data set 2/2 Example of annotations in GATE

9 The method 1/10 To solve source coreference resolution Transformation –How source coreference resolution (SCR) can be transformed into standard noun phrase coreference resolution (NPCR) ? Difference between SCR and NPCR : 1.The sources of opinions do not exactly correspond to the automatic extractors’ notion of noun phrases (NPs) 2.The time-consuming nature of coreference annotation

10 The method 2/10 The general approach to SCR 1.Preprocessing –To obtain an augmented set of NPs in the text –Done by Ng and Cardie (2002) »Running a tokenizer, sentence splitter, POS tagger, parser, a base NP finder, and a named entity finder 2.Source to noun phrase mapping –Three problems –Using a set of heuristics 3.Coreference resolution –Applying a state-of-the-art coreference resolution approach to the transformed data »“ Improving Machine learning approaches to coreference resolution ” [ by Ng and Cardie (2002) ]

11 The method 3/10 Three problems –Inexact span match »“Venezuelan people” vs. “the Venezuelan people” »“Muslims rulers” was not recognized, while “Muslims” and “rulers” were recognized by the NP extractor –Multiple NP match »“the country’s new president, Eduardo Duhalde” »“Latin American leaders at a summit meeting in Costa Rica” »“Britain, Canada and Australia” –No matching NP »“Carmona named new ministers, including two military officers who rebelled against Chavez” »“many”, “which”, and “domestically” »“lash” and “taskforce”

12 The method 4/10 Using a set of heuristics –Rule 1 »If a source matches any NP exactly in span, match that source to the NP; do this even if multiple NPs overlap the source »Example_1 ·[determiner] “ the Venezuelan people ” ·[NP extractor] “ the Venezuelan people ” »Example_2 ·[determiner] “ the country’s new president, Eduardo Duhalde ” ·[NP extractor] “ the country’s new president”, “Eduardo Duhalde ” ·

13 The method 5/10 Rule 2 If no NP matches exactly in span then : –If a single NP overlaps the source, »Then map the source to that NP –If multiple NP overlaps the source, »Then prefer three cases : »The outermost NP ·Because longer NPs contain more information »The last NP ·Because it is likely to be the head NP of a phrase »NP’s before preposition ·Because a preposition signals an explanatory prepositional phrase

14 The method 6/10 Example 1.The outermost NP –[determiner] »“Prime Minister Sergey Stanishev” –[NP extractor] »“Bulgarian Prime Minister”, “Sergey Stanishev” »“Bulgarian Prime Minister Sergey Stanishev” 2.The last NP –[determiner] »“new president, Eduardo Duhalde” –[NP extractor] »“the country’s new president”, “Eduardo Duhalde” 3.NP’s before preposition –[determiner] »“Latin American leaders at a summit meeting in Costa Rica” –[NP extractor] »“Latin American leaders”, ”summit meeting”, “Costa Rica”

15 The method 7/10 Rule 3 If no NP overlaps the source, select the last NP before the source. –Stanishev was elected prime minister in Since then, [ source he] has been a prominent supporter. »[determiner] => “he“ »[NP extractor] ·“Stanishev“,“prime minister”,“prominent supporter” In half of the cases we are dealing with the word who, which typically refers to the last preceding NP. –“Carmona named new ministers, including two military officers who rebelled against Chavez”

16 The method 8/10 Coreference resolution Using the standard combination of classification and single-link clustering –Soon et al. (2001) and Ng and Cardie (2002) Machine learning approach –Computing a vector of 57 features for every pair of source noun phrases from the preprocessed corpus »(source, NP) –Training positive »To predict whether a source NP pair should be classified as positive (the NPs refer to the same entity) or negative –Testing »To predict whether a source NP pair is positive »and single-link clustering to group together sources that belong to the same entity

17 The method 9/10 Example (Single-link clustering) Training (positive instance) –(source, NP) + feature set –( 李登輝, 李前總統 ) + 57 features –( 李登輝, 登輝先生 ) + 57 features –( 阿輝伯, 登輝先生 ) + 57 features Testing –( 李前總統, 登輝先生 ) => positive –( 阿輝伯, 李前總統 ) => positive » 阿輝伯 -- 李前總統 -- 登輝先生

18 The method 10/10 Machine learning techniques To try the reportedly best techniques for pairwise classification –RIPPER (Cohen, 1995) »Repeated Incremental Pruning to Produce Error Reduction »Using 24 different settings –SVM light »Support Vector Machines »Using 56 different settings Feature set 57 = ?? –12 by Soon et al. (2001) –41 by Ng and Cardie (ACL2002)

19 Feature set (12 features) ( NP i, NP j )

20 Feature set (41 features)

21 Feature set (41 features) cont.

22 Evaluation MPQA corpus (535 documents) 400 for training set (random) 135 for test set (remaining) The purpose of the evaluation To create a strong baseline –Using the best setting for the NP coreference resolution

23 Evaluation Instance selection Adopt the method of Soon et al.(2001) –selects for each NP the pairs with the n preceding coreferent instances and all intervening non-coreferent pairs Soon 1 (n=1) [ Ng and Cardie (2002) ] Soon 2 (n=2) [ Ng and Cardie (2002) ] None

24 Evaluation Using performance measures for coreference resolution B-CUBED (Bagga and Baldwin, 1998) MUC score (Vilain et al.,1995) Positive identification –Precision, recall and F1 »Using these metrics on the identification of the positive class »By using the pairwise decisions as the classifiers outputs them »Example (see next page) Actual Positive identification –Precision, recall and F1 »Using these metrics on the identification of the positive class »By performing the clustering of the source NPs and then considering a pairwise decision to be positive if the two source NPs belong to the same cluster »Example (see next page)

25 Sample of answer set The classifiers output (positive) Positive identification ( 陳水扁, 陳水扁總統 ) ( 陳水扁, 陳總統 )** ( 陳水扁, 阿扁總統 )** ( 馬英九, 市長馬英九 ) ( 陳總統, 陳水扁總統 )** ( 陳總統, 陳總統 ) ( 陳總統, 阿扁總統 ) ( 阿扁, 陳水扁總統 )** ( 阿扁, 陳總統 ) ( 阿扁, 阿扁總統 ) ( 陳水扁, 陳水扁總統 ) ( 馬英九, 市長馬英九 ) ( 陳總統, 陳總統 ) ( 阿扁, 阿扁總統 ) Actual Positive identification ( 陳水扁, 陳水扁總統 ) ( 馬英九, 市長馬英九 ) ( 陳總統, 陳總統 ) ( 陳總統, 阿扁總統 ) ( 阿扁, 陳總統 ) ( 阿扁, 阿扁總統 ) ( 陳水扁, 陳水扁總統 ) ( 馬英九, 市長馬英九 ) ( 陳總統, 陳總統 ) ( 阿扁, 阿扁總統 ) (source) 陳水扁 (NP) 陳水扁總統 (source) 馬英九 (NP) 市長馬英九 (source) 陳總統, 阿扁 (NP) 陳總統, 阿扁總統

26 Evaluation

27 Evaluation

28 Evaluation

29 Evaluation

30 Evaluation

31 Conclusion As a first step toward opinion summarization To target the problem of source coreference resolution To show that this problem can be tackled effectively as noun coreference resolution To create a baseline Next step To develop a method that utilizes the unlabeled NPs in the corpus using a structured rule learner