1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.

Slides:



Advertisements
Similar presentations
1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Advertisements

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Jan Wiebe University of Pittsburgh Claire Cardie Cornell University Ellen Riloff University of Utah Opinions in Question Answering.
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis Theresa Wilson Janyce Wiebe Paul Hoffmann University of Pittsburgh.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
Comparing Methods to Improve Information Extraction System using Subjectivity Analysis Prepared by: Heena Waghwani Guided by: Dr. M. B. Chandak.
Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Mining and Summarizing Customer Reviews
Module Code CT1H01NI: Study Skills For Communication Technology Lecture for Week Autumn.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
A Language Independent Method for Question Classification COLING 2004.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Evaluating an Opinion Annotation Scheme Using a New Multi- perspective Question and Answer Corpus (AAAI 2004 Spring) Veselin Stoyanov Claire Cardie Diane.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Automatic recognition of discourse relations Lecture 3.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
Bootstrapped Optimistic Algorithm for Tree Construction
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Lecture 15: Text Classification & Naive Bayes
iSRD Spam Review Detection with Imbalanced Data Distributions
©2004 Pearson Education, Inc., publishing as Longman Publishers.
Presentation transcript:

1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh

2/18 Abstract Introduction: Multi-Perspective Question Answering (MPQA) OpQA Corpus Analysis of characteristics of opinion answers Answer length Partial answers Syntactic constituent of the answer Experiments on Filters: Subjectivity filters Opinion source filters Conclusion

3/18 Introduction of MPQA Fact-based QA: “ When did McDonald ’ s open its first restaurant? ” A lot of research has been done already Multi-Perspective QA (MPQA): “ How do the Chinese regard the human rights record of the United States? ” Relatively little research has been done here (Will successful approaches in fact-based QA work well for MPQA?)

4/18 Introduction of OpQA Corpus 98 documents: June 2001~May 2002 Phrase-level opinion info 4 general and controversial topics: President Bush ’ s alternative to the Kyoto protocol (kyoto) The US annual human rights report (humanrights) The 2002 coup d ’ etat in Venezuela (venezuela) The 2002 elections in Zimbabwe and Mugabe ’ s reelection (mugabe) 19~33 docs per topic Questions: 6~8 Qs per topic evenly  30 Qs totally Questions

5/18 Introduction of OpQA Corpus Answers: every text segment that contributes to an answer to any of the 30 questions Mark the minimum answer spans “ a Tokyo organization representing about 150 Japanese groups ”  “ a Tokyo organization ” Partial answer Lack the specificity needed to constitute a full answer Q: “ When was the Kyoto protocol ratified? ” A: “ before May ” (when a specific date is known) Need to be combined with at least one additional answer segment to fully answer the question Q: “ Are the Japanese unanimous in their opposition of Bush ’ s position on the Kyoto protocol? “ Only partially by a segment expressing a single opinion

6/18 Characteristics of opinion answers Use the OpQA corpus to analyze and compare the characteristics of fact vs. opinion Qs. Traditional QA architectures: IR module Linguistic filters Semantic filters : when  date/time; who  person/organization Syntactic filters : who  noun phrase

7/18 Answer length Approximately twice as long as those of fact questions  likely to span more than a single syntactic constituent  rendering the syntactic filters and the semantic filters less effective

8/18 Partial answers Much more likely to represent partial answers rather than complete answers Answer generator: Distinguish between partial and full answers Recognize redundant partial answers Identify which subset of the partial answers Determine whether additional documents need to be examined to find a complete answer Assemble the final answer from partial pieces of information

9/18 Syntactic constituent of the answer Use Abney ’ s (1996) CASS partial parser, and count the number of times an answer segment for the question matches each constituent type 4 constituent types: noun phrase (n) verb phrase (v) prepositional phrase (p) clause (c)

10/18 Syntactic constituent of the answer 3 matching criteria ex: answer segments whose spans exactly correspond to a constituent in the CASS output up: the constituent completely contains the answer and no more than three additional (non- answer) tokens up/dn: the answer matches according to the up criterion or if the answer completely contains the constituent and no more than three additional tokens Results

11/18 Characteristics of opinion answers_ Overview Approximately twice as long as those of fact questions Much more likely to represent partial answers rather than complete answers Vary much more widely with respect to syntactic category; in contrast, fact answers are overwhelming associated with noun phrases Roughly half as likely to correspond to a single syntactic constituent type

12/18 Subjectivity Filters for MPQA Systems 3 subjectivity filters: Manual: consider a sentence to be opinion if it contains at least one opinion of intensity medium or higher, and to be fact otherwise Rulebased: use a bootstrapping algorithm to perform a sentence-based opinion classification Na ï ve Bayes: trained a Naive Bayes subjectivity classifier on the labeled set

13/18 Experiments on Subjectivity Filters Answer rank experiments: Can subjectivity filters improve the answer identification phase? For each opinion Q, do the following: Results

14/18 Experiments on Subjectivity Filters Answer probability experiments: Can opinion information be used in an answer generator? Compute the probabilities: Results: < < < < < >

15/18 Opinion Source Filters for MPQA Systems Source filter: removes all sentences that do not have an opinion annotation with a source that matches the source of the question (manually identified) Use Manual source annotation only Answer rank experiment

16/18 Opinion Source Filters for MPQA Systems Results: Outperforms the baseline on some questions and performs worst on others MRR is worse than the baseline ( vs ) MRFA is the best (11.26 vs )  the ability to recognize the As to the hardest Qs M7: What did South Africa want Mugabe to do after the 2002 election? (rank: 153  21) M8: What is Mugabe ’ s opinion about the West ’ s attitude and actions towards the 2002 Zimbabwe election? (rank: 182  11) Exception: V3: Did anything surprising happen when Hugo Chavez regained power in Venezuela after he was removed by a coup? No clear source, only a single answer, opinion not clear …… Always ranked an answer within the first 25 answers Especially useful in the additional processing phase

17/18 Conclusion Use OpQA corpus to compare the characteristics of answers to fact and opinion questions Surmise that traditional QA approaches may not be as effective for MPQA as they have been for fact-based QA Investigate the use of machine learning and rule-based opinion filters and showed that they can be used to guide MPQA systems

18/18 Q & A

19/18 Questions in the OpQA collection by topic

20/18 Syntactic Constituent Type the % of correct answers that would remain after filtering roughly half as likely to correspond to a single syntactic constituent type Vary much more widely with respect to syntactic category

21/18 Results for the subjectivity filters No filtering at least as high as in the baseline