Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Yansong Feng and Mirella Lapata

LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Exploring the Effectiveness of Lexical Ontologies for Modeling Temporal Relations with Markov Logic Eun Y. Ha, Alok Baikadi, Carlyle Licata, Bradford Mott,

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Robust Extraction of Named Entity Including Unfamiliar Word Masatoshi Tsuchiya, Shinya Hida & Seiichi Nakagawa Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi.

Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.

D ETERMINING THE S ENTIMENT OF O PINIONS Presentation by Md Mustafizur Rahman (mr4xb) 1.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Mining Wiki Resources for Multilingual Named Entity Recognition Alexander E. Richman & Patrick Schone Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.

Supervised models for coreference resolution Altaf Rahman and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1.

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

Anaphora Resolution Sanghoon Kwak Takahiro Aoyama.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.

A Light-weight Approach to Coreference Resolution for Named Entities in Text Marin Dimitrov Ontotext Lab, Sirma AI Kalina Bontcheva, Hamish Cunningham,

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.

On the Issue of Combining Anaphoricity Determination and Antecedent Identification in Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto Nara Institute.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

Opinion Sentence Search Engine on Open-domain Blog Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, Ryoji Kataoka NTT Cyber Solutions Laboratories, NTT.

1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.

A Language Independent Method for Question Classification COLING 2004.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Identifying Opinion Holders for Question Answering in Opinion Texts Soo-Min Kim and Eduard Hovy Information Sciences Institute University of Southern California.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming Xiaofeng Yang 1 Jian Su 1 Jun Lang 2 Chew Lim Tan 3 Ting Liu 2 Sheng.

1 Toward Opinion Summarization: Linking the Sources Veselin Stoyanov and Claire Cardie Department of Computer Science Cornell University Ithaca, NY 14850,

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.

Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.

Sentiment analysis algorithms and applications: A survey

Aspect-based sentiment analysis

Social Knowledge Mining

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Presentation transcript:

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen Information and Communications University South Korea 2007 Conference on Granular Computing 1

2 Introduction The web contains a wealth of opinions for various topics. “Opinion holder identification” helps to analyze how people think about social issues. Challenging on online news articles: Newspapers carry various opinions from very different holders. There may be many possible candidates between an anaphor and the proper antecedent. The authors proposed an anaphor resolution based opinion holder identification method exploiting lexical and syntactic information. Corpus: NTCIR-6, English 2 2

3 Related Work “Automatic extraction of opinion propositions and their holders”, S. Bethard et al. Semantic parser based system “Identifying Opinion Holders for Question Answering in Opinion Texts”, S. Kim and E. Hovy. Maximum Entropy ranking model using syntactic features. “Anaphora resolution by antecedent identification followed by anaphoricity determination”, R. Iida et al. Anaphor resolution technique; maximizes the use of lexical and structural information 3

Method An anaphor resolution based opinion holder identification method exploiting lexical and syntactic information. 1. System Architecture The system consists of the subjectivity classification and opinion holder identification. 2. Resolving Coreference Problem 3. Identifying Actual Opinion Holder 4. Selecting Training Features 4

System Architecture 1. Determine the subjectivity of a sentence Opinionated or Non-opinionated 2. Identifies opinion holders including anaphor resolution Anaphoric holder or non-anaphoric holder 3. Identifies Actual Opinion Holder Selecting the most probable holder among candidates Each sentence is associated with a triplet 5

Resolving Coreference Problem (1/2) Model: SVM light (MIT) Named Entity Recognizer: MALLET (UMASS) Noun Phrase: (t|T)he(ADJ ?)(Noun+) Several opinions are carried by “the author”; an implicit holder without explicit holder. Classify opinions into three classes: Anaphoric, Non-anaphoric, or The author 6

Resolving Coreference Problem (2/2) Anaphoric Non-anaphoric The author 7

Identifying Actual Opinion Holder Candidate lists: Anaphoric: Entities or opinion holders from the previous sentences Non-anaphoric: Entities from the current sentence Model: Decision rule based on a probabilistic model Candidate lists:{ h 1, h 2,..., h C }; context e f k is a feature function (0,1); λ k is the weight of each f k S all :all opinionated sentences S O :opinions contained opinion holders 8

Selecting Training Features Feature function in learning model 9

Experiments (1/3) Lenient standard is the case where one or more words are overlapped Strict standard is the case for the exact matching About 38% of whole holders is anaphoric in the gold-standard The system accomplishes the following tasks. 1. Anaphoricity classification (Anaphoric holder / Non- anaphoric holder / The author) 2. Non-anaphoric holder resolution: ranking the candidates based on the features (Table 2) 3. Anaphoric holder resolution: ranking the candidates based on the features (Table 3) 10

Experiments (2/3) Non-anaphoric: Lexical clues (N3, 4, 5, 6) are dominant, whereas syntactic clues (N1, 2) are less effective Anaphoric: The combination of all features leads dramatically increasing performance NER could not generate proper candidates such as “The ministry speaker, Ms. Barbara” but catch only “Ms. Barbara” 11

Experiments (3/3) Various features such as some particular phrases and the mixture of syntactic and lexical clues were utilized in anaphoricity classification Using syntactic features (A2, 5, 7, 8, 10, 11) alone is less effective than using all features The author is not clearly revealed by structural clues alone. 12

13 Discuss & Conclusion The system solved the task by the novel approach focusing on coreference resolution. Most errors are related to the named entities hypothesis. Can’t find general nouns such as “31 smokers” A more complicated problem arises when opinions are expressed by both anaphoric and non- anaphoric holders. Apposition is also a source of problems. “Jackson” for “The singer, Jackson” is not acceptable, since “Jackson” is too ambiguous 13

14 Thank you! 14