Extraction of Opinions on the Web Richard Johansson Presentation at the LK summer school August 31, 2011 Computer Science and Engineering Department University.

Slides:



Advertisements
Similar presentations
GermanPolarityClues A Lexical Resource for German Sentiment Analysis
Advertisements

Farag Saad i-KNOW 2014 Graz- Austria,
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
S ENTIMENTAL A NALYSIS O F B LOGS B Y C OMBINING L EXICAL K NOWLEDGE W ITH T EXT C LASSIFICATION. 1 By Prem Melville, Wojciech Gryc, Richard D. Lawrence.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Information Retrieval in Practice
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
A Random Walk on the Red Carpet: Rating Movies with User Reviews and PageRank Derry Tanti Wijaya Stéphane Bressan.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Cornell University Department of Computer Science.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Language Independent Method for Question Classification COLING 2004.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
Recognizing Stances in Ideological Online Debates.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
Evaluating an Opinion Annotation Scheme Using a New Multi- perspective Question and Answer Corpus (AAAI 2004 Spring) Veselin Stoyanov Claire Cardie Diane.
Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.
Opinion Observer: Analyzing and Comparing Opinions on the Web
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Sentiment analysis algorithms and applications: A survey
Improving a Pipeline Architecture for Shallow Discourse Parsing
Aspect-based sentiment analysis
An Overview of Concepts and Selected Techniques
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Presentation transcript:

Extraction of Opinions on the Web Richard Johansson Presentation at the LK summer school August 31, 2011 Computer Science and Engineering Department University of Trento Funded by EU FP7: LivingKnowledge and EternalS

Personal Background Defended doctoral dissertation in December 2008 at Lund University, Sweden I now work as a postdoctoral researcher at the University of Trento, Italy PhD work focused on NLP tasks such as syntactic parsing and shallow-semantic extraction Postdoc work on the applications of these methods in areas such as opinion extraction

Overview Introduction Coarse-grained methods Fine-grained methods Resources Advanced topics: recent research from LK

Introduction Extraction of opinions expressed on the web is a task with many practical applications “give me all positive opinions expressed by Sarkozy last week” “what is the overall perception (positive/negative) on the New Start treaty?” “Vaclav Klaus expressed his [disapproval] of the treaty while French Prime Minister Sarkozy [supported] it.”

Direct applications Consumer information Quickly surveying evaluations from other consumers Conversely, companies may survey what customers think Social and political sciences Surveying popular opinion on contentious issues Track the development of opinion over time Measure the effect of some event on opinions

Indirect applications Retrieval systems given a topic, identify documents that express attitudes toward this topic Question-answering systems Obvious: What does X think about Y? Also: Filtering out opinionated text before returning answers

A note on terminology Opinion extraction/analysis/mining etc Sentiment analysis/extraction Subjectivity analysis/extraction Etc etc etc

Coarse-grained Opinion Extraction Classification of fairly large units of text (e.g. documents) Examples: Distinguish editorials from “objective” news text Given a review (product, movie, restaurant, …), predict the number of stars

Lexicon-based Methods Simplest solution: count “positive” and “negative” words listed in some lexicon Also weighted Lexicons may be generic or domain-specific Example (with SentiWordNet, first sense): “This movie is awful with really boring actors” awful: negative really: positive boring; 0.25 negative

Classification using machine learning Coarse-grained opinion extraction is a type of text categorization Categorize the text As factual or opinionated As positive or negative (or the number of stars) We may then obviously apply classical text categorization methods (Pang and Lee, 2002)

Classification using machine learning Represent a document using a bag of words representation (i.e. a histogram) Optionally, add extra features for words that appear in some lexicon Apply some machine learning method to learn to separate the documents into classes (e.g. SVM, MaxEnt, Naïve Bayes, …)

But the context… “The price is high – I saw many cheaper options elsewhere” In practice, expressions of opinion are highly context-sensitive: Unigram (BOW or lexicon) models may run into difficulties Possible solutions: Bigrams, trigrams, … Syntax-based representations Very large feature spaces: feature selection needed

Domain Adaptation Problem: an opinion classifier trained on one collection (e.g. reviews of hotels) may not perform well on a collection from a different domain (e.g. reviews of cars) domain adaptation We may apply domain adaptation methods (Blitzer et al., 2007, inter alia) Similar methods may be applied for lexicon- based opinion classifiers (Jijkoun et al., 2010)

Structural Correspondence Learning (Blitzer et al., 2007) Idea: Some pivot features generalize across domains (e.g. “good”, “awful”) Some features are completely domain-specific (“plastic”, “noisy”, “dark”) Find correlations between pivot and domain-specific Example experiment: DVD movies -> kitchen appliances Baseline 0.74, upper bound 0.88 With domain adaptation: 0.81

Fine-grained Opinion Extraction We may want to pose more complex queries: “give me all positive opinions expressed by Sarkozy last week” “what is the overall perception (positive/negative) on the New Start treaty?” “what is good and what is bad about the new Canon camera?” “Vaclav Klaus expressed his [disapproval] of the treaty while French Prime Minister Sarkozy [supported] it.”

Common subtasks Mark up opinion expressions in the text Label expressions with polarity values Find opinion holders for the opinions Find the topics (targets) of the opinions

Opinion Expressions An opinion expression is a piece of text that allows us to conclude that some entity has some opinion – a private state The MPQA corpus (Wiebe et al., 2005) defines two main types of expressions: Direct-subjective: typically emotion, communication, and categorization verbs Expressive subjective: typically qualitative adjectives and “loaded language”

Examples of opinion expressions I [love] DSE this [fantastic] ESE conference. [However] ESE, it is becoming [rather fashionable] ESE to [exchange harsh words] DSE with each other [like kids] ESE. The software is [not so easy] ESE to use.

Opinion Holders For every opinion expression, there is an associated opinion holder. Also annotated in the MPQA Our system finds three types of holders: Explicitly mentioned holders in the same sentence The writer of the text Implicit holder, such as in passive sentences (“he was widely condemned”)

Examples of opinion holders Explicitly mentioned holder: I [love] DSE this [fantastic] ESE conference. Writer (red) and implicit (green): [However] ESE, it is becoming [rather fashionable] ESE to [exchange harsh words] DSE with each other [like kids] ESE.

Nested structure of opinion scopes Sharon [insinuated] ESE+DSE that Arafat [hated] DSE Israel. Writer: negative opinion on Sharon Sharon: negative opinion on Arafat Arafat: negative opinion on Israel The MPQA corpus annotates the nested structure of opinion/holder scopes Our system does not take the nesting into account

Opinion polarities Every opinion expression has a polarity: positive, negative, or neutral (for non-evaluative opinions) I [love] this [fantastic] conference. [However], it is becoming [rather fashionable] to [exchange harsh words] with each other [like kids]. The software is [not so easy] to use.

Tagging Opinion Expressions The obvious approach – which we used as a baseline – would be a standard sequence labeler with Viterbi decoding. Sequence labeler using word, POS tag, and lemma features in a sliding window Can also use prior polarity/intensity features derived from the MPQA subjectivity lexicon. This was the approach by Breck et al. (2007)

Example

Extracting Opinion Holders For opinion holder extraction, we trained a classifier based on techniques common in semantic role labeling Applies to the noun phrases in a sentence A separate classifier detects implicit and writer opinion holders At prediction time, the opinion holder candidate with the maximal score is selected

Syntactic structure and semantic roles We used the LTH syntactic/semantic parser to extract features (Johansson and Nugues, 2008) Outputs dependency parse trees and semantic role structures

Classifying Expression Polarity Given an opinion expression, assign a polarity label (Positive, Neutral, Negative) SVM classifier with BOW representation of the expression and its context, lexicon features

Resources: Collections Pang: Movie reviews (pos/neg) Liu: Product features Dredze: Multi-domain product reviews (pos/neg) MPQA: Fine-grained annotation: expressions, holder, polarities, intensities, holder coreference

Resources: Lexicons MPQA lexicon SentiWordNet

Advanced topic 1: Opinion extraction with an interaction model Previous work used bracketing methods with local features and Viterbi decoding In a sequence labeler using local features only, the model can’t take into account the interactions between opinion expressions Opinions tend to be structurally close in the sentence, and occur in patterns, for instance Verb of categorization dominating evaluation: He denounced as a human rights violation … Discourse connections: Zürich is beautiful but its restaurants are expensive

Interaction (opinion holders) For verbs of evaluation/categorization, opinion holder extraction is fairly easy (basically SRL) They may help us find the holder of other opinions expressed in the sentence: He denounced as a human rights violation … This is a human rights violation … Linguistic structure may be useful to determine whether two opinions have the same holder

Interaction (polarity) The relation between opinion expressions may influence polarity: He denounced as a human rights violation … Discourse relations are also important: Expansion: Zürich is beautiful and its restaurants are good Contrast: Zürich is beautiful but its restaurants are expensive

Learning the Interaction model We need a new model based on interactions between opinions We use a standard linear model: We decompose the feature representation: But: Exact inference in a model with interactions is intractable (can be reduced to weighted CSP)

Approximate inference Apply a standard Viterbi-based sequence labeler based on local context features but no structural interaction features. Generate a small candidate set of size k. Generate opinion holders/polarities for every proposed opinion expression. Apply a reranker using interaction features – which can be arbitrarily complex – to pick the top candidate from the candidate set.

Evaluation (Johansson and Moschitti 2010a, 2010b, 2011) Opinion markupF-measure Baseline53.8 Reranked58.5 Holder identificationF-measure Baseline50.8 Extended54.2 Markup + polarityF-measure Baseline45.7 Extended49.7

Advanced topic 2: Extraction of Feature Evaluations Extraction of evaluations of product features (Hu and Liu, 2004) “This player boasts a decent size and weight, a relatively-intuitive navigational system that categorizes based on id3 tags, and excellent sound” size +2, weight +2, navigational system +2, sound +2 We used only the signs (positive/negative)

Extraction of Feature Evaluations We built a system that used features derived from the MPQA-style opinion expressions We compared with two baselines: Simple baseline using local features only Stronger baseline using sentiment lexicon

Extraction of Feature Evaluations

References E. Breck, Y. Choi, C. Cardie. Identifying expressions of opinion in context. Proc. IJCAI J. Blitzer, M. Dredze, F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain adaptation for sentiment classification. Proc. ACL Y. Choi, C. Cardie. Hierarchical sequential learning for extracting opinions and their attributes. Proc. ACL M. Hu, B. Liu. Mining opinion features in customer reviews. Proc. AAAI V. Jijkoun, M. de Rijke, W. Weerkamp. Generating focused topic-specific sentiment lexicons. Proc. ACL R. Johansson, A. Moschitti. Syntactic and semantic structure for opinion expression detection. Proc. CoNLL R. Johansson, A. Moschitti. Reranking models in fine-grained opinion analysis. Proc. Coling-2010.

References R. Johansson, A. Moschitti. Extracting opinion expressions and their polarities – exploration of pipelines and joint models. Proc. ACL R. Johansson, P. Nugues. Dependency-based syntactic–semantic analysis with PropBank and NomBank. Proc. CoNLL B. Pang, L. Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proc. ACL S. Somasundaran, G. Namata, J. Wiebe, L. Getoor. Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification. Proc. EMNLP J. Wiebe, T. Wilson, C. Cardie. Annotating expressions of opinions and emotions in language. LRE, 39(2-3), 2005.

Acknowledgements We have received funding from the European Community’s Seventh Framework Programme (FP7/ ) under the following grants: Grant : LivingKnowledge – Facts, Opinions and Bias in Time, Grant :Trustworthy Eternal Systems via Evolving Software, Data and Knowledge (EternalS). We would also like to thank Eric Breck and Yejin Choi for explaining their results and experimental setup.