807 - TEXT ANALYTICS Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining)

Slides:



Advertisements
Similar presentations
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Advertisements

LINGUISTICA GENERALE E COMPUTAZIONALE SENTIMENT ANALYSIS.
Sentiment Analysis and Opinion Mining
Text Categorization Moshe Koppel Lecture 8: Bottom-Up Sentiment Analysis Some slides adapted from Theresa Wilson and others.
Sentiment Analysis Bing Liu University Of Illinois at Chicago
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Opinion Analysis Sudeshna Sarkar IIT Kharagpur. Introduction – facts and opinions Two main types of information on the Web. Facts and Opinions Current.
A Brief Overview. Contents Introduction to NLP Sentiment Analysis Subjectivity versus Objectivity Determining Polarity Statistical & Linguistic Approaches.
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
CSE 538 Bing Liu Book Chapter 11: Opinion Mining and Sentiment Analysis.
Sentiment Analysis + MaxEnt* MAS.S60 Rob Speer Catherine Havasi * Lots of slides borrowed for lots of sources! See end.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Peiti Li 1, Shan Wu 2, Xiaoli Chen 1 1 Computer Science Dept. 2 Statistics Dept. Columbia University 116th Street and Broadway, New York, NY 10027, USA.
Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences Bing Liu Department of Computer Science University Of Illinois at Chicago
PNC 2011: Pacific Neighborhood Consortium S-Sense: An Opinion Mining Tool for Market Intelligence Choochart Haruechaiyasak and Alisa Kongthon Speech and.
Mining and Searching Opinions in User-Generated Contents Bing Liu Department of Computer Science University of Illinois at Chicago.
A Holistic Lexicon-Based Approach to Opinion Mining
Chapter 11: Opinion Mining
Chapter 11. Opinion Mining. Bing Liu, UIC ACL-07 2 Introduction – facts and opinions Two main types of information on the Web.  Facts and Opinions Current.
Mining and Summarizing Customer Reviews
Opinion Mining and Sentiment Analysis
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Opinion Mining : A Multifaceted Problem Lei Zhang University of Illinois at Chicago Some slides are based on Prof. Bing Liu’s presentation.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Designing Ranking Systems for Consumer Reviews: The Economic Impact of Customer Sentiment in Electronic Markets Anindya Ghose Panagiotis Ipeirotis Stern.
Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Chapter 11: Opinion Mining Bing Liu Department of Computer Science University of Illinois at Chicago
Chapter 11: Opinion Mining Bing Liu Department of Computer Science University of Illinois at Chicago
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
CSC 594 Topics in AI – Text Mining and Analytics
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
CSE 538 Bing Liu Book Chapter 11: Opinion Mining and Sentiment Analysis.
Opinion Observer: Analyzing and Comparing Opinions on the Web
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Sentiment and Opinion Sep13, 2012 Analysis of Social Media Seminar William Cohen.
Extracting Opinion Topics for Chinese Opinions using Dependence Grammar Guang Qiu, Kangmiao Liu, Jiajun Bu*, Chun Chen, Zhiming Kang Reporter: Chia-Ying.
2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Oct 10 th, 2014 Bo-Hyun Kim, Sr. Software Engineer With Lina Chen, Sr. Software.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Automated Sentiment Analysis from Blogs: Predicting the Change in Stock Magnitude Saleh Alshepani (BH115) Supervisor : Dr Najeeb Abbas Al-Sammarraie.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Sentiment Analysis 11/20/20161 CENG 770. What is subjectivity? The linguistic expression of somebody’s opinions, sentiments, emotions, evaluations, beliefs,
Sentiment analysis algorithms and applications: A survey
Sentiment Analysis Seminar Social Media Mining University UC3M
Memory Standardization
University of Computer Studies, Mandalay
Aspect-based sentiment analysis
An Overview of Concepts and Selected Techniques
Presentation transcript:

807 - TEXT ANALYTICS Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining)

FACTS AND OPINIONS Two main types of textual information on the Web: FACTS and OPINIONS Current search engines search for facts (assume they are true) – Facts can be expressed with topic keywords.

FACTS AND OPINIONS Two main types of textual information on the Web: FACTS and OPINIONS Current search engines search for facts (assume they are true) – Facts can be expressed with topic keywords. Search engines do not search for opinions – Opinions are hard to express with a few keywords How do people think of Motorola Cell phones? – Current search ranking strategy is not appropriate for opinion retrieval/search.

THERE IS PLENTY OF OPINIONS IN THE WEB

SENTIMENT ANALYSIS (also known as opinion mining) Attempts to identify the opinion/sentiment that a person may hold towards an object Sentiment Analysis Positive Negative Neutral

Components of an opinion Basic components of an opinion: – Opinion holder: The person or organization that holds a specific opinion on a particular object. – Object: on which an opinion is expressed – Opinion: a view, attitude, or appraisal on an object from an opinion holder.

SENTIMENT ANALYSIS GRANULARITY At the document (or review) level: – Task: sentiment classification of reviews – Classes: positive, negative, and neutral – Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder.

DOCUMENT-LEVEL SENTIMENT ANALYSIS EXAMPLE

SENTIMENT ANALYSIS GRANULARITY At the document (or review) level: – Task: sentiment classification of reviews – Classes: positive, negative, and neutral – Assumption: each document (or review) focuses on a single object (not true in many discussion posts) and contains opinion from a single opinion holder. At the sentence level: – Task 1: identifying subjective/opinionated sentences Classes: objective and subjective (opinionated) – Task 2: sentiment classification of sentences Classes: positive, negative and neutral. Assumption: a sentence contains only one opinion; not true in many cases. Then we can also consider clauses or phrases.

SENTENCE-LEVEL SENTIMENT ANALYSIS EXAMPLE Id: Abc123 on “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

SENTENCE-LEVEL SENTIMENT ANALYSIS Id: Abc123 on “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys. However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was too expensive, …”

SENTIMENT ANALYSIS GRANULARITY At the feature level: – Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). – Task 2: Determine whether the opinions on the features are positive, negative or neutral. – Task 3: Group feature synonyms. Produce a feature-based opinion summary of multiple reviews.

SENTIMENT ANALYSIS GRANULARITY At the feature level: – Task 1: Identify and extract object features that have been commented on by an opinion holder (e.g., a reviewer). – Task 2: Determine whether the opinions on the features are positive, negative or neutral. – Task 3: Group feature synonyms. Produce a feature-based opinion summary of multiple reviews. Opinion holders: identify holders is also useful, e.g., in news articles, etc, but they are usually known in the user generated content, i.e., authors of the posts.

FEATURE-LEVEL SENTIMENT ANALYSIS

ENTITY AND ASPECT (Hu and Liu, 2004; Liu, 2006)

OPINION TARGET

A DEFINITION OF OPINION (Liu, Ch. in NLP handbook, 2010)

SENTIMENT ANALYSIS: THE TASK

Applications Businesses and organizations: – product and service benchmarking. – market intelligence. – Business spends a huge amount of money to find consumer sentiments and opinions. Consultants, surveys and focused groups, etc Individuals: interested in other’s opinions when – purchasing a product or using a service, – finding opinions on political topics Ads placements: Placing ads in the user-generated content – Place an ad when one praises a product. – Place an ad from a competitor if one criticizes a product. Opinion retrieval/search: providing general search for opinions.

DOCUMENT-LEVEL SENTIMENT ANALYSIS

DOCUMENT-LEVEL SENTIMENT ANALYSIS = TEXT CLASSIFICATION

ASSUMPTIONS AND GOALS

LEXICON-BASED APPROACHES Use sentiment and subjectivity lexicons Rule-based classifier – A sentence is subjective if it has at least two words in the lexicon – A sentence is objective otherwise

SUPERVISED CLASSIFICATION Treat sentiment analysis as a type of classification Use corpora annotated for subjectivity and/or sentiment Train machine learning algorithms: – Naïve bayes – Decision trees – SVM – … Learn to automatically annotate new text

TYPICAL SUPERVISED APPROACH

FEATURES FOR SUPERVISED DOCUMENT- LEVEL SENTIMENT ANALYSIS A large set of features have been tried by researchers (see e.g., work here at Essex by Roseline Antai) – Terms frequency and different IR weighting schemes as in other work on classification – Part of speech (POS) tags – Opinion words and phrases – Negations – Syntactic dependency

EASIER AND HARDER PROBLEMS Tweets from Twitter are probably the easiest – short and thus usually straight to the point Reviews are next – entities are given (almost) and there is little noise Discussions, comments, and blogs are hard. – Multiple entities, comparisons, noisy, sarcasm, etc

ASPECT-BASED SENTIMENT ANALYSIS Sentiment classification at the document or sentence (or clause) levels are useful, but do not find what people liked and disliked. They do not identify the targets of opinions, i.e., ENTITIES and their ASPECTS Without knowing targets, opinions are of limited use.

ASPECT-BASED SENTIMENT ANALYSIS Much of the research is based on online reviews For reviews, aspect-based sentiment analysisis easier because the entity (i.e., product name) is usually known – Reviewers simply express positive and negative opinions on different aspects of the entity. For blogs, forum discussions, etc., it is harder: – both entity and aspects of entity are unknown – there may also be many comparisons – and there is also a lot of irrelevant information.

BRIEF DIGRESSION Regular opinions: Sentiment/opinion expressions on some target entities – Direct opinions: The touch screen is really cool – Indirect opinions: “After taking the drug, my pain has gone” COMPARATIVE opinions: Comparisons of more than one entity. – “iPhone is better than Blackberry”

Find entities (entity set expansion) Although similar, it is somewhat different from the traditional named entity recognition (NER). (See next lectures) E.g., one wants to study opinions on phones – given Motorola and Nokia, find all phone brands and models in a corpus, e.g., Samsung, Moto,

Feature/Aspect extraction May extract frequent nouns and noun phrases – Sometimes limited to a set known to be related to the entity of interest or using part discriminators – e.g., for a scanner entity “scanner”, “scanner has” opinion and target relations – Proximity or syntactic dependency Standard IE methods – Rule-based or supervised learning – Often HMMs or CRFs (like standard IE)

Aspect extraction using dependency grammar

RESOURCES FOR SENTIMENT ANALYSIS Lexicons General Inquirer (Stone et al., 1966) OpinionFinder lexicon (Wiebe & Riloff, 2005) SentiWordNet (Esuli & Sebastiani, 2006) Annotated corpora Used in statistical approaches (Hu & Liu 2004, Pang & Lee 2004) MPQA corpus (Wiebe et. al, 2005) Tools Algorithm based on minimum cuts (Pang & Lee, 2004) OpinionFinder (Wiebe et. al, 2005)

Lexical resources for Sentiment and Subjectivity Analysis Overview

Sentiment (or opinion) lexica

Sentiment lexica

Sentiment-bearing words ICWSM Adjectives Hatzivassiloglou & McKeown 1997, Wiebe 2000, Kamps & Marx 2002, Andreevskaia & Bergler 2006 – positive: honest important mature large patient Ron Paul is the only honest man in Washington. Kitchell’s writing is unbelievably mature and is only likely to get better. To humour me my patient father agrees yet again to my choice of film

Negative adjectives ICWSM Adjectives – negative: harmful hypocritical inefficient insecure It was a macabre and hypocritical circus. Why are they being so inefficient ? bjective: curious, peculiar, odd, likely, probably

Subjective adjectives ICWSM Adjectives – Subjective (but not positive or negative sentiment): curious, peculiar, odd, likely, probable He spoke of Sue as his probable successor. The two species are likely to flower at different times.

Other words ICWSM Other parts of speech Turney & Littman 2003, Riloff, Wiebe & Wilson 2003, Esuli & Sebastiani 2006 – Verbs positive: praise, love negative: blame, criticize subjective: predict – Nouns positive: pleasure, enjoyment negative: pain, criticism subjective: prediction, feeling

Phrases ICWSM Phrases containing adjectives and adverbs Turney 2002, Takamura, Inui & Okumura 2007 – positive: high intelligence, low cost – negative: little variation, many troubles

Sentiment lexica ICWSM Human-created – WordNet Affect Semi-automatic – SentiWordNet 3.0 Fully automatic – SenticNet 2.0

(Semi) Automatic creation of sentiment lexica ICWSM Find relevant words, phrases, patterns that can be used to express subjectivity Determine the polarity of subjective expressions

FINDING POLARITY IN CORPORA USING PATTERNS

USING PATTERNS ICWSM Lexico-syntactic patterns Riloff & Wiebe 2003 way with : … to ever let China use force to have its way with … expense of : at the expense of the world’s security and stability underlined : Jiang’s subdued tone … underlined his desire to avoid disputes …

DICTIONARY-BASED METHODS

SEMI-SUPERVISED LEARNING (Esuti and Sebastiani, 2005)

Corpora for Sentiment and Subjectivity Analysis Overview

MPQA ICWSM MPQA: (version 2) MPQA: English language versions of articles from the world press (187 news sources) Also includes contextual polarity annotations (later) Themes of the instructions: – No rules about how particular words should be annotated. – Don’t take expressions out of context and think about what they could mean, but judge them as they are used in that sentence.

Definitions and Annotation Scheme ICWSM Manual annotation: human markup of corpora (bodies of text) Why? – Understand the problem – Create gold standards (and training data) Wiebe, Wilson, Cardie LRE 2005 Wilson & Wiebe ACL-2005 workshop Somasundaran, Wiebe, Hoffmann, Litman ACL-2006 workshop Somasundaran, Ruppenhofer, Wiebe SIGdial 2007 Wilson 2008 PhD dissertation

Overview ICWSM Fine-grained: expression-level rather than sentence or document level Annotate – Subjective expressions – material attributed to a source, but presented objectively

OTHER CORPORA The Movie Review data created by Pang and Lee – The Semeval 2007 and 2014 (sentiment analysis in Twitter) shared tasks data – The Kaggle 2014 competition for Sentiment Analysis on movie reviews –

Gold Standards ICWSM Derived from manually annotated data Derived from “found” data (examples): – Blog tags Balog, Mishne, de Rijke EACL 2006 – Websites for reviews, complaints, political arguments amazon.com Pang and Lee ACL 2004 complaints.com Kim and Hovy ACL 2006 bitterlemons.com Lin and Hauptmann ACL 2006 Word lists (example): – General Inquirer Stone et al. 1996

TOOLS

OPINE

OPINION SUMMARIES

GOOGLE PRODUCTS

READINGS Bo Pang & Lillian Lee, 2008 – Opinion Mining and Sentiment Analysis – Foundations and Trends in Information Retrieval, v. 2, 1-2 – On the website

ACKNOWLEDGMENTS Some slides borrowed from – Janyce Wiebe’s tutorials – Bing Liu’s tutorials – Ronen Feldman’s IJCAI 2013 tutorial