Mining and Searching Opinions in User-Generated Contents Bing Liu Department of Computer Science University of Illinois at Chicago.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Trends in Sentiments of Yelp Reviews Namank Shah CS 591.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Chapter 5: Introduction to Information Retrieval
Farag Saad i-KNOW 2014 Graz- Austria,
Distant Supervision for Emotion Classification in Twitter posts 1/17.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Sentiment Analysis Bing Liu University Of Illinois at Chicago
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Product Feature Discovery and Ranking for Sentiment Analysis from Online Reviews. __________________________________________________________________________________________________.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Search Engines and Information Retrieval
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
CS 5831 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Chapter 5: Information Retrieval and Web Search
CS583 – Data Mining and Text Mining Course Web Page 07/cs583.html.
A Holistic Lexicon-Based Approach to Opinion Mining
Chapter 11. Opinion Mining. Bing Liu, UIC ACL-07 2 Introduction – facts and opinions Two main types of information on the Web.  Facts and Opinions Current.
Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Sudeshna Sarkar IIT Kharagpur
Search Engines and Information Retrieval Chapter 1.
Opinion Mining : A Multifaceted Problem Lei Zhang University of Illinois at Chicago Some slides are based on Prof. Bing Liu’s presentation.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Identifying Comparative Sentences in Text Documents
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
WSDM’08 Xiaowen Ding 、 Bing Liu 、 Philip S. Yu Department of Computer Science University of Illinois at Chicago Conference on Web Search and Data Mining.
CS 5831 CS583 – Data Mining and Text Mining Course Web Page 06/cs583.html.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Chapter 6: Information Retrieval and Web Search
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
McCormick Northwestern Engineering 1 Electrical Engineering & Computer Science Mining Millions of Reviews: A Technique to Rank Products Based on Importance.
Opinion Observer: Analyzing and Comparing Opinions on the Web
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Opinion Observer: Analyzing and Comparing Opinions on the Web WWW 2005, May 10-14, 2005, Chiba, Japan. Bing Liu, Minqing Hu, Junsheng Cheng.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Summary Report 3  Opinion: user generated media ◦ Opinion on the Web: ◦ contains personal experiences and opinions (sentiments) on almost anything,
Research Progress Kieu Que Anh School of Knowledge, JAIST.
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Data mining (KDD) process
University of Computer Studies, Mandalay
Aspect-based sentiment analysis
CS583 – Data Mining and Text Mining
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Presentation transcript:

Mining and Searching Opinions in User-Generated Contents Bing Liu Department of Computer Science University of Illinois at Chicago

Bing UIC 2 Introduction User-generated content on the Web: reviews, forums and group discussions, blogs, questions and answers, etc. Our interest: opinions in user-generated content  The Web has dramatically changed the way that people express their views and opinions.  One can express opinions on almost anything at review sites, forums, discussion groups, blogs.  An intellectually challenging problem.

Bing UIC 3 Motivations: Opinion search Businesses and organizations: marketing intelligence, product and service benchmarking and improvement.  Business spends a huge amount of money to find consumer sentiments and opinions. Consultants Surveys and focused groups, etc Individuals: interested in other’s opinions on products, services, topics, events, etc.

Bing UIC 4 Search opinions We use the product reviews as an example: Searching for opinions in product reviews is different from general Web search.  E.g., search for consumer opinions on a digital camera General Web search: rank pages according to some authority and relevance scores.  The user looks at the first page (if the search is perfect). Review search: rank is still needed, however  Reading only the review ranked at the top is dangerous because it is only opinion of one person.

Bing UIC 5 Search opinions (contd) Ranking:  produce two rankings Positive opinions and negative opinions Some kind of summary of both, e.g., # of each  Or, one ranking but The top (say 30) reviews should reflect the natural distribution of all reviews (assume that there is no spam), i.e., with the right balance of positive and negative reviews. Questions:  Should the user reads all the top reviews?  Or should the system prepare a summary of the reviews?

Bing UIC 6 Reviews are like surveys Reviews are like traditional surveys.  In traditional survey, returned survey forms are treated as raw data.  Analysis is performed to summarize the survey results. E.g., % against or for a particular issue, etc. In review search,  Can a summary be provided?  What should the summary be?

Bing UIC 7 Two types of evaluations Direct Opinions: sentiment expressions on some objects/entities, e.g., products, events, topics, individuals, organizations, etc  E.g., “the picture quality of this camera is great”  Subjective Comparisons: relations expressing similarities, differences, or ordering of more than one objects.  E.g., “car x is cheaper than car y.”  Objective or subjective

Bing UIC 8 Roadmap Sentiment classification Feature-based opinion extraction and summarization  Problems  Some existing techniques Comparative sentence and relation extraction  Problems  Some existing techniques

Bing UIC 9 Sentiment classification Classify documents (e.g., reviews) based on the overall sentiments expressed by authors,  Positive, negative and (possibly) neutral Similar but also different from topic-based text classification.  In topic-based classification, topic words are important.  In sentiment classification, sentiment words are more important, e.g., great, excellent, horrible, bad, worst, etc.

Bing UIC 10 Can we go further? Sentiment classification is useful, but it does not find what the reviewer liked and disliked. An negative sentiment on an object does not mean that the reviewer does not like anything about the object. A positive sentiment on an object does not mean that the reviewer likes everything. Go to the sentence level and feature level.

Bing UIC 11 Roadmap Sentiment classification Feature-based opinion extraction and summarization  Problems  Some existing techniques Comparative sentence and relation extraction  Problems  Some existing techniques.

Bing UIC 12 Feature-based opinion mining and summarization (Hu and Liu 2004, Liu et al 2005) Interesting in what reviewers liked and disliked,  features and components Since the number of reviews for an object can be large, we want to produce a simple summary of opinions. The summary can be easily visualized and compared.

Bing UIC 13 Three main tasks Task 1: Identifying and extracting object features that have been commented on in each review. Task 2: Determining whether the opinions on the features are positive, negative or neutral. Task 3: Grouping synonyms of features. Produce a feature-based opinion summary, which is simple after the above three tasks are performed.

Bing UIC 14 Example 1: Format 1

Bing UIC 15 Example 2: Format 2

Bing UIC 16 Example 3: Format 3 (with summary) GREAT Camera., Jun 3, 2004 Reviewer: jprice174 from Atlanta, Ga. I did a lot of research last year before I bought this camera... It kinda hurt to leave behind my beloved nikon 35mm SLR, but I was going to Italy, and I needed something smaller, and digital. The pictures coming out of this camera are amazing. The 'auto' feature takes great pictures most of the time. And with digital, you're not wasting film if the picture doesn't come out. … …. Feature Based Summary : Feature1: picture Positive: 12 The pictures coming out of this camera are amazing. Overall this is a good camera with a really good picture clarity. … Negative: 2 The pictures come out hazy if your hands shake even for a moment during the entire process of taking a picture. Focusing on a display rack about 20 feet away in a brightly lit room during day time, pictures produced by this camera were blurry and in a shade of orange. Feature2: battery life …

Bing UIC 17 Visual Summarization & Comparison Summary of reviews of Digital camera 1 PictureBatterySizeWeightZoom + _ Comparison of reviews of Digital camera 1 Digital camera 2 _ +

Bing UIC 18 Roadmap Sentiment classification Feature-based opinion extraction  Problems  Some existing techniques Comparative sentence and relation extraction  Problems  Some existing techniques.

Bing UIC 19 Extraction of features Reviews of these formats are usually complete sentences e.g., “the pictures are very clear.”  Explicit feature: picture “It is small enough to fit easily in a coat pocket or purse.”  Implicit feature: size Extraction: Frequency based approach  Frequent features (main features)  Infrequent features

Bing UIC 20 Identify opinion orientation of features Using sentiment words and phrases  Identify words that are often used to express positive or negative sentiments  There are many ways. Use dominate orientation of opinion words as the sentence orientation, e.g.,  Sum: a negative word is near the feature, -1, a positive word is near a feature, +1 Text machine learning methods can be employed too.

Bing UIC 21 Roadmap Sentiment classification Feature-based opinion extraction  Problems  Some existing techniques Comparative sentence and relation extraction  Problems  Some existing techniques.

Bing UIC 22 Extraction of Comparatives (Jinal and Liu 2006a, 2006b, Liu’s Web mining book 2006) Two types of evaluation  Direct opinions: “I don’t like this car”  Comparisons: “Car X is not as good as car Y” They use different language constructs. Comparative Sentence Mining  Identify comparative sentences, and  extract comparative relations from them.

Bing UIC 23 Linguistic Perspective Comparative sentences use morphemes like  more/most, -er/-est, less/least and as. than and as are used to make a ‘standard’ against which an entity is compared. Limitations Limited coverage  Ex: “In market capital, Intel is way ahead of Amd” Non-comparatives with comparative words  Ex1: “In the context of speed, faster means better”  Ex2: “More men than James like scotch on the rocks” (meaningless comparison) For human consumption; no computational methods

Bing UIC 24 Comparative sentences An Object (or entity) is the name of a person, a product brand, a company, a location, etc, under comparison in a comparative sentence. A feature is a part or property (attribute) of the object/entity that is being compared. Definition: A comparative sentence expresses a relation based on similarities, or differences of more than one objects/entities.  It usually orders the objects involved.

Bing UIC 25 Types of Comparatives: Gradable Gradable  Non-Equal Gradable: Relations of the type greater or less than Keywords like better, ahead, beats, etc Ex: “optics of camera A is better than that of camera B”  Equative: Relations of the type equal to Keywords and phrases like equal to, same as, both, all Ex: “camera A and camera B both come in 7MP”  Superlative: Relations of the type greater or less than all others Keywords and phrases like best, most, better than all Ex: “camera A is the cheapest camera available in market”

Bing UIC 26 Types of comparatives: non-gradable Non-Gradable: Sentences that compare features of two or more objects, but do not grade them. Sentences which imply:  Object A is similar to or different from Object B with regard to some features.  Object A has feature F 1, Object B has feature F 2 (F 1 and F 2 are usually substitutable).  Object A has feature F, but object B does not have.

Bing UIC 27 Comparative Relation: gradable Definition: A gradable comparative relation captures the essence of a gradable comparative sentence and is represented with the following: (relationWord, features, entityS1, entityS2, type)  relationWord: The keyword used to express a comparative relation in a sentence.  features: a set of features being compared.  entityS1 and entityS2: Sets of entities being compared. Entities in entityS1 appear to the left of the relation word and entities in entityS2 appear to the right of the relation word.  type: non-equal gradable, equative or superlative.

Bing UIC 28 Ex1: “car X has better controls than car Y” (relationWord = better, features = controls, entityS1 = car X, entityS2 = car Y, type = non-equal-gradable) Ex2: “car X and car Y have equal mileage” (relationWord = equal, features = mileage, entityS1 = car X, entityS2 = car Y, type = equative) Ex3: “Car X is cheaper than both car Y and car Z” (relationWord = cheaper, features = null, entityS1 = car X, entityS2 = {car Y, car Z}, type = non-equal-gradable ) Ex4: “company X produces variety of cars, but still best cars come from company Y” (relationWord = best, features = cars, entityS1 = company Y, entityS2 = null, type = superlative) Examples: Comparative relations

Bing UIC 29 Tasks Given a collection of evaluative texts Task 1: Identify comparative sentences. Task 2: Categorize different types of comparative sentences. Task 2: Extract comparative relations from the sentences. Focus on gradable comparatives in this talk.

Bing UIC 30 Roadmap Sentiment classification Feature-based opinion extraction  Problems  Some existing techniques Comparative sentence and relation extraction  Problems  Some existing techniques.

Bing UIC 31 Identify comparative sentences (Jinal and Liu, SIGIR-06) Keyword strategy An observation: It is easy to find a small set of keywords that covers almost all comparative sentences, i.e., with a very high recall and a reasonable precision We have compiled a list of 83 keywords used in comparative sentences, which includes:  Words with POS tags of JJR, JJS, RBR, RBS POS tags are used as keyword instead of individual words. Exceptions: more, less, most and least  Other indicative words like beat, exceed, ahead, etc  Phrases like in the lead, on par with, etc

Bing UIC 32 2-step learning strategy Step1: Extract sentences which contain at least a keyword (recall = 98%, precision = 32% on our data set for gradables) Step2: Use the naïve Bayes (NB) classifier to classify sentences into two classes  comparative and  non-comparative sentences.  using class sequential rules (CSRs) generated from sentences in step1 as attributes, e.g.,  {1}{3}{7, 8}   class i [sup = 2/5, conf = 3/4]

Bing UIC 33 Classify comparative sentences into three types: non-equal gradable, equative, and superlative  SVM learner gave the best result.  Attribute set is the set of keywords.  If the sentence has a particular keyword in the attribute set, the corresponding value is 1, and 0 otherwise. Classify different types of comparatives

Bing UIC 34 Extraction of comparative relations (Jindal and Liu, AAAI-06; Liu’s Web mining book 2006) Assumptions There is only one relation in a sentence. Entities and features are nouns (includes nouns, plural nouns and proper nouns) and pronouns. 3 steps Sequence data generation Label sequential rule (LSR) generation Build a sequential cover/extractor from LSRs

Bing UIC 35 Experimental results Identifying Gradable Comparative Sentences  NB using CSRs and manual rules as attribute precision = 82% and recall = 81%.  NB using CSRs alone: precision = 76% and recall = 74%.  SVM: precision = 71% and recall = 69% Classification into three different gradable types  SVM gave accuracy of 96%  NB gave accuracy of 87%

Bing UIC 36 Extraction of comparative relations LSR gave F-score = 72% CRF gave F-score = 58% LSR extracted  32% of complete relations  32% relations where one item was not extracted Extracting relation words:  Non-Equal Gradable Precision = 97%. Recall = 88%  Equative: Precision = 93%. Recall = 91%  Superlative: Precision = 96%. Recall = 89%

Bing UIC 37 LSR vs. CRF on relation item extraction

Bing UIC 38 Conclusion Two types of evaluations are discussed Direct opinions: A lot of interesting work to do: Accuracy is the key: 1. Feature extraction 2. Opinion orientations on features Comparison extraction: a lot of work to do too, 1. identify comparative sentences 2. Group them into different types 3. Extraction of relations A lot of interesting research to be done. Industrial applications are coming…  General search engines  Specific domains or industries