Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sentiment Analysis and Opinion Mining

Similar presentations

Presentation on theme: "Sentiment Analysis and Opinion Mining"— Presentation transcript:

1 Sentiment Analysis and Opinion Mining
Akshat Bakliwal Search and Information Extraction Lab (SIEL)

2 “What other people think ?”
What others think has always been an important piece of information Before making any decision, we look for suggestions and opinions from others. A big question “So whom shall I ask ?”.

3 Evolution History Present Friends Acquaintances Consumer Reports
Unknowns No Limitations ! Across Globe

4 Is moving to web a Solution ?
Partly Yes ! New problems How and Where to look for reviews or opinions ? Will Normal web search help ? Overwhelming amount of Information For some products – millions of reviews. Difficult to read all. For some less popular products – hardly a few reviews.

5 More Problems !! Biased views Fake Reviews Spam Reviews
Contradicting Reviews

6 Solution ! – Subjectivity Analysis
General Text can be divided into two segments Objective – which don’t carry any opinion or sentiment. Facts (news, encyclopedias, etc) Subjective Subjectivity Analysis Linguistic expressions of somebody’s opinions, sentiments, emotions .. that is not open to verification.

7 Flavors of Subjectivity Analysis
Synonyms and Used Interchangeably !! Sentiment Analysis Opinion Mining Mood Classification Emotion Analysis

8 like/dislike or good/bad, etc.
What is Sentiment? Subjective impressions Generally, Sentiment == Feelings Opinions Emotions Attitude like/dislike or good/bad, etc.

9 What is Sentiment Analysis?
Sentiment Analysis is a study of human behavior in which we extract user opinion and emotion from plain text. Identifying the orientation of opinions in a piece of text. This movie was fabulous. [Sentiment]  This movie stars Mr. X.     [Factual] This movie was boring.     [Sentiment] 

10 Motivation Enormous amount of information. Real time update
Monetary benefits

11 Applications ! Helpful for Business Intelligence (BI).
Aide in decision making. Geo-Spatial reaction modeling of Events. Ads Placements

12 Does Web really contain Sentiments ?
Yes, Where ? Blogs Reviews User Comments Discussion Forums Social Network (Twitter, Facebook, etc.)

13 Challenges Negation Handling Un-Structured Data, Slangs, Abbreviations
I don’t like Apple products. This is not a good read. Un-Structured Data, Slangs, Abbreviations Lol, rofl, omg! ….. Gr8, IMHO, … Noise Smiley Special Symbols ( ! , ? , …. )

14 Challenges Ambiguous words Sarcasm detection and handling
This music cd is literal waste of time. (negative) Please throw your waste material here. (neutral) Sarcasm detection and handling “All the features you want - too bad they don’t work. :-P” (Almost) No resources and tools for low/scarce resource languages like Indian languages.

15 Basics .. Basic components Opinion Holder – Who is talking ?
Object – Item on which opinion is expressed. Opinion – Attitude or view of the opinion holder. This is a good book. Opinion Holder Opinion Object

16 Types of Opinions Direct Comparison “This is a great book.”
“Mobile with awesome functions.” Comparison “Samsung Galaxy S3 is better than Apple iPhone 4S.” “Hyundai Eon is not as good as Maruti Alto ! .”

17 What is Sentiment Classification
Classify given text on the overall sentiments expresses by the author Different levels Document Sentence Feature Classification levels Binary Multi Class

18 Document Level Sentiment Classification
Documents can be reviews, blog posts, .. Assumption: Each document focuses on single object. Only single opinion holder. Task : determine the overall sentiment orientation of the document.

19 Sentence Level Sentiment Classification
Considers each sentence as a separate unit. Assumption : sentence contain only one opinion. Task 1: identify if sentence is subjective or objective Task 2: identify polarity of sentence.

20 Feature Level Sentiment Classification
Task 1: identify and extract object features Task 2: determine polarity of opinions on features Task 3: group same features Task 4: summarization Ex. This mobile has good camera but poor battery life.

21 Approaches Prior Learning Subjective Lexicon
(Un)Supervised Machine Learning

22 Approach 1: Prior Learning
Utilize available pre-annotated data Amazon Product Review (star rated) Twitter Dataset(s) IMDb movie reviews (star rated) Learn keywords, N-Gram with polarity

23 1.1 Keywords Selection from Text
Pang et. al. (2002) Two human’s hired to pick keywords Binary Classification of Keywords Positive Negative Unigram method reached 80% accuracy.

24 1.2 N-Gram based classification
Learn N-Grams (frequencies) from pre-annotated training data. Use this model to classify new incoming sample. Classification can be done using Counting method Scoring function(s)

25 1.3 Part-of-Speech based patterns
Extract POS patterns from training data. Usually used for subjective vs objective classification. Adjectives and Adverbs contain sentiments Example patterns *-JJ-NN : trigram pattern JJ-NNP : bigram pattern *-JJ : bigram pattern

26 Approach 2: Subjective Lexicon
Heuristic or Hand Made Can be General or Domain Specific Difficult to Create Sample Lexicons General Inquirer (1966) Dictionary of Affective Language SentiWordNet (2006)

27 2.1 General Inquirer Positive and Negative connotations.
List of words manually created. 1915 Positive Words 2291 Negative Words

28 2.2 Dictionary of Affective Language
9000 Words with Part-of-speech information Each word has a valance score range 1 – 3. 1 for Negative 3 for Positive App

29 2.3 SentiWordNet Approx 1.7 Million words
Using WordNet and Ternary Classifier. Classifier is based on Bag-of-Synset model. Each synset is assigned three scores Positive Negative Objective

30 Example :Scores from SentiWordNet
Very comfortable, but straps go loose quickly. comfortable Positive: 0.75 Objective: 0.25 Negative: 0.0 loose Positive: 0.0 Objective: 0.375 Negative: 0.625 Overall - Positive Objective: 0.625

31 Advantages and Disadvantages
Fast No Training data necessary Good initial accuracy Disadvantages Does not deal with multiple word senses Does not work for multiple word phrases

32 Approach 3: Machine Learning
Sensitive to sparse and insufficient data. Supervised methods require annotated data. Training data is used to create a hyper plane between the two classes. New instances are classified by finding their position on hyper plane.

33 Machine Learning SVMs are widely used ML Technique for creating feature-vector-based classifiers. Commonly used features N-Grams or Keywords Presence : Binary Count : Real Numbers Special Symbols like !, #, etc. Smiley

34 Some unanswered Questions !
Sarcasm Handling Word Sense Disambiguation Pre-processing and cleaning Multi-class classification

35 Datasets Movie Review Dataset Product Review Dataset
Bo Pang and Lillian Lee Product Review Dataset Blitzer et. al. product reviews 25 product domains

36 Datasets MPQA Corpus Twitter Dataset
Multi Perspective Question Answering News Article, other text documents Manually annotated 692 documents Twitter Dataset 1.6 million annotated tweets Bi-Polar classification

37 Reading Opinion Mining and Sentiment Analysis
Bo Pang and Lillian Lee (2008) Book: Sentiment Analysis and Opinion Mining Bing Liu (2012)

38 Thank You

Download ppt "Sentiment Analysis and Opinion Mining"

Similar presentations

Ads by Google