Presentation on theme: "Sentiment Analysis and Opinion Mining"— Presentation transcript:
1Sentiment Analysis and Opinion Mining Akshat BakliwalSearch and Information Extraction Lab (SIEL)
2“What other people think ?” What others think has always been an important piece of informationBefore making any decision, we look for suggestions and opinions from others.A big question “So whom shall I ask ?”.
4Is moving to web a Solution ? Partly Yes !New problemsHow and Where to look for reviews or opinions ?Will Normal web search help ?Overwhelming amount of InformationFor some products – millions of reviews. Difficult to read all.For some less popular products – hardly a few reviews.
6Solution ! – Subjectivity Analysis General Text can be divided into two segmentsObjective – which don’t carry any opinion or sentiment.Facts (news, encyclopedias, etc)SubjectiveSubjectivity AnalysisLinguistic expressions of somebody’s opinions, sentiments, emotions .. that is not open to verification.
7Flavors of Subjectivity Analysis Synonyms andUsed Interchangeably !!Sentiment AnalysisOpinionMiningMood ClassificationEmotionAnalysis
8like/dislike or good/bad, etc. What is Sentiment?Subjective impressionsGenerally, Sentiment ==FeelingsOpinionsEmotionsAttitudelike/dislike or good/bad, etc.
9What is Sentiment Analysis? Sentiment Analysis is a study of human behavior in which we extract user opinion and emotion from plain text.Identifying the orientation of opinions in a piece of text.This movie was fabulous. [Sentiment] This movie stars Mr. X. [Factual]This movie was boring. [Sentiment]
10Motivation Enormous amount of information. Real time update Monetary benefits
11Applications ! Helpful for Business Intelligence (BI). Aide in decision making.Geo-Spatial reaction modeling of Events.Ads Placements
12Does Web really contain Sentiments ? Yes, Where ?BlogsReviewsUser CommentsDiscussion ForumsSocial Network (Twitter, Facebook, etc.)
13Challenges Negation Handling Un-Structured Data, Slangs, Abbreviations I don’t like Apple products.This is not a good read.Un-Structured Data, Slangs, AbbreviationsLol, rofl, omg! …..Gr8, IMHO, …NoiseSmileySpecial Symbols ( ! , ? , …. )
14Challenges Ambiguous words Sarcasm detection and handling This music cd is literal waste of time. (negative)Please throw your waste material here. (neutral)Sarcasm detection and handling“All the features you want - too bad they don’t work. :-P”(Almost) No resources and tools for low/scarce resource languages like Indian languages.
15Basics .. Basic components Opinion Holder – Who is talking ? Object – Item on which opinion is expressed.Opinion – Attitude or view of the opinion holder.This is a good book.Opinion HolderOpinionObject
16Types of Opinions Direct Comparison “This is a great book.” “Mobile with awesome functions.”Comparison“Samsung Galaxy S3 is better than Apple iPhone 4S.”“Hyundai Eon is not as good as Maruti Alto ! .”
17What is Sentiment Classification Classify given text on the overall sentiments expresses by the authorDifferent levelsDocumentSentenceFeatureClassification levelsBinaryMulti Class
18Document Level Sentiment Classification Documents can be reviews, blog posts, ..Assumption:Each document focuses on single object.Only single opinion holder.Task : determine the overall sentiment orientation of the document.
19Sentence Level Sentiment Classification Considers each sentence as a separate unit.Assumption : sentence contain only one opinion.Task 1: identify if sentence is subjective or objectiveTask 2: identify polarity of sentence.
20Feature Level Sentiment Classification Task 1: identify and extract object featuresTask 2: determine polarity of opinions on featuresTask 3: group same featuresTask 4: summarizationEx. This mobile has good camera but poor battery life.
22Approach 1: Prior Learning Utilize available pre-annotated dataAmazon Product Review (star rated)Twitter Dataset(s)IMDb movie reviews (star rated)Learn keywords, N-Gram with polarity
231.1 Keywords Selection from Text Pang et. al. (2002)Two human’s hired to pick keywordsBinary Classification of KeywordsPositiveNegativeUnigram method reached 80% accuracy.
241.2 N-Gram based classification Learn N-Grams (frequencies) from pre-annotated training data.Use this model to classify new incoming sample.Classification can be done usingCounting methodScoring function(s)
251.3 Part-of-Speech based patterns Extract POS patterns from training data.Usually used for subjective vs objective classification.Adjectives and Adverbs contain sentimentsExample patterns*-JJ-NN : trigram patternJJ-NNP : bigram pattern*-JJ : bigram pattern
26Approach 2: Subjective Lexicon Heuristic or Hand MadeCan be General or Domain SpecificDifficult to CreateSample LexiconsGeneral Inquirer (1966)Dictionary of Affective LanguageSentiWordNet (2006)
272.1 General Inquirer Positive and Negative connotations. List of words manually created.1915 Positive Words2291 Negative Words
282.2 Dictionary of Affective Language 9000 Words with Part-of-speech informationEach word has a valance score range 1 – 3.1 for Negative3 for PositiveApp
292.3 SentiWordNet Approx 1.7 Million words Using WordNet and Ternary Classifier.Classifier is based on Bag-of-Synset model.Each synset is assigned three scoresPositiveNegativeObjective
30Example :Scores from SentiWordNet Very comfortable, but straps go loose quickly.comfortablePositive: 0.75Objective: 0.25Negative: 0.0loosePositive: 0.0Objective: 0.375Negative: 0.625Overall - PositiveObjective: 0.625
31Advantages and Disadvantages FastNo Training data necessaryGood initial accuracyDisadvantagesDoes not deal with multiple word sensesDoes not work for multiple word phrases
32Approach 3: Machine Learning Sensitive to sparse and insufficient data.Supervised methods require annotated data.Training data is used to create a hyper plane between the two classes.New instances are classified by finding their position on hyper plane.
33Machine LearningSVMs are widely used ML Technique for creating feature-vector-based classifiers.Commonly used featuresN-Grams or KeywordsPresence : BinaryCount : Real NumbersSpecial Symbols like !, #, etc.Smiley
34Some unanswered Questions ! Sarcasm HandlingWord Sense DisambiguationPre-processing and cleaningMulti-class classification
35Datasets Movie Review Dataset Product Review Dataset Bo Pang and Lillian LeeProduct Review DatasetBlitzer et. al.Amazon.com product reviews25 product domains
36Datasets MPQA Corpus Twitter Dataset Multi Perspective Question AnsweringNews Article, other text documentsManually annotated692 documentsTwitter Dataset1.6 million annotated tweetsBi-Polar classification
37Reading Opinion Mining and Sentiment Analysis Bo Pang and Lillian Lee (2008)Book: Sentiment Analysis and Opinion MiningBing Liu (2012)