Presentation is loading. Please wait.

Presentation is loading. Please wait.

S ENTIMENT A NALYSIS AND O PINION M INING Akshat Bakliwal Search and Information Extraction Lab (SIEL)

Similar presentations

Presentation on theme: "S ENTIMENT A NALYSIS AND O PINION M INING Akshat Bakliwal Search and Information Extraction Lab (SIEL)"— Presentation transcript:

1 S ENTIMENT A NALYSIS AND O PINION M INING Akshat Bakliwal Search and Information Extraction Lab (SIEL)

2 “W HAT OTHER PEOPLE THINK ?” What others think has always been an important piece of information Before making any decision, we look for suggestions and opinions from others. A big question “So whom shall I ask ?”. 2

3 E VOLUTION History Friends Acquaintances Consumer Reports Present Friends + Acquaintances Unknowns No Limitations ! Across Globe 3

4 I S MOVING TO WEB A S OLUTION ? Partly Yes ! New problems How and Where to look for reviews or opinions ? Will Normal web search help ? Overwhelming amount of Information For some products – millions of reviews. Difficult to read all. For some less popular products – hardly a few reviews. 4

5 M ORE P ROBLEMS !! Biased views Fake Reviews Spam Reviews Contradicting Reviews 5

6 S OLUTION ! – S UBJECTIVITY A NALYSIS General Text can be divided into two segments Objective – which don’t carry any opinion or sentiment. Facts (news, encyclopedias, etc) Subjective Subjectivity Analysis Linguistic expressions of somebody’s opinions, sentiments, emotions.. that is not open to verification. 6

7 F LAVORS OF S UBJECTIVITY A NALYSIS Sentiment Analysis Opinion Mining Mood Classification Emotion Analysis Synonyms and Used Interchangeably !! 7

8 W HAT IS S ENTIMENT ? Subjective impressions Generally, Sentiment == Feelings Opinions Emotions Attitude like/dislike or good/bad, etc. 8

9 W HAT IS S ENTIMENT A NALYSIS ? Sentiment Analysis is a study of human behavior in which we extract user opinion and emotion from plain text. Identifying the orientation of opinions in a piece of text. This movie was fabulous. [Sentiment] This movie stars Mr. X. [Factual] This movie was boring. [Sentiment]  9

10 M OTIVATION Enormous amount of information. Real time update Monetary benefits 10

11 A PPLICATIONS ! Helpful for Business Intelligence (BI). Aide in decision making. Geo-Spatial reaction modeling of Events. Ads Placements 11

12 D OES W EB REALLY CONTAIN S ENTIMENTS ? Yes, Where ? Blogs Reviews User Comments Discussion Forums Social Network (Twitter, Facebook, etc.) 12

13 C HALLENGES Negation Handling I don’t like Apple products. This is not a good read. Un-Structured Data, Slangs, Abbreviations Lol, rofl, omg! ….. Gr8, IMHO, … Noise Smiley Special Symbols ( !, ?, …. ) 13

14 C HALLENGES Ambiguous words This music cd is literal waste of time. (negative) Please throw your waste material here. (neutral) Sarcasm detection and handling “All the features you want - too bad they don’t work. :-P” (Almost) No resources and tools for low/scarce resource languages like Indian languages. 14

15 B ASICS.. Basic components Opinion Holder – Who is talking ? Object – Item on which opinion is expressed. Opinion – Attitude or view of the opinion holder. This is a good book. Opinion Holder Object Opinion 15

16 T YPES OF O PINIONS Direct “This is a great book.” “Mobile with awesome functions.” Comparison “Samsung Galaxy S3 is better than A pple iPhone 4S.” “Hyundai Eon is not as good as Maruti Alto !.” 16

17 W HAT IS S ENTIMENT C LASSIFICATION Classify given text on the overall sentiments expresses by the author Different levels Document Sentence Feature Classification levels Binary Multi Class 17

18 D OCUMENT L EVEL S ENTIMENT C LASSIFICATION Documents can be reviews, blog posts,.. Assumption: Each document focuses on single object. Only single opinion holder. Task : determine the overall sentiment orientation of the document. 18

19 S ENTENCE L EVEL S ENTIMENT C LASSIFICATION Considers each sentence as a separate unit. Assumption : sentence contain only one opinion. Task 1 : identify if sentence is subjective or objective Task 2 : identify polarity of sentence. 19

20 F EATURE L EVEL S ENTIMENT C LASSIFICATION Task 1: identify and extract object features Task 2: determine polarity of opinions on features Task 3: group same features Task 4: summarization Ex. This mobile has good camera but poor battery life. 20

21 A PPROACHES Prior Learning Subjective Lexicon (Un)Supervised Machine Learning 21

22 A PPROACH 1: P RIOR L EARNING Utilize available pre-annotated data Amazon Product Review (star rated) Twitter Dataset(s) IMDb movie reviews (star rated) Learn keywords, N-Gram with polarity 22

23 1.1 K EYWORDS S ELECTION FROM T EXT Pang et. al. (2002) Two human’s hired to pick keywords Binary Classification of Keywords Positive Negative Unigram method reached 80% accuracy. 23

24 1.2 N-G RAM BASED CLASSIFICATION Learn N-Grams (frequencies) from pre-annotated training data. Use this model to classify new incoming sample. Classification can be done using Counting method Scoring function(s) 24

25 1.3 P ART - OF -S PEECH BASED PATTERNS Extract POS patterns from training data. Usually used for subjective vs objective classification. Adjectives and Adverbs contain sentiments Example patterns *-JJ-NN : trigram pattern JJ-NNP : bigram pattern *-JJ : bigram pattern 25

26 A PPROACH 2: S UBJECTIVE L EXICON Heuristic or Hand Made Can be General or Domain Specific Difficult to Create Sample Lexicons General Inquirer (1966) Dictionary of Affective Language SentiWordNet (2006) 26

27 2.1 G ENERAL I NQUIRER Positive and Negative connotations. List of words manually created. 1915 Positive Words 2291 Negative Words 27

28 2.2 D ICTIONARY OF A FFECTIVE L ANGUAGE 9000 Words with Part-of-speech information Each word has a valance score range 1 – 3. 1 for Negative 3 for Positive App AL_app/index.php AL_app/index.php 28

29 2.3 S ENTI W ORD N ET Approx 1.7 Million words Using WordNet and Ternary Classifier. Classifier is based on Bag-of-Synset model. Each synset is assigned three scores Positive Negative Objective 29

30 E XAMPLE :S CORES FROM S ENTI W ORD N ET Very comfortable, but straps go loose quickly. comfortable Positive: 0.75 Objective: 0.25 Negative: 0.0 loose Positive: 0.0 Objective: 0.375 Negative: 0.625 Overall - Positive Positive: 0.75 Objective: 0.625 Negative: 0.625 30

31 A DVANTAGES AND D ISADVANTAGES Advantages Fast No Training data necessary Good initial accuracy Disadvantages Does not deal with multiple word senses Does not work for multiple word phrases 31

32 A PPROACH 3: M ACHINE L EARNING Sensitive to sparse and insufficient data. Supervised methods require annotated data. Training data is used to create a hyper plane between the two classes. New instances are classified by finding their position on hyper plane. 32

33 M ACHINE L EARNING SVMs are widely used ML Technique for creating feature-vector-based classifiers. Commonly used features N-Grams or Keywords Presence : Binary Count : Real Numbers Special Symbols like !, ?, @, #, etc. Smiley 33

34 S OME UNANSWERED Q UESTIONS ! Sarcasm Handling Word Sense Disambiguation Pre-processing and cleaning Multi-class classification 34

35 D ATASETS Movie Review Dataset Bo Pang and Lillian Lee data/ Product Review Dataset Blitzer et. al. product reviews 25 product domains 35

36 D ATASETS MPQA Corpus Multi Perspective Question Answering News Article, other text documents Manually annotated 692 documents Twitter Dataset 1.6 million annotated tweets Bi-Polar classification 36

37 R EADING Opinion Mining and Sentiment Analysis Bo Pang and Lillian Lee (2008) Book: Sentiment Analysis and Opinion Mining Bing Liu (2012) and-OpinionMining.html and-OpinionMining.html 37


Download ppt "S ENTIMENT A NALYSIS AND O PINION M INING Akshat Bakliwal Search and Information Extraction Lab (SIEL)"

Similar presentations

Ads by Google