Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sentiment and Polarity Extraction Arzucan Ozgur SI/EECS 767 January 15, 2010.

Similar presentations


Presentation on theme: "Sentiment and Polarity Extraction Arzucan Ozgur SI/EECS 767 January 15, 2010."— Presentation transcript:

1 Sentiment and Polarity Extraction Arzucan Ozgur SI/EECS 767 January 15, 2010

2 Introduction  Suppose you would like to buy a digital camera.  How do you decide which camera to buy? Product specification/price Ask for friends' opinions Read on-line product reviews This is great camera. Its a very quick camera, the auto feature works very well as the red eye correction, picture quality is excellent and the image estabilization work great. I will recommend this camera to everyone looking for a great camera. Normally I am a big fan of Canons but this model is horrible. The pictures are always out of focus and the image quality is so poor not nearly as good as some of the older models. No matter what you do, close ups, far away they all take crummy pictures. Don't waste your money. Thumbs up Thumbs down

3 Introduction  Growth in on-line discussion groups and review sites. Important characteristic of posted articles is their sentiment (opinion towards the subject matter).  Is a product review positive or negative.  Sentiment and polarity extraction Identifying sentiments, opinions, emotions expressed in text.

4 Applications  Classify reviews (e.g. movie reviews or product reviews) as positive or negative.  Improved search: summary statistics for search engines. “Paris travel review”: 80% positive & 20% positive “Paris travel review: positive”  Summarization of reviews pick sentences with highest positive semantic orientation.  Filtering “flames” for newsgroups  Analysis of survey responses to open ended questions.

5 Approaches  Classifying words (or phrases) as having positive, negative, or (neutral) semantic orientation (polarity). + semantic orientation -> praise (e.g. excellent, honest) - semantic orientation -> criticism (e.g. bad, poor, negative) (Hatzivassiloglou & McKeown, 1997), (Takamura et al., 2005), (Turney & Littman, 2003).  Classifying documents (e.g. reviews) based on the overall sentiment expressed by authors as positive (thumbs up), negative (thumbs down), or (neutral). (Turney, 2002), (Pang et al., 2002)  Subjectivity Analysis: classify sentences as subjective or objective. I bought this camera four days ago. (Objective) This is a great camera. (Subjective) (Riloff & Wiebe, 2003)

6 Predicting the semantic orientation of adjectives (Hatzivassiloglou & McKeown, 1997)

7 Introduction  Task: Classify adjectives as having positive or negative semantic orientation.  Motivation: Use semantic orientation as a component in a larger system to identify antonyms or near synonyms. Antonyms usually have different semantic orientations (e.g. good vs. bad) Some near synonyms have different semantic orientation - implies desirability or not (e.g. simple vs. simplistic).  Approach: Corpus-based approach, infer semantic orientation using the conjunctions between adjectives.

8 Conjunctions between adjectives -> semantic orientation  Adjectives conjoined with “and” usually of the same orientation. fair and legitimate corrupt and brutal fair and brutal (not natural, semantically anomalous)  Adjectives conjoined with “but” usually of different orientation. fair but brutal fair but legitimate (semantically anomalous)  The tax proposal was simple and well-received simplistic but well-received simplistic and well-received -- incorrect by the public.

9 System Overview  All conjunctions of adjectives are extracted from the corpus.  A supervised learning algorithm classifies each pair of conjoined adjectives as having the same or different orientation. The result is a graph, where nodes are adjectives and links indicate the inferred same or different semantic orientation.  A clustering algorithm is applied to the graph to separate the adjectives into two groups of different orientation (place as many same orientation words into the same cluster as possible).  The group with the higher average frequency is labeled as having positive semantic orientation.

10 Corpus  21 million word 1987 Wall Street Journal corpus.  Training set: adjectives > 20 times remove adjectives that don't have orientation (e.g. medical, domestic) remove adjectives for which unique label can't be assigned (label depends on context)  cheap (+ when synonym for inexpensive; - when implies inferior quality)  Final set: 1,336 adjectives:  657 positive, 679 negative  Agreement: 96.97%

11 Validation of Conjunction Hypothesis 15,048 conjunctions extracted from the corpus (4024 both members in the set of pre-selected adjectives). 9296 distinct conjoined adjective pairs (2748 both pairs in the set of pre-selected adjectives) Conjoined adjectives not evenly split between same and different orientation. Conjoined adjective usually of the same orientation (except “but”). Each token classified by the parser according to three variables: - conjunction used (and, but, either – or, neither – nor) - type of modification (attributive, predicative, appositive, regulative) - number of the modified noun (singular or plural)

12 Prediction of Link Type  Morphological relationships between adjectives adequate – inadequate thoughtful – thoughtless 97.06% accurate (applies only to very few of the possible pairs) Log-linear regression model: - Feature vector for an adjective pair: observed counts in the various conjunction categories (e.g. conjunction used: and, type of modification: attributive, modified noun: singular) - only small improvement, but rates each prediction between 0 and 1

13 Clustering Adjectives  Construct a graph, where nodes are adjectives  Links associated with dissimilarity values [0, 1] same-orientation adjectives: low dissimilarity  1 – P(classification correct) different-orientation adjectives: high dissimilarity  P(classification correct) non-conjoined adjectives: neutral dissimilarity (0.5) Log-linear model: dissimilarity = (1-y), y is 1 if same orientation.  Clustering: Minimize objective function: subject to

14 Results - Experimented with different test sets by varying graph connectivity. * Denser and sparser test sets - Test set: all the adjectives that have at least alpha connections - Training set: The rest of the adjectives. Goodness of fit of each word: can be used as a quantitative measure of orientation.

15 Measuring praise and criticism: Inference of semantic orientation from association (Turney & Littman, 2003)

16 Introduction  Infer semantic orientation of a word from its statistical association with a set of positive and negative seed words.  Hypothesis: Semantic orientation of a word tends to correspond to the semantic orientation of its neighbors. SO-A(word) > 0, positive semantic orientation SO-A(word) < 0, negative semantic orientation |SO-A(word)|: strength of the semantic orientation - chosen for their lack of sensitivity to context - opposing pairs Unsupervised approach. Only 14 labeled seed words

17 Semantic Orientation from PMI  Uses Pointwise Mutual Information (PMI) to calculate strength of semantic association between words. positive: words tend to co-occur negative: presence of one word makes it likely that the other one is absent - PMI estimated by issuing queries to the AltaVista search engine (350 million web pages) using NEAR operator (words within 10 words to each other). - Word occurrence: number of hits (matching documents).

18 Semantic Orientation from LSA  Applies Latent Semantic Analysis (LSA) to calculate strength of the association between words. - The k largest singular values, and their corresponding singular vectors from U and V, gives the rank k approximation to X with the smallest error. - Compressed version of the original matrix: rank lowering is expected to merge the dimensions associated with terms that have similar meanings.

19 Evaluation  Two Different Lexicons HM (Adjectives from (Hatzivassiloglou & McKeown, 1997)) GI (General Inquirer) Lexicon  3596 adjectives, adverbs, nouns, verbs (1614 positive, 1982 negative)  Ambiguous words eliminated  “mind” in the sense of intellect – Positive  “mind” in the sense of beware – Negative  Three different corpora of different sizes AV-ENG: ~350 million English web pages indexed in AltaVista AV-CA ~ 7 million English web pages in the Canadian domain (~2 billion words) TASA ~ 10 million words - short documents from novels, newspaper articles.

20 Evaluation - Accuracy of HM Algorithm between 78.08% and 92.37% - Comparable with SO-PMI medium size corpus. - When large corpus used -> SO-PMI better

21 Effect of Corpus Size GI lexicon slightly lower results, but same trend.

22 Effect of Neighborhood Size TASA small corpus: neighborhood > 100 better Near operator better than AND

23 LSA vs PMI LSA better for small corpora, but not scalable to large corpora.

24 Effect of Seed Words Selection of seed words important.

25 Effect of Seed Words pick, raise, capital are negative in only certain contexts. - context dependent e.g. raise a protest, capital offense

26 Extracting semantic orientations of words using spin model (Takamura et al., 2005)

27 Introduction  Each electron has a direction of spin (up or down).  Each word has a semantic orientation (positive or negative).  Regard words as a set of electrons and use spin models for electrons to identify semantic orientations of words.

28 Spin Model  Also called Ising spin model.  A spin system: An array of N electrons Each electron has a spin of “+1 (up)” or “-1 (down)”. Two electrons next to each other energetically tend to have same spin value. The energy function of a spin system: Variable x follows Boltzmann Distribution. 2 N configurations of spins (computationally difficult) Z(W) normalization factor β: constant (called: inverse temperature) Spin configuration with higher energy has smaller probability

29 Mean Field Approximation - Approximate P(x|W) with a simple function Q(x; θ). - Select parameters θ such that P and Q are as similar to each other as possible. - Distance between P and Q: Variation free energy F -> Difference between the mean energy with respect to Q and the entropy of Q. Mean field equation: solved by iterative update rule

30 Construction of Lexical Networks  Link two words if one appears in the gloss of the other word. - Gloss (G) network. SL: same-orientation links DL: different-orientation links if one word precedes a negation word in the gloss of the other word, the link is DL. d(i): degree of word i - Gloss-Thesaurus Network (GT): Link synonyms, antonyms, and hypernyms. Only antonym links as in DL. - Gloss-Thesaurus-Corpus (GTC): Use the method by (Hatzivassiloglou & McKeown, 1997) * If adjectives connected with “and”, the link is in SL * If adjectives connected with “but”, the link is in DL

31 Prediction of β: Magnetization - At high temperatures: spins randomly oriented (paramagnetic phase) => m ~ 0 - At low temperatures: most spins same direction (ferromagnetic phase) => m ≠ 0 - Phase transition: at some intermediate temperature ferromagnetic phase changes to paramagnetic phase. - Slightly before phase transition: spins are locally polarized -> strongly connected spins have same polarity. - State of the lexical network is locally polarized. - Calculate values of m with different values of ϐ, select the value just before phase transition.

32 Evaluation - Construct English lexical network using glosses, synonyms, antonyms, hypernyms of WordNet. - 88,000 words - 804 conjunctive expressions from Wall Street Journal - GI Lexicon gold standard. (Turney & Littman): 82.84% with 14 seed words.

33 Evaluation

34 Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews (Turney, 2002)

35 Introduction  Unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down).  Semantic orientation of phrases containing adjectives or adverbs are calculated using PMI-IR (Pointwise Mutual Information – Information Retrieval).  Extract two-word phrases containing adjectives or adverbs. Try to capture more context information compared to using single words.  unpredictable steering (negative in car review)  unpredictable plot (positive in movie review)  If average semantic orientation of the phrases in a review: positive – recommended negative – not recommended

36 Semantic Orientation of a Phrase  Estimate semantic orientation of a phrase using PMI-IR. Five-star review rating system -> 5: excellent, 1: poor SO(phrase) = PMI(phrase, “excellent”) - PMI(phrase, “poor”) SO(phrase) is positive, if phrase more strongly associated with “excellent”; negative if more strongly associated with “poor”. AltaVista search engine with NEAR operator used. hits(query): number of hits returned

37 Example processed reviews Recommended review (of Bank of America) Not recommended review (of Bank of America)

38 Results 410 reviews from Epinions, randomly sampled from four different domains 170 (41%) - not recommended 240 (59%) - recommended - Little variation in accuracy within a domain (except travel) - Strong positive correlation between average semantic orientation and author's rating out of five stars

39 Movie reviews hard to classify - Positive reviews mention unpleasant things (e.g. violent scenes) - Negative reviews mention pleasant things (e.g. talented actor) - Different elements in movie reviews: actors, events, style, art. e.g. talented actors might not add up to a good movie

40 Thumbs up? sentiment classification using machine learning techniques (Pang et al., 2002)

41 Introduction  Classifying documents (movie reviews) by overall sentiment (positive or negative).  Examine the effectiveness of applying machine learning techniques (Naïve Bayes, Maximum Entropy Classification, Support Vector Machines) to the sentiment classification problem.  Sentiment classification more challenging than topic-based classification. “How could anyone sit through this movie?”  no word that is obviously negative Sentiment seems to require more understanding than the usual topic-based classification.

42 Movie Reviews Domain  Data source: The IMDB archive of the rec.arts.movies.reviews newsgroup.  Selected reviews where author rating was expressed with stars or numerical value Automatically converted to one of three categories: positive, negative, or neutral. Impose a limit of fewer that 20 reviews per author per sentiment category.  1301 positive reviews  752 negative reviews  144 reviewers

43 Closer Look at the Problem

44 Results - Randomly selected 700 positive and 700 negative documents - Accuracy for 3-fold cross-validation Model negation: “good” vs. “not very good” Add the tag NOT_ to every word between a negation word (“not”, “isn’t”, “didn’t”, etc.) and the first punctuation mark following the negation word. - unigrams, bigrams, part-of-speech, adjectives, position: first quarter, last quarter, or middle half of document. - NB tends to do the worst and SVMs tend to do the best. - Unigram presence information -> most effective (contradicts topic-based classification results)

45 Discussion  “thwarted expectations” narrative: Author sets up deliberate contrast to earlier discussion. "This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up."  Some form of discourse analysis is necessary (using more sophisticated techniques than the positional features mentioned above), or at least some way of determining the focus of each sentence.

46 Learning extraction patterns for subjective expressions (Riloff & Wiebe, 2003)

47 Introduction  Classify sentences as subjective or objective.  Subjective language can be expressed with a variety of words or phrases, some might be very rare: strongly subjective adjectives: preposterous, unseemly metaphorical or idiomatic phrases: drives (someone) up the wall, swept off one's feet.  To acquire a broad and comprehensive subjectivity vocabulary -> subjectivity learning systems must be trained on large text collections  Use bootstrapping to allow subjectivity classifiers learn from unannotated text.

48 Bootstrapping Process HP-Subj: at least 2 strongly subjective clues (91.5% precision, 31.9% recall) HP-Obj: No strongly subjective clues, at most one weekly subjective clue in the current, previous, and next sentence (82.6% precision, 16.4% recall)

49 Learning Subjective Extraction Patterns Apply syntactic templates to the training corpus to extract patterns. Templates and example patterns learned Patterns with interesting behavior Rank patterns using conditional probability. freq(pattern i ) >= Q 1 Pr(subjective | pattern i ) >= Q 2

50 Evaluation Evaluation of learned patterns Evaluation of Bootstrapping

51 Summary and Discussion  Three approaches to sentiment and polarity extraction.  Semantic orientation of words (Hatzivassiloglou & McKeown, 1997), (Takamura et al., 2005), (Turney & Littman, 2003) Performances comparable to each other. (Hatzivassiloglou & McKeown, 1997) only for adjectives. Can it be extended to other classes of words?  Adverbs: “He ran quickly but awkwardly.”  Nouns & verbs?  the rise and fall of the Roman Empire  love and death (Turney & Littman, 2003): Time to query AltaVista, LSA not scalable to large corpora. (Takamura et al., 2005): Slightly lower performance, strong theoretical model

52 Summary and Discussion  Non of the methods deal with ambiguous words. “lose one's mind”: negative “right mind”: positive semantic orientation depends on the context. Can Word Sense Disambiguation help?  “unpredictable steering”: negative  “unpredictable plot”: positive What methods can be used?

53 Summary and Discussion  Classify documents (Turney, 2002): Average semantic orientation of phrases in the text.  Classifying movies more difficult (Pang et al., 2002): Traditional ML methods  More difficult than topic-based classification Assume: A document expresses either a positive or a negative sentiment about an subject.Might talk about several different aspects of an object. (e.g. good actors, but bad movie) What methods can be applied?

54 Summary and Discussion  Subjectivity Analysis: Classify sentences as subjective/objective. Bootstrapping improved performance. Can it be used to improve performance of word-based or document-based sentiment extraction? How?

55 Thank you!


Download ppt "Sentiment and Polarity Extraction Arzucan Ozgur SI/EECS 767 January 15, 2010."

Similar presentations


Ads by Google