Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal DaveSteve LawrenceDavid M. Pennock IBM Google Overture.

Similar presentations


Presentation on theme: "1 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal DaveSteve LawrenceDavid M. Pennock IBM Google Overture."— Presentation transcript:

1 1 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal DaveSteve LawrenceDavid M. Pennock IBM Google Overture Work done at NEC Laboratories

2 2 Problem Many reviews spread out across Web –Product-specific sites –Editorial reviews at C|net, magazines, etc. –User reviews at C|net, Amazon, Epinions, etc. –Reviews in blogs, author sites –Reviews in Google Groups (Usenet) No easy aggregation, but would like to get numerous diverse opinions Even at one site, synthesis difficult

3 3 Solution! Find reviews on the web and mine them… –Filtering (find the reviews) –Classification (positive or negative) –Separation(identify and rate specific attributes)

4 4 Existing work Classification varies in granularity/purpose: Objectivity –Does this text express fact or opinion? Words –What connotations does this word have? Sentiments –Is this text expressing a certain type of opinion?

5 5 Objectivity classification Best features: relative frequencies of parts of speech (Finn 02) Subjectivity is…subjective (Wiebe 01)

6 6 Word classification Polarity vs. intensity Conjunctions –(Hatzivassiloglou and McKeown, 97) Colocations –(Turney and Littman 02) Linguistic colocations –(Lin 98), (Pereira 93)

7 7 Sentiment classification Manual lexicons –Fuzzy logic affect sets (Subasic and Huettener 01) –Directionality – block/enable (Hearst 92) –Common Sense and emotions (Liu et al 03) Recommendations –Stocks in Yahoo (Das and Chen 01) –URLs in Usenet (Terveen 97) –Movie reviews in IMDb (Pang et al 02)

8 8 Applied tools AvaQuest’s GoogleMovies –http:// :8080/demos/ GoogleMovies/GoogleMovies.cgihttp:// :8080/demos/ GoogleMovies/GoogleMovies.cgi Clipping services –PlanetFeedback, Webclipping, eWatch, TracerLock, etc. –Commonly use templates or simple searches Feedback analysis: –NEC’s Survey Analyzer –IBM’s Eclassifier Targeted aggregation –BlogStreet, onfocus, AllConsuming, etc. –Rotten Tomatoes

9 9 Our approach

10 10 Train/test Existing tagged corpora! C|net – two cases –even, mixed 10-fold test 4,480 reviews 4 categories –messy, by category 7-fold test 32,970 reviews

11 11 Other corpora Preliminary work with Amazon –(Potentially) easier problem Pang et al IMDb movie reviews corpus –Our simple algorithm insignificantly worse 80.6 vs (unigram SVM) –Different sort of problem

12 12 Domain obstacles Typos, user error, inconsistencies, repeated reviews Skewed distributions – 5x as many positive reviews – 13,000 reviews of MP3 players, 350 of networking kits – fewer than 10 reviews for ½ of products Variety of language, sparse data – 1/5 of reviews have fewer than 10 tokens – More than 2/3 of terms occur in fewer than 3 documents Misleading passages (ambivalence/comparison) – Battery life is good, but... Good idea, but...

13 13 Base Features Unigrams – great, camera, support, easy, poor, back, excellent Bigrams – can't beat, simply the, very disappointed, quit working Trigrams, substrings – highly recommend this, i am very happy, not worth it, i sent it back, it stopped working Near – greater would, find negative, refund this, automatic poor, company totally

14 14 Generalization Replacing product names,(metadata) –the iPod is an excellent ==> the _productname is an excellent domain-specific words,(statistical) – excellent [picture|sound] quality ==> excellent _producttypeword quality rare words (statistical) – overshadowed by ==> _unique by

15 15 Generalization (2) Finding synsets in WordNet –anxious ==> –nervous ==> Stemming –was ==> be –compromised ==> compromis

16 16 Qualification Parsing for colocation – this stupid piece of garbage ==> (piece(N):mod:stupid(A))... – this piece is stupid and ugly ==> (piece(N):mod:stupid(A))... Negation –not good or useful ==> NOTgood NOTor NOTuseful

17 17 Optimal features Confidence/precision tradeoff Try to find best-length substrings –How to find features? Marginal improvement when traversing suffix tree Information gain worked best –How to score? [p(C|w) – p(C’|w)] * df during construction Dynamic programming or simple during use –Still usually short ( n < 5 ) Although… “i have had no problems with”

18 18 Clean up Thresholding –Reduce space with minimal impact –But not too much! (sparse data) –> 3 docs Smoothing: allocate weight to unseen features, regularize other values –Laplace, Simple Good-Turing, Witten-Bell tried –Laplace helps Naïve Bayes

19 19 Scoring SVMlight, naïve Bayes –Neither does better in both tests –Our scoring is simpler, less reliant on confidence, document length, corpus regularity Give each word a score ranging –1 to 1 –Our metric: –Tried Fisher discriminant, information gain, odds ratio If sum > 0, it’s a positive review Other probability models – presence Bootstrapping barely helps score(f i ) = p(f i |C) – p(f i |C’) p(f i |C) + p(f i |C’)

20 20 Reweighting Document frequency –df, idf, normalized df, ridf –Some improvement from logdf and gaussian (arbitrary) Analogues for term frequency, product frequency, product type frequency Tried bootstrapping, different ways of assigning scores

21 21 Test 1 (clean) Best Results Baseline85.0 % Naïve Bayes + Laplace87.0 % Weighting (log df)85.7 % Weighting (Gaussian tf)85.7 % Baseline88.7 % + _productname88.9 % Unigrams Trigrams

22 22 Test 2 (messy) Best Results Baseline82.2 % Prob. after threshold83.3 % Presence prob. model83.1 % + Odds ratio83.3 % Baseline84.6 % + Odds ratio85.4 % + SVM85.8 % Baseline85.1 % + _productname85.3 % Unigrams Bigrams Variable

23 23 Extraction Can we use the techniques from classification to mine reviews?

24 24 Obstacles Words have different implications in general usage –“main characters” is negatively biased in reviews, but has innocuous uses elsewhere Much non-review subjective content –previews, sales sites, technical support, passing mentions and comparisons, lists

25 25 Results Manual test set + web interface Substrings better…bigrams too general? Initial filtering is bad –Make “review” part of the actual search –Threshold sentence length, score –Still get non-opinions and ambiguous opinions Test corpus, with red herrings removed, 76% accuracy in top confidence tercile

26 26

27 27

28 28 Attribute heuristics Look for _producttypewords Or, just look for things starting with “the” –Of course, some thresholding –And stopwords

29 29 Results No way of judging accuracy, arbitrary For example, Amstel Light –Ideally: taste, price, image... –Statistical method: beer, bud, taste, adjunct, other, brew, lager, golden, imported... –Dumb heuristic: the taste, the flavor, the calories, the best…

30 30

31 31

32 32 Future work Methodology –More precise corpus –More tests, more precise tests Algorithm –New combinations of techniques –Decrease overfitting to improve web search –Computational complexity –Ambiguity detection? New pieces –Separate review/non-review classifier –Review context (source, time) –Segment pages


Download ppt "1 Mining the peanut gallery: Opinion extraction and semantic classification of product reviews Kushal DaveSteve LawrenceDavid M. Pennock IBM Google Overture."

Similar presentations


Ads by Google