Presentation is loading. Please wait.

Presentation is loading. Please wait.

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

Similar presentations


Presentation on theme: "Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston"— Presentation transcript:

1 Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu

2 OpenTable.com

3 Short Characters Words

4 Sparse “An unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha. Divine. Inspirational and a great value.” Food? Ambiance? Service? Noise?

5 Skewed

6 Correlations

7 SVM + Features, Features, Features! tokenize punctuation "white list" (only use sentiment words) id, neutralize proper nouns remove stop words strip numbers POS tagging, ADJ only contraction splitting POS tagging, add ADV lower casing Brill tagger unigram (Bag of Words) sentiment "white list" (Harvard lexicon) bigram count of sentiment words (pos/neg) trigram balanced training set mixed n-grams binary accuracy ignore stop words sub-topic classifiers, hand list stemming WordNet topic list expansion negation processing topic-filtered n-grams expanded negation processing topic-word proximity filtering large training set size strict entropy modeling varying dictionary size frequency-weighted entropy modeling SVM scaling 30+ preprocessing and SVM classification features, ~50 configurations

8 Key Features Stemming Porter 1980 via NLTK,,  Negation processing (enhanced approach from Pang et al. 2002) “Not a great experience.”  NOT_great “They never disappoint!”  NOT_disappoint Net sentiment count pos/neg lexicon (Harvard General Inquirer) running +/- count “Incredible(+) food, but our server was rude(-).”  (0)

9 Results (so far) Trained on 10,000 reviews Tested on ~80,000 reviews Accuracy Baseline:50.0% Intermediate model:56.6%(1.13x) abs( average scoring delta ):0.56

10 Topic Modeling Hand-seeded topic-word list expanded via WordNet SynSets 1.sub-topic classifiers 2.topic-filtered n-grams 3.topic-word proximity filtering both above . Results:

11 Word-Rating Distributions “worst” “mediocre” “decent” “solid” “exceeded”

12 Frequency-Weighted Entropy Model Accuracy Baseline:50.0% Intermediate model:56.6% Best (entropy) model:58.6%(1.17x) abs( average scoring delta ):0.56  0.52


Download ppt "Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston"

Similar presentations


Ads by Google