Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick rmelnick@stanford.edu Dan Preston dpreston@stanford.edu

OpenTable.com

Short Characters Words

Sparse “An unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha. Divine. Inspirational and a great value.” Food? Ambiance? Service? Noise?

Skewed

Correlations

SVM + Features, Features, Features! tokenize punctuation "white list" (only use sentiment words) id, neutralize proper nouns remove stop words strip numbers POS tagging, ADJ only contraction splitting POS tagging, add ADV lower casing Brill tagger unigram (Bag of Words) sentiment "white list" (Harvard lexicon) bigram count of sentiment words (pos/neg) trigram balanced training set mixed n-grams binary accuracy ignore stop words sub-topic classifiers, hand list stemming WordNet topic list expansion negation processing topic-filtered n-grams expanded negation processing topic-word proximity filtering large training set size strict entropy modeling varying dictionary size frequency-weighted entropy modeling SVM scaling 30+ preprocessing and SVM classification features, ~50 configurations

Key Features Stemming Porter 1980 via NLTK,,  Negation processing (enhanced approach from Pang et al. 2002) “Not a great experience.”  NOT_great “They never disappoint!”  NOT_disappoint Net sentiment count pos/neg lexicon (Harvard General Inquirer) running +/- count “Incredible(+) food, but our server was rude(-).”  (0)

Results (so far) Trained on 10,000 reviews Tested on ~80,000 reviews Accuracy Baseline:50.0% Intermediate model:56.6%(1.13x) abs( average scoring delta ):0.56

Topic Modeling Hand-seeded topic-word list expanded via WordNet SynSets 1.sub-topic classifiers 2.topic-filtered n-grams 3.topic-word proximity filtering both above . Results:

Word-Rating Distributions “worst” “mediocre” “decent” “solid” “exceeded”

Frequency-Weighted Entropy Model Accuracy Baseline:50.0% Intermediate model:56.6% Best (entropy) model:58.6%(1.17x) abs( average scoring delta ):0.56  0.52

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

Similar presentations

Presentation on theme: "Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston

Similar presentations

Presentation on theme: "Great Food, Lousy Service Topic Modeling for Sentiment Analysis in Sparse Reviews Robin Melnick Dan Preston"— Presentation transcript:

Similar presentations

About project

Feedback