Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer Science at Unversity of Texas at El Paso 1

Overall Process Extract Reviews Pre-process data Sentiment Model Restaurants Grouping Terms Analysis 2

Extract Reviews  Reviews Dataset was filtered  Using category feature  Searched "Restaurants" and extract business ids  Extracted reviews with the same business ids  Created polar target  remove three star reviews  one and two stars are negative  four and five stars are positive 3

Extract Reviews  Dataset was unbalance  20 % were negative  80% were positive  Selected even number of examples  Extracted dates as well for each example 4

Pre-Process Data  Removed  Stop words- Except descriptive nouns and negatives  Nonsensical words- Except common slang words  Punctuation and numbers  Hyperlinks and invalid inputs  Spelling Corrector  Stemming  All words were converted into lower case 5

Pre-Process Data  Use symbols to represents words  Negative words "~"  for example: not great = ~great  All caps words " ! "  for example HATE = !hate  Used bigrams to separate terms  example:  "service slow food nasty no so great "  "service slow" "slow food" "food nasty" "nasty ~so" "~so great" 6

Sentiment Model Naive Bayes Classifier  Class (negative and positive) 7

Sentiment Model  NBSVM (Naive Bayes Support Vector Machine)  Have not been run for Yelp dataset  Matlab implementation available online [4] Feature Vector Author's SVM model 8

Sentiment Evaluation Results  10-fold evaluation 9

Restaurants Grouping  K-Means  K=2  Attributes  Sentiment Overall  Using probabilities of 20000 examples  Number of days since business open  Average star ratings  Cluster 100 business  Consist of ~4,ooo reviews 10

Clustering Results 11

References  [1] San Francisco Restaurants, Dentists, Bars, Beauty Salons, Doctors. (n.d.). Retrieved April 2, 2015, from http://www.yelp.com/http://www.yelp.com/  [2] Naive Bayes Text Classification Book Chapter, Stanford  [3] Luca, M. (2011). Reviews, reputation, and revenue: The case of Yelp.com.com (September 16, 2011). Havard Business School NOM Unit Working Paper, (12-016).  [4] Wang, Sida, and Christopher D. Manning. "Baselines and bigrams: Simple, good sentiment and topic classification." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 2012.

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Similar presentations

Presentation on theme: "Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Similar presentations

Presentation on theme: "Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer."— Presentation transcript:

Similar presentations

About project

Feedback