Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer.

Similar presentations


Presentation on theme: "Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer."— Presentation transcript:

1 Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer Science at Unversity of Texas at El Paso 1

2 Overall Process Extract Reviews Pre-process data Sentiment Model Restaurants Grouping Terms Analysis 2

3 Extract Reviews  Reviews Dataset was filtered  Using category feature  Searched "Restaurants" and extract business ids  Extracted reviews with the same business ids  Created polar target  remove three star reviews  one and two stars are negative  four and five stars are positive 3

4 Extract Reviews  Dataset was unbalance  20 % were negative  80% were positive  Selected even number of examples  Extracted dates as well for each example 4

5 Pre-Process Data  Removed  Stop words- Except descriptive nouns and negatives  Nonsensical words- Except common slang words  Punctuation and numbers  Hyperlinks and invalid inputs  Spelling Corrector  Stemming  All words were converted into lower case 5

6 Pre-Process Data  Use symbols to represents words  Negative words "~"  for example: not great = ~great  All caps words " ! "  for example HATE = !hate  Used bigrams to separate terms  example:  "service slow food nasty no so great "  "service slow" "slow food" "food nasty" "nasty ~so" "~so great" 6

7 Sentiment Model Naive Bayes Classifier  Class (negative and positive) 7

8 Sentiment Model  NBSVM (Naive Bayes Support Vector Machine)  Have not been run for Yelp dataset  Matlab implementation available online [4] Feature Vector Author's SVM model 8

9 Sentiment Evaluation Results  10-fold evaluation 9

10 Restaurants Grouping  K-Means  K=2  Attributes  Sentiment Overall  Using probabilities of 20000 examples  Number of days since business open  Average star ratings  Cluster 100 business  Consist of ~4,ooo reviews 10

11 Clustering Results 11

12 References  [1] San Francisco Restaurants, Dentists, Bars, Beauty Salons, Doctors. (n.d.). Retrieved April 2, 2015, from http://www.yelp.com/http://www.yelp.com/  [2] Naive Bayes Text Classification Book Chapter, Stanford  [3] Luca, M. (2011). Reviews, reputation, and revenue: The case of Yelp.com.com (September 16, 2011). Havard Business School NOM Unit Working Paper, (12-016).  [4] Wang, Sida, and Christopher D. Manning. "Baselines and bigrams: Simple, good sentiment and topic classification." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. Association for Computational Linguistics, 2012.


Download ppt "Extracting Hidden Components from Text Reviews for Restaurant Evaluation Juanita Ordonez Data Mining Final Project Instructor: Dr Shahriar Hossain Computer."

Similar presentations


Ads by Google