Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013.

Similar presentations


Presentation on theme: "Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013."— Presentation transcript:

1 Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013

2 Satirical Amazon Reviews Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 2 For a fun list: http://www.geekosystem.com/funny-amazon-reviews/http://www.geekosystem.com/funny-amazon-reviews/

3 Defining Irony, Sarcasm and Satire Irony: “the use of words to convey a meaning that is the opposite of its literal meaning” Sarcasm: “a sharply ironical taunt; sneering or cutting remark” Satire: “the use of irony, sarcasm, ridicule, or the like, in exposing, denouncing, or deriding vice, folly, etc.” Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 3

4 Sarcastic Review: Shure SE110 Sound Isolating Earphones Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 4

5 Satirical Review: BIC Cristal For Her ballpoint pens Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 5

6 Satirical Review: Zenith Men’s Defy Xtreme Titanium Watch Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 6

7 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 7 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Algorithm detects sarcasm in individual sentences using k-Nearest Neighbors type algorithm. Features include pattern-matching and punctuation. There are additional features to consider for satire that are not present in sarcasm model. Classification baseline needs to be determined from multiple options. Sentence-based sarcasm detector, not full document.

8 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 8 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Jindal and Liu (2008) has 66,000 data set of book and product reviews. Filatova (2012) provides corpora of Amazon reviews labeled ironic, sarcastic, both, regular. Specific products, authors, companies, and book titles were replaced with [product], [author], etc. HTML and special symbols were removed from text

9 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 9 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Tsur et al. (2010) posited that sarcastic sentences co-appear with others. Gathered nearby sentences using Yahoo! BOSS API with seeds. Satirical reviews prove true, not sarcastic ones. Sarcasm Satire

10 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 10 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Via Davidov and Rappoport (2006, 2008): High frequency words(HFWs) Content words (CWs) What can I say about the 571B Banana Slicer that hasn't already been said about the wheel, penicillin or the iPhone… “What can I CW CW the” “I CW CW the [product]” “[product] that hasn’t CW been CW about” “about the CW” “CW or the CW”

11 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 11 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary

12 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 12 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Generic features regarding punctuation, all normalized to [0, 1]. Sentence length in words Number of “!” characters Number of “?” characters Number of quotes in sentence Number of capitalized words or words in all capitals

13 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 13 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Burfoot and Baldwin (2009) introduced notion of validity for which models absurdity via a measure close to PMI. Related to number of made-up or mismatched named entities. Works well with satire, but not here. Absurdity of product Relevancy of product How often product is reviewed

14 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 14 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Classification via feature vectors for each pattern in training set. Use Euclidean distance for each of the matching vectors that share at least one pattern.

15 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 15 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Since semi-supervised, the classification algorithm takes advantage of the definition of sarcasm. Assumes low star rating and text with positive literal meaning. Not as clear-cut with satire, options: Variation in rating for product Purchases vs Page Views of product People finding review helpful Other heuristics

16 Semi-supervised Algorithm for Sarcasm Identification (SASI) Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 16 Overview Data preprocessing Data enrichment Pattern features Punctuation features Additional features Classification Baseline options Summary Satire seems to have a distinct advantage in the data enrichment phase in comparison to sarcasm. Satire seems to have a huge disadvantage in the baseline options for classification compared to sarcasm. This is the detail that must be worked out before moving forward with implementation.

17 Future Goals Following the end of the course, I wish to implement SASI - taking the features mentioned today into account. Extend model to sarcasm in other domains. Any questions or comments? Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 17


Download ppt "Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013."

Similar presentations


Ads by Google