Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ling 570 Day 9: Text Classification and Sentiment Analysis 1.

Similar presentations


Presentation on theme: "Ling 570 Day 9: Text Classification and Sentiment Analysis 1."— Presentation transcript:

1 Ling 570 Day 9: Text Classification and Sentiment Analysis 1

2 Outline  Questions on HW #3  Discussion of Project #1  Text Classification  Sentiment Analysis 2

3 Project #1 3

4 Your goal: political text analysis  Take a document, predict whether it is more Republican or Democratic  We have harvested blog posts from:  The Democratic National Committee  The Republican National Committee  Fox News  The Huffington Post 4

5 First task  Can you reconstruct the party affiliation of a given document?  We will gather some novel posts, held out from your training data  You predict the political part of each of these posts to the best of your ability 5

6 Second task  Is the media biased? Is a particular news source biased?  Using the classifier that you’ve learned, see whether documents from a particular news source seem to be left- or right-leaning.  What features are most indicative of the party of a given document?  Do you think your classifier is effective in detecting media bias? Why or why not? 6

7 Text Classification 7

8 Text classification  Also known as “text categorization”  Often an instance of supervised learning  Start with a large body of pre-classified data  Try to map new documents into one of these classes 8

9 train classes – often hierarchical test Text classification 9 linguistics phonology “acoustics” “IPA” … morphology “morpheme” “template” … … brewing varieties “IPA” “hefeweizen”.. … “We transcribed the samples of this unusual language in IPA…”

10 Classification methods  Manual  Yahoo, back in the day, had a manually curated hierarchy of useful web content  Can be very accurate, consistent…  …but it’s very expensive  Need to move to automatic methods 10

11 Text categorization 11

12 Machine learning: Supervised classification 12

13 Bayesian methods  Learning based on probability theory  Bayes theorem plays a big role  Build a generative model that approximates how data is produced  Prior probability of each class  Model gives a posterior probability of output given inputs  Naïve Bayes:  Bag of features (generally words)  Assumes each feature is independent 13

14 Bag of words representation 14 According to a study published in the October issue of Current Biology entitled 'Spontaneous human speech mimicry by a cetacean,' whales can talk. Not to burst your bubble ring or anything, but now that we've suckered you in, let's clarify what we mean by 'talk.' A beluga whale named 'NOC' (he was named for an incredibly annoying sort of Canadian gnat), that lived at the National Marine Mammal Foundation (NMMF) in San Diego up until his death five years ago, had been heard making some weird kinds of vocalizations. At first, nobody was sure that it was him: divers hearing what sounded like 'two people were conversing in the distance just out of range for our understanding.' But then one day, a diver in NOC's tank left the water after clearly hearing someone tell him to get out. It wasn't someone, though: it was some whale, and that some whale was NOC.

15 Bag of words representation 15

16 Bayes’ Rule for text classification 16

17 Bayes’ Rule for text classification 17

18 Bayes’ Rule for text classification 18

19 Bayes’ Rule for text classification 19

20 Back to text classification 20

21 Back to text classification 21

22 Back to text classification 22

23 Back to text classification 23

24 The “Naïve” part of Naïve Bayes 24

25 Return of smoothing… 25

26 Return of smoothing… 26

27 Return of smoothing… 27

28 Return of smoothing… 28

29 Return of smoothing… 29

30 Exercise documentlabel TRAINApple poised to unveil iPad MiniTECH Apple product leaksTECH Researchers test apple, cherry treesSCIENCE TESTDangerous apple, cherry pesticides? 30

31 Benchmark dataset #1: 20 newsgroups  18,000 documents from 20 distinct newsgroups  A now mostly unused technology for sharing textual information, with hierarchical topical groups 31 comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.med sci.space misc.forsaletalk.politics.misc talk.politics.guns talk.politics.Mideast talk.religion.misc alt.atheism soc.religion.christian

32 Results: 32

33 Evaluation methods  “macro”-averaging:  Compute Precision and Recall for each category  Take average of per-category precision and recall values 33 gold categorytotals newssportsartsscience predicted categorynews1570123 sports6170023 arts00404 science10078 totals222448

34 Evaluation methods  There is also “macro”-averaging:  Compute Precision and Recall for each category  Take average of per-category precision and recall values 34 gold categorytotals newssportsartsscience predicted categorynews1570123 sports6170023 arts00404 science10078 totals222448

35 35 gold categoryprec newssportsartsscience predicted categorynews157010.65 sports617000.74 arts00401.00 science10070.88 recall0.680.711.000.88

36 Evaluation methods  What is the analogue of precision and recall for multiclass classification?  We can still compute precision and recall as usual for each category  Then add up these numbers to compute precision and recall  This is called “micro-averaging”, and focuses on document level accuracy 36 Gold standard all other categories Classifier output all other categories

37 37 gold categoryprec newssportsartsscience0.82 predicted categorynews157010.65 sports617000.74 arts00401.00 science10070.88 recall0.820.680.711.000.88

38 38 newsGold standard newsother Classifier output news158 other7 sportsGold standard sportsother Classifier output sports176 other7 scienceGold standard sciother Classifier output sci71 other1 artsGold standard artsother Classifier output arts40 other0 gold categorytotals newssportsartsscience predicted categorynews1570123 sports6170023 arts00404 science10078 totals222448

39 39 newsGold standard newsother Classifier output news158 other7 sportsGold standard sportsother Classifier output sports176 other7 scienceGold standard sciother Classifier output sci71 other1 artsGold standard artsother Classifier output arts40 other0 totalGold standard correctotherrecall Classifier output correct43150.74 other15 prec0.74

40 Feature selection 40

41 Sentiment Analysis 41

42 Sentiment Analysis  Consider movie reviews:  Given a review from a site like Rotten Tomatoes, try to detect if the reviewers liked it  Some observations:  Humans can quickly and easily identify sentiment  Easier that performing topic classification, often  Suspicion: Certain words may be indicative of sentiment 42

43 Simple Experiment [Pang, Lee, Vaithyanathan, EMNLP 2002]  Ask two grad students to come up with a list of words changed with sentiment  Create a very simple, deterministic classifier based on this:  Count number of positive and negative hits  Break ties to increase accuracy 43

44 Simple Experiment [Pang, Lee, Vaithyanathan, EMNLP 2002]  Ask two grad students to come up with a list of words changed with sentiment  Create a very simple, deterministic classifier based on this:  Count number of positive and negative hits  Break ties to increase accuracy  Compare to automatically extracted lists 44

45 Toward more solid machine learning  Prior decision rule was very heuristic  Just count the number of charged words  Ties are a significant issue  What happens when we shift to something more complex? 45

46 Toward more solid machine learning  Prior decision rule was very heuristic  Just count the number of charged words  Ties are a significant issue  What happens when we shift to something more complex?  Naïve Bayes  Maximum Entropy (aka logistic regression, aka log- linear models)  Support Vector Machines 46

47 Experimental results Baseline was 69% accuracy. Here we get just under 79% with all words, just using frequency. What happens when we use binary features instead? 47

48 Experimental results Unigrams are pretty good – what happens when we add bigrams? 48

49 Experimental results Why are just bigrams worse than unigrams and bigrams together? 49

50 Experimental results 50

51 Experimental results 51

52 Domain Adaptation 52

53 What are we learning?  Primary features are unigrams.  For a movie, “unpredictable” is a good thing – likely to be an interesting thriller. 53

54 What are we learning?  Primary features are unigrams.  For a movie, “unpredictable” is a good thing – likely to be an interesting thriller.  For a dishwasher, “unpredictable” is not so great. 54

55 Domain shift [Blitzer, Dredze, Pereira, 1997]  What happens when we move to another domain?  Gather Amazon reviews from four domains:  Books, DVDs, Electronics, Kitchen appliances  Each review has  Rating (0-5 stars)  Reviewer name and location  Product name  Review (title, date, and body)  Ratings 3 become positive; remainder considered ambiguous and discarded  1000 positive and 1000 negative in each domain 55

56 Domain adaptation effects 56

57 Domain adaptation effects 57

58 Domain adaptation effects 58

59 Lessons learned  Be careful with your classifier:  Just because you get high accuracy on one test set doesn’t guaranteed high accuracy on another test set  Domain adaptation can be a major hit  What can we do about this? 59

60 Lessons learned  Be careful with your classifier:  Just because you get high accuracy on one test set doesn’t guaranteed high accuracy on another test set  Domain adaptation can be a major hit  What can we do about this?  Supervised approaches – say we have a little bit of training in the NEW domain, and a lot in the OLD domain, learn features from both (“Frustratingly Easy”, Daume 2007)  Unsupervised approaches (Structural Correspondence Learning) 60


Download ppt "Ling 570 Day 9: Text Classification and Sentiment Analysis 1."

Similar presentations


Ads by Google