Ling 570 Day 9: Text Classification and Sentiment Analysis 1.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Classification Classification Examples
Naïve-Bayes Classifiers Business Intelligence for Managers.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Farag Saad i-KNOW 2014 Graz- Austria,
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
What is Statistical Modeling
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Presented by Zeehasham Rasheed
Automatic Sentiment Analysis in On-line Text Erik Boiy Pieter Hens Koen Deschacht Marie-Francine Moens CS & ICRI Katholieke Universiteit Leuven.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
WordSieve: Learning Task Differentiating Keywords Automatically Travis Bauer Sandia National Laboratories (Research discussed today was done at Indiana.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Crash Course on Machine Learning
Exercise Session 10 – Image Categorization
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Naive Bayes Classifier
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document-Based Techniques Dr. Paula Matuszek
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Classification Techniques: Bayesian Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab Christina Wallin, Period 3 Computer Systems Research Lab
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Naïve Bayes Classification Christina Wallin Computer Systems Research Lab
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Sentiment analysis algorithms and applications: A survey
Data Mining Lecture 11.
Instructor : Saeed Shiry & Mitchell Ch. 6
Speech recognition, machine learning
Information Retrieval
Introduction to Sentiment Analysis
Speech recognition, machine learning
Presentation transcript:

Ling 570 Day 9: Text Classification and Sentiment Analysis 1

Outline  Questions on HW #3  Discussion of Project #1  Text Classification  Sentiment Analysis 2

Project #1 3

Your goal: political text analysis  Take a document, predict whether it is more Republican or Democratic  We have harvested blog posts from:  The Democratic National Committee  The Republican National Committee  Fox News  The Huffington Post 4

First task  Can you reconstruct the party affiliation of a given document?  We will gather some novel posts, held out from your training data  You predict the political part of each of these posts to the best of your ability 5

Second task  Is the media biased? Is a particular news source biased?  Using the classifier that you’ve learned, see whether documents from a particular news source seem to be left- or right-leaning.  What features are most indicative of the party of a given document?  Do you think your classifier is effective in detecting media bias? Why or why not? 6

Text Classification 7

Text classification  Also known as “text categorization”  Often an instance of supervised learning  Start with a large body of pre-classified data  Try to map new documents into one of these classes 8

train classes – often hierarchical test Text classification 9 linguistics phonology “acoustics” “IPA” … morphology “morpheme” “template” … … brewing varieties “IPA” “hefeweizen”.. … “We transcribed the samples of this unusual language in IPA…”

Classification methods  Manual  Yahoo, back in the day, had a manually curated hierarchy of useful web content  Can be very accurate, consistent…  …but it’s very expensive  Need to move to automatic methods 10

Text categorization 11

Machine learning: Supervised classification 12

Bayesian methods  Learning based on probability theory  Bayes theorem plays a big role  Build a generative model that approximates how data is produced  Prior probability of each class  Model gives a posterior probability of output given inputs  Naïve Bayes:  Bag of features (generally words)  Assumes each feature is independent 13

Bag of words representation 14 According to a study published in the October issue of Current Biology entitled 'Spontaneous human speech mimicry by a cetacean,' whales can talk. Not to burst your bubble ring or anything, but now that we've suckered you in, let's clarify what we mean by 'talk.' A beluga whale named 'NOC' (he was named for an incredibly annoying sort of Canadian gnat), that lived at the National Marine Mammal Foundation (NMMF) in San Diego up until his death five years ago, had been heard making some weird kinds of vocalizations. At first, nobody was sure that it was him: divers hearing what sounded like 'two people were conversing in the distance just out of range for our understanding.' But then one day, a diver in NOC's tank left the water after clearly hearing someone tell him to get out. It wasn't someone, though: it was some whale, and that some whale was NOC.

Bag of words representation 15

Bayes’ Rule for text classification 16

Bayes’ Rule for text classification 17

Bayes’ Rule for text classification 18

Bayes’ Rule for text classification 19

Back to text classification 20

Back to text classification 21

Back to text classification 22

Back to text classification 23

The “Naïve” part of Naïve Bayes 24

Return of smoothing… 25

Return of smoothing… 26

Return of smoothing… 27

Return of smoothing… 28

Return of smoothing… 29

Exercise documentlabel TRAINApple poised to unveil iPad MiniTECH Apple product leaksTECH Researchers test apple, cherry treesSCIENCE TESTDangerous apple, cherry pesticides? 30

Benchmark dataset #1: 20 newsgroups  18,000 documents from 20 distinct newsgroups  A now mostly unused technology for sharing textual information, with hierarchical topical groups 31 comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.med sci.space misc.forsaletalk.politics.misc talk.politics.guns talk.politics.Mideast talk.religion.misc alt.atheism soc.religion.christian

Results: 32

Evaluation methods  “macro”-averaging:  Compute Precision and Recall for each category  Take average of per-category precision and recall values 33 gold categorytotals newssportsartsscience predicted categorynews sports arts00404 science10078 totals222448

Evaluation methods  There is also “macro”-averaging:  Compute Precision and Recall for each category  Take average of per-category precision and recall values 34 gold categorytotals newssportsartsscience predicted categorynews sports arts00404 science10078 totals222448

35 gold categoryprec newssportsartsscience predicted categorynews sports arts science recall

Evaluation methods  What is the analogue of precision and recall for multiclass classification?  We can still compute precision and recall as usual for each category  Then add up these numbers to compute precision and recall  This is called “micro-averaging”, and focuses on document level accuracy 36 Gold standard all other categories Classifier output all other categories

37 gold categoryprec newssportsartsscience0.82 predicted categorynews sports arts science recall

38 newsGold standard newsother Classifier output news158 other7 sportsGold standard sportsother Classifier output sports176 other7 scienceGold standard sciother Classifier output sci71 other1 artsGold standard artsother Classifier output arts40 other0 gold categorytotals newssportsartsscience predicted categorynews sports arts00404 science10078 totals222448

39 newsGold standard newsother Classifier output news158 other7 sportsGold standard sportsother Classifier output sports176 other7 scienceGold standard sciother Classifier output sci71 other1 artsGold standard artsother Classifier output arts40 other0 totalGold standard correctotherrecall Classifier output correct other15 prec0.74

Feature selection 40

Sentiment Analysis 41

Sentiment Analysis  Consider movie reviews:  Given a review from a site like Rotten Tomatoes, try to detect if the reviewers liked it  Some observations:  Humans can quickly and easily identify sentiment  Easier that performing topic classification, often  Suspicion: Certain words may be indicative of sentiment 42

Simple Experiment [Pang, Lee, Vaithyanathan, EMNLP 2002]  Ask two grad students to come up with a list of words changed with sentiment  Create a very simple, deterministic classifier based on this:  Count number of positive and negative hits  Break ties to increase accuracy 43

Simple Experiment [Pang, Lee, Vaithyanathan, EMNLP 2002]  Ask two grad students to come up with a list of words changed with sentiment  Create a very simple, deterministic classifier based on this:  Count number of positive and negative hits  Break ties to increase accuracy  Compare to automatically extracted lists 44

Toward more solid machine learning  Prior decision rule was very heuristic  Just count the number of charged words  Ties are a significant issue  What happens when we shift to something more complex? 45

Toward more solid machine learning  Prior decision rule was very heuristic  Just count the number of charged words  Ties are a significant issue  What happens when we shift to something more complex?  Naïve Bayes  Maximum Entropy (aka logistic regression, aka log- linear models)  Support Vector Machines 46

Experimental results Baseline was 69% accuracy. Here we get just under 79% with all words, just using frequency. What happens when we use binary features instead? 47

Experimental results Unigrams are pretty good – what happens when we add bigrams? 48

Experimental results Why are just bigrams worse than unigrams and bigrams together? 49

Experimental results 50

Experimental results 51

Domain Adaptation 52

What are we learning?  Primary features are unigrams.  For a movie, “unpredictable” is a good thing – likely to be an interesting thriller. 53

What are we learning?  Primary features are unigrams.  For a movie, “unpredictable” is a good thing – likely to be an interesting thriller.  For a dishwasher, “unpredictable” is not so great. 54

Domain shift [Blitzer, Dredze, Pereira, 1997]  What happens when we move to another domain?  Gather Amazon reviews from four domains:  Books, DVDs, Electronics, Kitchen appliances  Each review has  Rating (0-5 stars)  Reviewer name and location  Product name  Review (title, date, and body)  Ratings 3 become positive; remainder considered ambiguous and discarded  1000 positive and 1000 negative in each domain 55

Domain adaptation effects 56

Domain adaptation effects 57

Domain adaptation effects 58

Lessons learned  Be careful with your classifier:  Just because you get high accuracy on one test set doesn’t guaranteed high accuracy on another test set  Domain adaptation can be a major hit  What can we do about this? 59

Lessons learned  Be careful with your classifier:  Just because you get high accuracy on one test set doesn’t guaranteed high accuracy on another test set  Domain adaptation can be a major hit  What can we do about this?  Supervised approaches – say we have a little bit of training in the NEW domain, and a lot in the OLD domain, learn features from both (“Frustratingly Easy”, Daume 2007)  Unsupervised approaches (Structural Correspondence Learning) 60