SI485i : NLP Set 5 Using Naïve Bayes.

Slides:

Advertisements

Similar presentations

Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.

Advertisements

Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.

PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

Maximum Entropy Model (I) LING 572 Fei Xia Week 5: 02/05-02/07/08 1.

Assuming normally distributed data! Naïve Bayes Classifier.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.

Data Mining with Naïve Bayesian Methods

Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayesian Classifiers Before getting to Naïve Bayesian Classifiers let’s first go over some basic probability theory p(C k |A) is known as a conditional.

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Text Categorization Moshe Koppel Lecture 2: Naïve Bayes Slides based on Manning, Raghavan and Schutze.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.

Semi-Supervised Learning

Crash Course on Machine Learning

Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.

Bayesian Networks. Male brain wiring Female brain wiring.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Text Classification, Active/Interactive learning.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Classification Techniques: Bayesian Classification

Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Chapter 23: Probabilistic Language Models April 13, 2004.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.

Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.

Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

CIS 335 CIS 335 Data Mining Classification Part I.

Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.

Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.

Data Science Algorithms: The Basic Methods

Lecture 15: Text Classification & Naive Bayes

Classification Techniques: Bayesian Classification

Naïve Bayes Classifier

LECTURE 23: INFORMATION THEORY REVIEW

Machine Learning in Practice Lecture 6

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

NAÏVE BAYES CLASSIFICATION

Naïve Bayes Classifier

Presentation transcript:

SI485i : NLP Set 5 Using Naïve Bayes

Given X, what is the most probable Y? Motivation We want to predict something. We have some text related to this something. something = target label Y text = text features X Given X, what is the most probable Y?

Motivation: Author Detection Alas the day! take heed of him; he stabbed me in mine own house, and that most beastly: in good faith, he cares not what mischief he does. If his weapon be out: he will foin like any devil; he will spare neither man, woman, nor child. X = { Charles Dickens, William Shakespeare, Herman Melville, Jane Austin, Homer, Leo Tolstoy } Y =

The Naïve Bayes Classifier Recall Bayes rule: Which is short for: We can re-write this as: Remaining slides adapted from Tom Mitchell.

Deriving Naïve Bayes and Idea: use the training data to directly estimate: We can use these values to estimate using Bayes rule. Recall that representing the full joint probability is not practical. and

Deriving Naïve Bayes However, if we make the assumption that the attributes are independent, estimation is easy! In other words, we assume all attributes are conditionally independent given Y. Often this assumption is violated in practice, but more on that later…

Deriving Naïve Bayes Let and label Y be discrete. Then, we can estimate and directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong same yes high rainy cold change no cool P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ?

The Naïve Bayes Classifier Now we have: To classify a new point Xnew:

The Naïve Bayes Algorithm For each value yk Estimate P(Y = yk) from the data. For each value xij of each attribute Xi Estimate P(Xi=xij | Y = yk) Classify a new point via: In practice, the independence assumption doesn’t often hold true, but Naïve Bayes performs very well despite it.

An alternate view of NB as LMs Y1 = dickens Y2 = twain P(Y1) * P(X | Y1) P(Y2) * P(X | Y2) P(X | Y1) = PY1(X) P(X | Y2) = PY2(X) Bigrams: PY1(X) = 𝑃 𝑌1 ( 𝑥 𝑖 | 𝑥 𝑖 −1) PY2(X) = 𝑃 𝑌2 ( 𝑥 𝑖 | 𝑥 𝑖 −1)

Naïve Bayes Applications Text classification Which e-mails are spam? Which e-mails are meeting notices? Which author wrote a document? Which webpages are about current events? Which blog contains angry writing? What sentence in a document talks about company X? etc.

Text and Features What is Xi? Could be unigrams, hopefully bigrams too. It can be anything that is computed from the text X. Yes, I really mean anything. Creativity and intuition into language is where the real gains come from in NLP. Non n-gram examples: X10 = “the number of sentences that begin with conjunctions” X356 = “existence of a semi-colon in the paragraph”

Features In machine learning, “features” are the attributes to which you assign weights (probabilities in Naïve Bayes) that help in the final classification. Up until now, your features have been n-grams. You now want to consider other types of features. You count features just like n-grams. How many did you see? X = set of features P(Y|X) = probability of a Y given a set of features