MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Slides:

Advertisements

Similar presentations

Basics of Statistical Estimation

Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Naïve Bayes Classifier

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

What is Statistical Modeling

Assuming normally distributed data! Naïve Bayes Classifier.

Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

1 Classifying Instantaneous Cognitive States from fMRI Data Tom Mitchell, Rebecca Hutchinson, Marcel Just, Stefan Niculescu, Francisco Pereira, Xuerui.

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Crash Course on Machine Learning

1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.

Exercise Session 10 – Image Categorization

Principles of Pattern Recognition

Bayesian Networks. Male brain wiring Female brain wiring.

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

SI485i : NLP Set 5 Using Naïve Bayes.

1 Preliminary Experiments: Learning Virtual Sensors Machine learning approach: train classifiers –fMRI(t, t+  )  CognitiveState Fixed set of possible.

1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.

Classification Techniques: Bayesian Classification

CSE 446: Naïve Bayes Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & Dan Klein.

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Slides for “Data Mining” by I. H. Witten and E. Frank.

Jakob Verbeek December 11, 2009

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.

CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.

Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.

Learning to Decode Cognitive States from Brain Images Tom Mitchell et al. Carnegie Mellon University Presented by Bob Stark.

Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.

CS479/679 Pattern Recognition Dr. George Bebis

Perceptrons Lirong Xia.

Lecture 15: Text Classification & Naive Bayes

ECE 5424: Introduction to Machine Learning

Special Topics In Scientific Computing

Classification Techniques: Bayesian Classification

Pattern Recognition and Machine Learning

Parametric Methods Berlin Chen, 2005 References:

Naïve Bayes By professor Dr. Scott.

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

NAÏVE BAYES CLASSIFICATION

Perceptrons Lirong Xia.

Presentation transcript:

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30, 2008 Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website)

Naïve Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i ’s: So, classification rule for X new = is:

Naïve Bayes Algorithm – discrete X i Train Naïve Bayes (examples) for each * value y k estimate for each * value x ij of each attribute X i estimate Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates (MLE’s): Number of items in set D for which Y=y k

Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

Naïve Bayes: Subtlety #1 If unlucky, our MLE estimate for P(X i | Y) may be zero. (e.g., X 373 = Birthday_Is_January30) Why worry about just one parameter out of many? What can be done to avoid this?

Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates: MAP estimates (Dirichlet priors): Only difference: “imaginary” examples

Naïve Bayes: Subtlety #2 Often the X i are not really conditionally independent We use Naïve Bayes in many cases anyway, and it often works pretty well –often the right classification, even when not the right probability (see [Domingos&Pazzani, 1996]) What is effect on estimated P(Y|X)? –Special case: what if we add two copies: X i = X k

Learning to classify text documents Classify which s are spam Classify which s are meeting invites Classify which web pages are student home pages How shall we represent text documents for Naïve Bayes?

Baseline: Bag of Words Approach aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0

For code and data, see click on “Software and Data”

What if we have continuous X i ? Eg., image classification: X i is i th pixel

What if we have continuous X i ? Eg., image classification: X i is i th pixel Gaussian Naïve Bayes (GNB): assume Sometimes assume variance is independent of Y (i.e.,  i ), or independent of X i (i.e.,  k ) or both (i.e.,  )

Gaussian Naïve Bayes Algorithm – continuous X i (but still discrete Y) Train Naïve Bayes (examples) for each value y k estimate* for each attribute X i estimate class conditional mean, variance Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y discrete, X i continuous Maximum likelihood estimates: jth training example  z)=1 if z true, else 0 ith feature kth class

GNB Example: Classify a person’s cognitive activity, based on brain image are they reading a sentence of viewing a picture? reading the word “Hammer” or “Apartment” viewing a vertical or horizontal line? answering the question, or getting confused?

time Stimuli for our study: ant or 60 distinct exemplars, presented 6 times each

fMRI voxel means for “bottle”: means defining P(Xi | Y=“bottle) Mean fMRI activation over all stimuli: “bottle” minus mean activation: fMRI activation high below average average

Scaling up: 60 exemplars BODY PARTSlegarmeyefoothand FURNITURE chairtablebeddeskdresser VEHICLES carairplanetraintruckbicycle ANIMALS horsedogbearcowcat KITCHEN UTENSILSglassknifebottlecupspoon TOOLS chiselhammerscrewdriverplierssaw BUILDINGS apartmentbarnhousechurchigloo PART OF A BUILDINGwindowdoorchimneyclosetarch CLOTHING coatdressshirtskirtpants INSECTS flyantbeebutterflybeetle VEGETABLESlettucetomatocarrotcorncelery MAN MADE OBJECTSrefrigeratorkeytelephonewatchbell Categories Exemplars

Rank Accuracy Distinguishing among 60 words

Where in the brain is activity that distinguishes tools vs. buildings? Accuracy at each voxel with a radius 1 searchlight Accuracy of a radius one classifier centered at each voxel:

voxel clusters: searchlights Accuracies of cubical 27-voxel classifiers centered at each significant voxel [ ]

What you should know: Training and using classifiers based on Bayes rule Conditional independence –What it is –Why it’s important Naïve Bayes –What it is –Why we use it so much –Training using MLE, MAP estimates –Discrete variables (Bernoulli) and continuous (Gaussian)

Questions: Can you use Naïve Bayes for a combination of discrete and real-valued X i ? How can we easily model just 2 of n attributes as dependent? What does the decision surface of a Naïve Bayes classifier look like?

What is form of decision surface for Naïve Bayes classifier?