Presentation is loading. Please wait.

Presentation is loading. Please wait.

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Similar presentations


Presentation on theme: "MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,"— Presentation transcript:

1 MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30, 2008 Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website)

2 Naïve Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i ’s: So, classification rule for X new = is:

3 Naïve Bayes Algorithm – discrete X i Train Naïve Bayes (examples) for each * value y k estimate for each * value x ij of each attribute X i estimate Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

4 Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates (MLE’s): Number of items in set D for which Y=y k

5 Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

6 Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

7 Naïve Bayes: Subtlety #1 If unlucky, our MLE estimate for P(X i | Y) may be zero. (e.g., X 373 = Birthday_Is_January30) Why worry about just one parameter out of many? What can be done to avoid this?

8 Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates: MAP estimates (Dirichlet priors): Only difference: “imaginary” examples

9 Naïve Bayes: Subtlety #2 Often the X i are not really conditionally independent We use Naïve Bayes in many cases anyway, and it often works pretty well –often the right classification, even when not the right probability (see [Domingos&Pazzani, 1996]) What is effect on estimated P(Y|X)? –Special case: what if we add two copies: X i = X k

10 Learning to classify text documents Classify which emails are spam Classify which emails are meeting invites Classify which web pages are student home pages How shall we represent text documents for Naïve Bayes?

11

12

13 Baseline: Bag of Words Approach aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0

14

15 For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data”

16

17

18 What if we have continuous X i ? Eg., image classification: X i is i th pixel

19 What if we have continuous X i ? Eg., image classification: X i is i th pixel Gaussian Naïve Bayes (GNB): assume Sometimes assume variance is independent of Y (i.e.,  i ), or independent of X i (i.e.,  k ) or both (i.e.,  )

20 Gaussian Naïve Bayes Algorithm – continuous X i (but still discrete Y) Train Naïve Bayes (examples) for each value y k estimate* for each attribute X i estimate class conditional mean, variance Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

21 Estimating Parameters: Y discrete, X i continuous Maximum likelihood estimates: jth training example  z)=1 if z true, else 0 ith feature kth class

22 GNB Example: Classify a person’s cognitive activity, based on brain image are they reading a sentence of viewing a picture? reading the word “Hammer” or “Apartment” viewing a vertical or horizontal line? answering the question, or getting confused?

23 time Stimuli for our study: ant or 60 distinct exemplars, presented 6 times each

24 fMRI voxel means for “bottle”: means defining P(Xi | Y=“bottle) Mean fMRI activation over all stimuli: “bottle” minus mean activation: fMRI activation high below average average

25 Scaling up: 60 exemplars BODY PARTSlegarmeyefoothand FURNITURE chairtablebeddeskdresser VEHICLES carairplanetraintruckbicycle ANIMALS horsedogbearcowcat KITCHEN UTENSILSglassknifebottlecupspoon TOOLS chiselhammerscrewdriverplierssaw BUILDINGS apartmentbarnhousechurchigloo PART OF A BUILDINGwindowdoorchimneyclosetarch CLOTHING coatdressshirtskirtpants INSECTS flyantbeebutterflybeetle VEGETABLESlettucetomatocarrotcorncelery MAN MADE OBJECTSrefrigeratorkeytelephonewatchbell Categories Exemplars

26 Rank Accuracy Distinguishing among 60 words

27 Where in the brain is activity that distinguishes tools vs. buildings? Accuracy at each voxel with a radius 1 searchlight Accuracy of a radius one classifier centered at each voxel:

28 voxel clusters: searchlights Accuracies of cubical 27-voxel classifiers centered at each significant voxel [0.7-0.8]

29 What you should know: Training and using classifiers based on Bayes rule Conditional independence –What it is –Why it’s important Naïve Bayes –What it is –Why we use it so much –Training using MLE, MAP estimates –Discrete variables (Bernoulli) and continuous (Gaussian)

30 Questions: Can you use Naïve Bayes for a combination of discrete and real-valued X i ? How can we easily model just 2 of n attributes as dependent? What does the decision surface of a Naïve Bayes classifier look like?

31 What is form of decision surface for Naïve Bayes classifier?


Download ppt "MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,"

Similar presentations


Ads by Google