Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Similar presentations


Presentation on theme: "Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie."— Presentation transcript:

1 Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm awm@cs.cmu.edu 412-268-7599 Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. Comments and corrections gratefully received.http://www.cs.cmu.edu/~awm/tutorials

2 Gaussian Bayes Classifiers: Slide 2 Copyright © 2001, Andrew W. Moore Maximum Likelihood learning of Gaussians for Classification Why we should care 3 seconds to teach you a new learning algorithm What if there are 10,000 dimensions? What if there are categorical inputs? Examples “out the wazoo”

3 Gaussian Bayes Classifiers: Slide 3 Copyright © 2001, Andrew W. Moore Why we should care One of the original “Data Mining” algorithms Very simple and effective Demonstrates the usefulness of our earlier groundwork

4 Gaussian Bayes Classifiers: Slide 4 Copyright © 2001, Andrew W. Moore Where we were at the end of the MLE lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE

5 Gaussian Bayes Classifiers: Slide 5 Copyright © 2001, Andrew W. Moore This lecture… Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC

6 Gaussian Bayes Classifiers: Slide 6 Copyright © 2001, Andrew W. Moore Road Map Probability PDFs Gaussians MLE of Gaussians MLE Density Estimation Bayes Classifiers Decision Trees

7 Gaussian Bayes Classifiers: Slide 7 Copyright © 2001, Andrew W. Moore Road Map Probability PDFs Gaussians MLE of Gaussians MLE Density Estimation Bayes Classifiers Gaussian Bayes Classifiers Decision Trees

8 Gaussian Bayes Classifiers: Slide 8 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classifier Assumption The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated?

9 Gaussian Bayes Classifiers: Slide 9 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? |DB i | p i mle = ------ |DB| Let DB i = Subset of database DB in which the output class is y = i

10 Gaussian Bayes Classifiers: Slide 10 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? (  i mle,  i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i

11 Gaussian Bayes Classifiers: Slide 11 Copyright © 2001, Andrew W. Moore MLE Gaussian Bayes Classifier The i’th record in the database is created using the following algorithm 1.Generate the output (the “class”) by drawing y i ~Multinomial(p 1,p 2,…p Ny ) 2.Generate the inputs from a Gaussian PDF that depends on the value of y i : x i ~ N(  i,  i ). Test your understanding. Given N y classes and m input attributes, how many distinct scalar parameters need to be estimated? (  i mle,  i mle )= MLE Gaussian for DB i Let DB i = Subset of database DB in which the output class is y = i

12 Gaussian Bayes Classifiers: Slide 12 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classification

13 Gaussian Bayes Classifiers: Slide 13 Copyright © 2001, Andrew W. Moore Gaussian Bayes Classification How do we deal with that?

14 Gaussian Bayes Classifiers: Slide 14 Copyright © 2001, Andrew W. Moore Here is a dataset 48,000 records, 16 attributes [Kohavi 1995]

15 Gaussian Bayes Classifiers: Slide 15 Copyright © 2001, Andrew W. Moore Predicting wealth from age

16 Gaussian Bayes Classifiers: Slide 16 Copyright © 2001, Andrew W. Moore Predicting wealth from age

17 Gaussian Bayes Classifiers: Slide 17 Copyright © 2001, Andrew W. Moore Wealth from hours worked

18 Gaussian Bayes Classifiers: Slide 18 Copyright © 2001, Andrew W. Moore Wealth from years of education

19 Gaussian Bayes Classifiers: Slide 19 Copyright © 2001, Andrew W. Moore age, hours  wealth

20 Gaussian Bayes Classifiers: Slide 20 Copyright © 2001, Andrew W. Moore age, hours  wealth

21 Gaussian Bayes Classifiers: Slide 21 Copyright © 2001, Andrew W. Moore age, hours  wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”

22 Gaussian Bayes Classifiers: Slide 22 Copyright © 2001, Andrew W. Moore age, hours  wealth Having 2 inputs instead of one helps in two ways: 1. Combining evidence from two 1d Gaussians 2. Off-diagonal covariance distinguishes class “shape”

23 Gaussian Bayes Classifiers: Slide 23 Copyright © 2001, Andrew W. Moore age, edunum  wealth

24 Gaussian Bayes Classifiers: Slide 24 Copyright © 2001, Andrew W. Moore age, edunum  wealth

25 Gaussian Bayes Classifiers: Slide 25 Copyright © 2001, Andrew W. Moore hours, edunum  wealth

26 Gaussian Bayes Classifiers: Slide 26 Copyright © 2001, Andrew W. Moore hours, edunum  wealth

27 Gaussian Bayes Classifiers: Slide 27 Copyright © 2001, Andrew W. Moore Accuracy

28 Gaussian Bayes Classifiers: Slide 28 Copyright © 2001, Andrew W. Moore An “MPG” example

29 Gaussian Bayes Classifiers: Slide 29 Copyright © 2001, Andrew W. Moore An “MPG” example

30 Gaussian Bayes Classifiers: Slide 30 Copyright © 2001, Andrew W. Moore An “MPG” example Things to note: Class Boundaries can be weird shapes (hyperconic sections) Class regions can be non-simply- connected But it’s impossible to model arbitrarily weirdly shaped regions Test your understanding: With one input, must classes be simply connected?

31 Gaussian Bayes Classifiers: Slide 31 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit.

32 Gaussian Bayes Classifiers: Slide 32 Copyright © 2001, Andrew W. Moore Overfitting dangers Problem with “Joint” Bayes classifier: #parameters exponential with #dimensions. This means we just memorize the training data, and can overfit. Problemette with Gaussian Bayes classifier: #parameters quadratic with #dimensions. With 10,000 dimensions and only 1,000 datapoints we could overfit. Question: Any suggested solutions?

33 Gaussian Bayes Classifiers: Slide 33 Copyright © 2001, Andrew W. Moore General: O(m 2 ) parameters

34 Gaussian Bayes Classifiers: Slide 34 Copyright © 2001, Andrew W. Moore General: O(m 2 ) parameters

35 Gaussian Bayes Classifiers: Slide 35 Copyright © 2001, Andrew W. Moore Aligned: O(m) parameters

36 Gaussian Bayes Classifiers: Slide 36 Copyright © 2001, Andrew W. Moore Aligned: O(m) parameters

37 Gaussian Bayes Classifiers: Slide 37 Copyright © 2001, Andrew W. Moore Spherical: O(1) cov parameters

38 Gaussian Bayes Classifiers: Slide 38 Copyright © 2001, Andrew W. Moore Spherical: O(1) cov parameters

39 Gaussian Bayes Classifiers: Slide 39 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC

40 Gaussian Bayes Classifiers: Slide 40 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree BC Here??? Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Easy! Guess how?

41 Gaussian Bayes Classifiers: Slide 41 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE

42 Gaussian Bayes Classifiers: Slide 42 Copyright © 2001, Andrew W. Moore BCs that have both real and categorical inputs? Inputs Classifier Predict category Inputs Density Estimator Prob- ability Inputs Regressor Predict real no. Categorical inputs only Mixed Real / Cat okay Real-valued inputs only Dec Tree Gauss/Joint BC Gauss Naïve BC Joint BC Naïve BC Joint DE Naïve DE Gauss DE Gauss BC Gauss/Joint DE Gauss Naïve DE

43 Gaussian Bayes Classifiers: Slide 43 Copyright © 2001, Andrew W. Moore Mixed Categorical / Real Density Estimation Write x = (u,v) = (u 1,u 2,… u q,v 1,v 2 … v m-q ) Real valuedCategorical valued P(x |M)= P(u,v |M) (where M is any Density Estimation Model)

44 Gaussian Bayes Classifiers: Slide 44 Copyright © 2001, Andrew W. Moore Not sure which tasty DE to enjoy? Try our… Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) Gaussian with parameters depending on v Big “m-q”-dimensional lookup table

45 Gaussian Bayes Classifiers: Slide 45 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v vv = Cov. of u among records matching v qvqv = Fraction of records that match v u |v,M ~ N(  v,  v ), P(v |M) = q v

46 Gaussian Bayes Classifiers: Slide 46 Copyright © 2001, Andrew W. Moore MLE learning of the Joint / Gauss DE Combo P(u,v |M) = P(u |v,M) P(v |M) vv = Mean of u among records matching v = vv = Cov. of u among records matching v = qvqv = Fraction of records that match v = u |v,M ~ N(  v,  v ), P(v |M) = q v R v = # records that match v

47 Gaussian Bayes Classifiers: Slide 47 Copyright © 2001, Andrew W. Moore Gender and Hours Worked* *As with all the results from the UCI “adult census” dataset, we can’t draw any real-world conclusions since it’s such a non-real-world sample

48 Gaussian Bayes Classifiers: Slide 48 Copyright © 2001, Andrew W. Moore Joint / Gauss DE Combo What we just did

49 Gaussian Bayes Classifiers: Slide 49 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo What we do next

50 Gaussian Bayes Classifiers: Slide 50 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo

51 Gaussian Bayes Classifiers: Slide 51 Copyright © 2001, Andrew W. Moore Joint / Gauss BC Combo Rather so-so-notation for “Gaussian with mean  i,v and covariance  i,v evaluated at u”  i,v = Mean of u among records matching v and in which y=i  i,v = Cov. of u among records matching v and in which y=i q i,v = Fraction of “y=i” records that match v pipi = Fraction of records that match “y=i”

52 Gaussian Bayes Classifiers: Slide 52 Copyright © 2001, Andrew W. Moore Gender, Hours  Wealth

53 Gaussian Bayes Classifiers: Slide 53 Copyright © 2001, Andrew W. Moore Gender, Hours  Wealth

54 Gaussian Bayes Classifiers: Slide 54 Copyright © 2001, Andrew W. Moore Joint / Gauss DE Combo and Joint / Gauss BC Combo: The downside (Yawn…we’ve done this before…) More than a few categorical attributes blah blah blah massive table blah blah lots of parameters blah blah just memorize training data blah blah blah do worse on future data blah blah need to be more conservative blah

55 Gaussian Bayes Classifiers: Slide 55 Copyright © 2001, Andrew W. Moore Naïve/Gauss combo for Density Estimation How many parameters? Real Categorical

56 Gaussian Bayes Classifiers: Slide 56 Copyright © 2001, Andrew W. Moore Naïve/Gauss combo for Density Estimation Real Categorical

57 Gaussian Bayes Classifiers: Slide 57 Copyright © 2001, Andrew W. Moore Naïve/Gauss DE Example

58 Gaussian Bayes Classifiers: Slide 58 Copyright © 2001, Andrew W. Moore Naïve/Gauss DE Example

59 Gaussian Bayes Classifiers: Slide 59 Copyright © 2001, Andrew W. Moore Naïve / Gauss BC  ij = Mean of u j among records in which y=i  2 ij = Var. of u j among records in which y=i q ij [h]= Fraction of “y=i” records in which v j = h pipi = Fraction of records that match “y=i”

60 Gaussian Bayes Classifiers: Slide 60 Copyright © 2001, Andrew W. Moore Gauss / Naïve BC Example

61 Gaussian Bayes Classifiers: Slide 61 Copyright © 2001, Andrew W. Moore Gauss / Naïve BC Example

62 Gaussian Bayes Classifiers: Slide 62 Copyright © 2001, Andrew W. Moore Learn Wealth from 15 attributes

63 Gaussian Bayes Classifiers: Slide 63 Copyright © 2001, Andrew W. Moore Learn Wealth from 15 attributes Same data, except all real values discretized to 3 levels

64 Gaussian Bayes Classifiers: Slide 64 Copyright © 2001, Andrew W. Moore Learn Race from 15 attributes

65 Gaussian Bayes Classifiers: Slide 65 Copyright © 2001, Andrew W. Moore What you should know A lot of this should have just been a corollary of what you already knew Turning Gaussian DEs into Gaussian BCs Mixing Categorical and Real-Valued

66 Gaussian Bayes Classifiers: Slide 66 Copyright © 2001, Andrew W. Moore Questions to Ponder Suppose you wanted to create an example dataset where a BC involving Gaussians crushed decision trees like a bug. What would you do? Could you combine Decision Trees and Bayes Classifiers? How? (maybe there is more than one possible way)


Download ppt "Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie."

Similar presentations


Ads by Google