Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Similar presentations


Presentation on theme: "Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,"— Presentation transcript:

1 Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

2 Classification and Regression  Classification  Goal: Learn the underlying function f: X (features)  Y (class, or category) e.g. words  “spam”, or “not spam”  Regression f: X (features)  Y (continuous values) e.g. GPA  salary

3 Supervised Classification  How to find an unknown function f: X  Y (features  class) or equivalently P(Y|X)  Classifier: 1. Find P(X|Y), P(Y), and use Bayes rule - generative 2. Find P(Y|X) directly - discriminative

4 Classification Learn P(Y|X) 1. Bayes rule: P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)  Learn P(X|Y), P(Y)  “Generative” classifier 2. Learn P(Y|X) directly  “ Discriminative ” (to be covered later in class)  e.g. logistic regression

5 Generative Classifier: Bayes Classifier Learn P(X|Y), P(Y)  e.g. email classification problem  3 classes for Y = { spam, not spam, maybe }  10,000 binary features for X = {“Cash”, “Rolex”,…}  How many parameters do we have?  P(Y) :  P(X|Y) :

6 Generative learning: Naïve Bayes  Introduce conditional independence P(X 1,X 2 |Y) = P(X 1 |Y) P(X 2 |Y) P(Y|X) = P(X|Y) P(Y) / P(X) for X=(X i,…,X n ) = P(X 1 |Y)…P(X n |Y) P(Y) / P(X) = prod i P(X i |Y) P(Y) / P(X)  Learn P(X 1 |Y), … P(X n |Y), P(Y) instead of learning P(X 1,…, X n |Y) directly

7 Naïve Bayes  3 classes for Y = {spam, not spam, maybe}  10,000 binary features for X = {“Cash”,”Rolex”,…}  Now, how many parameters?  P(Y)  P(X|Y) fewer parameters “simpler” – less likely to overfit

8 Full Bayes vs. Naïve Bayes  XOR X1X2Y 101 011 110 000 P(Y=1|(X1,X2)=(0,1))=?  Full Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=?  Naïve Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=?

9 Regression  Prediction of continuous variables  e.g. I want to predict salaries from GPA.   I can regress that …  Learn the mapping f: X  Y  Model is linear in the parameters (+ some noise)  linear regression  Assume Gaussian noise  Learn MLE Θ

10 1-parameter linear regression  Normal linear regression or equivalently,  MLE Θ ?  MLE σ 2 ?

11 Multivariate linear regression  What if the inputs are vectors?  Write matrix X and Y : (n data points, k features for each data)  MLE Θ =

12 Constant term?  We may expect linear data that does not go through the origin  Trick?

13 The constant term

14 Regression: another example  Assume the following model to fit the data. The model has one unknown parameter θ to be learned from data.  A maximum likelihood estimation of θ?


Download ppt "Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,"

Similar presentations


Ads by Google