Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.

Similar presentations


Presentation on theme: "Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew."— Presentation transcript:

1 Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

2 Discriminative Classifier  Learn P(Y|X) directly  Logistic regression for binary classification: Note: Generative classifier: learn P(X|Y), P(Y) to get P(Y|X) under some modeling assumption e.g. P(X|Y) ~ N(m y, 1), etc.

3 Decision Boundary  For which X, P(Y=1|X,w) ≥P(Y=0|X,w)?  Decision boundary from NB? Linear classification rule!

4 LR more generally  In more general case where for k < K for k=K

5 How to learn P(Y|X)  Logistic regression  Maximize conditional log likelihood  Good news: concave function of w  Bad news: no closed form solution  gradient ascent

6 Gradient ascent (/descent)  General framework for finding a maximum (or minimum) of a continuous (differentiable) function, say f(w)  Start with some initial value w (1) and compute the gradient vector  The next value w (2) is obtained by moving some distance from w (1) in the direction of steepest ascent, i.e., along the negative of the gradient

7 Gradient ascent for LR Iterate until change < threshold For all i,

8 Regularization  Overfitting is a problem, especially when data is very high dimensional and training data is sparse  Regularization: use a “penalized log likelihood function” which penalizes large values of w  the modified gradient ascent

9  Applet http://www.cs.technion.ac.il/~rani/LocBoost/

10 NB vs LR  Consider Y boolean, X continuous, X=(X1,…,Xn)  Number of parameters  NB:  LR:  Parameter estimation method  NB: uncoupled  LR: coupled

11 NB vs LR  Asymptotic comparison (#training examples->infinity)  When model assumptions correct  NB,LR produce identical classifiers  When model assumptions incorrect  LR is less biased-does not assume conditional independence  therefore expected to outperform NB


Download ppt "Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew."

Similar presentations


Ads by Google