Download presentation
Presentation is loading. Please wait.
1
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
2
Discriminative Classifier Learn P(Y|X) directly Logistic regression for binary classification: Note: Generative classifier: learn P(X|Y), P(Y) to get P(Y|X) under some modeling assumption e.g. P(X|Y) ~ N(m y, 1), etc.
3
Decision Boundary For which X, P(Y=1|X,w) ≥P(Y=0|X,w)? Decision boundary from NB? Linear classification rule!
4
LR more generally In more general case where for k < K for k=K
5
How to learn P(Y|X) Logistic regression Maximize conditional log likelihood Good news: concave function of w Bad news: no closed form solution gradient ascent
6
Gradient ascent (/descent) General framework for finding a maximum (or minimum) of a continuous (differentiable) function, say f(w) Start with some initial value w (1) and compute the gradient vector The next value w (2) is obtained by moving some distance from w (1) in the direction of steepest ascent, i.e., along the negative of the gradient
7
Gradient ascent for LR Iterate until change < threshold For all i,
8
Regularization Overfitting is a problem, especially when data is very high dimensional and training data is sparse Regularization: use a “penalized log likelihood function” which penalizes large values of w the modified gradient ascent
9
Applet http://www.cs.technion.ac.il/~rani/LocBoost/
10
NB vs LR Consider Y boolean, X continuous, X=(X1,…,Xn) Number of parameters NB: LR: Parameter estimation method NB: uncoupled LR: coupled
11
NB vs LR Asymptotic comparison (#training examples->infinity) When model assumptions correct NB,LR produce identical classifiers When model assumptions incorrect LR is less biased-does not assume conditional independence therefore expected to outperform NB
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.