Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Logistic regression classifier Naive Bayes classifier Likelihood P(x|y) Prior P(y) Logistic regression classifier No likelihood No prior Use differing way to estimate probabilities.
Features and weights Movie review N = 4 features 2 classes: +, − Negative weight for enjoy. It is evidence against the class negative.
Posterior probability x = ( ‘great’, ‘second-rate’, ‘no’, ‘enjoy’ ). A feature function, fi, that takes on only the values 0 and 1 is called an indicator function. f1(+,x)=1, f2(+,x)=0, f3(+,x)=0, f3(+,x)=0 f1(−,x)=0, f2(−,x)=1, f3(−,x)=1, f3(−,x)=1 P
Classification
Classify example P(+|x) ∝ 1.9 P(−|x) ∝ .9+.7−.8 = .8
Training logistic regression How are the parameters of the model, the weights w, learned? Logistic regression is trained with conditional maximum likelihood estimation. w Euclidean norm alpha is a constant Bayes probabilities are estimated from relative frequencies by counting. Regression probabilities are estimated from Gaussian means and variance, and Euclidean norm.
Multinomial logistic regression Logistic regression has two possible classes. Multinomial logistic regression has more than two classes. Multinomial logistic regression is also called maximum entropy modeling, MaxEnt for short.
Bayes vs regression Naive Bayes assumes conditional independence. Regression does not. When there are many correlated features, logistic regression will assign a more accurate probability than naive Bayes. Naive Bayes works well on small datasets or short documents. Naive Bayes is easy to implement and fast to train.