Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) 2008. 2. 19. Minkyoung Kim.

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Biointelligence Laboratory, Seoul National University
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Naïve Bayes Classifier
Visual Recognition Tutorial
Classification and risk prediction
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Crash Course on Machine Learning Part II
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Generative verses discriminative classifier
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Models for Classification
Lecture 2: Statistical learning primer for biologists
29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Machine Learning 5. Parametric Methods.
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Usman Roshan CS 675 Machine Learning
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Generative verses discriminative classifier
Empirical risk minimization
Ch3: Model Building through Regression
CH 5: Multivariate Methods
10701 / Machine Learning.
Discriminative and Generative Classifiers
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Data Mining Lecture 11.
with observed random variables
10701 / Machine Learning Today: - Cross validation,
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Logistic Regression.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Recap: Naïve Bayes classifier
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Naïve Bayes Classifier
Presentation transcript:

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim

Contents 1.Naive Bayes Classifier 1.Supervised Learning based on Bayes Rule (Impractical) 2.Conditionally Independence 3.Naive Bayes Classifier: Discrete-Valued Input 4.Naive Bayes Classifier: Continuous-Valued Input (GNB) 2.Logistic Regression 1.Supervised Learning using a parametric model 2.Logistic Regression from GNB 3.Regularization in Logistic Regression 3.Naive Bayes Classifier vs. Logistic Regression

Naive Bayes classifier

Supervised Learning Training Examples Supervised Learning Target Function SampleAnswer

Learning based on Bayes Rule Training Examples Supervised Learning Target Function SampleAnswer Bayes Rule Likelihood Prior

XYSum RuleTotal # of Params for estimating P(X|Y) of Bayes When Let’s count the number of parameters of Impractical!

XiXi YSum RuleTotal # of Params for estimating P(X|Y) of Naive Bayes When Let’s count the number of parameters of

Naive Bayes Algorithm When Bayes Conditional Independence Naive Bayes Naive Bayes classifier Then, the most probable value of Y (Answer) is Since the denominator does not depend on y k, simply ① likelihood ② prior

Naive Bayes for Discrete-Valued Input When ① Likelihood Let’s count the number of parameters of XiXi YSum RuleTotal ② Prior Let’s count the number of parameters of YSum RuleTotal

Training: Maximum likelihood estimates (relative frequencies) ① Likelihood smoothing ② Prior smoothing

Naive Bayes for Continuous Input - Gaussian Naive Bayes classifier When ① Likelihood In order to train likelihood, we must estimate the mean and standard deviation. 1.Gaussian: X i is generated by a mixture of class-conditional Gaussian (i.e. dependent on the value of the class variable Y) 2.Naive Bayes: The attribute values X i are conditionally independent of one another. Total ② Prior Total

Training: Maximum likelihood estimates (relative frequencies) ① Likelihood ② Prior smoothing : training examples in the case of Minimum variance unbiased estimator (MVUE)

Logistic Regression

Supervised Learning using a parametric model Training Examples Supervised Learning Target Function SampleAnswer Bayes Rule Params

Logistic Regression for Boolean Label When ① Parametric model (logistic function) ② We assign the label, if, and assigns, otherwise. This leads to a simple linear expression for classification!

Logistic Regression from GNB

Logistic Regression from GNB Conditional Independence Bayes Naive Bayes

Logistic Regression from GNB Gaussian

Logistic Regression from GNB GNB, where, Also we have Thus,

Training: Choosing W that maximize the conditional data log likelihood We choose parameters W that satisfy Equivalently, we can work with the log of the conditional likelihood:

Training: Choosing W that maximize the conditional data log likelihood Here, we introduce the logistic model Then,

Training: Choosing W that maximize the conditional data log likelihood Prediction error: we want this to be zero! Predicted prob. Responsibility for this prediction Observed Y l

Training: Gradient ascent rule to optimize the weights W Step 1: Step 2: For all training examples, repeatedly update the weights in the direction of the gradient,, where : step size : lth training example Because the conditional log likelihood is a concave function in W, this gradient ascent procedure will converge to a global maximum.

Regularization in Logistic Regression Overfitting problem especially when data is very high dimensional and training data is sparse. One approach to reducing overfitting is regularization in which we create a modified “penalized log likelihood function”, which penalizes large values of W., where : strength of penalty The derivative of this penalized log likelihood function,

Regularization in Logistic Regression The penalty term can be interpreted as the result of imposing a Normal prior on W, with zero mean, and whose variance is related to 1/λ.

Training: Modified gradient ascent rule to optimize the weights W Step 1: Step 2: Repeatedly update the weights in the direction of the gradient,, where : step size Because the conditional log likelihood is a concave function in W, this gradient ascent procedure will converge to a global maximum.

Logistic Regression for Discrete Label When ① Parametric model (logistic function) ② Gradient descent rule with regularization, Previous case is a special case of this new learning rule, when K=2.

Naive Bayes classifier vs. Logistic Regression

Naive Bayes Classifier vs. Logistic Regression (Naive) Bayes ClassifierLogistic Regression Nick name Generative classifier: We can view the distribution P(X|Y) as describing how to generate random instances X conditioned on the target attribute Y. Discriminative classifier: We can view the distribution P(Y|X) as directly discriminating the value of the target value Y for any given instance X. Assumpti on Naive Bayes classifier: all attributes of X are conditionally independent given Y. ⇒ reduces # of params dramatically Function approximation with logistic function y=1/(1+exp(-x)) Choice GNB = LR when l →∞, provided the Naive Bayes assumptions hold. GNB converges in order logn, whereas LR does in order n. GNB outperforms LR when training data is scarce, vice versa. NB has greater bias but lower variance than LR.