Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Logistic Regression Psy 524 Ainsworth.
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
x – independent variable (input)
Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Linear Methods for Classification
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Crash Course on Machine Learning
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Crash Course on Machine Learning Part II
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Optimal Bayes Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
KNN & Naïve Bayes Hongning Wang
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine Learning Logistic Regression
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CH 5: Multivariate Methods
10701 / Machine Learning.
ECE 5424: Introduction to Machine Learning
Machine Learning Logistic Regression
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Logistic Regression Chapter 7.
Multivariate Methods Berlin Chen, 2005 References:
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Logistic Regression Rong Jin

Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c

Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c

Logistic Regression Model  The log-ratio of positive class to negative class  Results

Logistic Regression Model  The log-ratio of positive class to negative class  Results

Logistic Regression Model  Assume the inputs and outputs are related in the log linear function  Estimate weights: MLE approach

Example 1: Heart Disease Input feature x: age group id output y: having heart disease or not +1: having heart disease -1: no heart disease 1: : : : : : : : 60-64

Example 1: Heart Disease Logistic regression model Learning w and c: MLE approach Numerical optimization: w = 0.58, c = -3.34

Example 1: Heart Disease  W = 0.58 An old person is more likely to have heart disease  C = i  w+c < 0  p(+|i) < p(-|i) i  w+c > 0  p(+|i) > p(-|i) i  w+c = 0  decision boundary  i* = 5.78  53 year old

Naïve Bayes Solution Inaccurate fitting Non Gaussian distribution i* = 5.59 Close to the estimation by logistic regression Even though naïve Bayes does not fit input patterns well, it still works fine for the decision boundary

Problems with Using Histogram Data?

Uneven Sampling for Different Ages

Solution w = 0.63, c =  i* = 5.65

Example: Text Classification  Input x: a binary vector Each word is a different dimension x i = 0 if the ith word does not appear in the document x i = 1 if it appears in the document  Output y: interesting document or not +1: interesting -1: uninteresting

Example: Text Classification Doc 1 The purpose of the Lady Bird Johnson Wildflower Center is to educate people around the world, … Doc 2 Rain Bird is one of the leading irrigation manufacturers in the world, providing complete irrigation solutions for people… termtheworldpeoplecompanycenter… Doc … Doc …

Example 2: Text Classification  Logistic regression model Every term t i is assigned with a weight w i  Learning parameters: MLE approach  Need numerical solutions

Example 2: Text Classification  Weight w i w i > 0: term t i is a positive evidence w i < 0: term t i is a negative evidence w i = 0: term t i is irrelevant to whether the document is intesting The larger the | w i |, the more important t i term is determining whether the document is interesting.  Threshold c

Example 2: Text Classification Dataset: Reuter Classification accuracy Naïve Bayes: 77% Logistic regression: 88%

Why Logistic Regression Works better for Text Classification?  Common words Small weights in logistic regression Large weights in naïve Bayes  Weight ~ p(w|+) – p(w|-)  Independence assumption Naive Bayes assumes that each word is generated independently Logistic regression is able to take into account of the correlation of words

Comparison Generative Model Model P(x|y) Model the input patterns Usually fast converge Cheap computation Robust to noise data But Usually performs worse Discriminative Model Model P(y|x) directly Model the decision boundary Usually good performance But Slow convergence Expensive computation Sensitive to noise data