Lecture 04: Logistic Regression

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Linear Regression.

Brief introduction on Logistic Regression

Logistic Regression.

Integration of sensory modalities

The General Linear Model. The Simple Linear Model Linear Regression.

Chapter 4: Linear Models for Classification

Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.

Visual Recognition Tutorial

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.

FIN357 Li1 Binary Dependent Variables Chapter 12 P(y = 1|x) = G(  0 + x  )

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Review of Lecture Two Linear Regression Normal Equation

MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.

Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.

Linear Models for Classification

Multiple Logistic Regression STAT E-150 Statistical Methods.

Lecture 2: Statistical learning primer for biologists

Machine Learning 5. Parametric Methods.

Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.

Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,

STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.

Logistic Regression: Regression with a Binary Dependent Variable.

Oliver Schulte Machine Learning 726

Deep Feedforward Networks

Logistic Regression.

Lecture 04: Logistic Regression

Lecture 07: Soft-margin SVM

Lecture 02: Linear Regression

CH 5: Multivariate Methods

Lecture 05: K-nearest neighbors

10701 / Machine Learning.

ECE 5424: Introduction to Machine Learning

Lecture 09: Gaussian Processes

Generalized Linear Models

Statistical Learning Dong Liu Dept. EEIS, USTC.

Lecture 07: Soft-margin SVM

Statistical Assumptions for SLR

Mathematical Foundations of BME Reza Shadmehr

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Lecture 11: Mixture of Gaussians

Integration of sensory modalities

Logistic Regression.

Lecture 07: Soft-margin SVM

Lecture 02: Linear Regression

Lecture 04: Logistic Regression

CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.

Simple Linear Regression

Lecture 10: Gaussian Processes

Lecture 03: K-nearest neighbors

Parametric Methods Berlin Chen, 2005 References:

Logistic Regression Chapter 7.

Mathematical Foundations of BME Reza Shadmehr

Recap: Naïve Bayes classifier

Linear Discrimination

Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.

Logistic Regression.

Presentation transcript:

Lecture 04: Logistic Regression CS489/698: Intro to ML Lecture 04: Logistic Regression 9/21/17 Yao-Liang Yu

Outline Announcements Bernoulli model Logistic regression Computation 9/21/17 Yao-Liang Yu

Outline Announcements Bernoulli model Logistic regression Computation 9/21/17 Yao-Liang Yu

Announcements Assignment 1 due next Tuesday 9/21/17 Yao-Liang Yu

Outline Announcements Bernoulli model Logistic regression Computation 9/21/17 Yao-Liang Yu

Classification revisited ŷ = sign( xTw + b ) How confident we are about ŷ? |xTw + b| seems a good indicator real-valued; hard to interpret ways to transform into [0,1] Better(?) idea: learn confidence directly 9/21/17 Yao-Liang Yu

Conditional probability P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? obviously, value in [0,1] P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes more generally, sum to 1 Notation (Simplex). Δk-1 := { p in Rk: p ≥ 0, Σi pi = 1 } 9/21/17 Yao-Liang Yu

Reduction to a harder problem P(Y=1 | X=x) = E(1Y=1 | X=x) Let Z = 1Y=1, then regression function for (X, Z) use linear regression for binary Z? Exploit structure! conditional probabilities are in a simplex Never reduce to unnecessarily harder problem 9/21/17 Yao-Liang Yu

Bernoulli model Let P(Y=1 | X=x) = p(x; w), parameterized by w Conditional likelihood on {(x1, y1), … (xn, yn)}: simplifies if independence holds Assuming yi is {0,1}-valued 9/21/17 Yao-Liang Yu

Naïve solution Find w to maximize conditional likelihood What is the solution if p(x; w) does not depend on x? What is the solution if p(x; w) does not depend on ? 9/21/17 Yao-Liang Yu

Generalized linear models (GLM) y ~ Bernoulli(p); p = p(x; w) natural parameter Logistic regression y ~ Normal(μ, σ2); μ = μ(x; w) (weighted) least-squares regression GLM: y ~ exp( θ φ(y) – A(θ) ) log-partition function sufficient statistics 9/21/17 Yao-Liang Yu

Outline Announcements Bernoulli model Logistic regression Computation 9/21/17 Yao-Liang Yu

Logit transform p(x; w) = wTx? p >=0 not guaranteed… log p(x; w) = wTx? better! LHS negative, RHS real-valued… Logit transform Or equivalently odds ratio 9/21/17 Yao-Liang Yu

Prediction with confidence ŷ = 1 if p = P(Y=1 | X=x) > ½ iff wTx > 0 Decision boundary wTx = 0 ŷ = sign(wTx) as before, but with confidence p(x; w) 9/21/17 Yao-Liang Yu

Not just a classification algorithm Logistic regression does more than classification it estimates conditional probabilities under the logit transform assumption Having confidence in prediction is nice the price is an assumption that may or may not hold If classification is the sole goal, then doing extra work as shall see, SVM only estimates decision boundary 9/21/17 Yao-Liang Yu

More than logistic regression F(p) transforms p from [0,1] to R Then, equating F(p) to a linear function wTx But, there are many other choices for F! precisely the inverse of any distribution function! 9/21/17 Yao-Liang Yu

Logistic distribution Cumulative Distribution Function Mean mu, variance s2π2/3 9/21/17 Yao-Liang Yu

Outline Announcements Bernoulli model Logistic regression Computation 9/21/17 Yao-Liang Yu

Maximum likelihood Minimize negative log-likelihood 9/21/17 Yao-Liang Yu

Newton’s algorithm η = 1: iterative weighted least-squares PSD Uncertain predictions get bigger weight η = 1: iterative weighted least-squares 9/21/17 Yao-Liang Yu

A word about implementation Numerically computing exponential can be tricky easily underflows or overflows The usual trick estimate the range of the exponents shift the mean of the exponents to 0 9/21/17 Yao-Liang Yu

Robustness Bounded derivative Variational exponential Larger exp loss gets smaller weights 9/21/17 Yao-Liang Yu

More than 2 classes Softmax Again, nonnegative and sum to 1 Negative log-likelihood (y is one-hot) 9/21/17 Yao-Liang Yu

Questions? 9/21/17 Yao-Liang Yu