# Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.

## Presentation on theme: "Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic."— Presentation transcript:

Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic regression

Logistic Regression (lecture 9 on amlbook.com)

Neuron analogy Dot product w T x is a way of combining attributes into a scalar signal s. How signal is used defines the hypothesis set.

In logistic regression, signal become argument of a function with properties like a probability distribution

Objective: find w such that risk score >> 0 for patients that had a heart attack (  (s) ~ 1) and risk score << 0 for those who have not (  (s) ~ 0). Application: risk of heart attack

More specifically (see text p91) Dataset drawn from a distribution function P(y|x), which is related to hypothesis h(x) by P(y n |x n ) = h(x n ) if y n = +1; P(y n |x n ) = 1 - h(x n ) if y n = -1 Logistic function has the property  (-s) = 1 –  (s) Hence, both relationships are satisfied by P(y n |x n ) =  (y n w T x n ) Now use maximum likelihood estimation (MLE) to derive an error function that we minimize to find the optimum w

Recall that MLE is used to Estimate parameters of a probability distribution given a sample X drawn from that distribution In logistic regression, parameters are the weights Likelihood of w given the sample X l (w| X ) = p ( X |w) = ∏ t p(x t |w) Log likelihood L (w| X ) = log( l (w| X )) = ∑ t log p(x t |w) In logistic regression, p( x t |w) =  (y n w T x n )

Since Log is a monotone increasing function, maximizing log(likelihood) is equivalent to minimizing -log(likelihood) Text also normalizes by dividing by N; hence error function becomes

Error function of logistic regression (called cross entropy) has the desired properties. If x n are attributes of person who has had a heart attack, w T x n >> 0 and y n > 0 so contribution to E in (w) is small. If x n are attributes of person who has not had a heart attack, w T x n << 0 and y n < 0 so contribution to E in (w) is again small.

Error function of linear regression allows “1-step” optimization. Not true for error function of logistic regression Optimization is iterative; method is “steepest decent”

Method of steepest (gradient) decent: Fixed step size  w(1) = w(0) +  v hat Unit vector in the direction of the gradient

Method of steepest (gradient) decent: Fixed leaning rate  w(1) = w(0) + delta w Weights change fastest where gradient is largest For E in = cross entropy, gradient is analytical

Logistics regression algorithm

How to compute gradient of E in

How to known when to stop

Assignment 6: Due 10-30-14

Download ppt "Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic."

Similar presentations