Download presentation

Presentation is loading. Please wait.

Published byMartha Carson Modified over 4 years ago

1
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic regression

2
Logistic Regression (lecture 9 on amlbook.com)

3
Neuron analogy Dot product w T x is a way of combining attributes into a scalar signal s. How signal is used defines the hypothesis set.

4
In logistic regression, signal become argument of a function with properties like a probability distribution

5
Objective: find w such that risk score >> 0 for patients that had a heart attack ( (s) ~ 1) and risk score << 0 for those who have not ( (s) ~ 0). Application: risk of heart attack

6
More specifically (see text p91) Dataset drawn from a distribution function P(y|x), which is related to hypothesis h(x) by P(y n |x n ) = h(x n ) if y n = +1; P(y n |x n ) = 1 - h(x n ) if y n = -1 Logistic function has the property (-s) = 1 – (s) Hence, both relationships are satisfied by P(y n |x n ) = (y n w T x n ) Now use maximum likelihood estimation (MLE) to derive an error function that we minimize to find the optimum w

7
Recall that MLE is used to Estimate parameters of a probability distribution given a sample X drawn from that distribution In logistic regression, parameters are the weights Likelihood of w given the sample X l (w| X ) = p ( X |w) = ∏ t p(x t |w) Log likelihood L (w| X ) = log( l (w| X )) = ∑ t log p(x t |w) In logistic regression, p( x t |w) = (y n w T x n )

8
Since Log is a monotone increasing function, maximizing log(likelihood) is equivalent to minimizing -log(likelihood) Text also normalizes by dividing by N; hence error function becomes

9
Error function of logistic regression (called cross entropy) has the desired properties. If x n are attributes of person who has had a heart attack, w T x n >> 0 and y n > 0 so contribution to E in (w) is small. If x n are attributes of person who has not had a heart attack, w T x n << 0 and y n < 0 so contribution to E in (w) is again small.

10
Error function of linear regression allows “1-step” optimization. Not true for error function of logistic regression Optimization is iterative; method is “steepest decent”

11
Method of steepest (gradient) decent: Fixed step size w(1) = w(0) + v hat Unit vector in the direction of the gradient

12
Method of steepest (gradient) decent: Fixed leaning rate w(1) = w(0) + delta w Weights change fastest where gradient is largest For E in = cross entropy, gradient is analytical

13
Logistics regression algorithm

14
How to compute gradient of E in

15
How to known when to stop

16
Assignment 6: Due 10-30-14

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google