Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression Gary Cottrell 6/8/2018

Similar presentations


Presentation on theme: "Logistic Regression Gary Cottrell 6/8/2018"— Presentation transcript:

1 Logistic Regression Gary Cottrell 6/8/2018
Why perceptrons? You’ll see! Walker L. Cisler Memorial Science Lecture

2 A generalization of the linear discriminant:
A monotonic activation function g(): This is still considered a linear discriminant, because if g is monotonic, the boundary will still be linear (even if it “ramps up”) To motivate this, imagine two gaussian-distributed categories with equal variance CSE 190 Lecture 3 9/30/15

3 Gaussian probability density functions:
By Bayes’ rule: And since these have to sum to 1 (if there are only two classes), the denominator is the sum of the numerators: CSE 190 Lecture 3 9/30/15

4 A somewhat counterintuitive derivation:
6/8/2018 A somewhat counterintuitive derivation: Call these terms A and B, we have: In other words, the probability of class 1 follows a sigmoid as a function of the log ratio of the probability of class C1 to the probability of class C2. Stop here to plot the logistic CSE 190 Lecture 3 9/30/15 Walker L. Cisler Memorial Science Lecture

5 The logistic activation function:
Allows us to interpret the output as posterior probabilities – the probability of category C1 given x. Note: a can be written as (and there is a generalization to multi-dimensional gaussians Where w and x are vectors) CSE 190 Lecture 3 9/30/15

6 That’s nice, but, we don’t know the data is gaussian, and we want to learn the weights
What to do? We are going to use something called the Maximum Likelihood Principle. The MLP says, “set your parameters to that they maximize the probability of your training data.” CSE 190 Lecture 3 9/30/15

7 Motivating Example (on board – gaussian model of grades)
CSE 190 Lecture 3 9/30/15

8 What does this mean for us?
We are trying to learn a mapping: We have a training set of (x,t) pairs, where t=1 means x is in Category 1, and t=0 means category 2. A complete model of this would be to find the distribution that best models p(x,t) – the joint probability of the data. We define the likelihood of the data as: CSE 190 Lecture 3 9/30/15

9 What does this mean for us?
The likelihood of the data is indexed by our parameters θ – in the gaussian example, this would be μ and σ So now, the Maximum Likelihood Principle says we should choose our parameters as: CSE 190 Lecture 3 9/30/15


Download ppt "Logistic Regression Gary Cottrell 6/8/2018"

Similar presentations


Ads by Google