Presentation is loading. Please wait.

Presentation is loading. Please wait.

580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Similar presentations


Presentation on theme: "580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data."— Presentation transcript:

1 580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data

2 Bayesian classification Suppose we wish to classify vector x as belonging to a class: {1,…,L}. We are given labeled data and need to form a classification function: Classify x into the class l that maximizes the posterior probability. priorLikelihood marginal

3 Classification when distributions have equal variance Suppose we wish to classify a person as male or female based on height. What we have: What we want: 160180200 0.01 0.02 0.03 0.04 femalemale Note that the two densities have equal variance Assume equal probability of being male or female: 160180200 0.005 0.01 0.015 0.02 160180200 0.005 0.01 0.015 0.02 0.025 0.03 0.035

4 Classification when distributions have equal variance 160180200 0.005 0.01 0.015 0.02 160180200 -4 -2 2 4

5 Estimating the decision boundary between data of equal variance Suppose the distributions for the data in each class is a Gaussian. The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with equal variance, then the boundary between any two classes is a line.

6 Estimating the decision boundary from estimated densities From the data we can get an ML estimate of Gaussian parameters Class 1 Class 2 Class 3

7 Relationship between Bayesian classification and Fischer discriminant If we have two classes, class -1 and class +1, then the decision boundary is at 0: For the Bayesian classifier, under assumption of equal variance, the decision boundary is at: The Fischer decision boundary is the same as the Bayesian when the two classes have equal variance and equal prior probability.

8 Classification when distributions have unequal variance What we have: Classification: 160180200 0.005 0.01 0.015 0.02 0.025 Assume: 160180200 0.005 0.01 0.015 0.02 0.025 0.03 0.035 160180200 0.2 0.4 0.6 0.8 1 140 160180 200 0 0.05 0.1 0.15 0.2 0.25

9 160180200 -12 -10 -8 -6 -4 -2 160180200 0.005 0.01 0.015 0.02 0.025

10 Quadratic discriminant: when data comes from unequal variance Gaussians The decision boundary between any two classes is where the log of the ratio is zero. If the data in each class has a Gaussian density with unequal variance, then the boundary between any two classes is a quadratic function of x. green red blue

11 -20-101020 0.02 0.04 0.06 Non-parametric estimate of densities: Kernel density estimate Suppose we have points x(i) that belong to class l. Suppose we can’t assume that these points come from a Gaussian distribution. To estimate the density, we need to form a function that assigns a weight to each point x in our space, with the integral of this function equal to 1. It seems that the more data points x (i) we find around x, the more the weight of x should be. The kernel density estimate puts a Gaussian centered at each data point. Where there are more data points, there are more Gaussians, and the sum is the density. -20-101020 2 4 6 8 10 -20-101020 0.02 0.04 0.06 0.08 Histogram of the sampled data belonging to class l Kernel ML estimate of a Gaussian density density estimate using a Gaussian kernel

12 Non-parametric estimate of densities: Kernel density estimate green red blue

13 Classification with missing data Suppose that we have built a Bayesian classifier and are now given a new data point to classify, but that this new data point is missing some of the “features” that we normally expect to see. In the example below, we have two features (x1 and x2), and four classes. The likelihood function is plotted. Suppose that we are given data point (*,-1) to classify. This data point is missing a value for x1. If we assume the missing value is the average of the previously observed x1, then we would estimate it to be about 1. Assuming that the prior probabilities are equal among the four classes, we classify (1,-1) as class c2. However, c4 is a better choice because when x2=-1, c4 is the most likely class as it has the highest likelihood. -4-202468 -3 -2 0 1 2 3

14 Classification with missing data good databad (or missing) data


Download ppt "580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data."

Similar presentations


Ads by Google