Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Similar presentations


Presentation on theme: "Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006."— Presentation transcript:

1 Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006

2 Classifiers: The Swiss Army Tool of Vision A HUGE number of vision problems can be reduced to:  Is this a _____ or not? The next two lectures will focus on making that decision Classifiers that we will cover  Bayesian classification  Logistic regression  Boosting  Support Vector Machines  Nearest-Neighbor Classifiers

3 Motivating Problem Which pixels in this image are “skin pixels”? Useful for tracking, finding people, finding images with too much skin.

4 How could you find skin pixels? Step 1: Get Data Label every pixel as skin or not skin

5 Getting Probabilities Now that I have a bunch of examples, I can create probability distributions.  P([r,g,b]|skin) = Probability of an [r,g,b] tuple given that the pixel is skin  P([r,g,b]|~skin) = Probability of an [r,g,b] tuple given that the pixel is not skin

6 (From Jones and Rehg)

7 Using Bayes Rule x – the observation y – some underlying cause (skin/not skin)

8 Using Bayes Rule Prior Likelihood Normalizing Constant

9 Classification In this case P[skin|x] = 1-P[~skin|x] So the classifier reduces to  P[skin|x] > 0.5? We can change this to  P[skin|x] > c And vary c

10 The effect of varying c This is called a Receiver Operating Curve (or ROC From Jones and Rehg

11 Application: Finding Adult Pictures Let's say you needed to build a web filter for a library Could look at a few simple measurements based on the skin model

12 Example of Misclassified Image

13 Example of Correctly Classified Image

14 ROC Curve

15 Generative versus Discriminative Models The classifier that I have just described is known as a generative model Once you know all of the probabilities, you can generate new samples of the data May be too much work You could also optimize a function to just discriminate skin and not skin

16 Discriminative Classification using Logistic Regression Imagine we had two measurements and we plotted each sample on a 2D chart

17 Discriminative Classification using Logistic Regression Imagine we had two measurements and we plotted each sample on a 2D chart To separate the two groups, we'll project each point onto a line Some points will be projected to positive values and some will be projected to negative values

18 Discriminative Classification using Logistic Regression This line defines a separating line Each point is classified based on where it falls on the line

19 How do we get the line? Common Option: Logistic Regression Logistic Function:

20 The logistic function Notice that g(x) goes from 0 to 1 We can use this to estimate the probability something being an x or an o We need to find a function that will have large positive values for x's And large negative values for o's

21 Fitting the Line Remember, we want a line. For the diagram below, x = +1, o = -1 y = label of point (-1 or +1)

22 Fitting the line The logistic function gives us an estimate of the probability of an example being either +1 or -1 We can fit the line by maximizing the conditional probability of the correct labeling of the training set Also called features

23 Fitting the Line We have multiple samples that we assume are independent, so the probability of the whole training set is

24 Fitting the line It is usually easier to optimize the log conditional probability

25 Optimizing Lots of options Easiest option: Gradient ascent : The Learning Rate parameter, many ways to choose this

26 Choosing My (current) personal favorite method Choose some value for Update w, Compute new probability If the new probability does not rise, divide by 2 Otherwise multiply it by 1.1 (or something similar) Called “Bold-Driver” heuristic

27 Faster Option Computing the gradient requires summing over every training example Could be slow for a large training set Speed-up: Stochastic Gradient Ascent Instead of computing the gradient over the whole training set, instead choose one point at random. Do update based on that one point

28 Limitations Remember, we are only separating the two classes with a line Separate this data with a line: This is a fundamental problem, most things can't be separated by a line

29 Overcoming these limitations Two options:  Train on a more complicated function Quadratic Cubic  Make a new set of features:

30 Advantages We achieve non-linear classification by doing linear classification on non-linear transformations of the features Only have to rewrite feature generation code Learning code stays the same

31 Nearest Neighbor Classifier Is the “?” an x or an o? ?

32 Nearest Neighbor Classifier Is the “?” an x or an o? ?

33 Nearest Neighbor Classifier Is the “?” an x or an o? ?

34 Basic idea For your new example, find the k nearest neighbors in the training set Each neighbor casts a vote Label with the most votes wins Disadvantages:  Have to find the nearest neighbors Can be slow for a large training set Good approximate methods available (LSH - Indyk)


Download ppt "Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006."

Similar presentations


Ads by Google