Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods for classification and image representation

Similar presentations


Presentation on theme: "Methods for classification and image representation"— Presentation transcript:

1 Methods for classification and image representation
Object recognition Methods for classification and image representation

2 Credits Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04 Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

3 Object recognition What is it? Where is it? How many are there?
Instance Category Something with a tail Where is it? Localization Segmentation How many are there? Object recognition is a broad, interesting and important area. We will think of it as a number of specific questions of varying difficulty. The most specific question is “It this my blue cup?”. Here we aim to recognize the very specific instance of the object. It very quickly becomes inconvenient to reason about every single object separately. We would rather recognize that an object belongs to some category. Here we aim to separate cars from airplanes, people from cats, dogs from tables etc. It’s hard to believe that we can reasonably enumerate everything in advance. When we encounter something we’ve never seen before, we still want to reason about it. We may say, it has a tail, it is made of metal. Once we can tell road from trees we would like to say where they are in the image. A simple and powerful recipe is to look at small sub-images and ask the “what is it?” question over and over again. When we know what it is, we say “here”. After that we may want to remove that object. For that we need to know, which pixels belong to the object and which don’t. We call this segmentation. The recipe of the scanning window is great for many cases, but sometimes things get tricky. To reliably count object we need to understand the relations between objects and their parts. In an image, we’d need to associate which part belong to which object and if we missed anything.

4 Object recognition What is it? Where is it? How many are there?
Instance Category Something with a tail Where is it? Localization Segmentation How many are there? Object recognition is a broad, interesting and important area. We will think of it as a number of specific questions of varying difficulty. The most specific question is “It this my blue cup?”. Here we aim to recognize the very specific instance of the object. It very quickly becomes inconvenient to reason about every single object separately. We would rather recognize that an object belongs to some category. Here we aim to separate cars from airplanes, people from cats, dogs from tables etc. It’s hard to believe that we can reasonably enumerate everything in advance. When we encounter something we’ve never seen before, we still want to reason about it. We may say, it has a tail, it is made of metal. Once we can tell road from trees we would like to say where they are in the image. A simple and powerful recipe is to look at small sub-images and ask the “what is it?” question over and over again. When we know what it is, we say “here”. After that we may want to remove that object. For that we need to know, which pixels belong to the object and which don’t. We call this segmentation. The recipe of the scanning window is great for many cases, but sometimes things get tricky. To reliably count object we need to understand the relations between objects and their parts. In an image, we’d need to associate which part belong to which object and if we missed anything.

5 Face detection +1 face ? features classify -1 not face ? ? x F(x) y
We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

6 What is a face? Eyes are dark (eyebrows+shadows)
Cheeks and forehead are bright. Nose is bright Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

7 Basic feature extraction
Information type: intensity Sum over: gray and white rectangles Output: gray-white Separate output value for Each type Each scale Each position in the window FEX(im)=x=[x1,x2,…….,xn] x357 x120 Basic features for face detection work on just intensity values. They capture x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

8 Face detection +1 face ? features classify -1 not face x F(x) y
We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

9 Classification + + Examples are points in Rn
Positives are separated from negatives by the hyperplane w y=sign(wTx-b) + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-d - - - -

10 Classification + + x  Rn - data points
P(x) - distribution of the data y(x) - true value of y for each x F - decision function: y=F(x, )  - parameters of F, e.g. =(w,b) We want F that makes few mistakes + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - - - -

11 Loss function + + POSSIBLE CANCER + + + + + - w - + - - -
Our decision may have severe implications L(y(x),F(x, )) - loss function How much we pay for predicting F(x,), when the true value is y(x) Classification error: Hinge loss + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - ABSOLUTELY NO RISK OF CANCER - - -

12 Learning Total loss shows how good a function (F, ) is:
Learning is to find a function to minimize the loss: How can we see all possible x?

13 Datasets Dataset is a finite sample {xi} from P(x)
Dataset has labels {(xi,yi)} Datasets today are big to ensure the sampling is fair #images #classes #instances Caltech 256 30608 256 Pascal VOC 4340 20 10363 LabelMe 176975 ??? 414687

14 Overfitting A simple dataset. Two models Non-linear Linear + + + + + +

15 Overfitting Let’s get more data.
Simple model has better generalization. + + + + + + + - + + - - + + + - + + + + + + - + + - - - - + - + - + - + - - - + - - - + - + - - + - + - - - - + - + - - + - - - - + - + - -

16 Overfitting As complexity increases, the model overfits the data
Loss As complexity increases, the model overfits the data Training loss decreases Real loss increases We need to penalize model complexity = to regularize Real loss Training loss Model complexity

17 Overfitting Loss Split the dataset
Training set Validation set Test set Use training set to optimize model parameters Use validation test to choose the best model Use test set only to measure the expected loss Stopping point Test set loss Validation set loss Training set loss Model complexity

18 Classification methods
K Nearest Neighbors Decision Trees Linear SVMs Kernel SVMs Boosted classifiers

19 K Nearest Neighbors Memorize all training data
Find K closest points to the query The neighbors vote for the label: Vote(+)=2 Vote(–)=1 + + + + + + o + - - - + - - + + - - - - - - - -

20 K-Nearest Neighbors Nearest Neighbors (silhouettes)
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

21 K-Nearest Neighbors Silhouettes from other views 3D Visual hull
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

22 Decision tree X1>2 No Yes V(+)=8 o + V(+)=2 V(-)=8 X2>1 No Yes

23 Decision Tree Training
V(-)=57% V(+)=80% V(-)=80% V(+)=64% Partition data into pure chunks Find a good rule Split the training data Build left tree Build right tree Count the examples in the leaves to get the votes: V(+), V(-) Stop when Purity is high Data size is small At fixed level + + + + + + - - - + - - + + - - - - - - - V(-)=100%

24 Decision trees Stump: If xi > a Very simple “Weak classifier”
1 root 2 leaves If xi > a then positive else negative Very simple “Weak classifier” x357 x120 x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

25 Support vector machines
Simple decision Good classification Good generalization + + + + w + margin + - - - + - - + + - - - - - - -

26 Support vector machines
+ + + + w + + - - - + - - + + - - - Support vectors: - - - -

27 How do I solve the problem?
It’s a convex optimization problem Can solve in Matlab (don’t) Download from the web SMO: Sequential Minimal Optimization SVM-Light LibSVM LibLinear SVM-Perf Pegasos

28 Linear SVM for pedestrian detection

29 Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

30 centered diagonal uncentered cubic-corrected Sobel
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

31 Histogram of gradient orientations
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

32 8 orientations X= 15x7 cells Slides by Pete Barnum
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

33 pedestrian Slides by Pete Barnum
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

34 Kernel SVM Decision function is a linear combination of support vectors: Prediction is a dot product: Kernel is a function that computes the dot product of data points in some unknown space: We can compute the decision without knowing the space:

35 Useful kernels Linear! RBF Histogram intersection Pyramid match

36 Histogram intersection
+1 Assign to texture cluster Count S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

37 (Spatial) Pyramid Match
S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

38 Boosting Weak classifier Weak learner builds weak classifiers
Classifier that is slightly better than random guessing Weak learner builds weak classifiers

39 Boosting Start with uniform distribution Iterate:
Get a weak classifier fk Compute it’s 0-1 error Take Update distribution Output the final “strong” classifier Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

40 Face detection +1 face ? features classify -1 not face x F(x) y
We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

41 Face detection Use haar-like features
X234>1.3 -1 Non-face +1 Face Yes No Use haar-like features Use decision stumps as week classifiers Use boosting to build a strong classifier Use sliding window to detect the face x120 x357 x629 x834

42


Download ppt "Methods for classification and image representation"

Similar presentations


Ads by Google