Methods for classification and image representation

Name: Methods for classification and image representation
Uploaded: 2017-08-18T04:22:16+00:00
Duration: PTM21S42
Channel: Rodger Ward
Description: Methods for classification and image representation

Methods for classification and image representation
Object recognition Methods for classification and image representation

Credits Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04 Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

Object recognition What is it? Where is it? How many are there?
Instance Category Something with a tail Where is it? Localization Segmentation How many are there? Object recognition is a broad, interesting and important area. We will think of it as a number of specific questions of varying difficulty. The most specific question is “It this my blue cup?”. Here we aim to recognize the very specific instance of the object. It very quickly becomes inconvenient to reason about every single object separately. We would rather recognize that an object belongs to some category. Here we aim to separate cars from airplanes, people from cats, dogs from tables etc. It’s hard to believe that we can reasonably enumerate everything in advance. When we encounter something we’ve never seen before, we still want to reason about it. We may say, it has a tail, it is made of metal. Once we can tell road from trees we would like to say where they are in the image. A simple and powerful recipe is to look at small sub-images and ask the “what is it?” question over and over again. When we know what it is, we say “here”. After that we may want to remove that object. For that we need to know, which pixels belong to the object and which don’t. We call this segmentation. The recipe of the scanning window is great for many cases, but sometimes things get tricky. To reliably count object we need to understand the relations between objects and their parts. In an image, we’d need to associate which part belong to which object and if we missed anything.

Face detection +1 face ? features classify -1 not face ? ? x F(x) y
We slide a window over the image Extract features for each window Classify each window into face/non-face We start with a walk through a face detector.

What is a face? Eyes are dark (eyebrows+shadows)
Cheeks and forehead are bright. Nose is bright Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Basic feature extraction
Information type: intensity Sum over: gray and white rectangles Output: gray-white Separate output value for Each type Each scale Each position in the window FEX(im)=x=[x1,x2,…….,xn] x357 x120 Basic features for face detection work on just intensity values. They capture x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Face detection +1 face ? features classify -1 not face x F(x) y

Classification + + Examples are points in Rn
Positives are separated from negatives by the hyperplane w y=sign(wTx-b) + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-d - - - -

Classification + + x  Rn - data points
P(x) - distribution of the data y(x) - true value of y for each x F - decision function: y=F(x, )  - parameters of F, e.g. =(w,b) We want F that makes few mistakes + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - - - -

Loss function + + POSSIBLE CANCER + + + + + - w - + - - -
Our decision may have severe implications L(y(x),F(x, )) - loss function How much we pay for predicting F(x,), when the true value is y(x) Classification error: Hinge loss + + + + + - w - + - - We have converted all windows to n-dimensional vectors, or points in N-dimensional space. Not all points are equally likely and P(x) captures the distribution of the data. For each point there the correct prediction value y(x). In this - ABSOLUTELY NO RISK OF CANCER - - -

Learning Total loss shows how good a function (F, ) is:
Learning is to find a function to minimize the loss: How can we see all possible x?

Datasets Dataset is a finite sample {xi} from P(x)
Dataset has labels {(xi,yi)} Datasets today are big to ensure the sampling is fair #images #classes #instances Caltech 256 30608 256 Pascal VOC 4340 20 10363 LabelMe 176975 ??? 414687

Overfitting A simple dataset. Two models Non-linear Linear + + + + + +

Overfitting Let’s get more data.
Simple model has better generalization. + + + + + + + - + + - - + + + - + + + + + + - + + - - - - + - + - + - + - - - + - - - + - + - - + - + - - - - + - + - - + - - - - + - + - -

Overfitting As complexity increases, the model overfits the data
Loss As complexity increases, the model overfits the data Training loss decreases Real loss increases We need to penalize model complexity = to regularize Real loss Training loss Model complexity

Overfitting Loss Split the dataset
Training set Validation set Test set Use training set to optimize model parameters Use validation test to choose the best model Use test set only to measure the expected loss Stopping point Test set loss Validation set loss Training set loss Model complexity

Classification methods
K Nearest Neighbors Decision Trees Linear SVMs Kernel SVMs Boosted classifiers

K Nearest Neighbors Memorize all training data
Find K closest points to the query The neighbors vote for the label: Vote(+)=2 Vote(–)=1 + + + + + + o + - - - + - - + + - - - - - - - -

K-Nearest Neighbors Nearest Neighbors (silhouettes)
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

K-Nearest Neighbors Silhouettes from other views 3D Visual hull
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell, Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes

Decision tree X1>2 No Yes V(+)=8 o + V(+)=2 V(-)=8 X2>1 No Yes

Decision Tree Training
V(-)=57% V(+)=80% V(-)=80% V(+)=64% Partition data into pure chunks Find a good rule Split the training data Build left tree Build right tree Count the examples in the leaves to get the votes: V(+), V(-) Stop when Purity is high Data size is small At fixed level + + + + + + - - - + - - + + - - - - - - - V(-)=100%

Decision trees Stump: If xi > a Very simple “Weak classifier”
1 root 2 leaves If xi > a then positive else negative Very simple “Weak classifier” x357 x120 x629 x834 Paul Viola, Michael Jones, Robust Real-time Object Detection, IJCV 04

Support vector machines
Simple decision Good classification Good generalization + + + + w + margin + - - - + - - + + - - - - - - -

Support vector machines
+ + + + w + + - - - + - - + + - - - Support vectors: - - - -

How do I solve the problem?
It’s a convex optimization problem Can solve in Matlab (don’t) Download from the web SMO: Sequential Minimal Optimization SVM-Light LibSVM LibLinear SVM-Perf Pegasos

Linear SVM for pedestrian detection

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

centered diagonal uncentered cubic-corrected Sobel

Histogram of gradient orientations

8 orientations X= 15x7 cells Slides by Pete Barnum
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pedestrian Slides by Pete Barnum
Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Kernel SVM Decision function is a linear combination of support vectors: Prediction is a dot product: Kernel is a function that computes the dot product of data points in some unknown space: We can compute the decision without knowing the space:

Useful kernels Linear! RBF Histogram intersection Pyramid match

Histogram intersection
+1 Assign to texture cluster Count S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

(Spatial) Pyramid Match
S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.

Boosting Weak classifier Weak learner builds weak classifiers
Classifier that is slightly better than random guessing Weak learner builds weak classifiers

Boosting Start with uniform distribution Iterate:
Get a weak classifier fk Compute it’s 0-1 error Take Update distribution Output the final “strong” classifier Yoav Freund Robert E. Schapire, A Short Introduction to Boosting

Face detection +1 face ? features classify -1 not face x F(x) y

Face detection Use haar-like features
X234>1.3 -1 Non-face +1 Face Yes No Use haar-like features Use decision stumps as week classifiers Use boosting to build a strong classifier Use sliding window to detect the face x120 x357 x629 x834

Methods for classification and image representation

Similar presentations

Presentation on theme: "Methods for classification and image representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Methods for classification and image representation

Similar presentations

Presentation on theme: "Methods for classification and image representation"— Presentation transcript:

Similar presentations

About project

Feedback