Part 3: discriminative methods

Part 3: discriminative methods

Discriminative methods
Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows, and a decision is taken at each window about if it contains a target object or not. Each window is represented by extracting a large number of features that encode information such as boundaries, textures, color, spatial structure. The classification function, that maps an image window into a binary decision, is learnt using methods such as SVMs, boosting or neural networks.

Overview General introduction, formulation of the problem in this framework and some history. Explore demo + related theory Current challenges and future trends There are many discriminative approaches for object recognition. The literature is very large here. In this presentation I will not try to make an overview of the important methods. Instead, I will go into depth into one single technique that is easy enough to get a good handle on it on just one hour. I will also review some other techniques with the goal of illustrating the typical formulation of the object recognition problem under a discriminative framework. Then, at the end of the talk, I will discuss some of the future directions I think will become important in the next few years.

Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows … and a decision is taken at each window about if it contains a target object or not. Decision boundary Computer screen Background In some feature space Where are the screens? Bag of image patches

Formulation Formulation: binary classification Classification function
… x1 x2 x3 … xN xN+1 xN+2 … xN+M Features x = y = -1 +1 -1 -1 Labels ? ? ? Training data: each image patch is labeled as containing the object or background Test data Where belongs to some family of functions Classification function

Formulation The loss function
Find that minimizes the loss on the training

Discriminative vs. generative
Generative model 10 20 30 40 50 60 70 0.05 0.1 x = data Discriminative model 10 20 30 40 50 60 70 0.5 1 This is one of the arguments some times used to indicate some advantages that a discriminative framework might have over a generative framework. x = data Classification function 10 20 30 40 50 60 70 80 -1 1 x = data

Classifiers NN SVM K-NN, nearest neighbors Additive models

Nearest neighbor classifiers
Learning involves adjusting the parameters that define the distance

Object models Invariance: search strategy Part based Template matching

Object models Big emphasis on the past on face detection:
It provided a clear problem with a well defined visual category and many applications. It allowed making progress on efficient techniques. Rowley Scheiderman Poggio Viola Ullman Geman

A simple object detector with Boosting
Download Toolbox for manipulating dataset Code and dataset Matlab code Gentle boosting Object detector using a part based model Dataset with cars and computer monitors In the next part I will focus on particular technique, and I will provide a detailed description of things you need to know to make it work. This part is acompanied by a Matlab toolbox that you can find on the web site of the class. Is entirely self contained and written in matlab. It has an implementation of gentle boosting, and an object detector. The demo shows results on detecting screens and cars and can be easily trained to detect other objects.

Boosting A simple algorithm for learning robust classifiers (Shapiro, Friedman) Provides efficient algorithm for sparse visual feature selection (Tieu & Viola, Viola & Jones)

Boosting Defines a classifier using an additive model:
Strong classifier Weak classifier Weight Features vector We need to define a family of weak classifiers from a family of weak classifiers

Boosting It is a greedy procedure: xt=1 Each data point has xt
a class label: xt xt=2 +1 ( ) -1 ( ) yt = and a weight: wt =1

Toy example Weak learners from the family of lines Each data point has
a class label: +1 ( ) -1 ( ) yt = and a weight: wt =1 h => p(error) = 0.5 it is at chance

Toy example Each data point has a class label: +1 ( ) yt = -1 ( )
+1 ( ) -1 ( ) yt = and a weight: wt =1 This one seems to be the best This is a ‘weak classifier’: It performs slightly better than chance.

Toy example Each data point has a class label: +1 ( ) yt = -1 ( )
+1 ( ) -1 ( ) yt = We update the weights: wt wt exp{-yt Ht} We set a new problem for which the previous weak classifier performs at chance again

Toy example f1 f2 f4 f3 The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.

Boosting Different cost functions and minimization algorithms result is various flavors of Boosting In this demo, I will use gentleBoosting: it is simple to implement and numerically stable.

gentleBoosting GentleBoosting fits the additive model
by minimizing the exponential loss Training samples Minimization is perform Newton steps

gentleBoosting Iterative procedure. At each step we add
We chose that we minimizes the cost: A Taylor approximation of this term gives: At each iterations we just need to solve a weighted least squares problem Weights at this iteration For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998)

Weak classifiers Generic form
Regression stumps: simple but commonly used in object detection. Decision tree fm(x) b=Ew(y [x> q]) a=Ew(y [x< q]) Four parameters: x q

gentleBoosting.m Initialize weights w = 1 Solve weighted least-squares
function classifier = gentleBoost(x, y, Nrounds) … for m = 1:Nrounds fm = selectBestWeakClassifier(x, y, w); w = w .* exp(- y .* fm); % store parameters of fm in classifier end Initialize weights w = 1 Solve weighted least-squares Re-weight training samples

Demo gentleBoosting Demo using Gentle boost and stumps with hand selected 2D data: > demoGentleBoost.m

Probabilistic interpretation
Generative model Discriminative (Boosting) model In a probabilistic framework, opposed to the generative model, here we fit a conditional probability. Boosting is a way of fitting a logistic regression. It can be a set of arbitrary functions This provides a great flexibility, difficult to beat by current generative models. But also there is the danger of not understanding what are they really doing.

From images to features: Weak detectors
We will now define a family of visual features that can be used as weak classifiers (“weak detectors”)

Weak detectors Textures of textures Tieu and Viola, CVPR 2000
One the first papers to use boosting for a vision task. Take one filter, filter the image, take the absolute value and downsample. Take another filter, take absolute value of the output and downsample. Do this once more. Then, for each color channel independently, sum the value of all the pixels and the resulting number will be one of the features. If you do this for all triples of filters, you’ll end up with features. Every combination of three filters generates a different feature This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting is also very robust to overfitting.

Weak detectors Haar filters and integral image

Weak detectors Edge probes & chamfer distances

Weak detectors Part based: similar to part-based generative models. We create weak detectors by using parts and voting for the object center location Screen model Car model These features are the ones used for the detector on the course web site.

Weak detectors We can create a family of “weak detectors” by collecting a set of filtered templates from a set of training objects. = * = The simples weak detector will be using template matching. In this case we use template matching to detect the presence of a part of the object. Once the presence is detected it will vote for the object in the center of the object coordinates frame. In this example, we are trying to detect a computer screen. We use the top left corner of the screen. The first thing we do is compute a normalize correlation between the image and the template. This will give a pick in the response near the top-left corner of the screen and, of course, many other matches that do not correspond to screen. Now we displace the output of the correlation taking into account the relative positions between the top left corner of the screen and the center of the screen. We also blur the output. Now, if we apply a threshold we can see that the center of the screen has a value above the threshold while most of the background gets supressed. There are a lot of other locations that also are above the threshold. Therefore, this is not a good detector. But it is better than chance! Better than chance

Weak detectors We can do a better job using filtered images
Still a weak detector but better than before We can create a family of “weak detectors” by collecting a set of filtered templates from a set of training objects.

From images to features: Weak detectors
First we create a family of weak detectors. 1) We collect a set of filtered templates from a set of training objects. So, in sumary, we start collecting many patches 2) Each weak detector works by doing template matching using one of the templates.

Weak detectors Generative model Discriminative (Boosting) model Image
In a probabilistic framework, opposed to the generative model, here we fit a conditional probability. Boosting is a way of fitting a logistic regression. Image fi, Pi Feature gi Part template Relative position wrt object center

Training First we evaluate all the N features on all the training images. Then, we sample the feature outputs on the object center and at random locations in the background: So, now we have to train. We need positive and negative examples. So we do the next thing: get positive samples by taking the values of all the features in the center of the target. Then take negative samples from the background.

Example of weak ‘screen detector’
We run boosting: At each iteration we select the best feature on the weighted classification problem: hm (v) A decision stump is a threshold on a single feature Each decision stump has 4 parameters: {f, q, a, b} f = template index (selected among a dictionary of templates) q = Threshold, a,b = average class value (-1, +1) at each side of the threshold

Representation … … Selected features for the screen detector 100 1 2 3
4 10

Representation … … Selected features for the car detector 1 2 3 4 10
100

Example: screen detection
Feature output

Feature output Thresholded output Weak ‘detector’ Produces many false alarms.

Feature output Thresholded output Strong classifier at iteration 1

Feature output Thresholded output Strong classifier Second weak ‘detector’ Produces a different set of false alarms.

Feature output Thresholded output Strong classifier + Strong classifier at iteration 2

Feature output Thresholded output Strong classifier + … Strong classifier at iteration 10

Feature output Thresholded output Strong classifier + … Adding features Final classification Strong classifier at iteration 200

Demo Demo of screen and car detectors using parts, Gentle boost, and stumps: > runDetector.m

Performance measures ROC Precision-recall

Cascade of classifiers
Geman Viola

Hybrid methods In many cases, algorithms can not be classified as generative or discriminative Mixture of both generative (just one slide mention these methods) Hinton, Schiele, Perona (Welling ICCV05), Poggio (morphable models and learning face from one example) Some examples

“Head in the coffee beans problem”
Can you find the head in this image?

Multiclass object detection
Multiview Multiclass

Multiclass object detection

Sharing representations
Multiview Multiclass

Sharing representations
Ullman & Evgyne

Context Multiview Multiclass

Context: relationships between objects
Detect first simple objects (reliable detectors) that provide strong contextual constraints to the target (screen -> keyboard -> mouse)

Session 3: discriminative methods
20’ demo Simple (ada-) boosting + object detector 20’ review + new stuff Viola jones Schneiderman et al. Opelt et al. Torralba et al. SVM (triggs, CVPR 05) NN (LeCun)

Session 3: Demo Simple (ada-) boosting LeCun demo???? Viola demo???

Session 3: Demo gentle Boost

Session 3: Review Viola jones Schneiderman et al. Opelt et al.
Torralba et al. SVM (triggs, CVPR 05 – get video) NN (LeCun)

Session 3: multiclass Multiclass boosting, Some examples

Single category object detection and the “Head in the coffee beans problem”

Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows Bag of image patches

… and a decision is taken at each window about if it contains a target object or not. Background Computer screen Decision boundary

Part 3: discriminative methods

Similar presentations

Presentation on theme: "Part 3: discriminative methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 3: discriminative methods

Similar presentations

Presentation on theme: "Part 3: discriminative methods"— Presentation transcript:

Similar presentations

About project

Feedback