Download presentation
1
Robust Real Time Face Detection
P. Viola & Michael Jones Presented By: Matan Protter Visual Recognition, Spring 2005, Technion
2
Face Detection – What & Why?
Given an image: Ultimately: Find all faces, while making no mistakes. Practically: Find most faces, while making few mistakes. What for? Mainly: face / people recognition. What is a face? Standard testing set: MIT + CMU
3
Face Detection Methods
Too many to count… Look for meaningful features: Eyes, nose, ears, chin line, etc. Train a detector A combination of a number of weighted weak classifiers Differ by: Feature set Training set Training method Etc.
4
The Proposed Method Another learning – based detector
Novell ideas though: Detector structure designed to run quickly – “Cascade” Feature set Haar – like Feature selection method (training method) Modified boosting Also – a very extensive testing set Plus a way to automatically generate negative examples All will be explained in the following slides
5
Detector Structure – Cascade
Detectors usually scan every window in the image in every scale. Conventional detectors run all weak classifiers on all windows. Some windows can be discarded very quickly. Therefore, computation time is wasted. Solution: Construct a sequential set of tests. Only windows that pass a test move to the next. The rest are discarded and ignored. Windows that survive all tests are declared faces. Each test – more computationally expensive than the one before, but also more discriminative.
6
Background – Rejected Window
Cascade – Graphically Computational price Discrimination Ability More Tests All Window, All Scales Test 1 Test 2 Last Test Background – Rejected Window Legend: All windows that pass the test All windows that failed the test
7
Training : Building The Cascade
Training is done off-line, as a pre-processing step. Takes a lot of time Non-parallel version took weeks. Parallel version took a day. Who cares? – done once! Data sets: Positive examples : about 5000 faces (each a 24X24 pixels window), not fully aligned. Negative examples: about images which contain no faces (random crawl through Google). Independent validation set.
8
Training : Building The Cascade Cont.
Each level is trained separately, in sequential order. Training set: Positive examples : all input faces. Negative examples: first 5000 detections found by running the cascade up to that level on non-face image set. Each level is trained based on its predecessors’ errors! Training Level #N Positive Examples Negative Examples All Faces Detector Levels 1 to (N-1) Mis-Detection All Non Face Images
9
Training : Building The Cascade Cont.
Setting Goals – Entire Cascade Total Probability of Detection (PD) Total False Alarm Rate (FAR) Higher PD – more complicated levels Lower FAR – more levels Setting Goals – Each Level Can derive PD & FAR for each level from total detector’s PD & FAR. Trade Offs: PD , FAR , number of features , number of levels (running time)
10
Training : One Level Each level is made up of a combination of weak classifiers. Each classifier: Is made up of: A function to run on the window (feature value) Threshold level Polarity (faces are above threshold or below) Returns a 0/1 answer. Assigned a weight The level gives an answer: Decreasing the threshold results in higher PD and higher FAR.
11
Training : One Level – Selecting Classifiers
Adding Another (Optimal) Weak Classifier Normalize Example’s (Training Set) Weights No save time Select weak classifier that minimizes error (sum of weights of misclassifications) Decrease weights of correctly classified examples Check If Good Enough Run on validation set. Determine level’s threshold such that desired PD is met. Check if the desired FAR is also met Yes Finish Level
12
Classifiers - Description
Each classifier is the sum of 2, 3 or 4 adjacent rectangles of the same size. Each rectangle represents the sum of pixels inside it. Different configurations of rectangles. All possible sizes & locations of rectangles. Rectangles are summed either as positive (blue) or negative(yellow). Two Rectangle Horizontal Feature Three Rectangle Horizontal Feature Four Rectangle Feature Three Rectangle Veritcal Feature Two Rectangle Veritcal Feature
13
Classifiers – Cont. For a 24 by 24 pixel window – over 160,000 possible classifiers (all types, all locations, all sizes). Requires the selection process to be efficient Example: The first two classifiers selected
14
Classifiers - Explanation
They are reminiscent of Haar basis functions. Can also give intuitive explanation: evaluates the first derivative. similar to second derivative, also line detector evaluates the derivative in XY OK, so what’s new?
15
Classifiers – Very Efficient
And literally – the sum of the pixels up to that point. Can be computed in one pass over the image. The features can be computed in 6-9 matrix references. Multi-Scale Efficient Detector can be scaled, instead of image Insures better efficiency than any method that requires pyramids. Computationally Efficient Using the Integral Image (I_I) The Integral Image is, formally:
16
Results The only point that matters…
17
Results The only point that matters…
18
Results Compared To Others And speed?
15 frames per second (each frame is 388 X 244 pixels) on P3 800 MHz. Today can probably achieve real time. Training takes weeks (unless parrallelized)
19
Background – Rejected Window
Summary All Window, All Scales Test 1 Test 2 Last Test Background – Rejected Window More Tests Cascade: Haar – like features Modified Boosting Acceptable Results Speedy Results
20
Improvements A more extensive feature set.
Using the cascade idea with different tests. More efficient learning (weeks???)
21
Questions?
22
Thank You
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.