Presentation is loading. Please wait.

Presentation is loading. Please wait.

Face Detection and Recognition Reading: Chapter 18.10 and, optionally, “Face Recognition using Eigenfaces” by M. Turk and A. Pentland.

Similar presentations


Presentation on theme: "Face Detection and Recognition Reading: Chapter 18.10 and, optionally, “Face Recognition using Eigenfaces” by M. Turk and A. Pentland."— Presentation transcript:

1 Face Detection and Recognition Reading: Chapter 18.10 and, optionally, “Face Recognition using Eigenfaces” by M. Turk and A. Pentland

2 Face Detection Problem Scan window over image Classify window as either: –Face –Non-face Classifier Window Face Non-face

3 http://www.ri.cmu.edu/projects/project_320.html Face Detection Problem

4 Face Detection in most Consumer Cameras and Smartphones for Autofocus

5 The Viola-Jones Real-Time Face Detector P. Viola and M. Jones, 2004 Challenges: Each image contains 10,000 – 50,000 locations and scales where a face may occur Faces are rare: 0 - 50 per image >1,000 times as many non-faces as faces Want a very small # of false positives: 10 -6

6 Training Data (grayscale) 5,000 faces (frontal) 10 8 non-faces Faces are normalized –Scale, translation Many variations Across individuals Illumination Pose (rotation both in plane and out) Use Machine Learning to Create a 2-Class Classifier

7 Use Classifier at All Locations and Scales

8 Building a Classifier Compute lots of very simple features Efficiently choose best features Each feature is used to define a “weak classifier” Combine weak classifiers into an ensemble classifier based on boosting Learn multiple ensemble classifiers and “chain” them together to improve classification accuracy

9 Computing Features At each position and scale, use a sub- image (“window”) of size 24 x 24 Compute multiple candidate features for each window Want to rapidly compute these features

10 What Types of Features? Useful domain knowledge: –The eye region is darker than the forehead or the upper cheeks –The nose bridge region is brighter than the eyes –The mouth is darker than the chin Encoding –Location and size: eyes, nose bridge, mouth, etc. –Value: darker vs. lighter

11 Features 4 feature types (similar to “Haar wavelets”): Two-rectangle Three-rectangle Four-rectangle Value = ∑ (pixels in white area) - ∑ (pixels in black area)

12 Huge Number of Features 160,000 features for each window!

13 Computing Features Efficiently: The Integral Image Intermediate representation of the image –Sum of all pixels above and to left of (x, y) in image i : Computed in one pass over the image: ii(x, y) = i(x, y) + ii(x-1, y) + ii(x, y-1) − ii(x-1, y-1)

14 Using the Integral Image With the integral image representation, we can compute the value of any rectangular sum in constant time For example, the integral sum in rectangle D is computed as: ii(4) + ii(1) – ii(2) – ii(3) (x,y) s(x, y) = s(x, y-1) + i(x, y) ii(x, y) = ii(x-1, y) + s(x, y) (0,0) x y

15 Features as Weak Classifiers Given window x, feature detector f t, and threshold θ t, construct a weak classifier, h t

16 Aggregation of predictions of multiple classifiers with the goal of improving accuracy by reducing the variance of an estimated prediction function “Mixture of experts” Combining multiple classifiers often produces higher accuracy than any individual classifier Combining Classifiers: Ensemble Methods

17 Ensemble Strategies Bagging –Create classifiers using different training sets, where each training set is created by “bootstrapping,” i.e., drawing examples (with replacement) from all possible training examples Boosting –Sequential production of classifiers, where each classifier is dependent on the previous one –Make examples misclassified by current classifier more important in the next classifier used for Random Forests

18 Boosting Boosting is a class of ensemble methods for sequentially producing multiple weak classifiers, where each classifier is dependent on the previous ones Make examples misclassified by previous classifier more important in the next classifier Combine all weak classifiers linearly into combined classifier: weight weak classifier

19 Boosting How to select the best features? How to learn the final classification function? C(x) = α 1 h 1 (x) + α 2 h 2 (x) +....

20 AdaBoost Algorithm Given a set of training windows labelled +/-, initially give equal weight to each training example Repeat T times 1.Select best weak classifier (min total weighted error on all training examples) 2.Increase weights of examples misclassified by current weak classifier Each round greedily selects the best feature given all previously selected features Final classifier weights weak classifiers by their accuracy

21 AdaBoost Algorithm 1. Initialize Weights 2. Construct a classifier. Compute the error. 3. Update the weights, and repeat step 2. … 4. Finally, sum hypotheses… Slide by T. Holloway

22 Each data point has a class label: w t =1 and a weight: +1 ( ) -1 ( ) y t = AdaBoost Example x t=1 x t=2 xtxt

23 Example Weak learners from the family of decision lines Each data point has a class label: w t =1 and a weight: +1 ( ) -1 ( ) y t =

24 Example This one is best Each data point has a class label: w t =1 and a weight: +1 ( ) -1 ( ) y t = This is a ‘weak classifier’: It performs slightly better than chance

25 Example Re-weight misclassified examples Each data point has a class label: w t w t exp{-y t h t } Update the weights: +1 ( ) -1 ( ) y t =

26 Example Select 2 nd weak classifier Each data point has a class label: w t w t exp{-y t h t } Update the weights: +1 ( ) -1 ( ) y t =

27 Example Re-weight misclassified examples Each data point has a class label: w t w t exp{-y t h t } Update the weights: +1 ( ) -1 ( ) y t =

28 Example Select 3 rd weak classifier Each data point has a class label: w t w t exp{-y t h t } Update the weights: +1 ( ) -1 ( ) y t =

29 Example The final classifier is the combination of all the weak, linear classifiers h1h1 h2h2 h3h3 h4h4

30 Classifications (colors) and weights (size) after 1 iteration Of AdaBoost 3 iterations 20 iterations from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007. Slide by T. Holloway

31 AdaBoost with Rectangle Features For each round of boosting: –Evaluate each rectangle feature on each example –Sort examples by feature values –Select best threshold (θ) for each feature (one with lowest error) –Select best feature/threshold combination from all candidate features –Compute weight ( α ) and incorporate feature into strong classifier F(x) F(x) + α f(x) –Re-weight examples O(mn) time where m = #features, n = #examples

32 AdaBoost - Adaptive Boosting Learn a single simple classifier Classify the data Look at where it makes errors Re-weight the data so that the inputs where we made errors get higher weight in the learning process Now learn a 2 nd simple classifier on the weighted data Combine the 1 st and 2 nd classifiers and weight the data according to where they make errors Learn a 3 rd classifier based on the weighted data … and so on until we learn T simple classifiers Final classifier is the weighted combination of all T classifiers

33 AdaBoost Advantages –Very little code –Reduces variance Disadvantages –Sensitive to noise and outliers Slide by T. Holloway

34 Example Classifier ROC curve for 200-feature classifier A classifier with T=200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 face in 14,084 false positives Not quite competitive false positive rate correct detection rate

35 Learned Features

36 Classifier Error is Driven Down as T Increases, but Cost Increases

37 AdaBoost: Training Error Freund and Schapire (1997) proved that AdaBoost ADApts to the error rates of the individual weak hypotheses

38 AdaBoost: Generalization Error Freund and Schapire (1997) showed that

39 AdaBoost: Generalization Error An alternative analysis was presented by Schapire et al. (1998), that suits the empirical findings

40 AdaBoost: Generalization Error It seems that Boosting may overfit if run for too many rounds It has been observed empirically that AdaBoost does not overfit, even when run for thousands of rounds It has been observed that the generalization error continues to go down long after training error reaches 0

41 AdaBoost: Different Point of View We try to solve the problem of approximating the y’s using a linear combination of weak hypotheses In other words, we are interested in the problem of finding a vector of parameters α such that is a good approximation of y i For classification problems we try to match the sign of f(x i ) to y i

42 AdaBoost – different point of view Sometimes it is advantageous to minimize some other (non-negative) loss function instead of the number of classification errors For AdaBoost the loss function is This point of view was used by Collins, Schapire and Singer (2002) to demonstrate that AdaBoost converges to optimality

43 Weak Classifiers Defined by a feature, f, computed on a window in the image, a threshold, θ, and a parity, p: Thresholds are obtained by the mean value for the feature on both classes and then averaging the two values Parity defines the direction of the inequality

44 Weak Classifiers Weak Classifier: A feature that best separates a set of training examples Given a sub-window (x), a feature (f), a threshold ( θ ), and a polarity (p) indicating the direction of the inequality:

45 Weak Classifiers A weak classifier is a combination of a feature and a threshold We have K features We have N thresholds where N is the number of examples Thus there are KN weak classifiers

46 Weak Classifier Selection For each feature, sort the examples based on feature value For each element evaluate the total sum of positive/negative example weights (T+/T-) and the sum of positive/negative weights below the current example (S+/S-) The error for a threshold which splits the range between the current and previous example in the sorted list is

47 Example eBAS-S+T-T+Wfyx 2/53/52/5002/53/51/52X1 1/54/51/51/502/53/51/53X2 05/502/502/53/51/551X3 1/54/51/52/51/52/53/51/571X4 2/53/52/52/52/52/53/51/581X5

48 AdaBoost Given a training set of images –For t = 1, …,T (rounds of boosting) A weak classifier is trained using a single feature The error of the classifier is calculated –The classifier (or single feature) with the lowest error is selected, and combined with the others to make a strong classifier After a T rounds a T-strong classifier is created –It is the weighted linear combination of the weak classifiers selected

49 Cascading Improve accuracy and speed by “cascading” multiple classifiers together Start with a simple (small T) classifier that rejects many of the negative windows while detecting almost all positive windows Positive results from the first classifier triggers the evaluation of a second (more complex; larger T) classifier, and so on –Each classifier is trained with the false positives from the previous classifier A negative outcome at any point leads to the immediate rejection of the window (i.e., no face detected)

50 Building Fast Classifiers Given a nested set of classifier hypothesis classes Computational Risk Minimization vs falsenegdetermined by % False Pos % Detection 0 50 50 100 FACE IMAGE WINDOW Classifier 1 F T NON-FACE Classifier 3 T F NON-FACE F T Classifier 2 T F NON-FACE

51 Cascaded Classifier 5 Features 50% F NON-FACE 20 Features 20%2% FACE F NON-FACE IMAGE WINDOW 1 Feature F NON-FACE A T=1 feature classifier achieves 100% detection rate and about 50% false positive rate A T=5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) –using data from previous stage A T=20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative)

52 Training a Cascade User selects values for: –Maximum acceptable false positive rate per node –Minimum acceptable detection rate per node –Target overall false positive rate User gives a set of positive and negative training examples

53 Training a Cascade Algorithm While the overall false positive rate is not met: –While the false positive rate of current node is less than the maximum: Train a classifier with n features using AdaBoost on a set of positive and negative training examples Decrease the threshold for current classifier detection rate of the layer until it is more than the minimum Evaluate current cascade classifier on tuning set –Evaluate current cascade detector on a set of non- face images and put any false detections into the negative training set

54 Results

55 Detection in Real Images Basic classifier operates on 24 x 24 windows Scaling –Scale the detector (rather than the images) –Features can easily be evaluated at any scale –Scale by factors of 1.25 Location –Move detector around the image (e.g., 1 pixel increments) Final Detection –A real face may result in multiple nearby detections –Post-process detected sub-windows to combine overlapping detections into a single detection

56 Training Data Set 4,916 hand-labeled face images (24 x 24) 9,500 non-face images

57 Structure of the Detector 38 layer cascade 6,060 features

58 Speed of Final Detector In 2003, on a 700 Mhz processor, the algorithm could process a 384 x 288 image in about 0.067 seconds Real-time implementations exist for face tracking in video

59 Results Notice detection at multiple scales

60 Results

61 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection

62

63 Profile Features

64 Demo Implementing Viola & Jones system Frank Fritze, 2004

65 http://vimeo.com/12774628#

66 Real-Time Demos http://flashfacedetection.com/ http://flashfacedetection.com/camdemo2.html

67 Face detection: recent approaches X. Zhu and D. Ramanan, Face Detection, Pose Estimation, and Landmark Localization in the Wild, CVPR, 2012

68 Face detection: recent approaches Shen et al., Detecting and Aligning Faces by Image Retrieval, CVPR, 2013

69 Face detection: recent approaches Shen et al., Detecting and Aligning Faces by Image Retrieval, CVPR 2013

70 Face Alignment and Landmark Localization Goal of face alignment: automatically align a face (usually non-rigidly) to a canonical reference Goal of face landmark localization: automatically locate face landmarks of interests http://www.mathworks.com/matlabcentral/fx_files/32704/4/icaam.jpg http://homes.cs.washington.edu/~neeraj/projects/face-parts/images/teaser.png

71 Face Image Parsing Given an input face image, automatically segment the face into its constituent parts Smith, Zhang, Brandt, Lin, and Yang, Exemplar-Based Face Parsing, CVPR 2013

72 Face Image Parsing: Results + InputSoft segmentsHard segmentsGround truth


Download ppt "Face Detection and Recognition Reading: Chapter 18.10 and, optionally, “Face Recognition using Eigenfaces” by M. Turk and A. Pentland."

Similar presentations


Ads by Google