Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.

Similar presentations


Presentation on theme: "1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01."— Presentation transcript:

1 1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01

2 2 Outline 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

3 3 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

4 4 1. Introduction  Observe three asymmetries in face detection problem: 1. Uneven class priors – Training database: # of Positives Vs. # of Negatives. 2. Goal asymmetry – Detection Rate Vs. False Positive Rate => EER 3. Unequal complexity with positive and negative classes – Face Vs. Car (Non-Face) => Easy to classify Face Vs. Animal (Non-Face) => Hard to classify  This paper present a framework similar to Adaboost: but faster in learning have the freedom to design an ensemble classifier.

5 5 1. Introduction  Decoupled classifier design step into Feature selection and Ensemble classifier. (ex: FDA, SVM…)  Proposed Forward Feature Selection (FFS) and Linear Asymmetric Classifier (LAC).  Advantage : 1.FFS is about 2.5 ~3.5 times faster than Fast Adaboost and 50~100 times faster than Adaboost in training process. 2.FFS only requires about 3% memory usage as that of Adaboost. 3.Have the freedom to design an ensemble classifier.

6 6 1. Introduction: Adaboost Vs. FFS+LAC Adaboost FFS+LAC h1h1 α1α1 α2α2 h2h2 α3α3 α4α4 α5α5 h5h5 h3h3 h4h4 FFS h1h1 h2h2 h4h4 h5h5 h3h3 1 1 1 1 1 LAC h2h2 α1’α1’ h1h1 h5h5 h4h4 h3h3 α2’α2’ α5’α5’ α3’α3’ α4’α4’ image z i, weight w i =1/N p Assume N=N p +N n N p positive samples N n negative samples image z i, weight w i =1/ N p c i is the label of z i, and h 1 is the weak classifier with weight α 1 k = # of weak classifier

7 7 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

8 8 2. Recall Adaboost (1/2) 1. Input Data N Training Data N samples 2. Cascaded Framework Learning goal satisfied? Adding new node Node Learning F T 1. Normalize weights. 2. Pick appropriate threshold for each weak classifier h i, where 1<i<M. M is the number of features. Feature Selection and Ensemble Classifier (Adaboost) 3. Cascaded Detector 4. Update weights with input data z, and h ’ s corresponding mask (feature) m and threshold τ. H k+1 3. Choose the classifier, h t, with the lowest error. Were coupling together (not separable) α h H1H1 H2H2 H3H3 N=Np+NnN=Np+Nn,α t is the weight of h t T iterations

9 9 2. Recall Adaboost (2/2)  α t is decided once h t is chosen.  Weight w t,i is updated by the error rate ε i at the end of each iteration, where w t,i is the weight of sample i at iteration t.  Feature = (Filter, Position) Feature Value = Feature * example, * = convolution Classifier = (Feature, Threshold)

10 10 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

11 11 3. System Flowchart: Notations  z: Input example.  x: Vector of feature values of a positive example.  y: Vector of feature values of a negative example.  : Covariance matrix of x.   a: Optimal weight.  b: Optimal threshold. sample x i h1h1 h3h3 h2h2 h4h4 weak classifiers Convolution

12 12 3. System Flowchart: FFS+LAC 1. Input Data N Training Data N samples 2. Cascaded Framework Learning goal satisfied? Adding new node Node Learning F T 1. Build Feature Table. 2. Choose the weak classifier, h i, that makes H ’ has the smallest error rate. Feature Selection (FFS) Ensemble Classifier (LAC) 3. Cascaded Detector H k+1 Separable H1H1 H2H2 H3H3 N=Np+NnN=Np+Nn Θ is the threshold of H(z) T iterations

13 13 3. System Flowchart: Q&A (1/2)  Q1: What ’ s the difference between Adaboost and FFS+LAC?  A1: We can ’ t separate Adaboost into feature selection and ensemble classifier step.  (Adaboost) α i is decided once h i is chosen.  (FFS) α i is 1 for all h i. Each sample weight w i in Adaboost is updated at the end of each round.  Q2: Why using FFS instead of Adaboost?  A2: FFS: 1-bit for each weight storage. (only 3% memory) Adaboost: 32-bits each.  Q3: Can Adaboost be expedited by a pre-computing strategy?  A3: Yes. If the weights keeps unchanged (no weight update)=> fast Adaboost.

14 14 3. System Flowchart: Q&A (2/2)  Conclusion: 1. (Training Process) FFS is about : a. 2.5 ~3.5 times faster than Fast Adaboost. b. 50~100 times faster than Adaboost. c. only 3% memory usage. 2.It ’ s much easier to implement in plate form. 3.We have freedom to design our own algorithms (ex: SVM, FDA … ) for solving different problems.

15 15 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

16 16 4. Forward Feature Selection (FFS) Fig. 1. Adaboost vs FFS Train all weak classifiers Add the feature with minimum weighted error to the ensemble Adjust threshold of the Ensemble to meet the learning goal (a) Adaboost O(NMTlogN) O(T) O(N) T Train all weak classifiers Add the feature to minimize error of the current ensemble Adjust threshold of the ensemble to meet the Learning goal (b) FFS O(NMlogN) O(NMT) O(N) T

17 17 4. FFS: Adaboost Vs. FFS - Adaboost w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: h1h1 h2h2 ε 1 =2ε 2 =5 h3h3 h4h4 ε 3 =7ε 4 =4 Iteration 1 w1’w1’ w2’w2’ w4’w4’ w3’w3’ w5’w5’ w6’w6’ Samples: 取 min => ε 1 Iteration 2 ε 2 ’ = 9ε 3 ’ =5ε 4 ’ = 3 取 min => ε 4 Error: Updated Weak classifiers:

18 18 4. FFS: Adaboost Vs. FFS - FFS w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: Iteration 1 w1w1 w2w2 w4w4 w3w3 w5w5 w6w6 Samples: Iteration 2 ε 2 ’ =6ε 3 ’ =10ε 4 ’ =8 取 min => ε 2 Error: h1h1 Unchanged The chosen one in first iteration h1h1 h2h2 ε 1 =2ε 2 =5 h3h3 h4h4 ε 3 =7ε 4 =4 取 min => ε 1 Error: Weak classifiers:

19 19 4. FFS: Training Process a. for i =1 to M do find θ that makes H ’ has the smallest error rate end for b. k<=arg min 1 ≦ i ≦ M ε i c. Find h k that makes H ’ has the smallest error rate For each feature i a. Sort the feature value V i1 ~V iN b. Choose a threshold τ with the smallest error i = M i < M, i=i+1 t = T t < T (With N samples (images), M features) 2. Build Feature Table: Size MxN with input example z, and h ’ s corresponding mask m and threshold τ. 1. Train all weak classifiers 3. Add the feature to minimize error of the current ensemble 4. Adjust value of θ: adjust θ to make H has a 50% false positive rate on the training set ε i <=the error rate of H ’ with the chosen θ (threshold) V in S <= ψ Fix Theta => Adjust V

20 20 4. FFS: Example - Train all weak classifiers For a given feature i, 1 ≦ i ≦ M Feature values for each example Sort Set N = 6 Initial ε= 0.2+0.3+0.1+0.4 = 1 9257361216 Non-FaceFace 1236925716 w1w1 w2w2 w6w6 w3w3 w5w5 w4w4 0.60.50.40.10.30.2 1236925716 1. ε= 1-0.2=0.8 1236925716 2. ε= 0.8-0.3=0.5 1236925716 3. ε= 0.5-0.1=0.4 4. ε= 0.4-0.4=0 5. ε= 0+0.6=0.6 1236925716 1236925716 1236925716 6. ε= 0.6+0.5=1.1 1236925716 τ=16 zTm zTmzizi Paper: P. 5, Algo. 3 threshold P N

21 21 4. FFS: Example – Feature Selection Using Table (1/2) M=4 N=6 t = 1 取 h 3 為第一輪 的 weak h Pos Neg Classify result while apply h 3 to sample 2

22 22 4. FFS: Example – Feature Selection Using Table (2/2) M=4, N=6 t = 2 Pos Neg 取 h 1 為第二輪 的 weak h

23 23 4. FFS: FFS Vs. Adaboost Three major difference between FFS and Adaboost in implementation  No weight update – Faster, due to the Table.  Total vote (confidence value before normalized) in FFS is between 0 and T. Adaboost can be any real number.  Criterion: FFS: selected feature should make the ensemble classifier has smallest error on the training set. Adaboost: choose a feature with the smallest weighted error on the training set.

24 24 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

25 25 5. Linear Asymmetric Classifier (LAC) We can treat β as (1 - false positive rate) We want to optimize this

26 26 5. LAC: Definitions k Normalize term

27 27 5. LAC: Derivation (1/3) Constraint (1) can be re-written as We want to maximize It ’ s equal to minimize Take (2) into b

28 28 5. LAC: Derivation (2/3) =k=k Assume y is symmetric distribution, we have For β=0.5 we have k2k2 k1k1 Fig. 2.

29 29 5. LAC: Derivation (3/3) Fig. 3. Normality test for a T y, in which y is a feature vector extracted from non- face data, and a is drawn from the uniform distribution [0 1] T. It ’ s more likely to be normal distribution while we are close to the red line

30 30 5. LAC: Optimal Result Compared with FDA FDALAC Optimal Result Output is a classifier

31 31 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

32 32 6. Experimental Result: LAC Vs. FDA Fig. 4. Comparing LAC and FDA on synthetic data set when both x and y are Gaussians. (red for positives, blue for negatives)

33 33 6. Experimental Result: Synthetic Data Fig. 5. Synthetic data where y is not symmetric.

34 34 Fig. 6. Experiments comparing different linear discrimination functions. In 6(a), training data sets are collected from AdaBoost+FDA cascade ’ s node 11 to 21. And in 6(b), were collected from AdaBoost+LAC. 6. Experimental Result: Adaboost Vs. FDA&LAC

35 35 Fig. 7. Experiments comparing different linear discrimination functions. Training sets were collected from AdaBoost cascade ’ s. 6. Experimental Result: Adaboost Vs. FDA&LAC

36 36 6. Experimental Result: Adaboost Vs. FFS Fig. 8. Experiments comparing cascade performances on the MIT+CMU test set (ROC).

37 37 Fig. 8. Experiments comparing cascade performances on the MIT+CMU test set. (a) with post-processing. (b) without post-processing. (a) (b) 6. Experimental Result: Effect of Post- Processing??? Post-processing:????

38 38 1. Introduction 2. Recall Adaboost 3. System Flowchart 4. Forward Feature Selection (FFS) 5. Linear Asymmetric Classifier (LAC) 6. Experimental Result 7. Conclusion

39 39 7. Conclusion (Contribution??)  Three types of asymmetric are categorized.  Decoupled classifier design step into feature selection and design ensemble classifier.  Proposed FFS for feature selection, and it is 2.5~3.5 times faster than Adaboost with only 3% memory usage as that of Adaboost.  Proposed LAC for ensemble classifier to solve the asymmetric problem.

40 Problems: Q&A??? 40

41 41 Reference  [1] J. Wu, C. Brubaker, "Fast Asymmetric Learning for Cascade Face Detection", IEEE transaction on Pattern Analysis and Machine Intelligence, pp369-382, March 2008.  [2] P. Viola and M Jones, "Robust Real-time Object Detection", Intl. J. Computer Vision, 57(2): pp.137-154, 2004.  [3] P. Viola and M Jones, " Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade", NIPS, pp.1311-1318, 2001.


Download ppt "1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01."

Similar presentations


Ads by Google