Download presentation

Presentation is loading. Please wait.

Published byClaire Thornton Modified over 2 years ago

1
Kullback-Leibler Boosting Ce Liu Heung-Yeung Shum Microsoft Research Asia Research Asia

2
A General Two-layer Classifiers Input Intermediate Output Projection function Discriminating function Identification function Coefficients

3
Issues under Two-layer Framework How to choose the type of projection function? How to choose the type of projection function? How to choose the type of discriminating function? How to choose the type of discriminating function? How to learn the parameters from samples? How to learn the parameters from samples? Projection function Sigmoid RBF Polynomial Discriminating function

4
Our proposal How to choose the type of projection function? How to choose the type of projection function? Kullback-Leibler linear featureKullback-Leibler linear feature How to choose the type of discriminating function? How to choose the type of discriminating function? Histogram divergencesHistogram divergences How to learn the parameters from samples? How to learn the parameters from samples? Sample re-weighting (Boosting)Sample re-weighting (Boosting) Kullback-Leibler Boosting (KL Boosting)

5
Intuitions Linear projection is robust and easy to compute Linear projection is robust and easy to compute The histograms of two classes upon a projection are evidences for classification The histograms of two classes upon a projection are evidences for classification The linear feature, on which the histograms of two classes differ most, should be selectedThe linear feature, on which the histograms of two classes differ most, should be selected If the weight distribution of the sample set changes, the histogram changes as well If the weight distribution of the sample set changes, the histogram changes as well Increase weights for misclassified samples, and decrease weights for correctly classified samplesIncrease weights for misclassified samples, and decrease weights for correctly classified samples

6
Linear projections and histograms

7
KLBoosting (1) At the k th iteration At the k th iteration Kullback-Leibler FeatureKullback-Leibler Feature Discriminating functionDiscriminating function ReweightingReweighting

8
KLBoosting (2) Two types of parameters to learn Two types of parameters to learn KL features:KL features: Combination coefficients:Combination coefficients: Learning KL feature in low dimensions: MCMC Learning KL feature in low dimensions: MCMC Learning weights to minimize training error Learning weights to minimize training error Optimization: brute-force searchOptimization: brute-force search

9
Learn combining coefficients Flowchart Input: Initialize weights Learn KL feature Update weights Recognition error small enough? Output classifier Y N

10
A Simple Example KL Features Histograms Decision manifold

11
A Complicated Case

12
Kullback-Leibler Analysis (KLA) A challenging task to find KL feature in image space A challenging task to find KL feature in image space Sequential 1D optimization Sequential 1D optimization Construct a feature bankConstruct a feature bank Build a set of the most promising featuresBuild a set of the most promising features Sequentially do 1D optimization along the promising featuresSequentially do 1D optimization along the promising features Conjecture: The global optimum of an objective function can be reached by searching along linear features as many as needed

13
Intuition of Sequential 1D Optimization Feature bank Promising feature set Result of Sequential 1D OptimizationMCMC feature

14
Optimization in Image Space Image is a random field, not a pure random variable Image is a random field, not a pure random variable The local statistics can be captured by wavelets The local statistics can be captured by wavelets 111×400 small-scale wavelets for the whole 20×20 patch111×400 small-scale wavelets for the whole 20×20 patch 80×100 large-scale wavelets for the inner 10×10 patch80×100 large-scale wavelets for the inner 10×10 patch Total 52,400 wavelets to compose a feature bankTotal 52,400 wavelets to compose a feature bank 2,800 most promising wavelets selected2,800 most promising wavelets selected Gaussian family wavelets Harr wavelets Feature bank

15
Compose the KL feature by sequential 1D optimization Data-driven KLA Face patternsNon-face patterns Feature bank (111 wavelets) Promising feature set (total 2,800 features) On each position of the 20*20 lattice, compute the histograms of the 111 wavelets and the KL divergences between face and non- face images. Large scale wavelets are used to capture the global statistics, on the 10*10 inner lattice

16
Comparison with Other Features MCMC feature Best Harr wavelet KL=2.944 (Harr wavelet) KL=3.246 (MCMC feature) KL feature KL= (KL feature)

17
Application: Face Detection Experimental setup Experimental setup 20×20 patch to represent face20×20 patch to represent face 17,520 frontal faces17,520 frontal faces 1,339,856,947 non-faces from 2,484 images1,339,856,947 non-faces from 2,484 images 300 bins in histogram representation300 bins in histogram representation A cascade of KLBoosting classifiers A cascade of KLBoosting classifiers In each classifier, keep false negative rate <0.01% and false alarm rate <35%In each classifier, keep false negative rate <0.01% and false alarm rate <35% Totally 22 classifiers to form the cascade (450 features)Totally 22 classifiers to form the cascade (450 features)

18
KL Features of Face Detector Face patternsNon-face patterns First 10 KL features Some other KL features Global semantics Frequency filters Local features

19
ROC Curve

20
Some Detection Results

21
Comparison with AdaBoost

22
Compared with AdaBoost KLBoostingAdaBoost Base classifier KL feature + histogram divergence Selected from experiences Combining coefficients Globally optimized to minimize training error Empirically set to be incrementally optimal

23
Summary KLBoosting is an optimal classifier KLBoosting is an optimal classifier Projection function: linear projectionProjection function: linear projection Discrimination function: histogram divergenceDiscrimination function: histogram divergence Coefficients: optimized by minimizing training errorCoefficients: optimized by minimizing training error KLA: a data-driven approach to pursue KL features KLA: a data-driven approach to pursue KL features Applications in face detection Applications in face detection

24
Thank you! Harry Shum Microsoft Research Asia Research Asia

25
Compared with SVM KLBoostingSVM Support vectors KL features learnt to optimize KL divergence (a few) Selected from training samples (many) Kennel function Histogram divergence (flexible) Selected from experiences (fixed)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google