Presentation is loading. Please wait.

Presentation is loading. Please wait.

Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto,

Similar presentations


Presentation on theme: "Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto,"— Presentation transcript:

1 Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan

2 Accuracy vs. Evaluation Time for SVM Classifiers Accuracy Evaluation time Non-linear Kernel Linear Kernel

3 Accuracy vs. Evaluation Time for SVM Classifiers Accuracy Evaluation time Our CVPR 08 Non-linear Kernel Linear Kernel

4 Additive Kernel Accuracy Evaluation time Our CVPR 08 Accuracy vs. Evaluation Time for SVM Classifiers Non-linear Kernel Linear Kernel

5 Additive Kernel Accuracy Evaluation time Our CVPR 08 Accuracy vs. Evaluation Time for SVM Classifiers Non-linear Kernel Linear Kernel

6 Accuracy vs. Evaluation Time for SVM Classifiers Accuracy Evaluation time Our CVPR 08 Made it possible to use SVMs with additive kernels for detection. Non-linear Kernel Additive Kernel Linear Kernel Additive Kernel

7 Additive Classifiers Much work already uses them! – SVMs with additive kernels are additive classifiers Histogram based kernels – Histogram intersection, chi-squared kernel – Pyramid Match Kernel (Grauman & Darell, ICCV05) – Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR06) – ….

8 Accuracy vs. Training Time for SVM Classifiers Linear Kernel Accuracy Training time Non-linear

9 Accuracy vs. Training Time for SVM Classifiers Accuracy Training time Linear <=1990s Non-linear

10 Accuracy vs. Training Time for SVM Classifiers Accuracy Training time Today Linear Non-linear Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend

11 Accuracy vs. Training Time for SVM Classifiers Accuracy Training time Linear Our CVPR 08 Additive Non-linear

12 Accuracy vs. Training Time for SVM Classifiers Accuracy Training time Linear Our CVPR 08 Non-linear Additive

13 Accuracy vs. Training Time for SVM Classifiers Accuracy Training time Linear This Paper Non-linear Additive

14 Accuracy vs. Training Time for SVM Classifiers Linear Accuracy Training time This Paper Makes it possible to train additive classifiers very fast. Non-linear Additive

15 Summary Additive classifiers are widely used and can provide better accuracy than linear Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. Additive Kernel SVM Our Additive Classifier Linear SVM TimeTrain 1000 Test 1000 Train 10 Test 1 Train 10 Test 1 Accuracy95 %94 %82 % An example

16 Support Vector Machines Kernel Function Inner Product in the embedded space Can learn non-linear boundaries in input space Classification Function Kernel Trick Input Space Embedded Space

17 Embeddings… These embeddings can be high dimensional (even infinite) Our approach is based on embeddings that approximate kernels. Wed like this to be as accurate as possible We are going to use fast linear classifier training algorithms on the so sparseness is important.

18 Key Idea: Embedding an Additive Kernel Additive Kernels are easy to embed, just embed each dimension independently Linear Embedding for min Kernel for integers For non integers can approximate by quantizing

19 Issues: Embedding Error Quantization leads to large errors Better encoding x y

20 Issues: Sparsity Represent with sparse values

21 Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (not practical): Linear vs. Encoded SVMs

22 Linear SVM objective (solve with LIBLINEAR): Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)

23 Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (solve with LIBLINEAR) : Linear vs. Encoded SVMs

24 linearpiecewise linear IKSVM I Additive Classifier Choices Regularization Encoding

25 linearpiecewise linear IKSVM I Additive Classifier Choices Accuracy Increases Evaluation times are similar Regularization Encoding

26 linearpiecewise linear IKSVM I Additive Classifier Choices Accuracy Increases Evaluation times are similar Regularization Encoding

27 linearpiecewise linear IKSVM I Additive Classifier Choices Accuracy Increases Few lines of code + standard solver Eg. LIBLINEAR Standard solver Eg. LIBSVM Regularization Encoding

28 linearpiecewise linear IKSVM I Additive Classifier Choices Accuracy Increases Custom solver Regularization Encoding

29 linearpiecewise linear IKSVM I Additive Classifier Choices Accuracy Increases Classifier Notations Regularization Encoding

30 Experiments Small Scale: Caltech 101 (Fei-Fei, et.al.) Medium Scale: DC Pedestrians (Munder & Gavrila) Large Scale : INRIA Pedestrians (Dalal & Triggs)

31 Experiment : DC Pedestrians 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation 100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) (2.98s, 85.71%) (1.86s, 88.80%) (3.18s, 89.25%) (363s, 89.05%)

32 Experiment : Caltech training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel (41s, 46.15%) (2687s, 56.49%) (291s, 55.35%) (102s, 54.8%) (90s, 51.64%) 10x faster Small loss in accuracy

33 Experiment : INRIA Pedestrians SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots (20s, 0.82) (27s, 0.88) (140 mins, 0.95) (76s, 0.94) (122s, 0.85) 300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins

34 Experiment : INRIA Pedestrians SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots 300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins

35 Take Home Messages Additive models are practical for large scale data Can be trained discriminatively: – Poor mans version : encode + Linear SVM Solver – Middle mans version : encode + Custom Solver – Rich mans version : Min Kernel SVM Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time Everyone should use: see code on our websites – Fast IKSVM from CVPR08, Encoded SVMs, etc

36 Thank You


Download ppt "Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto,"

Similar presentations


Ads by Google