Download presentation

Presentation is loading. Please wait.

1
**Max-Margin Additive Classifiers for Detection**

Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan Thankyou. Good afternoon everybody. I am going to present ways to train additive classifiers efficiently . This work is a part of an ongoing collaboration with alex berg.

2
**Accuracy vs. Evaluation Time for SVM Classifiers**

Non-linear Kernel Evaluation time Linear Kernel For any classification task the two main things we care about are accuracy and evaluation time. Especially for object detection where one evalutaes a classifier on thousands of windows Per image – the evalutation time becomes very important. In the past linear SVMs though relatively less accurate were preferred over kernel SVMs for real-time applications. Accuracy

3
**Accuracy vs. Evaluation Time for SVM Classifiers**

Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel In our CVPR 08 paper… Accuracy

4
**Accuracy vs. Evaluation Time for SVM Classifiers**

Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions. Accuracy

5
**Accuracy vs. Evaluation Time for SVM Classifiers**

Additive Kernel Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions. Accuracy

6
**Accuracy vs. Evaluation Time for SVM Classifiers**

Additive Kernel Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Additive Kernel And showed that they can be evaulated efficiently. This makes it possible for one to use more accurate classifiers with relatively no loss in speed. In fact more than half of this Year’s submissions to the PACCAL VOC object detection challenge use variants of additive kernels. Accuracy Made it possible to use SVMs with additive kernels for detection.

7
**Additive Classifiers Much work already uses them!**

SVMs with additive kernels are additive classifiers Histogram based kernels Histogram intersection, chi-squared kernel Pyramid Match Kernel (Grauman & Darell, ICCV’05) Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06) …. In this talk we are going to talk about additive models in general – where the classifier decomposes into dimensions. This may seem restrictive but it’s a useful class of classifiers which iis strictly more general than linear classifiers. In fact if the underlying kernel for the SVM is additive then the classifier is also additive

8
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Training time Linear Kernel Pic looks similar to that for evaluation time… it is important to note that this was not the case even somewhat recently… Accuracy

9
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Training time <=1990s Linear Accuracy

10
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Training time Today Linear Maybe put some refs on this… Accuracy Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend

11
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Additive Training time Our CVPR 08 Linear Maybe put some refs on this… As mentioned before, our previous work identified a subset of non-linear classifiers with an additive structure and showed they could be evaluated efficiently, but unfortunately did not address improving efficiency for training… Accuracy

12
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Additive Training time Our CVPR 08 ✗ Linear Maybe put some refs on this… Accuracy

13
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Additive Training time This Paper Linear This paper addresses efficient training for additive classifiers, developing training methods that are about as efficient as the best methods fortraining linear classifiers. We also demonstrate the accuracy avantages on some popular datasets.?.... Accuracy

14
**Accuracy vs. Training Time for SVM Classifiers**

Non-linear Training time This Paper Linear Additive Should we change the wording? Drop SVM? Accuracy Makes it possible to train additive classifiers very fast.

15
Summary Additive classifiers are widely used and can provide better accuracy than linear Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. (finish this by 5 mins) Additive Kernel SVM Our Additive Classifier Linear SVM Time Train Test 1000 Train 10 Test 1 Accuracy 95 % 94 % 82 % An example

16
**Support Vector Machines**

Embedded Space Input Space Kernel Function Inner Product in the embedded space Can learn non-linear boundaries in input space Classification Function Kernel Trick The idea of support vector machines is to find a separating hyperplane on the data into a high dimension space using a Kernel. The final classifier is ofcouse a line in a very high dimensional space but can be expressed using only the Kernel function using the so called kernel trick. If the embedded space is low dimensional then one can take advantage of the very fast linear SVM training algorithms which scale linearly with training Data as opposed to the quadratic growth for the kernel SVM.

17
**Embeddings… These embeddings can be high dimensional (even infinite)**

Our approach is based on embeddings that approximate kernels. We’d like this to be as accurate as possible We are going to use fast linear classifier training algorithms on the so sparseness is important. Unfortunately these embeddings are often high dimensional Our approach can be seen as finding embeddings that are both sparse and accurate so that we can use the very best of the linear SVM training algorithms for training The classifier. In fact we would ideally like the number of non zero entries in the embedded features to be a small multiple of the nonn zero entries in the input features.

18
**Key Idea: Embedding an Additive Kernel**

Additive Kernels are easy to embed, just embed each dimension independently Linear Embedding for min Kernel for integers For non integers can approximate by quantizing A key idea of the paper is to realize that additive kernels are easy to embed as the final embedding is just a concatenation of the individual dimension embeddings AS as example the min kernel or the histogram intersection kernel defined as A well known embedding for min kernel for integers is the unary encoding where each number is represented in the unary Example … For non-integers one may just approximate this by quantization

19
**Issues: Embedding Error**

Quantization leads to large errors Better encoding x y

20
Issues: Sparsity Represent with sparse values

21
**Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR):**

Encoded SVM objective (not practical):

22
**Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR):**

Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)

23
**Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR):**

Encoded SVM objective (solve with LIBLINEAR) :

24
**Additive Classifier Choices**

Regularization Encoding linear piecewise linear IKSVM I ✔

25
**Additive Classifier Choices**

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Evaluation times are similar

26
**Additive Classifier Choices**

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Evaluation times are similar

27
**Additive Classifier Choices**

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Few lines of code + standard solver Eg. LIBLINEAR Standard solver Eg. LIBSVM

28
**Additive Classifier Choices**

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Custom solver

29
**Additive Classifier Choices**

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I Accuracy Increases Classifier Notations

30
**Experiments “Small” Scale: Caltech 101 (Fei-Fei, et.al.)**

“Medium” Scale: DC Pedestrians (Munder & Gavrila) “Large” Scale : INRIA Pedestrians (Dalal & Triggs)

31
**Experiment : DC Pedestrians**

100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation

32
**Experiment : Caltech 101 30 training examples per category**

10x faster Small loss in accuracy (41s, 46.15%) 30 training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel

33
**Experiment : INRIA Pedestrians**

(140 mins, 0.95) (76s, 0.94) (27s, 0.88) 300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins (122s, 0.85) (20s, 0.82) SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

34
**Experiment : INRIA Pedestrians**

300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

35
**Take Home Messages Additive models are practical for large scale data**

Can be trained discriminatively: Poor man’s version : encode + Linear SVM Solver Middle man’s version : encode + Custom Solver Rich man’s version : Min Kernel SVM Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time Everyone should use: see code on our websites Fast IKSVM from CVPR’08, Encoded SVMs, etc

36
Thank You

Similar presentations

OK

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google