Presentation on theme: "Max-Margin Additive Classifiers for Detection"— Presentation transcript:
1 Max-Margin Additive Classifiers for Detection Subhransu Maji & Alexander BergUniversity of California at BerkeleyColumbia UniversityICCV 2009, Kyoto, JapanThankyou. Good afternoon everybody. I am going to present ways to train additive classifiers efficiently . This work is a part of an ongoing collaboration with alex berg.
2 Accuracy vs. Evaluation Time for SVM Classifiers Non-linear KernelEvaluation timeLinear KernelFor any classification task the two main things we care about are accuracy and evaluation time. Especially for object detection where one evalutaes a classifier on thousands of windowsPer image – the evalutation time becomes very important. In the past linear SVMs though relatively less accurate were preferred over kernel SVMs for real-time applications.Accuracy
3 Accuracy vs. Evaluation Time for SVM Classifiers Non-linear KernelEvaluation timeOur CVPR 08Linear KernelIn our CVPR 08 paper…Accuracy
4 Accuracy vs. Evaluation Time for SVM Classifiers Non-linear KernelAdditive KernelEvaluation timeOur CVPR 08Linear KernelWe identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.Accuracy
5 Accuracy vs. Evaluation Time for SVM Classifiers Additive KernelNon-linear KernelAdditive KernelEvaluation timeOur CVPR 08Linear KernelWe identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions.Accuracy
6 Accuracy vs. Evaluation Time for SVM Classifiers Additive KernelNon-linear KernelEvaluation timeOur CVPR 08Linear KernelAdditive KernelAnd showed that they can be evaulated efficiently. This makes it possible for one to use more accurate classifiers with relatively no loss in speed. In fact more than half of thisYear’s submissions to the PACCAL VOC object detection challenge use variants of additive kernels.AccuracyMade it possible to use SVMs with additive kernels for detection.
7 Additive Classifiers Much work already uses them! SVMs with additive kernels are additive classifiersHistogram based kernelsHistogram intersection, chi-squared kernelPyramid Match Kernel (Grauman & Darell, ICCV’05)Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06)….In this talk we are going to talk about additive models in general – where the classifier decomposes into dimensions.This may seem restrictive but it’s a useful class of classifiers which iis strictly more general than linear classifiers.In fact if the underlying kernel for the SVM is additive then the classifier is also additive
8 Accuracy vs. Training Time for SVM Classifiers Non-linearTraining timeLinear KernelPic looks similar to that for evaluation time… it is important to note that this was not the case even somewhat recently…Accuracy
9 Accuracy vs. Training Time for SVM Classifiers Non-linearTraining time<=1990sLinearAccuracy
10 Accuracy vs. Training Time for SVM Classifiers Non-linearTraining timeTodayLinearMaybe put some refs on this…AccuracyEg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend
11 Accuracy vs. Training Time for SVM Classifiers Non-linearAdditiveTraining timeOur CVPR 08LinearMaybe put some refs on this…As mentioned before, our previous work identified a subset of non-linear classifiers with an additive structure and showed they could be evaluated efficiently, but unfortunately did not address improving efficiency for training…Accuracy
12 Accuracy vs. Training Time for SVM Classifiers Non-linearAdditiveTraining timeOur CVPR 08✗LinearMaybe put some refs on this…Accuracy
13 Accuracy vs. Training Time for SVM Classifiers Non-linearAdditiveTraining timeThis PaperLinearThis paper addresses efficient training for additive classifiers, developing training methods that are about as efficient as the best methods fortraining linear classifiers. We also demonstrate the accuracy avantages on some popular datasets.?....Accuracy
14 Accuracy vs. Training Time for SVM Classifiers Non-linearTraining timeThis PaperLinearAdditiveShould we change the wording? Drop SVM?AccuracyMakes it possible to train additive classifiers very fast.
15 SummaryAdditive classifiers are widely used and can provide better accuracy than linearOur CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear.This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers.(finish this by 5 mins)Additive Kernel SVMOur Additive ClassifierLinear SVMTimeTrain Test 1000Train 10Test 1Accuracy95 %94 %82 %An example
16 Support Vector Machines Embedded SpaceInput SpaceKernel FunctionInner Product in the embedded spaceCan learn non-linear boundaries in input spaceClassification FunctionKernel TrickThe idea of support vector machines is to find a separating hyperplane on the data into a high dimension space using a Kernel.The final classifier is ofcouse a line in a very high dimensional space but can be expressed using only the Kernel function using the so called kernel trick.If the embedded space is low dimensional then one can take advantage of the very fast linear SVM training algorithms which scale linearly with trainingData as opposed to the quadratic growth for the kernel SVM.
17 Embeddings… These embeddings can be high dimensional (even infinite) Our approach is based on embeddings that approximate kernels.We’d like this to be as accurate as possibleWe are going to use fast linear classifier training algorithms on the so sparseness is important.Unfortunately these embeddings are often high dimensionalOur approach can be seen as finding embeddings that are both sparse and accurate so that we can use the very best of the linear SVM training algorithms for trainingThe classifier. In fact we would ideally like the number of non zero entries in the embedded features to be a small multiple of the nonn zero entries in the input features.
18 Key Idea: Embedding an Additive Kernel Additive Kernels are easy to embed, just embed each dimension independentlyLinear Embedding for min Kernel for integersFor non integers can approximate by quantizingA key idea of the paper is to realize that additive kernels are easy to embed as the final embedding is just a concatenation of the individual dimension embeddingsAS as example the min kernel or the histogram intersection kernel defined asA well known embedding for min kernel for integers is the unary encoding where each number is represented in the unaryExample …For non-integers one may just approximate this by quantization
19 Issues: Embedding Error Quantization leads to large errorsBetter encodingxy
21 Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (not practical):
22 Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR): Encoded SVM modified (custom solver):Encourages smooth functionsClosely approximates min kernel SVMCustom solver : PWLSGD (see paper)
23 Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR): Encoded SVM objective (solve with LIBLINEAR) :
31 Experiment : DC Pedestrians 100x fastertraining time ~ linear SVMaccuracy ~ kernel SVM(1.89s, 72.98%)20,000 features, 656 dimensional100 bins for encoding6-fold cross validation
32 Experiment : Caltech 101 30 training examples per category 10x fasterSmall loss in accuracy(41s, 46.15%)30 training examples per category100 bins for encodingPyramid HOG + Spatial Pyramid Match Kernel
33 Experiment : INRIA Pedestrians (140 mins, 0.95)(76s, 0.94)(27s, 0.88)300x fastertraining time ~ linear SVMaccuracy ~ kernel SVM trains the detector in < 2 mins(122s, 0.85)(20s, 0.82)SPHOG: 39,000 features, 2268 dimensional100 bins for encodingCross Validation Plots
34 Experiment : INRIA Pedestrians 300x fastertraining time ~ linear SVMaccuracy ~ kernel SVM trains the detector in < 2 minsSPHOG: 39,000 features, 2268 dimensional100 bins for encodingCross Validation Plots
35 Take Home Messages Additive models are practical for large scale data Can be trained discriminatively:Poor man’s version : encode + Linear SVM SolverMiddle man’s version : encode + Custom SolverRich man’s version : Min Kernel SVMEmbedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training timeEveryone should use: see code on our websitesFast IKSVM from CVPR’08, Encoded SVMs, etc