Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification.

Similar presentations


Presentation on theme: "Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification."— Presentation transcript:

1 Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification and detection 114 December 2010

2 Large Visual Data 2 Cheap capturing, storage and internet devices

3 Rapid 3 Video sharing Image sharing Rapid growth in the amount of data available In the case of youtube

4 Problems Scene/Object Classification – Find specified categories of scenes/objects 4 Is there a bus in this image? Output the bounding box of the bus in this image Object Detection – Find the location of specified categories of scenes/objects Is there a a demonstration/protest in this image?

5 Challenges 5 Intra class variations Inter class similarity Ex: Boat/Ship category Protest FlowersCityscape

6 Challenges 6 View Point variation Occlusions/Truncations

7 Scalability We need solutions which can be scalable to large amount of data For example, if we have to test 1,40,000 images For best performance – Feature representation (Visual words based) 6300 dimensions takes ~50 seconds ->total time would be ~57 days – Classification (SVM with non-linear kernel) 20 classes 3 images/second, a total time of ~ 10 days 7

8 Overview Large scale semantic concept retrieval in videos Modeling subcategories Efficient detection by using GRBF feature maps Conclusions 8

9 1. Semantic video retrieval 9 Given a large set of videos, retrieve the videos of specific category – Ex: Find all the videos containing soccer

10 Training Testing Overview of the approach Feature Extraction Ex: PHOW, PHOG, GIST Classifier Ex: SVM, Random Forests Annotated Video Frames Unseen Videos Example Videos Feature Extraction Ranked Shots 10

11 Features GIST – Torralba et. al IJCV 01 – Image divided into m x m grid – For each cell, a set of filters (different scales, orientations) are applied – Final descriptor: Average of the filter responses over all blocks 11 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006

12 Features 12 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006 Pyramid Histogram of Oriented Gradients

13 Vector Quantization Pyramid Histogram of Visual Words Scale Invariant Feature Transform 13 “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories.”, S. Lazebnik et. al CVPR 2006 Using dense SIFT descriptors

14 Support Vector Machines (SVM) w t (x) + b = 0 w t (x) + b = +1 w t (x) + b = -1 b w Support Vector Misclassified point  = 0  < 1 X i i = 1,..…..,N y i i = 1,……,N

15 SVM formulation Evaluation function f(x) = w t x + b

16 Kernel Trick Use a function which maps input space to feature space. And then build the classifier in feature space.

17 Dot product in feature space Moving to different space f(x) = w t x + b =  i  i y i + b

18 Replace it with kernel function Kernelizing SVMs

19 Kernels Linear : Polynomial : Intersection kernel Generalized RBF kernel : Weighted combination of multiple kernels

20 TRECVID competition Objective : Rank video shots based on the presence of given concept Participated in High level feature extraction, TRECVID Organized by NIST, USA 2008: around 180 submissions by 40 teams from all over the world 20

21 Some of the classes High-level Feature Extraction o Mountain o Hand o Street o Telephone o Flower o Bridge o Airplane flying o Boat/Ship o Bus o Dog o Cityscape o Classroom o Driver o Two People o Emergency Vehicle o Harbor o Kitchen o Nighttime o Singing o Demonstration/Protest 21

22 Data Statistics 22 Evaluation Measure Average Precision - Area under Precision-Recall curve

23 Our Approach Performance compared using different features and SVM parameters – Use of PHOW with Intersection kernel is efficient – Testing is very fast, with little drop in performance Testing time: ~2lakh frames in 10 seconds 23 “Classification using Intersection kernel SVMs is efficient”, A. Berg et. al, CVPR 2009

24 Variation with features 24

25 Variation with kernels 25

26 Results 26 More Results

27 1. Summary Method of visual concept retrieval suitable for large scale data PHOW with fast intersection kernel is very much useful 27

28 2. Modeling subcategories 28

29 Subcategories in real world 29

30 What we achieved? 30

31 Structural SVM vs SVM 31 - Joint feature map between input and output -Allows the output label to be a complex variable -Our case: Use as a combination of category and subcategory labels “Support Vector Learning for Interdependent and Structured Output Spaces”, I. Tsochantaridis,, et. al ICML 04

32 Use of latent variables 32 “Learning structural SVMs with latent variables”, C. N. Yu et. al ICML 2009

33 Toy Datasets 33

34 Real world datasets 34 TRECVID 2009 dataset PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset

35 Results on TRECVID dataset 35

36 Improvement with latent SVM 36

37 Effect of no. of subclasses 37

38 2. Summary Method for modeling of subcategories using structural SVM Application of latent structural SVM for further improvements Improved the performance of linear kernel Performed various experiments on toy and real data 38

39 3. Generalized RBF feature maps for Efficient Detection 39

40 Object Detection aeroplane horse bicycle car cow motorbike 40

41 Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 41

42 General Framework for detection Feature representations Classifier (Ex: SVM ) “Multiple Kernel Learning for Object Detection”, Vedaldi et. al, ICCV 2009, “Cascade Object Detection with Deformable Part Models”, Felzenszwalb et. al, CVPR 2010, Ex: Car Non-linear SVM Linear SVM Any Image 42

43 Linear SVM Additive kernels Generalized RBF kernels Ex: intersection Kernel Ex: exp- kernel Kernels faster more discriminative Fast Linear SVMs Stochastic SVM ( PEGASOS ) Primal SVM (liblinear) One-slack SVM ( SVM-perf ) 43

44 Kernels Problem: GRBF kernels with high computational complexity are required to get good performance Our Solution: Approximate Generalized RBF kernels with a linear one by using a feature map 44

45 A kernel is a dot product in a high dimensional feature space Define a feature map approximating the kernel Speeding up non-linear SVMs 45

46 Explicit feature maps Feature maps for RBF/multiplicative kernels – [Rahmi and Recht, NIPS 07] – [ F. Li et. al DAGM 2010] Feature maps for additive kernels – [Maji and Berg, ICCV 09] – [Vedaldi and Zisserman, CVPR 2010] – [Perronin, et. al CVPR 2010] Our Contribution Feature maps for generalized RBF kernels 2X to 3X speedup (only a little drop in performance) 46

47 Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 47

48 Additive kernels Examples:, Intersection Hellinger’s, kernel 48

49 Additive Kernel Maps approximated by sampling Feature maps for additive kernels [ Vedaldi & Zisserman 10 ]: closed form function “Efficient Additive Kernels via Explicit Feature Maps”, A. Vedaldi and A. Zisserman, CVPR 2010 49

50 Random Fourier features [Rahimi & Recht 07] Feature maps for RBF kernels “Random Features for Large-Scale Kernel Machines”, Ali Rahimi, Ben Recht NIPS 2007 50

51 Generalized RBF kernels Definition Trick: In terms of feature map Example for distance: kernel distance 51

52 GRBF feature maps algorithm 52

53 Outline Introduction: Kernels and Feature maps Explicit feature maps for generalized RBF kernels Experiments & Results 53

54 Experimental Setup PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset – 20 object categories Cascade of classifiers [Vedaldi et. al ICCV 2009] exp-chi2 kernel SVM Linear SVM Additive Kernel SVM PHOG features exp-chi2 feature map SVM 54

55 Approximate vs Exact Kernels Average precision but testing time increases with number of projections. 55

56 Approximate vs Exact Kernels Average Precision & Testing Time increase with number of projections 56

57 Large number of projections required for good performance Additional improvement in testing time SVM SPARSE L1-Regularized L2 - loss function LR SPARSE L1-Regularized Logistic Regression –loss function 57

58 Speedup with l 1 regularization 58

59 Effect of C on Sparsity Smaller C gives a sparser solution with only a slight drop in performance Parameter to control sparsity: SVM parameter C Recall SVM objective function 59

60 Example Results 60

61 Results on all the 20 categories SVM dense is faster than exact exp- and performs better than 61

62 Results on all the 20 categories LR sparse is 2 to 3 times faster than exact exp- and performs better than 62

63 3. Summary Feature maps for generalized RBF kernels Method for reducing the number of projections Results on VOC 2007: – nearly 2x to 3x speedup with a slight loss in performance 63

64 Conclusions Proposed efficient methods based on SVM for visual scene/object categorization and detection Validated these methods on a large amount of data Further: Porting these techniques on to GPUs, including time information for improvement of average precision. 64 AeroplaneMotorbike

65 Publications Generalized RBF feature maps for efficient detection, Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman, C. V. Jawahar 21st British Machine Vision Conference (BMVC), 2010 (Oral Presentation), Aberystwyth, UK 2009Oxford/IIIT - TRECVID 2009 - Notebook paper, Sreekanth Vempati, Mihir Jain, Omkar M. Parkhi, C. V. Jawahar, Andrea Vedaldi, Marcin Marszalek, Andrew Zisserman TRECVID 2009 Workshop, Gaithersburg, Md., USA. 2008Oxford/IIIT - TRECVID 2008 - Notebook paper, James Philbin, Manuel Marin-Jimenez, Siddharth Srinivasan and Andrew Zisserman, Mihir Jain, Sreekanth Vempati, Pramod Sankar and C. V. Jawahar TRECVID 2008 Workshop, Gaithersburg, Md., USA. 65

66 Thank You 66

67 Object Detection Should we put our results or this groundtruth? 67

68 GRBF-Algorithm 1.Compute  the approximate feature map corresponding to the additive kernel 68

69 GRBF-Algorithm 1.Compute  the approximate feature map corresponding to the additive kernel 2.Compute  the RBF feature map using as the input vector 69

70 Choice of features PHOG features are used in our experiments –exp- performs better than Using Exact Kernels PHOG PHOW 70

71 Sparsity vs Performance 71


Download ppt "Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification."

Similar presentations


Ads by Google