Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua.

Similar presentations


Presentation on theme: "From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua."— Presentation transcript:

1 From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua Li Dalian University of Technology http://ice.dlut.edu.cn/PeihuaLi/http://ice.dlut.edu.cn/PeihuaLi/ Email: peihuali@dlut.edu.cnpeihuali@dlut.edu.cn

2 Collaborators Xiaoxiao Lu, Master Student Qilong Wang, PhD Student Peihua Li

3 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

4 J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV, 2003. (cited by 4535) M. Cimpoi and S. Maji and A. Vedaldi. Deep filter banks for texture recognition and segmentation. CVPR, 2015. (cited by 32) Overview Bag of visual words (BoW, BoF)

5 J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV, 2003. (cited by 4535) M. Cimpoi and S. Maji and A. Vedaldi. Deep filter banks for texture recognition and segmentation. CVPR, 2015. (cited by 32) Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR, 2006 (cited by 5464) Overview Bag of visual words (BoW, BoF)

6 BoW pipeline Feature extraction Image Codebook Feature coding Classifier Pooling

7 Feature extraction Image Codebook Feature coding Classifier Pooling A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, 2012 (cited by 3956)

8 Feature extraction Image Codebook Feature coding Classifier Pooling BoW pipeline CNN BoW on the shoulder of deep learning

9 Feature extraction Image Codebook Feature coding Classifier Pooling BoW pipeline CNN

10 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

11 Jianchao Yang, Kai Yu, Yihong Gong, Thomas S. Huang: Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009: 1794-1801 (cited by 1917) BoW - From VQ to SC

12 Jianchao Yang, Kai Yu, Yihong Gong, Thomas S. Huang: Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009: 1794-1801 (cited by 1613) BoW - From VQ to SC

13 J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification.In CVPR, 2010 (cited by 1454) BoW - From SC to LLC

14 J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification.In CVPR, 2010 (cited by 1766)

15 Motivations of LASC G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012. Downside 1 of LLC —Dictionary

16 Motivations of LASC  Only designating points neglecting their local structure  A crude, piecewise constant approximation of manifold G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012. Downside 1 of LLC —Dictionary

17 Can we address this problem by increasing #words ? Motivations of LASC T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2009. (cited by 26,377) Curse of dimensionality 1.Samples are sparsely populated 2.Clusters are far away Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. TPAMI, 36(3):493–506, 2014. (cited by 70)

18 Motivation 1—Dictionary LLC: Dictionary of visual words representative points a crude, piecewise constant approximation Idea  Leveraging local geometric structure info immediately surrounding visual words. G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012. Motivations of LASC

19 Motivation 1 —Dictionary LLC: Dictionary of visual words LASC: Dictionary of subspaces representative points a crude, piecewise constant approximation affine subspaces a piecewise linear approximation G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012. Motivations of LASC

20 Downside 2 of LLC — No higher-order info. Motivations of LASC  Soft-assignment – 0 th -order coding

21 Hervé Jégou, Matthijs Douze, Cordelia Schmid and Patrick Pérez. Aggregating local descriptors into a compact image presentation. CVPR, 2010. (cited by 734) X. Zhou, K. Yu, T. Zhang, and T. S. Huang. Image classification using super-vector coding of local image descriptors. In ECCV, 2010. (cited by 346) Motivations of LASC Higher-order improves much : 1 st -order coding (VLAD or SV)

22 Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob J. Verbeek: Image Classification with the Fisher Vector: Theory and Practice. IJCV, 105(3): 222-245, 2013 (cited by 329) Motivations of LASC Higher-order improves much: 2 nd -order coding (FV)

23 T. Jaakkola, D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, 1998. (cited by 1281) Fisher kernel Fisher vector Motivations of LASC Higher-order improves much: 2 nd -order coding (FV)

24 Motivation 2 Dictionary of visual words representative points a crude, piecewise constant approximation Idea  Making use of higher-order statistics of subspaces Motivations of LASC

25 Motivation 2 Dictionary of visual words Dictionary of subspaces representative points a crude, piecewise constant approximation Fisher kernel Higher-order approximation G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012.

26 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

27 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

28 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

29 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

30 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation … 0000 0000

31  We propose to leverage the second-order information based on Fisher information metric (FIM).  Assume that:  Then:  We finally obtain the Fisher vector associated with z i and accordingly the second-order LASC vector: P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. LASC: 2 nd -order Formulation

32 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. we assume that z i follows Gaussian distribution LASC: 2 nd -order Formulation

33 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. we assume that z i follows Gaussian distribution LASC: 2 nd -order Formulation Fisher Kernel (FK) … 0000 0000

34 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC). In CVPR, 2015. LASC: Illustration of 1 st +2 nd

35 Proximity measure — d r from the perspective of the reconstruction error. P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. LASC: Proximity Measure

36 Proximity measure—d s Euclidean distance Statistically equivalently to with assumption of isotropic & identical Gaussian of each cluster Lingqiao Liu, Lei Wang and Xinwang Liu. In defense of soft-assignment coding. In ICCV, 2011. LASC: Proximity Measure

37 Proximity measure—d p from the statistical perspective P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. LASC: Proximity Measure

38 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

39 Experiments SIFT — k P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. Feature: dense SIFT SPM: 3 layers Dictionary: 256 SVM: linear one-vs-all Local coding can bring benefits.

40 Experiments SIFT — dimension P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. Feature: dense SIFT SPM: 3 layers Dictionary: 256 SVM: linear one-vs-all Too small dimensions are insufficient to describe the structure of the subspace, while much larger ones give little benefit.

41 Experiments SIFT — high order P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015.

42 Experiments SIFT — Proximity measure P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015.

43 Experiments SIFT – PASCAL VOC2007 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. T. Kobayashi. Dirichlet-based histogram feature transform for image classification. In CVPR, 2014. methodmAP(%) LLC(25k) [CVPR2010]57.6 SV(1k) [ECCV2010]58.2 The winners59.4 FV(256) [IJCV 2013]61.8 Kobayashi [CVPR2014]63.8 LASC(256)63.2 LASC(512)63.6

44 Experiments SIFT – Caltech256 method15 train30 train45 train60 train SC(1k) [CVPR2009]27.7(0.5)34.0(0.4)37.5(0.6)40.1(0.9) LLC(4k) [CVPR2010]34.4(-)41.2(-)45.3(-)47.7(-) SV(256) [ECCV2010]36.1(-)42.4(-)46.3(-)48.8(-) FV(256) [IJCV 2013]38.5(0.2)47.4(0.1)52.1(0.4)54.8(0.4) Kobayashi [CVPR2014]41.8(0.2)49.8(0.1)54.4(0.3)57.4(0.4) Bo et al. [CVPR2013]42.7(-)50.7(-)54.8(-)58.0(-) LASC(256)43.7(0.4)52.1(0.1)57.2(0.3)60.1(0.3)

45 Experiments SIFT – MIT-Indoor P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. methodAcc.(%) Quattoni et al.[CVPR2009]26.0 SV(1k) [ECCV2010]56.2 FV(256) [IJCV 2013]61.3 Bo et al. [CVPR2013]51.2 Kobayashi [CVPR2014]63.4 Xie et al. [CVPR2014]63.5 LASC(256)62.9 LASC(512)63.4

46 Experiments SIFT –SUN397 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015. method5 train10 train20 train50 train Xiao et al. [CVPR2010]14.520.928.138.0 LLC(4k) [CVPR2010]13.518.724.532.4 SV(128) [ECCV2010]16.421.928.436.6 FV(256) [IJCV 2013]19.2(0.4)26.6(0.4)34.2(0.3)43.3(0.2) LASC(256)19.4(0.4)27.3(0.3)35.6(0.1)45.3(0.4)

47 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

48 Deep features extraction Image K-Means VLAD coding Classifier Pooling CNN VLAD-CNN (MOP-CNN) Yunchao Gong, Liwei Wang, Ruiqi Guo, Svetlana Lazebnik: Multi-scale Orderless Pooling of Deep Convolutional Activation Features. In ECCV, 2014 Comparison with CNN-based methods —Image Classification

49 Deep features extraction Image GMM FV coding Classifier Pooling CNN FV-CNN Mircea Cimpoi,Subhransu Maji, Andrea Vedaldi: Deep convolutional filter banks for texture recognition and segmentation. In CVPR, 2015. (cited by 32) Comparison with CNN-based methods —Image Classifcation

50 Bin-Bin Gao, Xiu-Shen Wei, Jianxin Wu, Weiyao Lin: Deep Spatial Pyramid: The Devil is Once Again in the Details. CoRR abs/1504.05277 (2015) Improved FV-CNN (DSP) 1.Convolutional layer 2.Matrix normalization+No PCA 3.Spatial pyramid 4.Less components in GMM Comparison with CNN-based methods —Image Classification

51 Crossed-layer Pooling (CLP) Lingqiao Liu, Chunhua Shen, Anton van den Hengel. The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification. In CVPR, 2015 Comparison with CNN-based methods —Image classification

52 dimVOC07CUB200MIT-67FMD FC (VGG-M)76.846.162.570.3±1.8 VLAD-CNN (CAFFE)——68.9— CLP (CAFFE)77.873.5 (bb) 71.5— FV-CNN (VGG-M) 2x512x64 76.449.974.273.5±1.8 LASC-CNN (VGG-M) 2x256x128 80.165.4 (72.8 (bb) )76.574.7±1.1 Comparison with CNN-based methods —Image classification Comparison of classification results (VGG-M)

53 dimVOC07CUB200MIT-67FMDCaltech256Sun-397 FC409681.754.667.677.4±1.8—— FV-CNN 2x512x64 84.966.781.079.8±1.879.8±0.162.2 DSP89.3—78.3—85.5±0.159.8 LASC-CNN 2x256x128 87.379.981.782.0±1.088.3±0.164.6 Comparison with CNN-based methods —Image classification Comparison of classification results (VGG-VD)

54 Comparison with CNN-based methods —Image retrieval [1] Florent Perronnin, Diane Larlus. Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture. In CVPR, 2015. [2] Paulin, Mattis, et al. Local convolutional features with unsupervised training for image retrieval. In ICCV, 2015. [3] Babenko, Artem, and Victor Lempitsky. Aggregating local deep features for image retrieval. In ICCV, 2015. [4] A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In ECCV, 2014. [5] Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale Orderless Pooling of Deep Convolutional Activation Features. In ECCV, 2014. [6] Lingxi Xie, Qi Tian, Richang Hong and Bo Zhang. Image Classification and Retrieval are ONE. in ACM ICMR (Best Paper Award), 2015.

55 Comparison with CNN-based methods —Image retrieval Holidays (mAP) LASC-l289.6 LASC-root91.1 LASC-matrix88.8 FV+NN [1] 84.7 CKN-mix [2] 79.3 SPoC [3] 80.2 Global-CNN [4] 79.3 MOP-CNN [5] 80.2 One [6] 88.7  LASC:  Feature:VGG-VD16  1 st -order LASC  Feature dimension=256  Vocabulary size=128 UKB LASC-l23.75 LASC-root3.77 LASC-matrix3.82 FV+NN [1] 3.43 CKN-mix [2] 3.76 SPoC [3] 3.65 Global-CNN [4] 3.56 One [6] 3.87 Comparison of retrieval results

56 Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

57 LASC—Extension of LLC A dictionary of affine subspaces Second-order information

58 Conclusion LASC—Extension of LLC A dictionary of affine subspaces Second-order information LASC—Advantages: Simple highly competitive

59 Conclusion LASC—Advantages: Simple highly competitive Future work End-to-end learning in BoW

60 Thanks Q & A?


Download ppt "From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua."

Similar presentations


Ads by Google