From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua.

Slides:

Advertisements

Similar presentations

Sparse Coding and Its Extensions for Visual Recognition

Advertisements

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

Aggregating local image descriptors into compact codes

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Limin Wang, Yu Qiao, and Xiaoou Tang

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Large-Scale Object Recognition with Weak Supervision

Discriminative and generative methods for bags of features

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Lecture 28: Bag-of-words models

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.

Bag-of-features models

Mixtures of Gaussians and Advanced Feature Encoding Computer Vision CS 143, Brown James Hays Many slides from Derek Hoiem, Florent Perronnin, and Hervé.

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Spatial Pyramid Pooling in Deep Convolutional

Image Classification using Sparse Coding: Advanced Topics

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Classification 2: discriminative models

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:

Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.

Locality-constrained Linear Coding for Image Classification

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Jakob Verbeek December 11, 2009

Hierarchical Matching with Side Information for Image Classification

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

CS654: Digital Image Analysis

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS Topics in Information Processing Spring 2013, Columbia University

CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA,

NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.

Media Lab, Leiden Institute of Advance Computer Science

Lecture IX: Object Recognition (2)

Recent developments in object detection

Comparison with Counterparts

Learning Mid-Level Features For Recognition

Segmentation Driven Object Detection with Fisher Vectors

Paper Presentation: Shape and Matching

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Mixtures of Gaussians and Advanced Feature Encoding

Training Techniques for Deep Neural Networks

Digit Recognition using SVMS

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

Speaker: Lingxi Xie Authors: Lingxi Xie, Qi Tian, Bo Zhang

CS 1674: Intro to Computer Vision Scene Recognition

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

Outline Background Motivation Proposed Model Experimental Results

Presentation transcript:

From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua Li Dalian University of Technology

Collaborators Xiaoxiao Lu, Master Student Qilong Wang, PhD Student Peihua Li

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV, (cited by 4535) M. Cimpoi and S. Maji and A. Vedaldi. Deep filter banks for texture recognition and segmentation. CVPR, (cited by 32) Overview Bag of visual words (BoW, BoF)

J. Sivic and A. Zisserman. Video Google: A Text Retrieval Approach to Object Matching in Videos. ICCV, (cited by 4535) M. Cimpoi and S. Maji and A. Vedaldi. Deep filter banks for texture recognition and segmentation. CVPR, (cited by 32) Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR, 2006 (cited by 5464) Overview Bag of visual words (BoW, BoF)

BoW pipeline Feature extraction Image Codebook Feature coding Classifier Pooling

Feature extraction Image Codebook Feature coding Classifier Pooling A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Proc. NIPS, 2012 (cited by 3956)

Feature extraction Image Codebook Feature coding Classifier Pooling BoW pipeline CNN BoW on the shoulder of deep learning

Feature extraction Image Codebook Feature coding Classifier Pooling BoW pipeline CNN

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

Jianchao Yang, Kai Yu, Yihong Gong, Thomas S. Huang: Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009: (cited by 1917) BoW － From VQ to SC

Jianchao Yang, Kai Yu, Yihong Gong, Thomas S. Huang: Linear spatial pyramid matching using sparse coding for image classification. CVPR 2009: (cited by 1613) BoW － From VQ to SC

J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification.In CVPR, 2010 (cited by 1454) BoW － From SC to LLC

J. Wang, J. Yang, K. Yu, F. Lv, T. S. Huang, and Y. Gong. Locality-constrained linear coding for image classification.In CVPR, 2010 (cited by 1766)

Motivations of LASC G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, Downside 1 of LLC —Dictionary

Motivations of LASC  Only designating points neglecting their local structure  A crude, piecewise constant approximation of manifold G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, Downside 1 of LLC —Dictionary

Can we address this problem by increasing #words ? Motivations of LASC T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, (cited by 26,377) Curse of dimensionality 1.Samples are sparsely populated 2.Clusters are far away Y. Huang, Z. Wu, L. Wang, and T. Tan. Feature coding in image classification: A comprehensive study. TPAMI, 36(3):493–506, (cited by 70)

Motivation 1—Dictionary LLC: Dictionary of visual words representative points a crude, piecewise constant approximation Idea  Leveraging local geometric structure info immediately surrounding visual words. G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, Motivations of LASC

Motivation 1 —Dictionary LLC: Dictionary of visual words LASC: Dictionary of subspaces representative points a crude, piecewise constant approximation affine subspaces a piecewise linear approximation G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, Motivations of LASC

Downside 2 of LLC — No higher-order info. Motivations of LASC  Soft-assignment – 0 th -order coding

Hervé Jégou, Matthijs Douze, Cordelia Schmid and Patrick Pérez. Aggregating local descriptors into a compact image presentation. CVPR, (cited by 734) X. Zhou, K. Yu, T. Zhang, and T. S. Huang. Image classification using super-vector coding of local image descriptors. In ECCV, (cited by 346) Motivations of LASC Higher-order improves much : 1 st -order coding (VLAD or SV)

Jorge Sánchez, Florent Perronnin, Thomas Mensink, Jakob J. Verbeek: Image Classification with the Fisher Vector: Theory and Practice. IJCV, 105(3): , 2013 (cited by 329) Motivations of LASC Higher-order improves much: 2 nd -order coding (FV)

T. Jaakkola, D. Haussler. Exploiting generative models in discriminative classifiers. In NIPS, (cited by 1281) Fisher kernel Fisher vector Motivations of LASC Higher-order improves much: 2 nd -order coding (FV)

Motivation 2 Dictionary of visual words representative points a crude, piecewise constant approximation Idea  Making use of higher-order statistics of subspaces Motivations of LASC

Motivation 2 Dictionary of visual words Dictionary of subspaces representative points a crude, piecewise constant approximation Fisher kernel Higher-order approximation G. D. Canas, T. Poggio, and L. Rosasco. Learning manifolds with k-means and k-flats. NIPS, 2012.

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, find k-nearest subspaces by proximity measures LASC: 1 st -order Formulation …

 We propose to leverage the second-order information based on Fisher information metric (FIM).  Assume that:  Then:  We finally obtain the Fisher vector associated with z i and accordingly the second-order LASC vector: P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, LASC: 2 nd -order Formulation

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, we assume that z i follows Gaussian distribution LASC: 2 nd -order Formulation

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, we assume that z i follows Gaussian distribution LASC: 2 nd -order Formulation Fisher Kernel (FK) …

P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC). In CVPR, LASC: Illustration of 1 st +2 nd

Proximity measure — d r from the perspective of the reconstruction error. P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, LASC: Proximity Measure

Proximity measure—d s Euclidean distance Statistically equivalently to with assumption of isotropic & identical Gaussian of each cluster Lingqiao Liu, Lei Wang and Xinwang Liu. In defense of soft-assignment coding. In ICCV, LASC: Proximity Measure

Proximity measure—d p from the statistical perspective P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, LASC: Proximity Measure

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

Experiments SIFT — k P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, Feature: dense SIFT SPM: 3 layers Dictionary: 256 SVM: linear one-vs-all Local coding can bring benefits.

Experiments SIFT — dimension P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, Feature: dense SIFT SPM: 3 layers Dictionary: 256 SVM: linear one-vs-all Too small dimensions are insufficient to describe the structure of the subspace, while much larger ones give little benefit.

Experiments SIFT — high order P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015.

Experiments SIFT — Proximity measure P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, 2015.

Experiments SIFT – PASCAL VOC2007 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, T. Kobayashi. Dirichlet-based histogram feature transform for image classification. In CVPR, methodmAP(%) LLC(25k) [CVPR2010]57.6 SV(1k) [ECCV2010]58.2 The winners59.4 FV(256) [IJCV 2013]61.8 Kobayashi [CVPR2014]63.8 LASC(256)63.2 LASC(512)63.6

Experiments SIFT – Caltech256 method15 train30 train45 train60 train SC(1k) [CVPR2009]27.7(0.5)34.0(0.4)37.5(0.6)40.1(0.9) LLC(4k) [CVPR2010]34.4(-)41.2(-)45.3(-)47.7(-) SV(256) [ECCV2010]36.1(-)42.4(-)46.3(-)48.8(-) FV(256) [IJCV 2013]38.5(0.2)47.4(0.1)52.1(0.4)54.8(0.4) Kobayashi [CVPR2014]41.8(0.2)49.8(0.1)54.4(0.3)57.4(0.4) Bo et al. [CVPR2013]42.7(-)50.7(-)54.8(-)58.0(-) LASC(256)43.7(0.4)52.1(0.1)57.2(0.3)60.1(0.3)

Experiments SIFT – MIT-Indoor P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, methodAcc.(%) Quattoni et al.[CVPR2009]26.0 SV(1k) [ECCV2010]56.2 FV(256) [IJCV 2013]61.3 Bo et al. [CVPR2013]51.2 Kobayashi [CVPR2014]63.4 Xie et al. [CVPR2014]63.5 LASC(256)62.9 LASC(512)63.4

Experiments SIFT –SUN397 P. Li, X. Lu, and Q. Wang. From Dictionary of Visual-Words to Subspaces: Locality-constrained Affine Subspace Coding. In CVPR, method5 train10 train20 train50 train Xiao et al. [CVPR2010] LLC(4k) [CVPR2010] SV(128) [ECCV2010] FV(256) [IJCV 2013]19.2(0.4)26.6(0.4)34.2(0.3)43.3(0.2) LASC(256)19.4(0.4)27.3(0.3)35.6(0.1)45.3(0.4)

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

Deep features extraction Image K-Means VLAD coding Classifier Pooling CNN VLAD-CNN (MOP-CNN) Yunchao Gong, Liwei Wang, Ruiqi Guo, Svetlana Lazebnik: Multi-scale Orderless Pooling of Deep Convolutional Activation Features. In ECCV, 2014 Comparison with CNN-based methods —Image Classification

Deep features extraction Image GMM FV coding Classifier Pooling CNN FV-CNN Mircea Cimpoi,Subhransu Maji, Andrea Vedaldi: Deep convolutional filter banks for texture recognition and segmentation. In CVPR, (cited by 32) Comparison with CNN-based methods —Image Classifcation

Bin-Bin Gao, Xiu-Shen Wei, Jianxin Wu, Weiyao Lin: Deep Spatial Pyramid: The Devil is Once Again in the Details. CoRR abs/ (2015) Improved FV-CNN (DSP) 1.Convolutional layer 2.Matrix normalization+No PCA 3.Spatial pyramid 4.Less components in GMM Comparison with CNN-based methods —Image Classification

Crossed-layer Pooling (CLP) Lingqiao Liu, Chunhua Shen, Anton van den Hengel. The Treasure beneath Convolutional Layers: Cross-convolutional-layer Pooling for Image Classification. In CVPR, 2015 Comparison with CNN-based methods —Image classification

dimVOC07CUB200MIT-67FMD FC (VGG-M) ±1.8 VLAD-CNN (CAFFE)——68.9— CLP (CAFFE) (bb) 71.5— FV-CNN (VGG-M) 2x512x ±1.8 LASC-CNN (VGG-M) 2x256x (72.8 (bb) ) ±1.1 Comparison with CNN-based methods —Image classification Comparison of classification results (VGG-M)

dimVOC07CUB200MIT-67FMDCaltech256Sun-397 FC ±1.8—— FV-CNN 2x512x ± ± DSP89.3—78.3—85.5± LASC-CNN 2x256x ± ± Comparison with CNN-based methods —Image classification Comparison of classification results (VGG-VD)

Comparison with CNN-based methods —Image retrieval [1] Florent Perronnin, Diane Larlus. Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture. In CVPR, [2] Paulin, Mattis, et al. Local convolutional features with unsupervised training for image retrieval. In ICCV, [3] Babenko, Artem, and Victor Lempitsky. Aggregating local deep features for image retrieval. In ICCV, [4] A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In ECCV, [5] Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale Orderless Pooling of Deep Convolutional Activation Features. In ECCV, [6] Lingxi Xie, Qi Tian, Richang Hong and Bo Zhang. Image Classification and Retrieval are ONE. in ACM ICMR (Best Paper Award), 2015.

Comparison with CNN-based methods —Image retrieval Holidays (mAP) LASC-l289.6 LASC-root91.1 LASC-matrix88.8 FV+NN [1] 84.7 CKN-mix [2] 79.3 SPoC [3] 80.2 Global-CNN [4] 79.3 MOP-CNN [5] 80.2 One [6] 88.7  LASC:  Feature:VGG-VD16  1 st -order LASC  Feature dimension=256  Vocabulary size=128 UKB LASC-l23.75 LASC-root3.77 LASC-matrix3.82 FV+NN [1] 3.43 CKN-mix [2] 3.76 SPoC [3] 3.65 Global-CNN [4] 3.56 One [6] 3.87 Comparison of retrieval results

Outline Overview Related work and Motivations Formulation of LASC Experiments Comparison with CNN-based methods Conclusion

LASC—Extension of LLC A dictionary of affine subspaces Second-order information

Conclusion LASC—Extension of LLC A dictionary of affine subspaces Second-order information LASC—Advantages: Simple highly competitive

Conclusion LASC—Advantages: Simple highly competitive Future work End-to-end learning in BoW

Thanks Q & A?