Download presentation

Presentation is loading. Please wait.

Published byJasmine Thomson Modified over 3 years ago

1
1 Sparse Coding and Its Extensions for Visual Recognition Kai Yu Media Analytics Department NEC Labs America, Cupertino, CA

2
Visual Recognition is HOT in Computer Vision 1/26/ Caltech 101 PASCAL VOC 80 Million Tiny Images ImageNet

3
The pipeline of machine visual perception 1/26/ Low-level sensing Pre- processing Feature extract. Feature selection Inference: prediction, recognition Most critical for accuracy Account for most of the computation Most time-consuming in development cycle Often hand-craft in practice Most Efforts in Machine Learning

4
Computer vision features SIFT Spin image HoG RIFT Slide Credit: Andrew Ng GLOH

5
Learning everything from data 1/26/ Low-level sensing Pre- processing Feature extract. Feature selection Inference: prediction, recognition Machine Learning

6
BoW + SPM Kernel 1/26/ Combining multiple features, this method had been the state-of-the-art on Caltech-101, PASCAL, 15 Scene Categories, … Figure credit: Fei-Fei Li, Svetlana Lazebnik Bag-of-visual-words representation (BoW) based on vector quantization (VQ) Spatial pyramid matching (SPM) kernel

7
Winning Method in PASCAL VOC before /26/2014 Multiple Feature Sampling Methods Multiple Visual Descriptors VQ Coding, Histogram, SPM Nonlinear SVM 7

8
Convolution Neural Networks 8 The architectures of some successful methods are not so much different from CNNs Conv. Filtering Pooling

9
BoW+SPM: the same architecture 9 e.g, SIFT, HOG VQ Coding Average Pooling (obtain histogram) Nonlinear SVM Local Gradients Pooling Observations: Nonlinear SVM is not scalable VQ coding may be too coarse Average pooling is not optimal Why not learn the whole thing? Observations: Nonlinear SVM is not scalable VQ coding may be too coarse Average pooling is not optimal Why not learn the whole thing?

10
Develop better methods 10 Better Coding Better Pooling Scalable Linear Classifier Better CodingBetter Pooling

11
Sparse Coding 1/26/ Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). Training: given a set of random patches x, learning a dictionary of bases [Φ 1, Φ 2, …] Coding: for data vector x, solve LASSO to find the sparse coefficient vector a

12
Sparse Coding Example Natural Images Learned bases ( 1, …, 64 ): Edges 0.8 * * * x 0.8 * * * 63 [a 1, …, a 64 ] = [ 0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, 0 ] (feature representation) Test example Compact & easily interpretable Slide credit: Andrew Ng

13
Testing: What is this? Motorcycles Not motorcycles Unlabeled images … [Raina, Lee, Battle, Packer & Ng, ICML 07] Self-taught Learning Testing: What is this? Slide credit: Andrew Ng

14
Classification Result on Caltech 101 1/26/ % SIFT VQ + Nonlinear SVM 50% Pixel Sparse Coding + Linear SVM 9K images, 101 classes

15
15 Sparse Coding Max Pooling Scalable Linear Classifier Local Gradients Pooling e.g, SIFT, HOG Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]

16
1/26/ % SIFT VQ + Nonlinear SVM 73% SIFT Sparse Coding + Linear SVM Caltech-101 Sparse Coding on SIFT [Yang, Yu, Gong & Huang, CVPR09]

17
What we have learned? 17 Sparse Coding Max Pooling Scalable Linear Classifier Local Gradients Pooling 1.Sparse coding is a useful stuff (why?) 2.Hierarchical architecture is needed 1.Sparse coding is a useful stuff (why?) 2.Hierarchical architecture is needed e.g, SIFT, HOG

18
MNIST Experiments 1/26/ Error: 4.54% When SC achieves the best classification accuracy, the learned bases are like digits – each basis has a clear local class association. Error: 3.75%Error: 2.64%

19
Distribution of coefficient (SIFT, Caltech101) 1/26/ Neighbor bases tend to get nonzero coefficients

20
1/26/ Interpretation 2 Geometry of data manifold Each basis an anchor point Sparsity is induced by locality: each datum is a linear combination of neighbor anchors. Interpretation 1 Discover subspaces Each basis is a direction Sparsity: each datum is a linear combination of only several bases. Related to topic model

21
A Function Approximation View to Coding 1/26/ Setting: f(x) is a nonlinear feature extraction function on image patches x Coding: nonlinear mapping x a typically, a is high-dim & sparse Nonlinear Learning: f(x) = Setting: f(x) is a nonlinear feature extraction function on image patches x Coding: nonlinear mapping x a typically, a is high-dim & sparse Nonlinear Learning: f(x) = A coding scheme is good if it helps learning f(x)

22
1/26/ A Function Approximation View to Coding – The General Formulation Function Approx. Error An unsupervised learning objective

23
Local Coordinate Coding (LCC) 1/26/ Dictionary Learning: k-means (or hierarchical k-means) Coding for x, to obtain its sparse representation a Step 1 – ensure locality: find the K nearest bases Step 2 – ensure low coding error: Yu, Zhang & Gong, NIPS 09 Wang, Yang, Yu, Lv, Huang CVPR 10

24
Super-Vector Coding (SVC) 1/26/ Dictionary Learning: k-means (or hierarchical k-means) Coding for x, to obtain its sparse representation a Step 1 – find the nearest basis of x, obtain its VQ coding e.g. [0, 0, 1, 0, …] Step 2 – form super vector coding: e.g. [0, 0, 1, 0, …, 0, 0, (x-m 3 0 …] Zhou, Yu, Zhang, and Huang, ECCV 10 Zero-orderLocal tangent

25
Function Approximation based on LCC 1/26/ data points bases locally linear Yu, Zhang, Gong, NIPS 10

26
Function Approximation based on SVC data points cluster centers Piecewise local linear (first-order) Local tangent Zhou, Yu, Zhang, and Huang, ECCV 10

27
PASCAL VOC Challenge /26/ Ours Best of Other Teams Difference Classes No.1 for 18 of 20 categories We used only HOG feature on gray images

28
ImageNet Challenge /26/ ~40% VQ + Intersection Kernel 64%~73% Various Coding Methods + Linear SVM 1.4 million images, 1000 classes, top5 hit rate 50% Classification accuracy

29
Hierarchical sparse coding 29 Conv. Filtering Pooling Learning from unlabeled data Yu, Lin, & Lafferty, CVPR 11

30
A two-layer sparse coding formulation 1/26/

31
MNIST Results -- classification HSC vs. CNN: HSC provide even better performance than CNN more amazingly, HSC learns features in unsupervised manner! 31

32
MNIST results -- effect of hierarchical learning Comparing the Fisher score of HSC and SC Discriminative power: is significantly improved by HSC although HSC is unsupervised coding 32

33
MNIST results -- learned codebook 33 One dimension in the second layer: invariance to translation, rotation, and deformation

34
Caltech101 results -- classification Learned descriptor: performs slightly better than SIFT + SC 34

35
Caltech101 results -- learned codebook First layer bases: very much like edge detectors. 35

36
Conclusion and Future Work function approximation view to derive novel sparse coding methods. Locality – one way to achieve sparsity and its really useful. But we need deeper understanding of the feature learning methods Interesting directions –Hierarchical coding – Deep Learning (many papers now!) –Faster methods for sparse coding (e.g. from LeCuns group) –Learning features from a richer structure of data, e.g., video (learning invariance to out plane rotation)

37
References 1/26/ Learning Image Representations from Pixel Level via Hierarchical Sparse Coding, Kai Yu, Yuanqing Lin, John Lafferty. CVPR 2011 Large-scale Image Classification: Fast Feature Extraction and SVM Training, Yuanqing Lin, Fengjun Lv, Liangliang Cao, Shenghuo Zhu, Ming Yang, Timothee Cour, Thomas Huang, Kai Yu in CVPR 2011 ECCV 2010 Tutorial, Kai Yu, Andrew Ng (with links to some source codes) Deep Coding Networks, Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu. In NIPS Image Classification using Super-Vector Coding of Local Image Descriptors, Xi Zhou, Kai Yu, Tong Zhang, and Thomas Huang. In ECCV Efficient Highly Over-Complete Sparse Coding using a Mixture Model, Jianchao Yang, Kai Yu, and Thomas Huang. In ECCV Improved Local Coordinate Coding using Local Tangents, Kai Yu and Tong Zhang. In ICML Supervised translation-invariant sparse coding, Jianchao Yang, Kai Yu, and Thomas Huang, In CVPR 2010 Learning locality-constrained linear coding for image classification, Jingjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas Huang. In CVPR Nonlinear learning using local coordinate coding, Kai Yu, Tong Zhang, and Yihong Gong. In NIPS Linear spatial pyramid matching using sparse coding for image classification, Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. In CVPR 2009.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google