Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University

Similar presentations


Presentation on theme: "Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University"— Presentation transcript:

1 Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch

2 Problem The Myth of Human Vision Can you understand the following? 婴儿बचचा People may have difficulties to understand different texts, but NOT images. 아가 bebé Photo courtesy to luster Visual Recognition And Search 2 Columbia University, Spring 2013

3 Problem Can the computer vision system recognize objects or scenes like human? Visual Recognition And Search 3 Columbia University, Spring 2013

4 Why This Problem Important? Traditional companiesSearching engines Mobile Apps Visual Recognition And Search 4 Columbia University, Spring 2013

5 Problem Examples of Object Recognition Dataset http://www.vision.caltech.edu Visual Recognition And Search 5 Columbia University, Spring 2013

6 Problem http://groups.csail.mit.edu/vision/SUN/ Visual Recognition And Search 6 Columbia University, Spring 2013

7 Overview of Classification Model Coding Histogram of SIFT Vector Uncertainty- Based Quantization Soft Sparse Coding Soft Fisher vector/ Supervector GMM probability quantization Pooling quantization quantization estimation HistogramGMM Histogram aggregation aggregation Max pooling adptation Coding : to map local features into a compact representation Pooling : to aggregate these compact representation together Visual Recognition And Search 7 Columbia University, Spring 2013

8 Outline Outlines Histogram of local features Bag of words model Soft quantization and sparse coding Fisher vector and supervector Visual Recognition And Search 8 Columbia University, Spring 2013

9 Histogram of Local Features Visual Recognition And Search 9 Columbia University, Spring 2013

10 Bag of Words Models Recall of Last Class Powerful local features - DoG - Hessian, Harris - Dense-sampling Non-fixed number of local regions per image! Visual Recognition And Search 10 Columbia University, Spring 2013

11 Bag of Words Models Recall of Last Class (2) Histograms can provide a fixed size representation of images Spatial pyramid/gridding can enhance histogram presentation with spatial information Visual Recognition And Search 11 Columbia University, Spring 2013

12 Bag of Words Models Histogram of Local Features ….. codewords dim = # codewords Visual Recognition And Search 12 Columbia University, Spring 2013

13 Bag of Words Models Histogram of Local Features (2) …… dim = #codewords x #grids Visual Recognition And Search 13 Columbia University, Spring 2013

14 Bag of Words Models Local Feature Quantization … Slide courtesy to Fei-Fei Li Visual Recognition And Search 14 Columbia University, Spring 2013

15 Bag of Words Models Local Feature Quantization … Visual Recognition And Search 15 Columbia University, Spring 2013

16 Bag of Words Models Local Feature Quantization … - Vector quantization - Dictionary learning Visual Recognition And Search 16 Columbia University, Spring 2013

17 Histogram of Local Features Dictionary for Codewords Visual Recognition And Search 17 Columbia University, Spring 2013

18 Bag of Words Models Most slides in this section are courtesy to Fei-Fei Li Visual Recognition And Search 18 Columbia University, Spring 2013

19 ObjectBag of ‘words’ Visual Recognition And Search 19 Columbia University, Spring 2013

20 Bag of Words Models Underlining Assumptions - Text Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messa For a l image sensory, brain, center China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a pred compa $660b China, trade, annoy movie image discov know t percep more c followi to the Hubel demon image wise a stored has its a spec image. visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China' delibe agree yuan i gover also n dema countr yuan a permit the US freely. it will t allowi surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value Visual Recognition And Search 20 Columbia University, Spring 2013

21 Bag of Words Models Underlining Assumptions - Image Visual Recognition And Search 21 Columbia University, Spring 2013

22 learning feature detection & representation image representation category models (and/or) classifiers recognition K-means category decision

23 Bag of Words Models Borrowing Techniques from Text Classification PLSA Naïve Bayesian Model w n : each patch in an image - w n = [0,0,…1,…,0,0 T w: a collection of all N patches in an image - w = [w 1,w 2,…,w N ] d j : the j th image in an image collection c: category of the image z: theme or topic of the patch Visual Recognition And Search 23 Columbia University, Spring 2013

24 Bag of Words Models Probabilistic Latent Semantic Analysis (pLSA) d z w N D Hoffman, 2001 Latent Dirichlet Allocation (LDA) c  z w N D Blei et al., 2001 Visual Recognition And Search 24 Columbia University, Spring 2013

25 Bag of Words Models Probabilistic Latent Semantic Analysis (pLSA) d z w N D “face” Visual Recognition And Search 25 Col Sivic et al. ICCV 200 5

26 Bag of Words Models K dzwdzw N D p ( w | d i j ))  k1k1 p(wp(w i |z|z k )p(z)p(z k | d j ) Observed codeword Codeword distributions Theme distributions distributions per theme (topic) per image Parameter estimated by EM or Gibbs sampling Visual Recognition And Search 26 Columbia University, Spring 2013 Slide credit: Josef Sivic

27 Bag of Words Models Recognition using pLSA z   arg max p ( z | d ) z Visual Recognition And Search 27 Columbia Un Slide credit: Josef Siv ic

28 Bag of Words Models Scene Recognition using LDA “beach” Latent Dirichlet Allocation (LDA) c  z w N D Fei-Fei et al. ICCV 2005 Visual Recognition And Search 28 Columbia University, Spring 2013

29 Bag of Words Models Spatial-Coherent Latent Topic Model Cao and Fei-Fei, ICCV 2007 Visual Recognition And Search 29 Columbia University, Spring 2013

30 Bag of Words Models Simultaneous Segmentation and Recognition Visual Recognition And Search 30 Columbia University, Spring 2013

31 Bag of Words Models Pros and Cons Images differ from texts! Bag of Words Models are good in - Modeling prior knowledge - Providing intuitive interpretation But these models suffer from - Loss of information in quantization of “visual words” - Loss of spatial information Better coding Better pooling Visual Recognition And Search 31 Columbia University, Spring 2013

32 Soft Quantization and Sparse Coding Visual Recognition And Search 32 Columbia University, Spring 2013

33 Soft Quantization Hard Quantization Visual Recognition And Search 33 Columbia University, Spring 2013

34 Soft Quantization Uncertainty-Based Quantization Model the uncertainty across multiple codewords Gemert et al, Visual Word Ambiguity, PAMI 2009 Visual Recognition And Search 34 Columbia University, Spring 2013

35 Soft Quantization Intuition of UNC Hard quantization Soft quantization based on uncertainty Gemert et al, Visual Word Ambiguity, PAMI 2009 Visual Recognition And Search 35 Columbia University, Spring 2013

36 Soft Quantization Improvement of UNC Visual Recognition And Search 36 Columbia University, Spring 2013

37 Soft Quantization Sparse Coding-Based Quantization Hard quantization can be viewed as an “extremely sparse representation” s.t. A more general but hard to solve representation In practice we consider Sparse coding Visual Recognition And Search 37 Columbia University, Spring 2013

38 Soft Quantization Sparse Coding-Based Quantization Hard quantization can be viewed as an “extremely sparse representation” s.t. A more general but hard to solve representation In practice we consider Sparse coding Visual Recognition And Search 38 Columbia University, Spring 2013

39 Soft Quantization Yang et al obtain good recognition accuracy by combining sparse coding with spatial pyramid and dictionary training. More details will be in group presentation. Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009 Visual Recognition And Search 39 Columbia University, Spring 2013

40 Fisher Vector and Supervector One of the most powerful image/video classification techniques Thanks to Zhen Li and Qiang Chen constructive suggestions to this section Visual Recognition And Search 40 Columbia University, Spring 2013

41 Fisher Vector and Supervector Winning Systems 2009201020112012 Classification task 2010 2011 Large Scale Visual Recognition Challenge Visual Recognition And Search 41 Columbia University, Spring 2013

42 Fisher Vector and Supervector Literature [1] Yan et al, Regression from patch-kernel. CVPR 2008 [2] Zhou et al, SIFT-Bag kernel for video event analysis. Supervector Fisher Vector ACM Multimedia 2008 [3] Zhou et al, Hierarchical Gaussianization for image classification. ICCV 2009: 1971-1977 [4] Zhou et al, Image classification using super-vector coding of local image descriptors. ECCV 2010 [5] Perronnin et al, Improving the Fisher kernel for large-scale image classification, ECCV 2010 [6] Jégou et al, Aggregating local image descriptors into compact codes PAMI 2011. These papers are not very easy to read. Let me take a simplified perspective via coding & pooling framework Visual Recognition And Search 42 Columbia University, Spring 2013

43 Fisher Vector and Supervector Coding without Information Loss Coding with hard assignment Coding with soft assignment How to keep all the information? Visual Recognition And Search 43 Columbia University, Spring 2013

44 Fisher Vector and Supervector Coding without Information Loss Coding with hard assignment Coding with soft assignment How to keep all the information? Visual Recognition And Search 44 Columbia University, Spring 2013

45 Fisher Vector and Supervector An Intuitive Illustration Coding Visual Recognition And Search 45 Columbia University, Spring 2013

46 Fisher Vector and Supervector An Intuitive Illustration Coding Component 1 Component 2Component 3 Visual Recognition And Search 46 Columbia University, Spring 2013

47 Fisher Vector and Supervector An Intuitive Illustration Component 1 + + Pooling Component 2 + + Component 3 + + Visual Recognition And Search 47 Columbia University, Spring 2013

48 Fisher Vector and Supervector Implementation of Supervector In speech (speaker identification), supervector refer to stacked means of adaptive GMMs. Supervector = Visual Recognition And Search 48 Columbia University, Spring 2013

49 Fisher Vector and Supervector Interpretation with Supervector Origin distributionNew distribution Picture from Reynolds, Quatieri, and Dunn, DSP, 2001 Visual Recognition And Search 49 Columbia University, Spring 2013

50 Fisher Vector and Supervector Normalization of Supervector In practice, a normalization process using the covariance matrix often improves the performance Moreover, we can subtract the original mean vector for the ease of normalization The representation is also called Hierarchical Gaussianization (HG). Visual Recognition And Search 50 Columbia University, Spring 2013

51 Fisher Vector and Supervector Fisher Vector Letbe the probability density function with para [Jaakkola and Haussler, NIPS 98] suggested X can be described by the derivative subject to Now we can define the Fisher Kernel whereis called Fisher information matrix Visual Recognition And Search 51 Columbia University, Spring 2013

52 Fisher Vector and Supervector Fisher Vector with GMM Consider the Gaussian Mixture Model We consider With GMM, Fisher vectors can be obtained: Let The Fisher vector Visual Recognition And Search 52 Columbia University, Spring 2013

53 Fisher Vector and Supervector Comparison Supervector Fisher vector Visual Recognition And Search 53 Columbia University, Spring 2013

54 Fisher Vector and Supervector Comparison Diagonal covariance matrix Posterior estimation of Diagonal covariance with same derivation The two representations are almost the same even with different motivations. Visual Recognition And Search 54 Columbia University, Spring 2013

55 Fisher Vector and Supervector How to Code Your Own Learn from existing code http://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac) Learn from public GMM code Be careful with pitfalls - Probability is comparable to machine’s rounding error: compute logP instead of P - Try different normalization strategy - Try to make the code efficient Visual Recognition And Search 55 Columbia University, Spring 2013

56 Summary Coding Histogram of SIFT Vector Uncertainty- Based Quantization Soft Sparse Coding Soft Fisher vector/ Supervector GMM probability quantization Pooling quantization quantization estimation HistogramGMM Histogram aggregation aggregation Max pooling adptation Visual Recognition And Search 56 Columbia University, Spring 2013

57 Visual Recognition And Search 57 Columbia University, Spring 2013

58 Summary Simplest model: histogram of visual words Models with good illustration: PLSA, LDA, S-LTM, … Soft quantization: soft quantization and sparse coding Very good performance: fisher vector or super vector Visual Recognition And Search 58 Columbia University, Spring 2013


Download ppt "Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS 6890 - Topics in Information Processing Spring 2013, Columbia University"

Similar presentations


Ads by Google