Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling.

Similar presentations


Presentation on theme: "CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling."— Presentation transcript:

1 CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling

2 R ECAP OF L ECTURE III Blob detection Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region Learning local image descriptors (optional reading)

3 O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

4 L ECTURE IV: P ART I

5 B AG - OF - FEATURES MODELS

6 O RIGIN 1: T EXTURE RECOGNITION Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

7 O RIGIN 1: T EXTURE RECOGNITION Universal texton dictionary histogram Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

8 O RIGIN 2: B AG - OF - WORDS MODELS Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

9 O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

10 O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

11 O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

12 1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words” B AG - OF - FEATURES STEPS

13 1. F EATURE EXTRACTION Regular grid or interest regions

14 Normalize patch Detect patches Compute descriptor Slide credit: Josef Sivic 1. F EATURE EXTRACTION

15 … Slide credit: Josef Sivic

16 2. L EARNING THE VISUAL VOCABULARY … Slide credit: Josef Sivic

17 2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic

18 2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic Visual vocabulary

19 K- MEANS CLUSTERING Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it

20 C LUSTERING AND VECTOR QUANTIZATION Clustering is a common method for learning a visual vocabulary or codebook Unsupervised learning process Each cluster center produced by k-means becomes a codevector Codebook can be learned on separate training set Provided the training set is sufficiently representative, the codebook will be “universal” The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook Codebook = visual vocabulary Codevector = visual word

21 E XAMPLE CODEBOOK … Source: B. Leibe Appearance codebook

22 A NOTHER CODEBOOK Appearance codebook … … … … … Source: B. Leibe

23 Yet another codebook Fei-Fei et al. 2005

24 V ISUAL VOCABULARIES : I SSUES How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)

25 S PATIAL PYRAMID REPRESENTATION Extension of a bag of features Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

26 Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

27 Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

28 S CENE CATEGORY DATASET Multi-class classification results (100 training images per class)

29 C ALTECH 101 DATASET http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Multi-class classification results (30 training images per class)

30 B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Space-time interest points

31 B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

32 I MAGE CLASSIFICATION Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

33 L ECTURE IV: P ART II

34 O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

35 H ARD QUANTIZATION Slides credit: Cao & Feris

36 S OFT QUANTIZATION BASED ON UNCERTAINTY Quantize each local feature into multiple code-words that are closest to it by appropriately splitting its weight Gemert et at., Visual Word Ambiguity, PAMI 2009

37 S OFT Q UANTIZATION Hard quantization Soft quantization

38 S OME EXPERIMENTS ON SOFT QUANTIZATION Improvement on classification rate from soft quantization

39 S PARSE CODING Hard quantization is an “extremely sparse representation Generalizing from it, we may consider to solve for soft quantization, but it is hard to solve In practice, we consider solving the sparse coding as for quantization

40 S OFT QUANTIZATION AND POOLING [Yang et al, 2009]: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification

41 O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

42 Q UIZ What is the potential shortcoming of bag-of-feature representation based on quantization and pooling?

43 M ODELING THE FEATURE DISTRIBUTION The bag-of-feature histogram represents the distribution of a set of features The quantization steps introduces information loss! How about modeling the distribution without quantization? Gaussian mixture model

44 Q UIZ

45 Multivariate Gaussian Define precision to be the inverse of the covariance In 1-dimension Slides credit: C. M. Bishop T HE G AUSSIAN D ISTRIBUTION mean covariance

46 L IKELIHOOD F UNCTION Data set Assume observed data points generated independently Viewed as a function of the parameters, this is known as the likelihood function Slides credit: C. M. Bishop

47 M AXIMUM L IKELIHOOD Set the parameters by maximizing the likelihood function Equivalently maximize the log likelihood Slides credit: C. M. Bishop

48 M AXIMUM L IKELIHOOD S OLUTION Maximizing w.r.t. the mean gives the sample mean Maximizing w.r.t covariance gives the sample covariance Slides credit: C. M. Bishop

49 B IAS OF M AXIMUM L IKELIHOOD Consider the expectations of the maximum likelihood estimates under the Gaussian distribution The maximum likelihood solution systematically under- estimates the covariance This is an example of over-fitting Slides credit: C. M. Bishop

50 I NTUITIVE E XPLANATION OF O VER - FITTING Slides credit: C. M. Bishop

51 U NBIASED V ARIANCE E STIMATE Clearly we can remove the bias by using since this gives For an infinite data set the two expressions are equal Slides credit: C. M. Bishop

52 BCS Summer School, Exeter, 2003Christopher M. Bishop G AUSSIAN M IXTURES Linear super-position of Gaussians Normalization and positivity require Can interpret the mixing coefficients as prior probabilities

53 E XAMPLE : M IXTURE OF 3 G AUSSIANS Slides credit: C. M. Bishop

54 C ONTOURS OF P ROBABILITY D ISTRIBUTION Slides credit: C. M. Bishop

55 S URFACE P LOT Slides credit: C. M. Bishop

56 S AMPLING FROM THE G AUSSIAN To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point Slides credit: C. M. Bishop

57 S YNTHETIC D ATA S ET Slides credit: C. M. Bishop

58 F ITTING THE G AUSSIAN M IXTURE We wish to invert this process – given the data set, find the corresponding parameters: mixing coefficients, means, and covariances If we knew which component generated each data point, the maximum likelihood solution would involve fitting each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables Slides credit: C. M. Bishop

59 S YNTHETIC D ATA S ET W ITHOUT L ABELS Slides credit: C. M. Bishop

60 P OSTERIOR P ROBABILITIES We can think of the mixing coefficients as prior probabilities for the components For a given value of we can evaluate the corresponding posterior probabilities, called responsibilities These are given from Bayes’ theorem by Slides credit: C. M. Bishop

61 P OSTERIOR P ROBABILITIES ( COLOUR CODED ) Slides credit: C. M. Bishop

62 P OSTERIOR P ROBABILITY M AP Slides credit: C. M. Bishop

63 M AXIMUM L IKELIHOOD FOR THE GMM The log likelihood function takes the form Note: sum over components appears inside the log There is no closed form solution for maximum likelihood! Slides credit: C. M. Bishop

64 O VER - FITTING IN G AUSSIAN M IXTURE M ODELS Singularities in likelihood function when a component ‘collapses’ onto a data point: then consider Likelihood function gets larger as we add more components (and hence parameters) to the model not clear how to choose the number K of components Slides credit: C. M. Bishop

65 P ROBLEMS AND S OLUTIONS How to maximize the log likelihood solved by expectation-maximization (EM) algorithm How to avoid singularities in the likelihood function solved by a Bayesian treatment How to choose number K of components also solved by a Bayesian treatment Slides credit: C. M. Bishop

66 EM A LGORITHM – I NFORMAL D ERIVATION Let us proceed by simply differentiating the log likelihood Setting derivative with respect to equal to zero gives giving which is simply the weighted mean of the data Slides credit: C. M. Bishop

67 EM A LGORITHM – I NFORMAL D ERIVATION Similarly for the co-variances For mixing coefficients use a Lagrange multiplier to give Slides credit: C. M. Bishop

68 EM A LGORITHM – I NFORMAL D ERIVATION The solutions are not closed form since they are coupled Suggests an iterative scheme for solving them: Make initial guesses for the parameters Alternate between the following two stages: 1. E-step: evaluate responsibilities 2. M-step: update parameters using ML results Slides credit: C. M. Bishop

69 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

70 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

71 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

72 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

73 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

74 BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

75 T HE SUPERVECTOR REPRESENTATION (1) Given a set of features from a set of images, train a Gaussian mixture model. This is called a Universal Background Model. Given the UBM and a set of features from a single image, adapt the UBM to the image feature set by Bayesian EM (check equations in paper below). Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization Origin distributionNew distribution

76 T HE SUPERVECTOR REPRESENTATION (2) Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization


Download ppt "CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling."

Similar presentations


Ads by Google