Presentation is loading. Please wait.

Presentation is loading. Please wait.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Similar presentations


Presentation on theme: "Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University."— Presentation transcript:

1 Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

2 Outline Introduction Methods Experiment Conclusion

3 Outline Introduction Methods Experiment Conclusion

4 Summary Goal – efficient image search(real time on web-sized) and fast, just require little memory, enable on standard hardware or handheld devices Approach – Use machine learning to convert Gist descriptor to a compact binary code with a few hundred bits per image

5 Gist descriptor Global image representation Describe the shapes occurring in an image with one descriptor – Subdivide image in 4×4 sub images – Calculate Gabor responses in each of these – Create histograms of Gabor responses in each sub image Slide by James Hays and Alexei Efros

6 Gist descriptor Slide by James Hays and Alexei Efros

7 Gist descriptor In this paper – 8 orientations,4 frequency = 4×8×16 = 512 dimensional vector. – For smaller images (32×32 pixels), use 3 frequency = 3×8×16 = 384 dimensions.

8 Binary Code Three reason – compression, it’s possible to represent images with a very small number of bits and still maintain the information for recognition

9 Binary Code – scaling up to web-size databases requires doing the calculations in memory. Fitting hundreds of millions of images into a few GB of memory means we have a budget of very few bytes per image. – short binary codes allow very fast querying in standard hardware, either using hash tables or efficient bit-count operations

10 Locality Sensitive Hashing (LSH) high dimensional Euclidean space – finds nearest neighbors in constant time a number of random projections of that point into R1 – each projection contributes a few bits when the number of bits is fixed and small – LSH can perform quite poorly In this paper – N = 30 bits

11 Outline Introduction Methods Experiment Conclusion

12 Learning binary codes A database of images {xi} a distance function D(i, j) a binary feature vector yi = f(xi) Hamming distance N100(xi) - the 100 nearest neighbors of xi according to the distance function D(i, j) N100(yi) - the 100 descriptors yj that are closest to yi in terms of Hamming distance we would like N100(xi) = N100(yi) for all examples in our training set

13 BoostSSC Boosting similarity sensitive coding Learn original input space into a new space – distances between images can be computed using a weighted Hamming distance. Binary feature(M bits) – weighted Hamming distance –

14 BoostSSC positive examples – pairs of images xi, xj, j ∈ N(xi). Negative examples – pairs of images that are not neighbors regression stump –

15 BoostSSC Minimize the square loss – – K is the number of training pairs – Zk = 1, if the two images are neighbors; = −1, otherwise – In this paper – – M around 30 bits

16 Restricted Boltzmann Machines Network of binary stochastic units – weights W, bias b Hidden units: h Symmetric weights: w Visible units: v

17 Restricted Boltzmann Machines A probability can be assigned to a binary vector at the visible units – Convenient conditional distributions – – Learn weights and biases using Contrastive Divergence

18 Multi‐Layer RBM architecture

19 Training RBM models Pre‐training – Unsupervised – Use Contrastive Divergence to learn weights and biases – Gets parameters to right ballpark Fine‐tuning – Supervised – No longer stochastic – Backpropagate error to update parameters – Moves parameters to local minimum

20 Outline Introduction Methods Experiment Conclusion

21 Two test datasets LabelMe – 22,000 images – Ground truth segmentations for all – Can define distance between images using these segmentations Web data[28] – 12.9 million images 32 × 32 colorimages – Subset of 80 million images – No labels, so use L2 distance between GIST vectors as ground truth [28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007- 024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007.

22 LabelMe retrieval

23 what ground truth semantic similarity is – spatial pyramid matching over object labels

24 LabelMe retrieval

25 On 2000 test images, N = 50

26 Web images retrieval

27

28 Retrieval speed evaluation Using multi-threading (M/T) on a quad-core

29

30 Pixel label On 2000 test images

31 Web images recognition On 2000 test images

32 Outline Introduction Methods Experiment Conclusion

33 Possible to build compact codes for retrieval – Fast and small on standard PC – Suitable for use on large database – Much room for improvement


Download ppt "Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University."

Similar presentations


Ads by Google