Download presentation
Presentation is loading. Please wait.
1
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University
2
Outline Introduction Methods Experiment Conclusion
3
Outline Introduction Methods Experiment Conclusion
4
Summary Goal – efficient image search(real time on web-sized) and fast, just require little memory, enable on standard hardware or handheld devices Approach – Use machine learning to convert Gist descriptor to a compact binary code with a few hundred bits per image
5
Gist descriptor Global image representation Describe the shapes occurring in an image with one descriptor – Subdivide image in 4×4 sub images – Calculate Gabor responses in each of these – Create histograms of Gabor responses in each sub image Slide by James Hays and Alexei Efros
6
Gist descriptor Slide by James Hays and Alexei Efros
7
Gist descriptor In this paper – 8 orientations,4 frequency = 4×8×16 = 512 dimensional vector. – For smaller images (32×32 pixels), use 3 frequency = 3×8×16 = 384 dimensions.
8
Binary Code Three reason – compression, it’s possible to represent images with a very small number of bits and still maintain the information for recognition
9
Binary Code – scaling up to web-size databases requires doing the calculations in memory. Fitting hundreds of millions of images into a few GB of memory means we have a budget of very few bytes per image. – short binary codes allow very fast querying in standard hardware, either using hash tables or efficient bit-count operations
10
Locality Sensitive Hashing (LSH) high dimensional Euclidean space – finds nearest neighbors in constant time a number of random projections of that point into R1 – each projection contributes a few bits when the number of bits is fixed and small – LSH can perform quite poorly In this paper – N = 30 bits
11
Outline Introduction Methods Experiment Conclusion
12
Learning binary codes A database of images {xi} a distance function D(i, j) a binary feature vector yi = f(xi) Hamming distance N100(xi) - the 100 nearest neighbors of xi according to the distance function D(i, j) N100(yi) - the 100 descriptors yj that are closest to yi in terms of Hamming distance we would like N100(xi) = N100(yi) for all examples in our training set
13
BoostSSC Boosting similarity sensitive coding Learn original input space into a new space – distances between images can be computed using a weighted Hamming distance. Binary feature(M bits) – weighted Hamming distance –
14
BoostSSC positive examples – pairs of images xi, xj, j ∈ N(xi). Negative examples – pairs of images that are not neighbors regression stump –
15
BoostSSC Minimize the square loss – – K is the number of training pairs – Zk = 1, if the two images are neighbors; = −1, otherwise – In this paper – – M around 30 bits
16
Restricted Boltzmann Machines Network of binary stochastic units – weights W, bias b Hidden units: h Symmetric weights: w Visible units: v
17
Restricted Boltzmann Machines A probability can be assigned to a binary vector at the visible units – Convenient conditional distributions – – Learn weights and biases using Contrastive Divergence
18
Multi‐Layer RBM architecture
19
Training RBM models Pre‐training – Unsupervised – Use Contrastive Divergence to learn weights and biases – Gets parameters to right ballpark Fine‐tuning – Supervised – No longer stochastic – Backpropagate error to update parameters – Moves parameters to local minimum
20
Outline Introduction Methods Experiment Conclusion
21
Two test datasets LabelMe – 22,000 images – Ground truth segmentations for all – Can define distance between images using these segmentations Web data[28] – 12.9 million images 32 × 32 colorimages – Subset of 80 million images – No labels, so use L2 distance between GIST vectors as ground truth [28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007- 024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007.
22
LabelMe retrieval
23
what ground truth semantic similarity is – spatial pyramid matching over object labels
24
LabelMe retrieval
25
On 2000 test images, N = 50
26
Web images retrieval
28
Retrieval speed evaluation Using multi-threading (M/T) on a quad-core
30
Pixel label On 2000 test images
31
Web images recognition On 2000 test images
32
Outline Introduction Methods Experiment Conclusion
33
Possible to build compact codes for retrieval – Fast and small on standard PC – Suitable for use on large database – Much room for improvement
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.