Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Image Search and Retrieval using Compact Binary Codes

Similar presentations


Presentation on theme: "Efficient Image Search and Retrieval using Compact Binary Codes"— Presentation transcript:

1 Efficient Image Search and Retrieval using Compact Binary Codes
Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.)

2 Large scale image search
Internet contains many billions of images How can we search them, based on visual content? The Challenge: Need way of measuring similarity between images Needs to scale to Internet

3 Existing approaches to Content-Based Image Retrieval
Focus of scaling rather than understanding image Variety of simple/hand-designed cues: Color and/or Texture histograms, Shape, PCA, etc. Various distance metrics Earth Movers Distance (Rubner et al. ‘98) Most recognition approaches slow (~1sec/image)

4 Our Approach DO BOTH TOGETHER Learn the metric from training data
Use compact binary codes for speed DO BOTH TOGETHER

5 Large scale image/video search
Representation must fit in memory (disk too slow) Facebook has ~10 billion images (1010) PC has ~10 Gbytes of memory (1011 bits)  Budget of 101 bits/image YouTube has ~ a trillion video frames (1012) Big cluster of PCs has ~10 Tbytes (1014 bits)  Budget of 102 bits/frame

6 Binary codes for images
Want images with similar content to have similar binary codes Use Hamming distance between codes Number of bit flips E.g.: Semantic Hashing [Salakhutdinov & Hinton, 2007] Text documents Ham_Dist( , )=1 Ham_Dist( , )=3

7 Semantic Hashing Semantic Hash Function Binary code
[Salakhutdinov & Hinton, 2007] for text documents Query Image Semantic Hash Function Address Space Binary code Images in database Query address Semantically similar images Quite different to a (conventional) randomizing hash

8 Semantic Hashing Each image code is a memory address
Find neighbors by exploring Hamming ball around query address Address Space Lookup time is independent of # of data points Depends on radius of ball & length of code: Images in database Query address Code length Choose Radius

9 Code requirements Similar images  Similar Codes
Very compact (<102 bits/image) Fast to compute Does NOT have to reconstruct image Three approaches: Locality Sensitive Hashing (LSH) Boosting Restricted Boltzmann Machines (RBM’s)

10 Input Image representation: Gist vectors
Pixels not a convenient representation Use Gist descriptor instead (Oliva & Torralba, 2001) 512 dimensions/image (real-valued  16,384 bits) L2 distance btw. Gist vectors not bad substitute for human perceptual distance NO COLOR INFORMATION Oliva & Torralba, IJCV 2001

11 1. Locality Sensitive Hashing
Gionis, A. & Indyk, P. & Motwani, R. (1999) Take random projections of data Quantize each projection with few bits 1 101 1 1 Gist descriptor No learning involved

12 2. Boosting Learn threshold & dimension for each bit (weak classifier)
Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003] Positive examples are pairs of similar images Negative examples are pairs of unrelated images 1 1 Learn threshold & dimension for each bit (weak classifier) 1

13 3. Restricted Boltzmann Machine (RBM)
Type of Deep Belief Network Hinton & Salakhutdinov, Science 2006 Hidden units Visible units Symmetric weights Single RBM layer W Attempts to reconstruct input at visible layer from activation of hidden layer

14 Multi-Layer RBM: non-linear dimensionality reduction
Output binary code (N dimensions) Layer 3 N w3 256 Layer 2 256 w2 512 Layer 1 512 w1 512 Linear units at first layer Input Gist vector (512 dimensions)

15 Training RBM models 1st Phase: Pre-training 2nd Phase: Fine-tuning
Unsupervised Can use unlabeled data (unlimited quantity) Learn parameters greedily per layer Gets them to right ballpark 2nd Phase: Fine-tuning Supervised Requires labeled data (limited quantity) Back propagate gradients of chosen error function Moves parameters to local minimum

16 Greedy pre-training (Unsupervised)
Layer 1 512 w1 512 Input Gist vector (512 real dimensions)

17 Greedy pre-training (Unsupervised)
Layer 2 256 w2 512 Activations of hidden units from layer 1 (512 binary dimensions)

18 Greedy pre-training (Unsupervised)
Layer 3 N w3 256 Activations of hidden units from layer 2 (256 binary dimensions)

19 Fine-tuning: back-propagation of Neighborhood Components Analysis objective
Output binary code (N dimensions) w1 + ∆ w1 w2 + ∆ w2 w3 + ∆w3 Layer 3 N w3 256 Layer 2 256 w2 512 Layer 1 512 w1 512 Input Gist vector (512 real dimensions)

20 Neighborhood Components Analysis
Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Tries to preserve neighborhood structure of input space Assumes this structure is given (will explain later) Toy example with 2 classes & N=2 units at top of network: Points in output space (coordinate is activation probability of unit)

21 Neighborhood Components Analysis
Adjust network parameters (weights and biases) to move: Points of SAME class closer Points of DIFFERENT class away

22 Neighborhood Components Analysis
Adjust network parameters (weights and biases) to move: Points of SAME class closer Points of DIFFERENT class away Points close in input space (Gist) will be close in output code space

23 Simple Binarization Strategy
Deliberately add noise Simple Binarization Strategy Set threshold - e.g. use median 1 1

24 Overall Query Scheme RBM <10μs Image 1 Binary code Retrieved images
Semantic Hash Query Image Gist descriptor Compute Gist ~1ms (in Matlab)

25 Retrieval Experiments

26 Test set 1: LabelMe 22,000 images (20,000 train | 2,000 test)
Ground truth segmentations for all Can define ground truth distance btw. images using these segmentations

27 Defining ground truth Boosting and NCA back-propagation require ground truth distance between images Define this using labeled images from LabelMe

28 Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)

29 Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005) Car Sky Tree Road Building Car Tree Road Building Car Sky Tree Road Building Car Tree Road Building Car Sky Tree Road Building Car Tree Road Building Varying spatial resolution to capture approximate spatial correspondance

30 Examples of LabelMe retrieval
12 closest neighbors under different distance metrics

31 LabelMe Retrieval Size of retrieval set
% of 50 true neighbors in retrieval set 0 2, , ,0000 Size of retrieval set

32 LabelMe Retrieval Number of bits Size of retrieval set
% of 50 true neighbors in retrieval set % of 50 true neighbors in first 500 retrieved 0 2, , ,0000 Number of bits Size of retrieval set

33 Test set 2: Web images 12.9 million images Collected from Internet
No labels, so use Euclidean distance between Gist vectors as ground truth distance

34 Web images retrieval Size of retrieval set
% of 50 true neighbors in retrieval set Size of retrieval set

35 Web images retrieval Size of retrieval set Size of retrieval set
% of 50 true neighbors in retrieval set % of 50 true neighbors in retrieval set Size of retrieval set Size of retrieval set

36 Examples of Web retrieval
12 neighbors using different distance metrics

37 Retrieval Timings

38 Summary Explored various approaches to learning binary codes for hashing-based retrieval Very quick with performance comparable to complex descriptors More recent work on binarization Spectral Hashing (Weiss, Torralba, Fergus NIPS 2009)


Download ppt "Efficient Image Search and Retrieval using Compact Binary Codes"

Similar presentations


Ads by Google