Efficient Image Search and Retrieval using Compact Binary Codes

Slides:



Advertisements
Similar presentations
Semi-Supervised Learning in Gigantic Image Collections
Advertisements

CSC2535: 2013 Advanced Machine Learning Lecture 8b Image retrieval using multilayer neural networks Geoffrey Hinton.
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
CS590M 2008 Fall: Paper Presentation
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Advanced topics.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
Presented by Arshad Jamal, Rajesh Dhania, Vinkal Vishnoi Active hashing and its application to image and text retrieval Yi Zhen, Dit-Yan Yeung, Published.
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Discriminative and generative methods for bags of features
Large-scale matching CSE P 576 Larry Zitnick
Recognition: A machine learning approach
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
1 Large Scale Similarity Learning and Indexing Part II: Learning to Hash for Large Scale Search Fei Wang and Jun Wang IBM TJ Watson Research Center.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.
Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.
Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Discriminative and generative methods for bags of features
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Indexing Techniques Mei-Chen Yeh.
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CSE 185 Introduction to Computer Vision Pattern Recognition.
How to do backpropagation in a brain
Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,
Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Fast Similarity Search for Learned Metrics Prateek Jain, Brian Kulis, and Kristen Grauman Department of Computer Sciences University of Texas at Austin.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Minimal Loss Hashing for Compact Binary Codes
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU) Jon Barron (NYU/UC Berkeley) Antonio Torralba (MIT) Yair Weiss (Hebrew.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Chapter 18 Connectionist Models
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Naifan Zhuang, Jun Ye, Kien A. Hua
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Ananya Das Christman CS311 Fall 2016
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Restricted Boltzmann Machines for Classification
Structure learning with deep autoencoders
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Object Modeling with Layers
Rob Fergus Computer Vision
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Minwise Hashing and Efficient Search
Presentation transcript:

Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.)

Large scale image search Internet contains many billions of images How can we search them, based on visual content? The Challenge: Need way of measuring similarity between images Needs to scale to Internet

Existing approaches to Content-Based Image Retrieval Focus of scaling rather than understanding image Variety of simple/hand-designed cues: Color and/or Texture histograms, Shape, PCA, etc. Various distance metrics Earth Movers Distance (Rubner et al. ‘98) Most recognition approaches slow (~1sec/image)

Our Approach DO BOTH TOGETHER Learn the metric from training data Use compact binary codes for speed DO BOTH TOGETHER

Large scale image/video search Representation must fit in memory (disk too slow) Facebook has ~10 billion images (1010) PC has ~10 Gbytes of memory (1011 bits)  Budget of 101 bits/image YouTube has ~ a trillion video frames (1012) Big cluster of PCs has ~10 Tbytes (1014 bits)  Budget of 102 bits/frame

Binary codes for images Want images with similar content to have similar binary codes Use Hamming distance between codes Number of bit flips E.g.: Semantic Hashing [Salakhutdinov & Hinton, 2007] Text documents Ham_Dist(10001010,10001110)=1 Ham_Dist(10001010,11101110)=3

Semantic Hashing Semantic Hash Function Binary code [Salakhutdinov & Hinton, 2007] for text documents Query Image Semantic Hash Function Address Space Binary code Images in database Query address Semantically similar images Quite different to a (conventional) randomizing hash

Semantic Hashing Each image code is a memory address Find neighbors by exploring Hamming ball around query address Address Space Lookup time is independent of # of data points Depends on radius of ball & length of code: Images in database Query address Code length Choose Radius

Code requirements Similar images  Similar Codes Very compact (<102 bits/image) Fast to compute Does NOT have to reconstruct image Three approaches: Locality Sensitive Hashing (LSH) Boosting Restricted Boltzmann Machines (RBM’s)

Input Image representation: Gist vectors Pixels not a convenient representation Use Gist descriptor instead (Oliva & Torralba, 2001) 512 dimensions/image (real-valued  16,384 bits) L2 distance btw. Gist vectors not bad substitute for human perceptual distance NO COLOR INFORMATION Oliva & Torralba, IJCV 2001

1. Locality Sensitive Hashing Gionis, A. & Indyk, P. & Motwani, R. (1999) Take random projections of data Quantize each projection with few bits 1 101 1 1 Gist descriptor No learning involved

2. Boosting Learn threshold & dimension for each bit (weak classifier) Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003] Positive examples are pairs of similar images Negative examples are pairs of unrelated images 1 1 Learn threshold & dimension for each bit (weak classifier) 1

3. Restricted Boltzmann Machine (RBM) Type of Deep Belief Network Hinton & Salakhutdinov, Science 2006 Hidden units Visible units Symmetric weights Single RBM layer W Attempts to reconstruct input at visible layer from activation of hidden layer

Multi-Layer RBM: non-linear dimensionality reduction Output binary code (N dimensions) Layer 3 N w3 256 Layer 2 256 w2 512 Layer 1 512 w1 512 Linear units at first layer Input Gist vector (512 dimensions)

Training RBM models 1st Phase: Pre-training 2nd Phase: Fine-tuning Unsupervised Can use unlabeled data (unlimited quantity) Learn parameters greedily per layer Gets them to right ballpark 2nd Phase: Fine-tuning Supervised Requires labeled data (limited quantity) Back propagate gradients of chosen error function Moves parameters to local minimum

Greedy pre-training (Unsupervised) Layer 1 512 w1 512 Input Gist vector (512 real dimensions)

Greedy pre-training (Unsupervised) Layer 2 256 w2 512 Activations of hidden units from layer 1 (512 binary dimensions)

Greedy pre-training (Unsupervised) Layer 3 N w3 256 Activations of hidden units from layer 2 (256 binary dimensions)

Fine-tuning: back-propagation of Neighborhood Components Analysis objective Output binary code (N dimensions) w1 + ∆ w1 w2 + ∆ w2 w3 + ∆w3 Layer 3 N w3 256 Layer 2 256 w2 512 Layer 1 512 w1 512 Input Gist vector (512 real dimensions)

Neighborhood Components Analysis Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004 Tries to preserve neighborhood structure of input space Assumes this structure is given (will explain later) Toy example with 2 classes & N=2 units at top of network: Points in output space (coordinate is activation probability of unit)

Neighborhood Components Analysis Adjust network parameters (weights and biases) to move: Points of SAME class closer Points of DIFFERENT class away

Neighborhood Components Analysis Adjust network parameters (weights and biases) to move: Points of SAME class closer Points of DIFFERENT class away Points close in input space (Gist) will be close in output code space

Simple Binarization Strategy Deliberately add noise Simple Binarization Strategy Set threshold - e.g. use median 1 1

Overall Query Scheme RBM <10μs Image 1 Binary code Retrieved images Semantic Hash Query Image Gist descriptor Compute Gist ~1ms (in Matlab)

Retrieval Experiments

Test set 1: LabelMe 22,000 images (20,000 train | 2,000 test) Ground truth segmentations for all Can define ground truth distance btw. images using these segmentations

Defining ground truth Boosting and NCA back-propagation require ground truth distance between images Define this using labeled images from LabelMe

Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)

Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005) Car Sky Tree Road Building Car Tree Road Building Car Sky Tree Road Building Car Tree Road Building Car Sky Tree Road Building Car Tree Road Building Varying spatial resolution to capture approximate spatial correspondance

Examples of LabelMe retrieval 12 closest neighbors under different distance metrics

LabelMe Retrieval Size of retrieval set % of 50 true neighbors in retrieval set 0 2,000 10,000 20,0000 Size of retrieval set

LabelMe Retrieval Number of bits Size of retrieval set % of 50 true neighbors in retrieval set % of 50 true neighbors in first 500 retrieved 0 2,000 10,000 20,0000 Number of bits Size of retrieval set

Test set 2: Web images 12.9 million images Collected from Internet No labels, so use Euclidean distance between Gist vectors as ground truth distance

Web images retrieval Size of retrieval set % of 50 true neighbors in retrieval set Size of retrieval set

Web images retrieval Size of retrieval set Size of retrieval set % of 50 true neighbors in retrieval set % of 50 true neighbors in retrieval set Size of retrieval set Size of retrieval set

Examples of Web retrieval 12 neighbors using different distance metrics

Retrieval Timings

Summary Explored various approaches to learning binary codes for hashing-based retrieval Very quick with performance comparable to complex descriptors More recent work on binarization Spectral Hashing (Weiss, Torralba, Fergus NIPS 2009)