Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Information Retrieval in Practice
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Aggregating local image descriptors into compact codes
Chapter 5: Introduction to Information Retrieval
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Presented by Xinyu Chang
Content-Based Image Retrieval
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.
Query Specific Fusion for Image Retrieval
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Empowering visual categorization with the GPU Present by 陳群元 我是強壯 !
Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.
Learning for Text Categorization
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Large-scale matching CSE P 576 Larry Zitnick
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Object retrieval with large vocabularies and fast spatial matching
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Compression Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR =, where is the number of bits.
Lecture 28: Bag-of-words models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Video Fingerprinting: Features for Duplicate and Similar Video Detection and Query- based Video Retrieval Anindya Sarkar, Pratim Ghosh, Emily Moxley and.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
(Fri) Young Ki Baik Computer Vision Lab.
VEHICLE NUMBER PLATE RECOGNITION SYSTEM. Information and constraints Character recognition using moments. Character recognition using OCR. Signature.
CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,
Indexing Techniques Mei-Chen Yeh.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
Project 2 SIFT Matching by Hierarchical K-means Quantization
Project 4 Image Search based on BoW model with Inverted File System
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Special Topic on Image Retrieval
Near Duplicate Image Detection: min-Hash and tf-idf weighting
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Community Architectures for Network Information Systems
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
CS654: Digital Image Analysis
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Project 3 SIFT Matching by Binary SIFT
January, 7th 2011Simon Giraudot & Ugo Martin - MoSIG - Machine learning1/1/ Large-Scale Image Retrieval with Compressed Fisher Vectors Presentation of.
Another Example: Circle Detection
Indexing & querying text
Video Google: Text Retrieval Approach to Object Matching in Videos
Mixtures of Gaussians and Advanced Feature Encoding
Design open relay based DNS blacklist system
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA LEAR (Learning and Recognition in Vision)

Introduction Background Structure & Approach Experiments Conclusion

One of the main limitations of image search based on bag-of-feature is the memory usage per image Provide a method which Reduce the memory usage and faster than standard bag-of -feature

1. Extract local image descriptors (features) (Hessian-Affine detector [8] & SIFT descriptor [6]) 2. Learn “visual vocabulary” (Clustering of the descriptor) 3.Quantize features using visual vocabulary (K-means quantizer) 4. Represent images by frequencies of “visual words” histogram of visual word occurrence is weighted using the tf-idf. normalized with the L2 norm Producing a frequency vector f i of length k

tf = ( 3/100 ) 100 vocabularies in a document, ‘a’ 3 times idf = ( log (10,000,000/1,000) ) 1,000 documents have ‘a’, total number of documents 10,000,000 if-idf = ( 0.03 * ) Visual word (feature)image

INRIA Holidays dataset [4] University of Kentucky recognition benchmark [9] Flickr1M & Flickr1M*

Discard the information about the exact number of occurrences of a given visual word in the image Binary BOF components only indicates the presence or absence of a particular visual word in the image A sequential coding using 1 bit per component. The memory usage is, then, ┌ k/8 ┐ byte per image. The memory usage per image would be typically 10kB per image [12] J.SivicandA.Zisserman. Video Google: A text retrieval Approach to object matching in videos.In ICCV,pages1470–1477,2003.

[16]J.ZobelandA.Moffat. Inverted files for text search engines. ACM Computing Surveys,38(2):6,2006. Compared with a standard inverted file, about 4 times more image scan be indexed using the same amount of memory. The amount of memory to be read is proportionally reduced at query time. This may compensate the decoding cost of the decompression algorithm.

1) Producing 2) Indexing 3) Fusing Performed a query image

Sparse projection matrices A = {A1,...,Am} of sizes d × k d = dimension of the output descriptor k = dimension of the initial BOF For each projection vector (a matrix row ), the number of non-zero components is nz = k/d. Typically set nz = 8 for k=1000,resulting d = 125

The other aggregators are defined by shuffling the input BOF vector components using random permutation. For k =12, d=3 the random permutation (11,2,12,8,9,4,10,1,7,5,6,3 Image i, m miniBOFs ωi,j, 1 ≤ j ≤ m fi = BOF frequency vector

Quantization k’ = number of codebook entries of the indexing structure The set of k-means codebooks qj(.), 1<= j <= m, is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The miniBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose k ≠ k’. Typically set k’ = 20000

Binary signature generation b i,j Length of d, refine the localization of the miniBOF within the cell Using the method of [4] The miniBOF is projected using a random rotation matrix R, producing d components Each bit of the vector bi,j is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset.

j th miniBOF associated with image i is represented by the tuple 4 bytes to store the image identifier i ┌ d/8 ┐ byte to store the binary vector b i,j Total memory usage pre image is C i = m*( 4 + ┌ d/8 ┐ )

Multi-probe strategy [7] Retrieving not only the inverted list associated with the quantized index c i,j, but the set of inverted lists associated with the closest t centroids of the quantizer codebook It increases the number of image hits because t times more inverted lists are visited

b i,j The signature associated with the query image q b q,j The signature of the database image I b q = [ b q,1,...,b q,m ] b i = [ b i,1,...,b i,m ] h(x,y) represents the Hamming distance

equal to 0 for images having no observed binary signature equal to d * m/2 if the database image I is the query image itself The query speed improved by a threshold on the Hamming distance, we use τ = d/2

Used the following parameters in all the miniBOF experiments On University of Kentucky object recognition benchmark On Holidays + Flickr1M

On Holiday + Flickr1M

This paper we have introduced a way of packing BOFs : miniBOFs An efficient indexing structure based on Hamming Embedding allows for rapid access and an expected distance criterion for the fusion of the scores. It Reduces Memory usage the quantity of memory scanned (hits) query time