Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Slides:



Advertisements
Similar presentations
1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.
Advertisements

Image Retrieval with Geometry-Preserving Visual Phrases
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Aggregating local image descriptors into compact codes
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Three things everyone should know to improve object retrieval
Presented by Xinyu Chang
Content-Based Image Retrieval
Query Specific Fusion for Image Retrieval
Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Special Topic on Image Retrieval Local Feature Matching Verification.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Large-scale matching CSE P 576 Larry Zitnick
Bag of Features Approach: recent work, using geometric information.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.
Object retrieval with large vocabularies and fast spatial matching
Compression Word document: 1 page is about 2 to 4kB Raster Image of 1 page at 600 dpi is about 35MB Compression Ratio, CR =, where is the number of bits.
Lecture 28: Bag-of-words models
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Bag-of-features models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Keypoint-based Recognition and Object Search
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,
Indexing Techniques Mei-Chen Yeh.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Project 4 Image Search based on BoW model with Inverted File System
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Special Topic on Image Retrieval
Near Duplicate Image Detection: min-Hash and tf-idf weighting
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.
Chapter 6: Information Retrieval and Web Search
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
CS654: Digital Image Analysis
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Why indexing? For efficient searching of a document
Indexing & querying text
Video Google: Text Retrieval Approach to Object Matching in Videos
Mixtures of Gaussians and Advanced Feature Encoding
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Rob Fergus Computer Vision
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA

Introduction Proposed method Experiments Conclusion

Introduction Proposed method Experiments Conclusion

Bag-of-features Extracting local image descriptors Clustering of the descriptors & k-means quantizer(visual words) The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm Roducing a frequency vector fi of length k

TF–IDF weighting

TF–IDF weighting tf – 100 vocabularies in a document, ‘a’ 3 times – 0.03 (3/100) idf – 1,000 documents have ‘a’, total number of documents 10,000,000 – 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)

Binary BOF[12] discard the information about the exact number of occurrences of a given visual word in the image. Binary BOF vector components only indicates the presence or not of a particular visual word in the image. A sequential coding using 1 bit per component, ⌈ k/8 ⌉ bytes per image, the memory usage per image would be typically 10 kB per image [12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.

Binary BOF(Holidays dataset)

Inverted-file index(Sparsity) Documents – T 0 = "it is what it is" – T 1 = "what is it" – T 2 = "it is a banana" Index – "a": {2} – "banana": {2} – "is": {0, 1, 2} – "it": {0, 1, 2} – "what": {0, 1}

Binary BOF

Compressed inverted file Compression can close to the vector entropy Compared with a standard inverted file, about 4 times more images can be indexed using the same amount of memory This may compensate the decoding cost of the decompression algorithm [16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.

Introduction Proposed method Experiments Conclusion

MiniBOFs

Projection of a BOF Sparse projection matices – – d: dimension of the output descriptor – k: dimension of the input BOF For each matrix row, the number of non-zero components is, typically set nz = 8 for k = 1000, resulting in d = 125

Projection of a BOF The other matrices are defined by random permutations. – For k = 12 and d = 3, the random permutation (11, 2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3) Image i, m mini-BOFs –, ( )

Indexing structure Quantization – The miniBOF is quantized by associated with matrix,, where is the number of codebook entries of the indexing structure. – The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose. We typically set =

Indexing structure Binary signature generation – The miniBOF is projected using a random rotation matrix R, producing d components – Each bit of the vector is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset

Quantizing cells [4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.

Indexing structure miniBOF associated with image i is represented by the tuple total memory usage per image is bytes

Multi-probe strategy retrieving not only the inverted list associated with the quantized index, but the set of inverted lists associated with the closest t centroids of the quantizer codebook T times image hits

Fusion Query signature Database signature

Fusion – equal to 0 for images having no observed binary signatures – equal to if the database image i is the query image itself

Fusion

Introduction Proposed method Experiments Conclusion

Dataset Two annotated Dataset – INRIA Holidays dataset [4] – University of Ken-tucky recognition benchmark [9] Distractor dataset – one million images downloaded from Flickr, Flickr1M Learning parameters – Flickr1M ∗

Detail Descriptor extraction – Resize to a maximum of pixels – Performed a slight intensity normalization – SIFT Evaluation – – mAP – Memory – Image hits Parameters # Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.

mAP Mean average precision EX: – two images A&B – A has 4 duplicate images – B has 5 duplicate images – Retrieval rank A: 1, 2, 4, 7 – Retrieval rank B: 1, 3, 5 – Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 – Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 – mAP= ( )/2=0.64

Table 1(Holidays) # The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF

Table 2(Kentucky)

Table 3(Holidays+Flickr1M)

Figure(Holidays+Flickr1M) # Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.

Sample

Introduction Proposed method Experiments Conclusion

This paper have introduced a way of packing BOFs:miniBOFs – An efficient indexing structure for rapid access and an expected distance criterion for the fusion of the scores – Reduces memory usage – Reduces the quantity of memory scanned (hits) – Reduces query time