Bag of Features Approach: recent work, using geometric information
Problem Search for object occurrences in very large image collection
2 sub problems Object Category Recognition and Specific Object Recognition
Motivation Look for product information Look for similar products
Related work on large scale image search Most systems build upon the BoF framework [Sivic & Zisserman 03] – Large (hierarchical) vocabularies [Nister Stewenius 06] – Improved descriptor representation [Jégou et al 08, Philbin et al 08] – Geometry used in index [Jégou et al 08, Perdoc’h et al 09] – Query expansion [Chum et al 07] – … Efficiency improved by: – Min-hash and Geometrical min-hash [Chum et al ] – Compressing the BoF representation [Jégou et al. 09]
Local Features - SIFT
Creating a visual vocabulary 12 34
Inverted Index Index construction Searching
Use geometry Possible directions: – Change/optimize spatial verification stage – Insert a new geometric information to the index Ordered BOF Bundled features Visual phrases – Change the searching algorithm
Survey for today Spatial Bag-of-features [Cao, CVPR2010] Image Retrieval with Geometry-Preserving Visual Phrases [Zhang Jia Chen, CVPR2011] Smooth Object Retrieval using a Bag of Boundaries [Arandjelovi Zisserman, ICCV2011]
Spatial BOF Basic idea:
Spatial BOF Constructing linear and circular ordered bag- of-features:
Spatial BOF Translation invariance:
Spatial BOF Pros: – Gets better performance than BOF+RANSAC for large scale dataset* – Same format as standard BOF Cons: – Is dataset dependent because of need of training Do not present the results for large scale dataset with transfer learning from another dataset Future work – Check it with cross training for large dataset. Otherwise, it is not worth working further.
Geometry-Preserving Visual Phrases Basic idea:
Geometry-Preserving Visual Phrases Representation – Quantize image to 10x10 grid – Histogram of GVPs of length k – GVP dictionary size is “choose k from N visual words”
Geometry-Preserving Visual Phrases Pros: – Outperforms BOV + RANSAC Cons: – Only translation invariant because of memory Future work
BOF for smooth objects Idea: The information used for retrieval Query object Segment Gradient
BOF for smooth objects Results:
BOF for smooth objects Segmentation phase Over segmentation with super-pixels Classification of super-pixels: 3208 feature vector (median(Mag(Grad)), 4 bits, color histogram, BOF) SVM Post-processing
BOF for smooth objects Boundary description phase: Sample points on the boundary Calculate HoG at each point in 3 scales 340 dimensional L2 normalized vector * The descriptor is not rotation invariant
BOF for smooth objects Retrieval procedure: Boundary descripors are quantized (k=10k) Standard BOF scheme* Spatial verification for top 200 with loose affine homography (errors up to 100pixs) * No spatial information is recorded in the histogram
BOF for smooth objects Pros: – Solves the smooth object retrieval problem – Fast Cons: – Is dataset dependent because of need of training – Limited to objects with “solid” materials – segmentation has to catch the object’s boundary Future work – Eliminate the training step
Summary There is an active research in the field of CBIR to exploit geometry information. Each method with its limitations Still no widely accepted solution – Like spatial verification with RANSAC