Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Specific Fusion for Image Retrieval

Similar presentations

Presentation on theme: "Query Specific Fusion for Image Retrieval"— Presentation transcript:

1 Query Specific Fusion for Image Retrieval
Shaoting Zhang, Ming Yang NEC Laboratories, America

2 Outline Overview of image retrieval/search Query specific fusion
Basic paradigm Local features indexed by vocabulary trees Global features indexed by compact hash codes Query specific fusion Graph construction Graph fusion Graph-based ranking Experiments

3 Content-based Image retrieval/search
Scalability !!! Computational efficiency Memory consumption Retrieval accuracy Features Data base Images Hashing codes feature extraction hashing indexing Inverted indexing Query image search Rank list re-rank Offline Online

4 Local Features Indexed by Vocabulary Trees
Features: SIFT features Hashing: visual word IDs by hierarchical K-means. Indexing: vocabulary trees Search: voting and sorting An example: ~1K SIFT features per image 10^6~1M leaf nodes in the tree Query time: ~ ms for 1M images in the database Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR’06 Scalable Recognition with a Vocabulary Tree D. Nister and H. Stewenius, CVPR’06

5 Global Features Indexed by Compact Hash Codes
Feature: GIST, RGB or HSV histograms, etc. Hashing: compact binary codes, e.g., PCA+rotation+binarization. Indexing: a flat storage with/out inverted indexes Search: exhaustive search with Hamming distances + re-ranking An example: GIST -> PCA -> Binarization 960 floats -> 256 floats -> 256 bits (217 times smaller). Query time: ms, search 1M images using Hamming dist Modeling the Shape of the Scene: A Holistic Representation A. Oliva and A. Torralba, IJCV’01 C implementation for Lear (INRIA): Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08 Modeling the Shape of the Scene: A Holistic Representation, A. Oliva and A. Torralba, IJCV’01 Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08 Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR’11

6 Motivation Pros and Cons Can we combine or fuse these two approaches?
Improve the retrieval precision No sacrifice of the efficiency Early fusion (feature level)? Late fusion (rank list level)? Retrieval speed Memory usage Retrieval precision applications Image properties to attend to Local feat. fast high near duplicate local patterns Global feat. faster low general images global statistics Web image search Near duplicate image retrieval Vocabulary trees of local features Hashing global features

7 Challenges The features and algorithms are dramatically different.
Hard for the feature-level fusion Hard for the rank aggregation The fusion is query specific and database dependent Hard to learn how to combine cross different datasets No supervision and relevance feedback! Hard to evaluate the retrieval quality online

8 Query Specific Fusion How to evaluate online the quality of retrieve results from methods using local or global features? Assumption: The consensus degree among top candidate images reveal the retrieval quality The consistency of top candidates’ nearest neighborhoods. A graph-based approach to fusing and re-ranking retrieval results of different methods.

9 Graph Construction Construct a weighted undirected graph to represent a set of retrieval results of a query image q. Given the query q, image database D, a similarity function S(.,.), top-k neighborhood Edge: the reciprocal neighbor relation Edge weight: the Jaccard similarity between neighborhoods

10 Graph Fusion Fuse multiple graphs to one graph
Union of the nodes/edges and sum of the weights

11 Graph-based Ranking Ranking by a local Page Rank
Perform a link analysis on G Rank the nodes by their connectivity in G Ranking by maximizing weighted density

12 Experiments Datasets: 4 public benchmark datasets Baseline methods
UKBench : 2,550*4=10200 images (k=5) Corel-5K : 50*100 = 5000 images (k=15) Holidays : 1491 images in 500 groups (k=5) SFLandmark : 1.06M PCI and 638K PFI images (k=30) Baseline methods Local features: VOC (contextual weighting, ICCV11) Global features: GIST (960D=>256bits), HSV (2000D=>256bits) Rank aggregation A fusion method based on an SVM classifier Nearest neighbors are stored offline for the database

13 UKBench Evaluation: 4 x recall at the first four returned images, referred as N-S score (maximum = 4). Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR’06

14 Corel-5K Corel 5K: 50 categories, each category has 100 images. Average top-1 precision for leave-one-out retrievals.

15 Holidays Evaluation: mAP (%) for 1491 queries.

16 San Francisco Landmark
Database images: Perspective central images (PCIs): 1.07M Perspective frontal images (PFIs): 638K. Query images: 803 image taken with a smart phone Evaluation: The recall rate in terms of buildings

17 San Francisco Landmark
The fusion is applied to the top-50 candidates given by VOC.

18 Computation and Memory Cost
The average query time Memory cost 340MB extra storage for the top-50 nearest neighbor for 1.7M images in the SFLandmark.

19 Sample Query Results (1)
In the UKbench

20 Sample Query Results (2)
In the Corel-5K

21 Sample Query Results (3)
In the SFlankmark

22 Conclusions A graph-based query specific fusion of retrieval sets based on local and global features Requires no supervision Retains the efficiency of both methods Improves the retrieval precision consistently on 4 datasets Easy to be reproduced by other motivated researchers Limitations No reciprocal neighbor for certain queries in either methods Dynamical insertion or removal of database images

Download ppt "Query Specific Fusion for Image Retrieval"

Similar presentations

Ads by Google