Presentation on theme: "Query Specific Fusion for Image Retrieval"— Presentation transcript:
1 Query Specific Fusion for Image Retrieval Shaoting Zhang, Ming YangNEC Laboratories, America
2 Outline Overview of image retrieval/search Query specific fusion Basic paradigmLocal features indexed by vocabulary treesGlobal features indexed by compact hash codesQuery specific fusionGraph constructionGraph fusionGraph-based rankingExperiments
4 Local Features Indexed by Vocabulary Trees Features: SIFT featuresHashing: visual word IDs by hierarchical K-means.Indexing: vocabulary treesSearch: voting and sortingAn example:~1K SIFT features per image10^6~1M leaf nodes in the treeQuery time: ~ ms for 1M images in the databaseScalable Recognition with a Vocabulary Tree,D. Nister and H. Stewenius, CVPR’06Scalable Recognition with a Vocabulary TreeD. Nister and H. Stewenius, CVPR’06
5 Global Features Indexed by Compact Hash Codes Feature: GIST, RGB or HSV histograms, etc.Hashing: compact binary codes, e.g., PCA+rotation+binarization.Indexing: a flat storage with/out inverted indexesSearch: exhaustive search with Hamming distances + re-rankingAn example:GIST -> PCA -> Binarization960 floats -> 256 floats -> 256 bits (217 times smaller).Query time: ms, search 1M images using Hamming distModeling the Shape of the Scene: A Holistic Representation A. Oliva and A. Torralba, IJCV’01C implementation for Lear (INRIA):Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08Modeling the Shape of the Scene: A Holistic Representation, A. Oliva and A. Torralba, IJCV’01Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR’11
6 Motivation Pros and Cons Can we combine or fuse these two approaches? Improve the retrieval precisionNo sacrifice of the efficiencyEarly fusion (feature level)?Late fusion (rank list level)?Retrieval speedMemory usageRetrieval precisionapplicationsImage properties to attend toLocal feat.fasthighnear duplicatelocal patternsGlobal feat.fasterlowgeneral imagesglobal statisticsWeb image searchNear duplicate image retrievalVocabulary trees of local featuresHashing global features
7 Challenges The features and algorithms are dramatically different. Hard for the feature-level fusionHard for the rank aggregationThe fusion is query specific and database dependentHard to learn how to combine cross different datasetsNo supervision and relevance feedback!Hard to evaluate the retrieval quality online
8 Query Specific FusionHow to evaluate online the quality of retrieve results from methods using local or global features?Assumption: The consensus degree among top candidate images reveal the retrieval qualityThe consistency of top candidates’ nearest neighborhoods.A graph-based approach to fusing and re-ranking retrieval results of different methods.
9 Graph ConstructionConstruct a weighted undirected graph to represent a set of retrieval results of a query image q.Given the query q, image database D, a similarity function S(.,.), top-k neighborhoodEdge: the reciprocal neighbor relationEdge weight: the Jaccard similarity between neighborhoods
10 Graph Fusion Fuse multiple graphs to one graph Union of the nodes/edges and sum of the weights
11 Graph-based Ranking Ranking by a local Page Rank Perform a link analysis on GRank the nodes by their connectivity in GRanking by maximizing weighted density
12 Experiments Datasets: 4 public benchmark datasets Baseline methods UKBench : 2,550*4=10200 images (k=5)Corel-5K : 50*100 = 5000 images (k=15)Holidays : 1491 images in 500 groups (k=5)SFLandmark : 1.06M PCI and 638K PFI images (k=30)Baseline methodsLocal features: VOC (contextual weighting, ICCV11)Global features: GIST (960D=>256bits), HSV (2000D=>256bits)Rank aggregationA fusion method based on an SVM classifierNearest neighbors are stored offline for the database
13 UKBenchEvaluation: 4 x recall at the first four returned images, referred as N-S score (maximum = 4).Scalable Recognition with a Vocabulary Tree,D. Nister and H. Stewenius, CVPR’06
14 Corel-5KCorel 5K: 50 categories, each category has 100 images. Average top-1 precision for leave-one-out retrievals.
16 San Francisco Landmark Database images:Perspective central images (PCIs): 1.07MPerspective frontal images (PFIs): 638K.Query images: 803 image taken with a smart phoneEvaluation: The recall rate in terms of buildings
17 San Francisco Landmark The fusion is applied to the top-50 candidates given by VOC.
18 Computation and Memory Cost The average query timeMemory cost340MB extra storage for the top-50 nearest neighbor for 1.7M images in the SFLandmark.
22 ConclusionsA graph-based query specific fusion of retrieval sets based on local and global featuresRequires no supervisionRetains the efficiency of both methodsImproves the retrieval precision consistently on 4 datasetsEasy to be reproduced by other motivated researchersLimitationsNo reciprocal neighbor for certain queries in either methodsDynamical insertion or removal of database images