Query Specific Fusion for Image Retrieval

Slides:

Advertisements

Similar presentations

Image Retrieval with Geometry-Preserving Visual Phrases

Advertisements

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Aggregating local image descriptors into compact codes

Automatic Image Annotation Using Group Sparsity

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Presented by Xinyu Chang

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Special Topic on Image Retrieval Local Feature Matching Verification.

Discriminative and generative methods for bags of features

Multimedia Indexing and Retrieval Kowshik Shashank Project Advisor: Dr. C.V. Jawahar.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.

Large-scale matching CSE P 576 Larry Zitnick

Bag of Features Approach: recent work, using geometric information.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.

Object retrieval with large vocabularies and fast spatial matching

An efficient and effective region-based image retrieval framework Reporter: Francis 2005/5/12.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Presented by Zeehasham Rasheed

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.

Efficient Image Search and Retrieval using Compact Binary Codes

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Indexing Techniques Mei-Chen Yeh.

Project 4 Image Search based on BoW model with Inverted File System

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.

Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,

Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Cross-modal Hashing Through Ranking Subspace Learning

Efficient Image Classification on Vertically Decomposed Data

Nonparametric Semantic Segmentation

Video Google: Text Retrieval Approach to Object Matching in Videos

Multimedia Information Retrieval

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Efficient Image Classification on Vertically Decomposed Data

Rob Fergus Computer Vision

Feature Selection for Ranking

Video Google: Text Retrieval Approach to Object Matching in Videos

Presentation transcript:

Query Specific Fusion for Image Retrieval Shaoting Zhang, Ming Yang NEC Laboratories, America

Outline Overview of image retrieval/search Query specific fusion Basic paradigm Local features indexed by vocabulary trees Global features indexed by compact hash codes Query specific fusion Graph construction Graph fusion Graph-based ranking Experiments

Content-based Image retrieval/search Scalability !!! Computational efficiency Memory consumption Retrieval accuracy Features Data base Images Hashing codes feature extraction hashing indexing Inverted indexing Query image search Rank list re-rank Offline Online

Local Features Indexed by Vocabulary Trees Features: SIFT features Hashing: visual word IDs by hierarchical K-means. Indexing: vocabulary trees Search: voting and sorting An example: ~1K SIFT features per image 10^6~1M leaf nodes in the tree Query time: ~100-200ms for 1M images in the database Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR’06 Scalable Recognition with a Vocabulary Tree D. Nister and H. Stewenius, CVPR’06

Global Features Indexed by Compact Hash Codes Feature: GIST, RGB or HSV histograms, etc. Hashing: compact binary codes, e.g., PCA+rotation+binarization. Indexing: a flat storage with/out inverted indexes Search: exhaustive search with Hamming distances + re-ranking An example: GIST -> PCA -> Binarization 960 floats -> 256 floats -> 256 bits (217 times smaller). Query time: 50-100ms, search 1M images using Hamming dist Modeling the Shape of the Scene: A Holistic Representation A. Oliva and A. Torralba, IJCV’01 C implementation for Lear (INRIA): http://lear.inrialpes.fr/src/lear_gist-1.1.tgz Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08 Modeling the Shape of the Scene: A Holistic Representation, A. Oliva and A. Torralba, IJCV’01 Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08 Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR’11

Motivation Pros and Cons Can we combine or fuse these two approaches? Improve the retrieval precision No sacrifice of the efficiency Early fusion (feature level)? Late fusion (rank list level)? Retrieval speed Memory usage Retrieval precision applications Image properties to attend to Local feat. fast high near duplicate local patterns Global feat. faster low general images global statistics Web image search Near duplicate image retrieval Vocabulary trees of local features Hashing global features

Challenges The features and algorithms are dramatically different. Hard for the feature-level fusion Hard for the rank aggregation The fusion is query specific and database dependent Hard to learn how to combine cross different datasets No supervision and relevance feedback! Hard to evaluate the retrieval quality online

Query Specific Fusion How to evaluate online the quality of retrieve results from methods using local or global features? Assumption: The consensus degree among top candidate images reveal the retrieval quality The consistency of top candidates’ nearest neighborhoods. A graph-based approach to fusing and re-ranking retrieval results of different methods.

Graph Construction Construct a weighted undirected graph to represent a set of retrieval results of a query image q. Given the query q, image database D, a similarity function S(.,.), top-k neighborhood . Edge: the reciprocal neighbor relation Edge weight: the Jaccard similarity between neighborhoods

Graph Fusion Fuse multiple graphs to one graph Union of the nodes/edges and sum of the weights

Graph-based Ranking Ranking by a local Page Rank Perform a link analysis on G Rank the nodes by their connectivity in G Ranking by maximizing weighted density

Experiments Datasets: 4 public benchmark datasets Baseline methods UKBench : 2,550*4=10200 images (k=5) Corel-5K : 50*100 = 5000 images (k=15) Holidays : 1491 images in 500 groups (k=5) SFLandmark : 1.06M PCI and 638K PFI images (k=30) Baseline methods Local features: VOC (contextual weighting, ICCV11) Global features: GIST (960D=>256bits), HSV (2000D=>256bits) Rank aggregation A fusion method based on an SVM classifier Nearest neighbors are stored offline for the database

UKBench Evaluation: 4 x recall at the first four returned images, referred as N-S score (maximum = 4). Scalable Recognition with a Vocabulary Tree, D. Nister and H. Stewenius, CVPR’06

Corel-5K Corel 5K: 50 categories, each category has 100 images. Average top-1 precision for leave-one-out retrievals.

Holidays Evaluation: mAP (%) for 1491 queries.

San Francisco Landmark Database images: Perspective central images (PCIs): 1.07M Perspective frontal images (PFIs): 638K. Query images: 803 image taken with a smart phone Evaluation: The recall rate in terms of buildings

San Francisco Landmark The fusion is applied to the top-50 candidates given by VOC.

Computation and Memory Cost The average query time Memory cost 340MB extra storage for the top-50 nearest neighbor for 1.7M images in the SFLandmark.

Sample Query Results (1) In the UKbench

Sample Query Results (2) In the Corel-5K

Sample Query Results (3) In the SFlankmark

Conclusions A graph-based query specific fusion of retrieval sets based on local and global features Requires no supervision Retains the efficiency of both methods Improves the retrieval precision consistently on 4 datasets Easy to be reproduced by other motivated researchers Limitations No reciprocal neighbor for certain queries in either methods Dynamical insertion or removal of database images