Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

Slides:



Advertisements
Similar presentations
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
A Unified Framework for Context Assisted Face Clustering
Aggregating local image descriptors into compact codes
Three things everyone should know to improve object retrieval
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Tour the World: building a web-scale landmark recognition engine ICCV 2009 Yan-Tao Zheng1, Ming Zhao2, Yang Song2, Hartwig Adam2 Ulrich Buddemeier2, Alessandro.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Object retrieval with large vocabularies and fast spatial matching
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Sparse Solutions for Large Scale Kernel Machines Taher Dameh CMPT820-Multimedia Systems Dec 2 nd, 2010.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Presented by Zeehasham Rasheed
Linear Solution to Scale and Rotation Invariant Object Matching Professor: 王聖智 教授 Student : 周 節.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Non Negative Matrix Factorization
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Locality-constrained Linear Coding for Image Classification
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih-Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Unsupervised Streaming Feature Selection in Social Media
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
Compact Query Term Selection Using Topically Related Text
Automatic Segmentation of Data Sequences
Restructuring Sparse High Dimensional Data for Effective Retrieval
Scalable light field coding using weighted binary images
Presentation transcript:

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang 1, and Winston H. Hsu 1 1 National Taiwan University and 2 Academia Sinica, Taipei, Taiwan CVPR 2011

Outline Introduction Key Observations -- the problem of BoW model Graph construction and Image Clustering Semantic Visual Features Propagation Common Visual Words Selection Solution & Optimization –Gradient Descent Solver –Analytic Solver Experiment and Result Conclusion & Future Work

Query image: Introduction It is a challenging problem because target may cover only small region Image object retrieval – retrieving images containing the target image object – is one of the key techniques of managing the exponentially growing image/video collections Result:

Although BoW is popular and shown effective for image object retrieval [14] BoW-like methods fail to address issues related to: ☻ Noisily quantized visual features ☻ Vast variations in viewpoints ☻ Lighting conditions ☻ Occlusions. Thus it suffers from low recall rate Introduction

Traditional BoW v.s. Proposed

The contribution of this paper: ☺ Observing problems (Two) in large-scale image object retrieval by conventional BoW model ☺ Proposing auxiliary visual words(AVW) discovery through visual and textual clusters in unsupervised and scalable fashion ☺ Investigate variant optimization methods for efficiency and accuracy in AVW discovery ☺ Conducting experiments on consumer photos and show improvement recall rate for image object retrieval Introduction

Prob1. Sparseness of the Visual words Total 540,321 images in Flickr 550 dataset – Half of VWs only occur in less than 0.11% (57 images) – Most (96%) VWs occur for about 0.5% (2702 images) Those similar images will have very “few common VWs” This is known as the uniqueness of VWs [2] Partly due to some quantization errors or noisy features

Prob.2 Lacking Semantics Related Feature

Graph construction and Image Clustering Image clustering is based on graph construction Images are represented by 1M VWs and 90K Text tokens by Google snippets from associated tags Construct large-scale image graph by MapReduce [4] Algorithm (large scale calculation)

Graph construction and Image Clustering To cluster images on the image graph, we apply Affinity Propagation (AP) [5] AP’s advantage: – Automatic determining the number of clusters – Automatic canonical image detection within each cluster

Graph construction and Image Clustering Apply Affinity Propagation algorithm for both textual and visual relation

Semantic Visual Features Propagation Conduct the propagation on each extend visual cluster (Fig. b) If there is a single image in visual cluster (Fig. b, point H), it can also obtain AVWs in extend visual cluster We have VW histograms X and propagation matrix P is unknown (X i is VW combination of image i)

Semantic Visual Features Propagation Propose to formulate propagation as First term: avoid propagating too many VWs Second term: keep similarity to original propagation matrix Frobenius norm (Euclidean) norm

Common Visual Words Selection

Let X be VW combinations, S be selection matrix (unknown) Propose to formulate selection as First term: avoid too many distortions from original features Second term: reduce number of selected features

Finding Solutions Stack columns of P to a vector p=vec(P) P0=vec(P0) Replace vec(PX) with (X T I M )p is Kronecker product Propagation function becomes X X

Kronecker product

Optimization The first term of (5) is positive semi-definite The second term of (5) is positive finite because α2 > 0 So propagation function has unique optimal solution Same for selection function

Optimization The two equations are strictly convex quadratic programming problems Able to use quadratic programming solver to find optimal solutions Two solvers are used for evaluation: – Gradient Descent Solver – Analytic Solver

Gradient Descent Solver Updates p by η is called learning rate It’s time consuming calculating Rearrange function by and get

Gradient Descent Solver Finally, get The initial P is P 0 Do similar job for selection formula, get But with initial S to zero matrix

Analytic Solver The optimal solution should satisfy From eq(4) can be represented by where H is positive definite Hessian matrix, so and back to matrix form,

Analytic Solver Similarly, S can be solved by by using inverse function the S can represented by X T X is 1Mx1M, but XX T is smaller (time saving)

Experiments Uses Flickr 550 as main dataset Select 56 query images (1282 ground truths) Pick images from Flickr 550 to form a smaller subset called Flickr 11k

Experiments Uses Mean Average Precision (MAP) over all queries to evaluate performance Apply query expansion technique of pseudo- relevance-feedback (PRF) Take L1 distance as baseline for BoW model The MAP baseline is with 22M feature points MAP after PRF is 0.297

Result and Discussions The MAP of AVW results with the best iteration number and PRF in Flickr11K with totally 22M (SIFT) feature points. Note that the MAP of the baseline BoW model [14] is and after PRF is (+21.2%). #F represents the total number of features retained; M is short for million. % indicates the relative MAP gain over the Bow baseline

Result and Discussions 1.Propagation then selection 2.Selection then propagation Propagation then selection has more accuracy Because: 2 might lose some common VWs before propagation

Result and Discussions We only need one or two iterations to achieve better result – Informative and representative VWs have been propagated or selected in early iteration steps Number of features significantly reduced from 22.2M to 0.3M (1.4%) Using α=β=0.5 Learning Time(s)GDSAS Propagation Selection

Search Result by Auxiliary VWs

Result and Discussions From the figure, α=0.6 should work well

Conclusions & Future Work Conclusions: – Showed problems of current BoW model and needs for semantic visual words to improve recall rate – Formulated process as unsupervised optimization problems – Improve accuracy by 111% relative to BoW model Future Works: – Look for other solvers to maximize accuracy and efficiency

Thank you