Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

Similar presentations


Presentation on theme: "Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang."— Presentation transcript:

1 Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang 1, and Winston H. Hsu 1 1 National Taiwan University and 2 Academia Sinica, Taipei, Taiwan CVPR 2011

2 Outline Introduction Key Observations -- the problem of BoW model Graph construction and Image Clustering Semantic Visual Features Propagation Common Visual Words Selection Solution & Optimization –Gradient Descent Solver –Analytic Solver Experiment and Result Conclusion & Future Work

3 Query image: Introduction It is a challenging problem because target may cover only small region Image object retrieval – retrieving images containing the target image object – is one of the key techniques of managing the exponentially growing image/video collections Result:

4 Although BoW is popular and shown effective for image object retrieval [14] BoW-like methods fail to address issues related to: ☻ Noisily quantized visual features ☻ Vast variations in viewpoints ☻ Lighting conditions ☻ Occlusions. Thus it suffers from low recall rate Introduction

5 Traditional BoW v.s. Proposed

6 The contribution of this paper: ☺ Observing problems (Two) in large-scale image object retrieval by conventional BoW model ☺ Proposing auxiliary visual words(AVW) discovery through visual and textual clusters in unsupervised and scalable fashion ☺ Investigate variant optimization methods for efficiency and accuracy in AVW discovery ☺ Conducting experiments on consumer photos and show improvement recall rate for image object retrieval Introduction

7 Prob1. Sparseness of the Visual words Total 540,321 images in Flickr 550 dataset – Half of VWs only occur in less than 0.11% (57 images) – Most (96%) VWs occur for about 0.5% (2702 images) Those similar images will have very “few common VWs” This is known as the uniqueness of VWs [2] Partly due to some quantization errors or noisy features

8 Prob.2 Lacking Semantics Related Feature

9 Graph construction and Image Clustering Image clustering is based on graph construction Images are represented by 1M VWs and 90K Text tokens by Google snippets from associated tags Construct large-scale image graph by MapReduce [4] Algorithm (large scale calculation)

10 Graph construction and Image Clustering To cluster images on the image graph, we apply Affinity Propagation (AP) [5] AP’s advantage: – Automatic determining the number of clusters – Automatic canonical image detection within each cluster

11 Graph construction and Image Clustering Apply Affinity Propagation algorithm for both textual and visual relation

12 Semantic Visual Features Propagation Conduct the propagation on each extend visual cluster (Fig. b) If there is a single image in visual cluster (Fig. b, point H), it can also obtain AVWs in extend visual cluster We have VW histograms X and propagation matrix P is unknown (X i is VW combination of image i)

13 Semantic Visual Features Propagation Propose to formulate propagation as First term: avoid propagating too many VWs Second term: keep similarity to original propagation matrix Frobenius norm (Euclidean) norm

14 Common Visual Words Selection

15 Let X be VW combinations, S be selection matrix (unknown) Propose to formulate selection as First term: avoid too many distortions from original features Second term: reduce number of selected features

16 Finding Solutions Stack columns of P to a vector p=vec(P) P0=vec(P0) Replace vec(PX) with (X T I M )p is Kronecker product Propagation function becomes X X

17 Kronecker product

18 Optimization The first term of (5) is positive semi-definite The second term of (5) is positive finite because α2 > 0 So propagation function has unique optimal solution Same for selection function

19 Optimization The two equations are strictly convex quadratic programming problems Able to use quadratic programming solver to find optimal solutions Two solvers are used for evaluation: – Gradient Descent Solver – Analytic Solver

20 Gradient Descent Solver Updates p by η is called learning rate It’s time consuming calculating Rearrange function by and get

21 Gradient Descent Solver Finally, get The initial P is P 0 Do similar job for selection formula, get But with initial S to zero matrix

22 Analytic Solver The optimal solution should satisfy From eq(4) can be represented by where H is positive definite Hessian matrix, so and back to matrix form,

23 Analytic Solver Similarly, S can be solved by by using inverse function the S can represented by X T X is 1Mx1M, but XX T is smaller (time saving)

24 Experiments Uses Flickr 550 as main dataset Select 56 query images (1282 ground truths) Pick 10000 images from Flickr 550 to form a smaller subset called Flickr 11k

25 Experiments Uses Mean Average Precision (MAP) over all queries to evaluate performance Apply query expansion technique of pseudo- relevance-feedback (PRF) Take L1 distance as baseline for BoW model The MAP baseline is 0.245 with 22M feature points MAP after PRF is 0.297

26 Result and Discussions The MAP of AVW results with the best iteration number and PRF in Flickr11K with totally 22M (SIFT) feature points. Note that the MAP of the baseline BoW model [14] is 0.245 and after PRF is 0.297 (+21.2%). #F represents the total number of features retained; M is short for million. % indicates the relative MAP gain over the Bow baseline

27 Result and Discussions 1.Propagation then selection 2.Selection then propagation Propagation then selection has more accuracy Because: 2 might lose some common VWs before propagation

28 Result and Discussions We only need one or two iterations to achieve better result – Informative and representative VWs have been propagated or selected in early iteration steps Number of features significantly reduced from 22.2M to 0.3M (1.4%) Using α=β=0.5 Learning Time(s)GDSAS Propagation2720123 Selection1468895

29 Search Result by Auxiliary VWs

30 Result and Discussions From the figure, α=0.6 should work well

31 Conclusions & Future Work Conclusions: – Showed problems of current BoW model and needs for semantic visual words to improve recall rate – Formulated process as unsupervised optimization problems – Improve accuracy by 111% relative to BoW model Future Works: – Look for other solvers to maximize accuracy and efficiency

32 Thank you


Download ppt "Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang."

Similar presentations


Ads by Google