Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological.

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological University & Kodak Research Lab

Motivation Digital cameras and mobile phone cameras popularize rapidly: –More and more personal photos; –Retrieving images from enormous collections of personal photos becomes an important topic. ? How to retrieve?

Previous Work Content-Based Image Retrieval (CBIR) –Users provide images as queries to retrieve personal photos. The paramount challenge -- semantic gap: –The gap between the low-level visual features and the high-level semantic concepts. … Low-level Feature vector Image with high- level concept queryresult … … Feature vectors in DB compare Semantic Gap

A More Natural Way For Consumer Applications Let the user to retrieve the desirable personal photos using textual queries. Image annotation is used to classify images w.r.t. high-level semantic concepts. –Semantic concepts are analogous to the textual terms describing document contents. An intermediate stage for textual query based image retrieval. query Sunset Annotation Result: high-level concepts Annotation Result: high-level conceptsannotate compare …database …result rank

Our Goal Web images are accompanied by tags, categories and titles. … building people, family people, wedding sunset … WebImagesContextualInformation Web Images Consumer Photos Leverage information from web image to retrieve consumer photos in personal photo collection. information No intermediate image annotation process. A real-time textual query based consumer photo retrieval system without any intermediate annotation stage.

When user provides a textual query, Textual Query Classifier Automatic Web Image Retrieval Automatic Web Image Retrieval Large Collection of Web images (with descriptive words) Relevant/ Irrelevant Images WordNet Relevance Feedback Relevance Feedback Refined Top-Ranked Photos Consumer Photo Retrieval Consumer Photo Retrieval Raw Consumer Photos Top-Ranked Consumer Photos It would be used to find relevant/irrelevant images in web image collections. Then, a classifier is trained based on these web images. And then consumer photos can be ranked based on the classifier’s decision value. The user can also gives relevance feedback to refine the retrieval results. System Framework

“boat” Inverted File Inverted File Relevant Web Images Irrelevant Web Images boat ark barge dredgerhouseboat … Semantic Word Trees Based on WordNet For user’s textual query, first search it in the semantic word trees. The web images containing the query word are considered as “relevant web images”. The web images which do not contain the query word and its two-level descendants are considered as “irrelevant web images”. Automatic Web Image Retrieval

Decision Stump Ensemble Train a decision stump on each dimension. Combine them with their training error rates.

Why Decision Stump Ensemble? Main reason: low time cost –Our goal: a (quasi) real-time retrieval system. –For basic classifiers: SVMs are much slower; –For combination: boosting is also much slower. The advantage of decision stump ensemble: –Low training cost; –Low testing cost; –Very easy to parallelize;

Asymmetric Bagging Imbalance: count(irrelevant) >> count(relevant) –Side effects, e.g. overfitting. Solution: asymmetric bagging –Repeat 100 times by using different randomly sampled irrelevant web images. irrelevant images relevant images 100 training sets … …

Relevance Feedback The user labels n l relevant or irrelevant consumer photos. –Use this information to further refine the retrieval results; Challenge 1: Usually n l is small; Challenge 2: Cross-domain learning –Source classifier is trained on the web image domain. –The user labels some personal photos.

Method 1: Cross-Domain Combination of Classifiers Re-train classifiers with data from both domain? –Neither effective nor efficient; A simple but effective method: –Train an SVM on the consumer photo domain with user-labeled photos; –Convert the responds of source classifier and SVM classifier to probability, and add them up; –Rank consumer photos based on this sum value. Referred as DS_S+SVM_T.

Method 2: Cross-Domain Regularized Regression (CDRR) Construct a linear regression function f T (x): –For labeled photos: f T (x i ) ≈ y i ; –For unlabeled photos: f T (x i ) ≈ f s (x i ); Source Classifier

Other images f T (x) should be f s (x) Design a target linear classifier f T (x) = w T x. User-labeled images x 1,…,x l f T (x) should be the user’s label y(x) A regularizer to control the complexity of the target classifier f T (x) This problem can be solved with least square solver.

Hybrid Method A combination of two methods. For labeled consumer photos: –Measure the average distance d avg to their 30 nearest unlabeled neighbors in feature space; –If d avg < ε: Use DS_S+SVM_T; –Otherwise: Use CDRR. Reason: –For consumer photos which are visually similar to user-labeled images, they should be influenced more by user-labeled images.

Experimental Results

Dataset and Experimental Setup Web Image Database: –1.3 million photos from photoSIG. –Relatively professional photos. Text descriptions for web images: –Title, portfolio, and categories accompanied with web images; –Remove the common high-frequency words; –Remove the rarely-used words. –Finally, 21377 words in our vocabulary.

Dataset and Experimental Setup Testing Dataset #1: Kodak dataset –Collected by Eastman Kodak Company: From about 100 real users. Over a period of one year. –1358 images: The first keyframe from each video. –21 concepts: We merge “group_of_two” and “group_of_three_or_more” to one concept.

Dataset and Experimental Setup Testing Dataset #2: Corel dataset –4999 images 192x128 or 128x192. –43 concepts: We remove all concepts in which there are fewer than 100 images.

Visual Features Grid-Based color moment (225D) –Three moments of three color channels from each block of 5x5 grid. Edge direction histogram (73D) –72 edge direction bins plus one non-edge bin. Wavelet texture (128D) Concatenate all three kinds of features: –Normalize each dimension to avg = 0, stddev = 1 –Use first 103 principal components.

Retrieval without Relevance Feedback For all concepts: –Average number of relevant images: 3703.5.

Retrieval without Relevance Feedback kNN: rank consumer photos with average distance to 300-nn in the relevant web images. DS_S: decision stump ensemble.

Retrieval without Relevance Feedback Time cost: –We use OpenMP to parallelize our method; –With 8 threads, both methods can achieve interactive level. –But kNN is expected to cost much time on large- scale datasets.

Retrieval with Relevance Feedback In each round, the user labels at most 1 positive and 1 negative images in top-40; Methods for comparison: –kNN_RF: add user-labeled photos into relevant image set, and re-apply kNN; –SVM_T: train SVM based on the user-labeled images in the target domain; –A-SVM: Adaptive SVM; –MR: Manifold Ranking based relevance feedback method;

Retrieval with Relevance Feedback Setting of y(x) for CDRR: –Positive: +1.0; –Negative: -0.1; Reason: –The top-ranked negative images are not extremely negative; –Positive: “what is”; Negative: “what is not”. positive images negative images

Retrieval with Relevance Feedback On Corel dataset:

Retrieval with Relevance Feedback On Kodak dataset:

Retrieval with Relevance Feedback Time cost: –All methods except A-SVM can achieve real-time speed.

System Demonstration

Query: Sunset

Query: Plane

The User is Providing The Relevance Feedback …

After 2 pos 2 neg feedback…

Summary Our goal: (quasi) real-time textual query based consumer photo retrieval. Our method: –Use web images and their surrounding text descriptions as an auxiliary database; –Asymmetric bagging with decision stumps; –Several simple but effective cross-domain learning methods to help relevance feedback.

Future Work How to efficiently use more powerful source classifiers? How to further improve the speed: –Control training time within 1 seconds; –Control testing time when the consumer photo set is very large.

Thank you! Any questions?

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological.

Similar presentations

Presentation on theme: "Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological.

Similar presentations

Presentation on theme: "Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological."— Presentation transcript:

Similar presentations

About project

Feedback