Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India.

Similar presentations


Presentation on theme: "Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India."— Presentation transcript:

1 Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India

2 Keyword-based image search Best actress, Academy awards 2011 Natalie Portman Highest h-index in computer science Scott Shenker Professor @ UC Berkeley

3 Scott Shenker similar results from other search engines

4 Our holy grail

5 Huge improvements in performance for tail queries Only the raw click data Efficient use of visual content In this talk

6 Limitation 1: Ignore visual content Simple visual processing could improve results

7 Score(x) = w t x with a fixed w Query : taj mahal Query : delhi Limitation 2: Static ranker within a vertical delhi.jpg tajmahal.jpg

8 Limitation 3: Noisy training labels Query : night train

9 Overview of our solution query Ranker Original ranked list Re-ranked list f* visual f* text Pseudo-click estimation clicked

10 Existing work for text document search linear scanning of ranked list relationship between relevance and clicks [Joachims07, Radlinski08, Agrawal09, Cutrell07, …] Our contribution for image search (challenge) 2D display of results (challenge) no model for browsing/click behavior use only raw click data Novelty in our use of click data

11 Evidence for clicks-relevance relationship (short snippet)(thumbnail) [Agichtein06]

12 Rich gets richer The curious case of distracting images Little gain when only a few were clicked Naïve solution – ClickBoosting Click-boosting ranked listOriginal ranked list

13 35502052 Pseudo-click estimation: Regression Query : night train (#clicks)

14 23502052 Visual features are not enough Need both visual and text features night rod Query : night train (#clicks)

15 Re-ranking function query Compute text features Compute visual features color, texture, shape query dependent query independent y text y visual Regression Score: s R (x) = a 1 s O (x) + a 2 y text (x) + a 3 y visual (x) sRsR sOsO

16 #features (~3000) >> #clicked images (~10) Dimensionality reduction unsupervised: only positive labeled data and the winner is… PCA !!! Bayesian formulation of regression Gaussian Process regression prevents over-fitting to small no. of examples non-optimized Matlab: 20 ms for a query with 20 clicked images Regression: challenges

17 Bing: top 1000 retrieved images for 19 tail queries e.g., gnats, Bern, fracture, child drinking water Relevance highly relevant, relevant, non-relevant referred to parent documents when needed used only for evaluation; not training Evaluation nDCG @ 20 (normalized discounted cumulative gain) Development set for experiments

18 Baseline: mean nDCG@20 for Bing = 0.6854 SVR and 1-NN performance in between Pseudo-click estimation: Regression Approach Mean nDCG @ 20 Relative Improvement Linear Regression0.6871+ 0.2% GP Regression0.7692+12.2%

19 s R (x) = a 1 s O (x) + a 2 y Text (x) + a 3 y Visual (x) Re-scoring Function Approach Mean nDCG @ 20 Relative Improvement Baseline (a 2 = a 3 = 0)0.6854– Baseline + y Text (a 3 = 0)0.7077+3.3 % Baseline + y Visual (a 2 = 0)0.6136–10.5% Baseline + y Text + y Visual 0.7692+12.2%

20 Evaluation on 193 Queries

21

22 Query: fracture BingOur results Clicks predominately (~92%) on images of bone fracture

23 Query: gnats BingOur results Re-ranked list has only highly relevant images

24 Query: camel caravan BingOur results Anecdotally, our results were perceived as more visually pleasant

25 Query: turkey BingOur results Multiple interpretations are retained if manifested by clicks 305446 81 446

26 Query: Stargate (1994) BingOur results Leads to visually diverse results for some queries

27 Significant improvement in nDCG@20 over commercial image ranking system Use of raw click data Address three limitations of existing search engines incorporate visual features user clicks to handle noisy expert labels query-dependent re-ranking using GP regression Conclusions

28 Thank You!

29 Additional slides

30 Given a ranked list of relevance judgments R Cumulative Gain at P CG P (R) = i=1..P 2 R i – 1 Discounted Cumulative Gain DCG P (R) = i=1..P (2 R i – 1) / log 2 (i+1) Normalized Discounted Cumulative Gain nDCG P (R) = DCG P (R) / DCG P (I) where I is the judgment for the ideal ranked list Measuring Search Performance – nDCG

31 We only have positive training data so discriminative methods did not work well (generating negative training data is non-trivial) Simple methods did work well Click Estimation - Dimensionality Reduction Approach Mean nDCG at 20 Relative Improvement Average click rank0.6266– 8.6% Correlation with score0.7209+5.2 % Correlation with clicks0.7409+8.1% PCA0.7692+12.2%

32 Gaussian Process Regression y(x) = k(x, x Train ) [ k(x Train, x Train ) + 2 I ] -1 y Train = d t (x, x Train ) y Train = w t (x) where y is the predicted number of clicks and y Train the number of clicks for the set of training images x are the features extracted from a novel image x Train are the training set features is a noise parameter k is a Gaussian kernel function Gaussian Process regression

33 Click Estimation - Regression Approach Mean nDCG at 20 Relative Improvement Linear Regression0.6871– 0.2% Support Vector Regression0.6997+2.1 % Nearest Neighbour0.7428+8.3% GP Regression0.7692+12.2%


Download ppt "Learning to Re-rank: Query-Dependent Image Re-Ranking using Click Data Manik Varma Microsoft Research India Vidit Jain Yahoo! Labs India."

Similar presentations


Ads by Google