Download presentation

Presentation is loading. Please wait.

Published byMelanie Cahill Modified over 3 years ago

1
An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University

2
Information Systems

3
Designer How to collect feedback? What to ask human judges? What results to show users? How to interpret feedback? What is the utility function? What are useful signals? Optimization Optimize Utility Function Tune parameters (signals) Collecting Feedback Explicit Judgments Is this document good? Usage Logs (clicks) Clicks on search results.

4
Designer How to collect feedback? What to ask human judges? What results to show users? How to interpret feedback? What is the utility function? What are useful signals? Optimization Optimize Utility Function Tune parameters (signals) Collecting Feedback Explicit Judgments Is this document good? Usage Logs Clicks on search results.

5
Designer How to collect feedback? What to ask human judges? What results to show users? How to interpret feedback? What is the utility function? What are useful signals? Optimization Optimize Utility Function Tune parameters (signals) Collecting Feedback Explicit Judgments Is this document good? Usage Logs Clicks on search results.

6
Collecting Feedback Explicit Judgments Is this document good? Usage Logs Clicks on search results. Optimization Optimize Utility Function Tune parameters (signals) Designer How to collect feedback? What to ask human judges? What results to show users? How to interpret feedback? What is the utility function? What are useful signals?

7
Interactive Learning Setting Find the best ranking function (out of 1000) – Show results, evaluate using click logs – Clicks biased (users only click on what they see) – Explore / exploit problem

8
Interactive Learning Setting Find the best ranking function (out of 1000) – Show results, evaluate using click logs – Clicks biased (users only click on what they see) – Explore / exploit problem Technical issues – How to interpret clicks? – What is the utility function? – What results to show users?

9
Interactive Learning Setting Find the best ranking function (out of 1000) – Show results, evaluate using click logs – Clicks biased (users only click on what they see) – Explore / exploit problem Technical issues – How to interpret clicks? – What is the utility function? – What results to show users?

10
Team-Game Interleaving 1. Kernel Machines 2.Support Vector Machine 3.An Introduction to Support Vector Machines 4.Archives of SUPPORT-VECTOR-MACHINES SVM-Light Support Vector Machine light/ 1. Kernel Machines 2.SVM-Light Support Vector Machine light/ 3.Support Vector Machine and Kernel... References 4.Lucent Technologies: SVM demo applet 5.Royal Holloway Support Vector Machine 1. Kernel Machines T2 2.Support Vector MachineT1 3.SVM-Light Support Vector Machine T2 light/ 4.An Introduction to Support Vector MachinesT1 5.Support Vector Machine and Kernel... ReferencesT2 6.Archives of SUPPORT-VECTOR-MACHINES...T1 7.Lucent Technologies: SVM demo applet T2 f 1 (u,q) r 1 f 2 (u,q) r 2 Interleaving(r 1,r 2 ) (u=thorsten, q=svm) Interpretation: ( r 1 Â r 2 ) clicks(T 1 ) > clicks(T 2 ) Invariant: For all k, in expectation same number of team members in top k from each team. NEXT PICK [Radlinski, Kurup, Joachims; CIKM 2008]

11
Setting Find the best ranking function (out of 1000) – Evaluate using click logs – Clicks biased (users only click on what they see) – Explore / exploit problem Technical issues – How to interpret clicks? – What is the utility function? – What results to show users?

12
Dueling Bandits Problem Given K bandits b 1, …, b K Each iteration: compare (duel) two bandits – E.g., interleaving two retrieval functions [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

13
Dueling Bandits Problem Given K bandits b 1, …, b K Each iteration: compare (duel) two bandits – E.g., interleaving two retrieval functions Cost function (regret): (b t, b t ) are the two bandits chosen b * is the overall best one (% users who prefer best bandit over chosen ones) [Yue & Joachims, ICML 2009] [Yue, Broder, Kleinberg, Joachims, COLT 2009]

14
Example 1 P(f* > f) = 0.9 P(f* > f) = 0.8 Incurred Regret = 0.7 Example 2 P(f* > f) = 0.7 P(f* > f) = 0.6 Incurred Regret = 0.3 Example 3 P(f* > f) = 0.51 P(f* > f) = 0.55 Incurred Regret = 0.06

15
Assumptions P(b i > b j ) = ½ + ε ij (distinguishability) Strong Stochastic Transitivity – For three bandits b i > b j > b k : – Monotonicity property Stochastic Triangle Inequality – For three bandits b i > b j > b k : – Diminishing returns property Satisfied by many standard models – E.g., Logistic / Bradley-Terry

16
Explore then Exploit First explore – Try to gather as much information as possible – Accumulates regret based on which bandits we decide to compare Then exploit – We have a (good) guess as to which bandit best – Repeatedly compare that bandit with itself (i.e., interleave that ranking with itself)

17
Naïve Approach In deterministic case, O(K) comparisons to find max Extend to noisy case: – Repeatedly compare until confident one is better Problem: comparing two awful (but similar) bandits Example: – P(A > B) = 0.85 – P(A > C) = 0.85 – P(B > C) = 0.51 – Comparing B and C requires many comparisons!

18
Interleaved Filter Choose candidate bandit at random [Yue, Broder, Kleinberg, Joachims, COLT 2009]

19
Interleaved Filter Choose candidate bandit at random Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously – Maintain mean and confidence interval for each pair of bandits being compared [Yue, Broder, Kleinberg, Joachims, COLT 2009]

20
Interleaved Filter Choose candidate bandit at random Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously – Maintain mean and confidence interval for each pair of bandits being compared …until another bandit is better – With confidence 1 – δ [Yue, Broder, Kleinberg, Joachims, COLT 2009]

21
Interleaved Filter Choose candidate bandit at random Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously – Maintain mean and confidence interval for each pair of bandits being compared …until another bandit is better – With confidence 1 – δ Repeat process with new candidate – (Remove all empirically worse bandits) [Yue, Broder, Kleinberg, Joachims, COLT 2009]

22
Interleaved Filter Choose candidate bandit at random Make noisy comparisons (Bernoulli trial) against all other bandits simultaneously – Maintain mean and confidence interval for each pair of bandits being compared …until another bandit is better – With confidence 1 – δ Repeat process with new candidate – (Remove all empirically worse bandits) Continue until 1 candidate left [Yue, Broder, Kleinberg, Joachims, COLT 2009]

23
Intuition Simulate comparing candidate with all remaining bandits simultaneously [Yue, Broder, Kleinberg, Joachims, COLT 2009]

24
Intuition Simulate comparing candidate with all remaining bandits simultaneously Example: – P(A > B) = 0.85 – P(A > C) = 0.85 – P(B > C) = 0.51 – B is candidate – # comparisons between B vs C bounded by comparisons between B vs A! [Yue, Broder, Kleinberg, Joachims, COLT 2009]

25
Regret Analysis Round: all the time steps for a particular candidate bandit – Halts when better bandit found … – … with 1- δ confidence – Choose δ = 1/(TK 2 ) Match: all the comparisons between two bandits in a round – At most K matches in each round – Candidate plays one match against each remaining bandit [Yue, Broder, Kleinberg, Joachims, COLT 2009]

26
Regret Analysis O(log K) total rounds O(K) total matches Each match incurs regret – Depends on δ = K -2 T -1 Finds best bandit w.p. 1-1/T Expected regret: [Yue, Broder, Kleinberg, Joachims, COLT 2009]

27
Removing Inferior Bandits At conclusion of each round – Remove any empirically worse bandits Intuition: – High confidence that winner is better than incumbent candidate – Empirically worse bandits cannot be much better than incumbent candidate – Can show via Hoeffding bound that winner is also better than empirically worsebandits with high confidence – Preserves 1-1/T confidence overall that well find the best bandit

28
Summary Provably efficient online algorithm – (In a regret sense) – Also results for continuous (convex) setting

29
Summary Provably efficient online algorithm – (In a regret sense) – Also results for continuous (convex) setting Requires comparison oracle – Informative (not too noisy) – Independence / Unbiased – Strong transitivity – Triangle Inequality

30
Revisiting Assumptions P(b i > b j ) = ½ + ε ij (distinguishability) Strong Stochastic Transitivity – For three bandits b i > b j > b k : – Monotonicity property Stochastic Triangle Inequality – For three bandits b i > b j > b k : – Diminishing returns property Satisfied by many standard models – E.g., Logistic / Bradley-Terry

31
Directions to Explore Relaxing assumptions – E.g., strong transitivity & triangle inequality Integrating context Dealing with large K – Assume additional structure on for retrieval functions? Other cost models – PAC setting (fixed budget, find the best possible) Dynamic or changing user interests / environment

32
Improving Comparison Oracle

33
Dueling Bandits Problem – Interactive learning framework – Provably minimizes regret – Assumes ideal comparison oracle Can we improve the comparison oracle? – Can we improve how we interpret results?

34
Determining Statistical Significance Each q, interleave A(q) and B(q), log clicks t-Test – For each q, score: % clicks on A(q) E.g., 3/4 = 0.75 – Sample mean score (e.g., 0.6) – Compute confidence (p value) E.g., want p = 0.05 (i.e., 95% confidence) – More data, more confident

35
Limitation Example: query session with 2 clicks – One click at rank 1 (from A) – Later click at rank 4 (from B) – Normally would count this query session as a tie

36
Limitation Example: query session with 2 clicks – One click at rank 1 (from A) – Later click at rank 4 (from B) – Normally would count this query session as a tie – But second click is probably more informative… – …so B should get more credit for this query

37
Linear Model Feature vector φ(q,c): Weight of click is w T φ(q,c) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

38
Example w T φ(q,c) differentiates last clicks and other clicks [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

39
Example w T φ(q,c) differentiates last clicks and other clicks Interleave A vs B – 3 clicks per session – Last click 60% on result from A – Other 2 clicks random [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

40
Example w T φ(q,c) differentiates last clicks and other clicks Interleave A vs B – 3 clicks per session – Last click 60% on result from A – Other 2 clicks random Conventional w = (1,1) – has significant variance Only count last click w = (1,0) – minimizes variance [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

41
Scoring Query Sessions Feature representation for query session: [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

42
Scoring Query Sessions Feature representation for query session: Weighted score for query: Positive score favors A, negative favors B [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

43
Supervised Learning Will optimize for z-Test: Inverse z-Test – Approximately equal t-Test for large samples – z-Score = mean / standard deviation (Assumes A > B) [Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010]

44
Experiment Setup Data collection – Pool of retrieval functions – Hash users into partitions – Run interleaving of different pairs in parallel Collected on arXiv.org – 2 pools of retrieval functions – Training Pool: (6 pairs) know A > B – New Pool: (12 pairs)

45
Training Pool – Cross Validation

46

47
Experimental Results Inverse z-Test works well – Beats baseline on most of new interleaving pairs – Direction of tests all in agreement – In 6/12 pairs, for p=0.1, reduces sample size by 10% – In 4/12 pairs, achieves p=0.05, but not baseline 400 to 650 queries per interleaving experiment Weights hard to interpret (features correlated) Largest weight: 1 if single click & rank > 1

48
Interactive Learning System dynamically decides how to service users – Learn about user preferences – Maximize total user utility over time – Exploration / exploitation tradeoff

49
Interactive Learning System dynamically decides how to service users – Learn about user preferences – Maximize total user utility over time – Exploration / exploitation tradeoff Existing mechanism (interleaving) to collect feedback – Can also improve how we interpret feedback Simple yet practical cost model – Compatible with interleaving oracle – Yields efficient algorithms

50
Thank You! Work supported by NSF IIS , Yahoo! KSC Award, and Microsoft Graduate Fellowship. Slides, papers & software available at

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google