Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem ICML 2009 Yisong Yue Thorsten Joachims Cornell University.

Similar presentations


Presentation on theme: "Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem ICML 2009 Yisong Yue Thorsten Joachims Cornell University."— Presentation transcript:

1 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem ICML 2009 Yisong Yue Thorsten Joachims Cornell University

2 Learning To Rank Supervised Learning Problem – Extension of classification/regression – Relatively well understood – High applicability in Information Retrieval Requires explicitly labeled data – Expensive to obtain – Expert judged labels == search user utility? – Doesnt generalize to other search domains.

3 Our Contribution Learn from implicit feedback (users clicks) – Reduce labeling cost – More representative of end user information needs Learn using pairwise comparisons – Humans are more adept at making pairwise judgments – Via Interleaving [Radlinski et al., 2008] On-line framework (Dueling Bandits Problem) – We leverage users when exploring new retrieval functions – Exploration vs exploitation tradeoff (regret)

4 Team-Game Interleaving 1. Kernel Machines 2.Support Vector Machine 3.An Introduction to Support Vector Machines 4.Archives of SUPPORT-VECTOR-MACHINES SVM-Light Support Vector Machine light/ 1. Kernel Machines 2.SVM-Light Support Vector Machine light/ 3.Support Vector Machine and Kernel... References 4.Lucent Technologies: SVM demo applet 5.Royal Holloway Support Vector Machine 1. Kernel Machines T2 2.Support Vector MachineT1 3.SVM-Light Support Vector Machine T2 light/ 4.An Introduction to Support Vector MachinesT1 5.Support Vector Machine and Kernel... ReferencesT2 6.Archives of SUPPORT-VECTOR-MACHINES...T1 7.Lucent Technologies: SVM demo applet T2 f 1 (u,q) r 1 f 2 (u,q) r 2 Interleaving(r 1,r 2 ) (u=thorsten, q=svm) Interpretation: ( r 2 Â r 1 ) clicks(T 2 ) > clicks(T 1 ) Invariant: For all k, in expectation same number of team members in top k from each team. NEXT PICK [Radlinski, Kurup, Joachims; CIKM 2008]

5 Dueling Bandits Problem Continuous space bandits F – E.g., parameter space of retrieval functions (i.e., weight vectors) Each time step compares two bandits – E.g., interleaving test on two retrieval functions – Comparison is noisy & independent

6 Dueling Bandits Problem Continuous space bandits F – E.g., parameter space of retrieval functions (i.e., weight vectors) Each time step compares two bandits – E.g., interleaving test on two retrieval functions – Comparison is noisy & independent Choose pair (f t, f t ) to minimize regret: (% users who prefer best bandit over chosen ones)

7 Example 1 P(f* > f) = 0.9 P(f* > f) = 0.8 Incurred Regret = 0.7 Example 2 P(f* > f) = 0.7 P(f* > f) = 0.6 Incurred Regret = 0.3 Example 3 P(f* > f) = 0.51 P(f* > f) = 0.55 Incurred Regret = 0.06

8 Modeling Assumptions Each bandit f 2F has intrinsic value v(f) – Never observed directly – Assume v(f) is strictly concave ( unique f* ) Comparisons based on v(f) – P(f > f) = σ( v(f) – v(f) ) – P is L-Lipschitz – For example:

9 Probability Functions

10 Dueling Bandit Gradient Descent Maintain f t – Compare with f t (close to f t -- defined by step size) – Update if f t wins comparison Expectation of update close to gradient of P(f t > f) – Builds on Bandit Gradient Descent [Flaxman et al., 2005]

11 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

12 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

13 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

14 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

15 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

16 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

17 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

18 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

19 δ – explore step size γ – exploit step size Current point Losing candidate Winning candidate Dueling Bandit Gradient Descent

20 Analysis (Sketch) Dueling Bandit Gradient Descent – Sequence of partially convex functions c t (f) = P(f t > f) – Random binary updates (expectation close to gradient) Bandit Gradient Descent [Flaxman et al., SODA 2005] – Sequence of convex functions – Use randomized update (expectation close to gradient) – Can be extended to our setting (Assumes more information)

21 Analysis (Sketch) Convex functions satisfy – Both additive and multiplicative error – Depends on exploration step size δ – Main analytical contribution: bounding multiplicative error

22 Regret Bound Regret grows as O(T 3/4 ): Average regret shrinks as O(T -1/4 ) – In the limit, we do as well as knowing f* in hindsight δ = O(1/T -1/4 ) γ = O(1/T -1/2 )

23 Practical Considerations Need to set step size parameters – Depends on P(f > f) Cannot be set optimally – We dont know the specifics of P(f > f) – Algorithm should be robust to parameter settings Set parameters approximately in experiments

24 50 dimensional parameter space Value function v(x) = -x T x Logistic transfer function Random point has regret almost 1 More experiments in paper.

25 Web Search Simulation Leverage web search dataset – 1000 Training Queries, 367 Dimensions Simulate users issuing queries – Value function based on (ranking measure) – Use logistic to make probabilistic comparisons Use linear ranking function. Not intended to compete with supervised learning – Feasibility check for online learning w/ users – Supervised labels difficult to acquire in the wild

26 Chose parameters with best final performance Curves basically identical for validation and test sets (no over-fitting) Sampling multiple queries makes no difference

27 What Next? Better simulation environments – More realistic user modeling assumptions DBGD simple and extensible – Incorporate pairwise document preferences – Deal with ranking discontinuities Test on real search systems – Varying scales of user communities – Sheds on insight / guides future development

28 Extra Slides

29 Active vs Passive Learning Passive Data Collection (offline) – Biased by current retrieval function Point-wise Evaluation – Design retrieval function offline – Evaluate online Active Learning (online) – Automatically propose new rankings to evaluate – Our approach

30 Relative vs Absolute Metrics Our framework based on relative metrics – E.g., comparing pairs of results or rankings – Relatively recent development Absolute Metrics – E.g., absolute click-through rate – More common in literature – Suffers from presentation bias – Less robust to the many different sources of noise

31 What Results do Users View/Click? [Joachims et al., TOIS 2007]

32

33 Analysis (Sketch) Convex functions satisfy – We have both multiplicative and additive error – Depends on exploration step size δ – Main technical contribution: bounding multiplicative error Existing results yields sub-linear bounds on:

34 Analysis (Sketch) We know how to bound Regret: We can show using Lipschitz and symmetry of σ:

35 More Simulation Experiments Logistic transfer function σ(x) = 1/(1+exp(-x)) 4 choices of value functions δ, γ set approximately

36

37 NDCG Normalized Discounted Cumulative Gain Multiple Levels of Relevance DCG: – contribution of i th rank position: – Ex: has DCG score of NDCG is normalized DCG – best possible ranking as score NDCG = 1

38 Considerations NDCG is discontinuous w.r.t. function parameters – Try larger values of δ, γ – Try sampling multiple queries per update Homogenous user values – – Not an optimization concern – Modeling limitation Not intended to compete with supervised learning – Sanity check of feasibility for online learning w/ users


Download ppt "Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem ICML 2009 Yisong Yue Thorsten Joachims Cornell University."

Similar presentations


Ads by Google