1 ICML 2009 Yisong Yue Thorsten Joachims Cornell University Interactively Optimizing Information Retrieval Systems as a Dueling Bandits ProblemICML 2009Yisong YueThorsten JoachimsCornell University
2 Learning To Rank Supervised Learning Problem Extension of classification/regressionRelatively well understoodHigh applicability in Information RetrievalRequires explicitly labeled dataExpensive to obtainExpert judged labels == search user utility?Doesn’t generalize to other search domains.
3 Our Contribution Learn from implicit feedback (users’ clicks) Reduce labeling costMore representative of end user information needsLearn using pairwise comparisonsHumans are more adept at making pairwise judgmentsVia Interleaving [Radlinski et al., 2008]On-line framework (Dueling Bandits Problem)We leverage users when exploring new retrieval functionsExploration vs exploitation tradeoff (regret)
4 Team-Game Interleaving (u=thorsten, q=“svm”)f1(u,q) r1f2(u,q) r21. Kernel Machines2. Support Vector Machine3. An Introduction to Support Vector Machines4. Archives of SUPPORT-VECTOR-MACHINES ...5. SVM-Light Support Vector Machine light/NEXT PICK1. Kernel Machines2. SVM-Light Support Vector Machine light/3. Support Vector Machine and Kernel ... References4. Lucent Technologies: SVM demo applet5. Royal Holloway Support Vector MachineInterleaving(r1,r2)1. Kernel Machines T22. Support Vector Machine T13. SVM-Light Support Vector Machine T2 light/4. An Introduction to Support Vector Machines T15. Support Vector Machine and Kernel ... References T26. Archives of SUPPORT-VECTOR-MACHINES ... T17. Lucent Technologies: SVM demo applet T2Invariant: For all k, in expectation same number of team members in top k from each team.This is the evaluation method. Get the ranking for the learned retrieval function and for the standard retrieval function (e.g. Google).Combine both rankings into one combined ranking in a “fair and unbiased” way. This means, that at each position in the combined ranking the number of links from “learned” equals the number of links from “google” plus/minus 1. So, if user have no preference for a ranking function, with 50/50 chance they will click on links from either ranking function.We then evaluate, if the users click on links from one ranking function significantly more often. In the example, the lowest click in the combined ranking is 7. Due to the “fair” merging, the user has seen the top 4 from both rankings. Tracing back where the clicked on links came from, 3 links were in the top 4 from “Learned”, but only one in the top 4 from “google”. So, “learned” wins on this query.Note that this is a blind test. Users do not know which retrieval function the link came from. In particular, we use the same abstract generator.Interpretation: (r2 Â r1) ↔ clicks(T2) > clicks(T1)[Radlinski, Kurup, Joachims; CIKM 2008]
5 Dueling Bandits Problem Continuous space bandits FE.g., parameter space of retrieval functions (i.e., weight vectors)Each time step compares two banditsE.g., interleaving test on two retrieval functionsComparison is noisy & independent
6 Dueling Bandits Problem Continuous space bandits FE.g., parameter space of retrieval functions (i.e., weight vectors)Each time step compares two banditsE.g., interleaving test on two retrieval functionsComparison is noisy & independentChoose pair (ft, ft’) to minimize regret:(% users who prefer best bandit over chosen ones)
8 Modeling Assumptions Each bandit f 2F has intrinsic value v(f) Never observed directlyAssume v(f) is strictly concave ( unique f* )Comparisons based on v(f)P(f > f’) = σ( v(f) – v(f’) )P is L-LipschitzFor example:Want to find assumptions that are minimal, realistic and yields good algorithms. These modeling assumptions are one attempt to do so.
9 Probability Functions Same global optimumPartially convexGradient descent attractive
10 Dueling Bandit Gradient Descent Maintain ftCompare with ft’ (close to ft -- defined by step size)Update if ft’ wins comparisonExpectation of update close to gradient of P(ft > f’)Builds on Bandit Gradient Descent [Flaxman et al., 2005]
20 Analysis (Sketch) Dueling Bandit Gradient Descent Sequence of partially convex functions ct(f) = P(ft > f)Random binary updates (expectation close to gradient)Bandit Gradient Descent [Flaxman et al., SODA 2005]Sequence of convex functionsUse randomized update(expectation close to gradient)Can be extended to our setting(Assumes more information)
21 Analysis (Sketch) Convex functions satisfy Both additive and multiplicative errorDepends on exploration step size δMain analytical contribution: bounding multiplicative error
22 Regret Bound Regret grows as O(T3/4): Average regret shrinks as O(T-1/4)In the limit, we do as well as knowing f* in hindsightδ = O(1/T-1/4 )γ = O(1/T-1/2 )
23 Practical Considerations Need to set step size parametersDepends on P(f > f’)Cannot be set optimallyWe don’t know the specifics of P(f > f’)Algorithm should be robust to parameter settingsSet parameters approximately in experiments
24 50 dimensional parameter space Value function v(x) = -xTx Logistic transfer functionRandom point has regret almost 1More experiments in paper.
25 Web Search Simulation Leverage web search dataset 1000 Training Queries, 367 DimensionsSimulate “users” issuing queriesValue function based on (ranking measure)Use logistic to make probabilistic comparisonsUse linear ranking function.Not intended to compete with supervised learningFeasibility check for online learning w/ usersSupervised labels difficult to acquire “in the wild”
26 Chose parameters with best final performance Curves basically identical for validation and test sets (no over-fitting)Sampling multiple queries makes no difference
27 What Next? Better simulation environments DBGD simple and extensible More realistic user modeling assumptionsDBGD simple and extensibleIncorporate pairwise document preferencesDeal with ranking discontinuitiesTest on real search systemsVarying scales of user communitiesSheds on insight / guides future development
29 Active vs Passive Learning Passive Data Collection (offline)Biased by current retrieval functionPoint-wise EvaluationDesign retrieval function offlineEvaluate onlineActive Learning (online)Automatically propose new rankings to evaluateOur approach
30 Relative vs Absolute Metrics Our framework based on relative metricsE.g., comparing pairs of results or rankingsRelatively recent developmentAbsolute MetricsE.g., absolute click-through rateMore common in literatureSuffers from presentation biasLess robust to the many different sources of noise
31 What Results do Users View/Click? [Joachims et al., TOIS 2007]
33 Analysis (Sketch) Convex functions satisfy We have both multiplicative and additive errorDepends on exploration step size δMain technical contribution: bounding multiplicative errorExisting results yields sub-linear bounds on:
34 Analysis (Sketch) We know how to bound Regret: We can show using Lipschitz and symmetry of σ:
35 More Simulation Experiments Logistic transfer function σ(x) = 1/(1+exp(-x))4 choices of value functionsδ, γ set approximately
37 NDCG Normalized Discounted Cumulative Gain Multiple Levels of RelevanceDCG:contribution of ith rank position:Ex: has DCG score ofNDCG is normalized DCGbest possible ranking as score NDCG = 1
38 Considerations NDCG is discontinuous w.r.t. function parameters Try larger values of δ, γTry sampling multiple queries per updateHomogenous user valuesNot an optimization concernModeling limitationNot intended to compete with supervised learningSanity check of feasibility for online learning w/ users
Your consent to our cookies if you continue to use this website.