Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

The K-armed Dueling Bandits Problem

An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.

ICML 2009 Yisong Yue Thorsten Joachims Cornell University

Lazy Paired Hyper-Parameter Tuning

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud

Practical and Reliable Retrieval Evaluation Through Online Experimentation WSDM Workshop on Web Search Click Data February 12 th, 2012 Yisong Yue Carnegie.

QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,

Super Awesome Presentation Dandre Allison Devin Adair.

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Analysis of Algorithms CS Data Structures Section 2.6.

What makes an image memorable?

Towards Estimating the Number of Distinct Value Combinations for a Set of Attributes Xiaohui Yu 1, Calisto Zuzarte 2, Ken Sevcik 1 1 University of Toronto.

Theoretical Analysis. Objective Our algorithm use some kind of hashing technique, called random projection. In this slide, we will show that if a user.

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

A Simple Distribution- Free Approach to the Max k-Armed Bandit Problem Matthew Streeter and Stephen Smith Carnegie Mellon University.

Taming the monster: A fast and simple algorithm for contextual bandits

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

NWS Calibration Workshop, LMRFC March, 2009 Slide 1 Sacramento Model Derivation of Initial Parameters.

Practical Online Retrieval Evaluation SIGIR 2011 Tutorial Filip Radlinski (Microsoft) Yisong Yue (CMU)

Yue Han and Lei Yu Binghamton University.

Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Kuang-Hao Liu et al Presented by Xin Che 11/18/09.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.

CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.

Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.

1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.

Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Christopher Olston (CMU and Yahoo! Research)

Testing Transitivity with Individual Data Michael H. Birnbaum and Jeffrey P. Bahra California State University, Fullerton.

1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.

Online Search Evaluation with Interleaving Filip Radlinski Microsoft.

Evaluation INST 734 Module 5 Doug Oard. Agenda Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings  Interleaving.

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.

Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Stochastic Radiosity K. H. Ko School of Mechatronics Gwangju Institute.

APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:

Some Vignettes from Learning Theory Robert Kleinberg Cornell University Microsoft Faculty Summit, 2009.

The Dueling Bandits Problem Yisong Yue. Outline Brief Overview of Multi-Armed Bandits Dueling Bandits – Mathematical properties – Connections to other.

Specification of a CRM model Ken Cheung Department of Biostatistics, Columbia University (joint work with Shing Columbia)

Confidence Interval Proportions.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Universit at Dortmund, LS VIII

Experimental Evaluation of Learning Algorithms Part 1.

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Online Learning for Collaborative Filtering

Probability = Relative Frequency. Typical Distribution for a Discrete Variable.

Tutorial Improving your list of results in EBSCO Discovery Service (EDS) support.ebsco.com.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

Online Rank Elicitation for Plackett- Luce: A Dueling Bandits Approach Balázs Szörényi Technion, Haifa, Israel / MTA-SZTE Research Group on Artificial.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Pairwise comparisons: Confidence intervals Multiple comparisons Marina Bogomolov and Gili Baumer.

Step 1: Specify a null hypothesis

The Matching Hypothesis

Sequence comparison: Local alignment

Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo

Learning Preferences on Trajectories via Iterative Improvement

Lecture 7 – Algorithmic Approaches

How does Clickthrough Data Reflect Retrieval Quality?

Iain G. Johnston, Nick S. Jones The American Journal of Human Genetics

Presentation transcript:

Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell) Optimizing Information Retrieval Systems Assumptions Regret Guarantee Assumptions of preference behavior (required for theoretical analysis) P(bi > bj) = ½ + εij (distinguishability) Playing against mean bandit calibrates preference scores -- Estimates of (active) bandits directly comparable -- One estimate per active bandit = linear number of estimates We can bound comparisons needed to remove worst bandit -- Varies smoothly with transitivity parameter γ -- High probability bound We can bound the regret incurred by each comparison Can bound the total regret with high probability: -- γ is typically close to 1 Increasingly reliant on user feedback (E.g., clicks on search results) Online learning is a popular modeling tool (Especially partial-information (bandit) settings) Our focus: learning from relative preferences Motivated by recent work on interleaved retrieval evaluation Relaxed Stochastic Transitivity For three bandits b* > bj > bk : Internal consistency property Stochastic Triangle Inequality For three bandits b* > bj > bk : Diminishing returns property ← This is not possible with previous work! Team Draft Interleaving (Comparison Oracle for Search) B wins! γ = 1 required in previous work, and required to apply for all bandit triplets γ = 1.5 in Example Pairwise Preferences shown in left column Ranking A Napa Valley – The authority for lodging... www.napavalley.com Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley – The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking Napa Valley – The authority for lodging... www.napavalley.com 2. Napa Country, California – Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... Napa Valley Wineries – Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels – Bed and Breakfast... www.napalinks.com Napa Balley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org We also have a similar PAC guarantee. Click A B C D E F Mean Lower Bound Upper A wins Total 13 25 16 24 11 22 28 20 30 21 0.59 150 0.49 0.69 B wins 14 15 19 17 26 0.63 0.53 0.73 C wins 12 10 23 0.55 0.45 0.65 D wins 9 0.50 0.40 0.60 E wins 8 6 29 31 0.42 0.32 0.52 F wins 4 18 0.43 0.33 A B C D E F Mean Lower Bound Upper A wins Total 15 30 19 29 14 28 18 33 23 25 0.55 120 0.43 0.67 B wins 17 34 24 20 27 26 0.56 118 0.44 0.68 C wins 13 31 11 16 0.45 0.33 0.57 D wins 12 0.48 112 0.36 0.60 E wins 8 6 22 10 0.42 150 0.32 0.52 F wins 32 7 0.41 145 0.31 0.51 Beat-the-Mean Click -- Each bandit (row) maintains score against mean bandit -- Mean bandit is average against all active bandits (averaging over columns A-F) -- Maintains upper/lower bound confidence intervals (last two columns) -- When one bandit dominates another (lower bound > upper bound), remove bandit (grey out) -- Remove comparisons from estimate of score against mean bandit (don’t count greyed out columns) -- Remaining scores form estimate of versus new mean bandit (of remaining active bandits) -- Continue until one bandit remains [Radlinski et al. 2008] Dueling Bandits Problem Given K bandits b1, …, bK Each iteration: compare (duel) two bandits (E.g., interleaving two retrieval functions) Cost function (regret): (bt, bt’) are the two bandits chosen b* is the overall best one (% users who prefer best bandit over chosen ones) A B C D E F Mean Lower Bound Upper A wins Total 13 25 16 24 11 22 28 20 30 21 0.58 120 0.49 0.67 B wins 14 15 19 26 0.62 124 0.51 0.73 C wins 12 10 23 0.50 126 0.39 0.61 D wins 9 122 0.38 0.60 E wins 8 6 29 31 0.42 150 0.32 0.52 F wins 4 18 0.31 0.53 A B C D E F Mean Lower Bound Upper A wins Total 41 80 44 75 38 70 42 23 30 15 25 0.51 0.38 0.64 B wins 31 69 78 47 51 26 27 0.52 147 0.45 0.49 C wins 33 77 35 39 76 20 24 16 0.33 225 0.24 0.42 D wins 74 73 28 17 300 0.35 E wins 8 11 6 22 14 29 10 19 150 0.32 F wins 12 32 7 13 0.41 145 0.31 [Yue et al. 2009] Example Pairwise Preferences A B C D E F 0.05 0.04 0.11 -0.05 0.06 0.08 0.10 0.01 -0.04 0.00 -0.11 -0.08 -0.01 -0.10 -0.06 -0.00 Compare E & F: P(A > E) = 0.61 P(A > F) = 0.61 Incurred Regret = 0.22 Empirical Results Conclusions Online learning approach using pairwise feedback -- Well-suited for optimizing information retrieval systems from user feedback -- Models exploration/exploitation tradeoff -- Models violations in preference transitivity Algorithm: Beat-the-Mean -- Regret linear in #bandits and logarithmic in #iterations -- Degrades smoothly with transitivity violation -- Stronger guarantees than previous work -- Also has PAC guarantees -- Empirically supported Compare D & F: P(A > D) = 0.54 P(A > F) = 0.61 Incurred Regret = 0.15 Values are Pr(row > col) – 0.5 Derived from interleaving experiments on http://arXiv.org Compare A & B: P(A > A) = 0.50 P(A > B) = 0.55 Incurred Regret = 0.05 Violation in internal consistency! For strong stochastic transitivity: A > D should be at least 0.06 C > E should be at least 0.04 Simulation experiment where γ = 1 Light (Beat-the-Mean) Dark (Interleaved Filter [Yue et al. 2009]) Beat-the-Mean exhibits lower variance. Simulation experiment where γ = 1.3 Light (Beat-the-Mean) Dark (Interleaved Filter [Yue et al. 2009]) Interleaved Filter has quad. regret in worst case