Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin.

Slides:



Advertisements
Similar presentations
The K-armed Dueling Bandits Problem
Advertisements

An Interactive Learning Approach to Optimizing Information Retrieval Systems Yahoo! August 24 th, 2010 Yisong Yue Cornell University.
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Nonparametric Bootstrap Inference on the Characterization of a Response Surface Robert Parody Center for Quality and Applied Statistics Rochester Institute.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Optimizing Recommender Systems as a Submodular Bandits Problem Yisong Yue Carnegie Mellon University Joint work with Carlos Guestrin & Sue Ann Hong.
Fast Algorithms For Hierarchical Range Histogram Constructions
Taming the monster: A fast and simple algorithm for contextual bandits
Tuning bandit algorithms in stochastic environments The 18th International Conference on Algorithmic Learning Theory October 3, 2007, Sendai International.
Gaussian Process Optimization in the Bandit Setting: No Regret & Experimental Design Niranjan Srinivas Andreas Krause Caltech Sham Kakade Matthias Seeger.
Niranjan Srinivas Andreas Krause Caltech Caltech
Beat the Mean Bandit Yisong Yue (CMU) & Thorsten Joachims (Cornell)
Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 17: The Multi-Armed Bandit Problem 1Lecture 17: The Multi-Armed Bandit Problem.
Yue Han and Lei Yu Binghamton University.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Lecture 4: Embedded methods
Linear Submodular Bandits and their Application to Diversified Retrieval Yisong Yue (CMU) & Carlos Guestrin (CMU) Optimizing Recommender Systems Every.
The loss function, the normal equation,
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Variance Reduction via Lattice Rules By Pierre L’Ecuyer and Christiane Lemieux Presented by Yanzhi Li.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.
1 Efficient planning of informative paths for multiple robots Amarjeet Singh *, Andreas Krause +, Carlos Guestrin +, William J. Kaiser *, Maxim Batalin.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.
Pieter Abbeel and Andrew Y. Ng Apprenticeship Learning via Inverse Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Lecture II-2: Probability Review
Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
Online Learning Algorithms
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Masquerade Detection Mark Stamp 1Masquerade Detection.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
online convex optimization (with partial information)
Estimating the Number of Data Clusters via the Gap Statistic Paper by: Robert Tibshirani, Guenther Walther and Trevor Hastie J.R. Statist. Soc. B (2001),
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.
Online Learning for Collaborative Filtering
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Neural Network Approximation of High- dimensional Functions Peter Andras School of Computing and Mathematics Keele University
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Contextual Bandits in a Collaborative Environment Qingyun Wu 1, Huazheng Wang 1, Quanquan Gu 2, Hongning Wang 1 1 Department of Computer Science 2 Department.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Stochastic Linear Bandits
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Lecture 4: Econometric Foundations
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
WellcomeTrust Centre for Neuroimaging University College London
Presentation transcript:

Hierarchical Exploration for Accelerating Contextual Bandits Yisong Yue Carnegie Mellon University Joint work with Sue Ann Hong (CMU) & Carlos Guestrin (CMU)

… Sports Like! Topic# Likes# DisplayedAverage Sports111 Politics00N/A Economy00N/A

… Politics Boo! Topic# Likes# DisplayedAverage Sports111 Politics010 Economy00N/A

… Economy Like! Topic# Likes# DisplayedAverage Sports111 Politics010 Economy111

… Boo! Topic# Likes# DisplayedAverage Sports120.5 Politics010 Economy111 Sports

… Boo! Topic# Likes# DisplayedAverage Sports120.5 Politics020 Economy111 Politics

… Boo! Topic# Likes# DisplayedAverage Sports120.5 Politics020 Economy111 Politics Exploration / Exploitation Tradeoff! Learning “on-the-fly” Modeled as a contextual bandit problem Exploration is expensive Our Goal: use prior knowledge to reduce exploration

Linear Stochastic Bandit Problem At time t – Set of available actions A t = {a t,1, …, a t,n } (articles to recommend) – Algorithm chooses action â t from A t (recommends an article) – User provides stochastic feedback ŷ t (user clicks on or “likes” the article) E[ŷ t ] = w *T â t (w * is unknown) – Algorithm incorporates feedback – t=t+1 Regret:

Balancing Exploration vs. Exploitation At each iteration: Example below: select article on economy Estimated Gain by Topic Uncertainty of Estimate + Uncertainty Estimated Gain “Upper Confidence Bound”

Conventional Bandit Approach LinUCB algorithm [Dani et al. 2008; Rusmevichientong & Tsitsiklis 2008; Abbasi-Yadkori et al. 2011] – Uses particular way of defining uncertainty – Achieves regret: Linear in dimensionality D Linear in norm of w * How can we do better?

More Efficient Bandit Learning LinUCB naively explores D-dimensional space – S = |w * | w*w* Assume w * mostly in subspace – Dimensionality K << D – E.g., “European vs Asia News” – Estimated using prior knowledge E.g., existing user profiles Two tiered exploration – First in subspace – Then in full space Significantly less exploration w*w* LinUCB Guarantee: Feature Hierarchy

At time t: Least squares in subspace Least squares in full space (regularized to ) Recommend article a that maximizes Receive feedback ŷ t CoFineUCB: Coarse-to-Fine Hierarchical Exploration Uncertainty in Subspace Uncertainty in Full Space (Projection onto subspace)

Theoretical Intuition Regret analysis of UCB algorithms requires 2 things – Rigorous confidence region of the true w * – Shrinkage rate of confidence region size CoFineUCB uses tighter confidence regions – Can prove lies mostly in K-dim subspace – Convolution of K-dim ellipse with small D-dim ellipse

Empirical sample learned user preferences – W = [w 1,…,w N ] Approximately minimizes norms in regret bound Similar to approaches for multi-task structure learning – [Argyriou et al. 2007; Zhang & Yeung 2010] LearnU(W,K): [A,Σ,B] = SVD(W) (I.e., W = AΣB T ) Return U = (AΣ 1/2 ) (1:K) / C Constructing Feature Hierarchies (One Simple Approach) “Normalizing Constant”

Simulation Comparison Leave-one-out validation using existing user profiles – From previous personalization study [Yue & Guestrin 2011] Methods – Naïve (LinUCB) (regularize to mean of existing users) – Reshaped Full Space (LinUCB using LearnU(W,D)) – Subspace (LinUCB using LearnU(W,K)) Often what people resort to in practice – CoFineUCB Combines reshaped full space and subspace approaches (D=100, K = 5)

Naïve Baselines Reshaped Full space SubspaceCoarse-to-Fine Approach “Atypical Users”

User Study 10 days 10 articles per day – From thousands of articles for that day (from Spinn3r – Jan/Feb 2012) – Submodular bandit extension to model utility of multiple articles [Yue & Guestrin 2011] 100 topics – 5 dimensional subspace Users rate articles Count #likes

User Study ~27 users per study Coarse-to-Fine Wins Naïve LinUCB Coarse-to-Fine Wins Ties Losses LinUCB with Reshaped Full Space *Short time horizon (T=10) made comparison with Subspace LinUCB not meaningful Losses

Conclusions Coarse-to-Fine approach for saving exploration – Principled approach for transferring prior knowledge – Theoretical guarantees Depend on the quality of the constructed feature hierarchy – Validated via simulations & live user study Future directions – Multi-level feature hierarchies – Learning feature hierarchy online Requires learning simultaneously from multiple users – Knowledge transfer for sparse models in bandit setting Research supported by ONR (PECASE) N , ONR YIP N , and by the Intel Science and Technology Center for Embedded Computing.

Extra Slides

Submodular Bandit Extension Algorithm recommends set of articles Features depend on articles above – “Submodular basis features” User provides stochastic feedback

CoFine LSBGreedy At time t: – Least squares in subspace – Least squares in full space – (regularized to ) – Start with A t empty – For i=1,…,L Recommend article a that maximizes – Receive feedback y t,1,…,y t,L

Comparison with Sparse Linear Bandits Another possible assumption: is sparse – At most B parameters are non-zero – Sparse bandit algorithms achieve regret that depend on B: E.g., Carpentier & Munos 2011 Limitations: – No transfer of prior knowledge E.g., don’t know WHICH parameters are non-zero. – Typically K < B  CoFineUCB achieves lower regret E.g., fast singular value decay S ≈ S P