Carnegie Mellon University

Slides:

Advertisements

Similar presentations

IDSIA Lugano Switzerland Master Algorithms for Active Experts Problems based on Increasing Loss Values Jan Poland and Marcus Hutter Defensive Universal.

Advertisements

Online Learning for Online Pricing Problems Maria Florina Balcan.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

Online learning, minimizing regret, and combining expert advice

Follow the regularized leader

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton.

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

Machine Learning for Mechanism Design and Pricing Problems Avrim Blum Carnegie Mellon University Joint work with Maria-Florina Balcan, Jason Hartline,

Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.

Machine Learning Rob Schapire Princeton Avrim Blum Carnegie Mellon Tommi Jaakkola MIT.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.

Experts and Multiplicative Weights slides from Avrim Blum.

Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.

Instructor: Shengyu Zhang 1. Offline algorithms Almost all algorithms we encountered in this course assume that the entire input is given all at once.

Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.

On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.

Money and Banking Lecture 10. Review of the Previous Lecture Application of Present Value Concept Compound Annual Rate Interest Rates vs Discount Rate.

Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.

Uncertain Judgements: Eliciting experts’ probabilities Anthony O’Hagan et al 2006 Review by Samu Mäntyniemi.

Exponential Functions

Correlation Clustering

Introduction to Hypothesis Testing: The Binomial Test

The NP class. NP-completeness

Probabilistic Algorithms

Game Theory Just last week:

Dan Roth Department of Computer and Information Science

Introduction to Randomized Algorithms and the Probabilistic Method

Better Algorithms for Better Computers

Exponential Functions

Hash table CSC317 We have elements with key and satellite data

Unit 5: Hypothesis Testing

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Static Optimality and Dynamic Search Optimality in Lists and Trees

Introduction to Hypothesis Testing: The Binomial Test

C ODEBREAKER Class discussion.

Exponential Functions

Session 8 Exam techniques

More about Tests and Intervals

Understanding Randomness

Exponential Functions

Exponential Functions

Investment Analysis and Portfolio Management

CSCI B609: “Foundations of Data Science”

Presented By Aaron Roth

The Nonstochastic Multiarmed Bandit Problem

Chapter 2: Evaluative Feedback

Four of the Best Numeracy focus: Problem solving focus:

Significance Tests: The Basics

Ensemble learning.

On Routing without Regret

Any combination of the prime factorization.

Exponential Functions

Exponential Functions

DO NOW 3/15/2016 Find the values

Exponential Functions

The Weighted Majority Algorithm

Chapter 2: Evaluative Feedback

Instructor: Vincent Conitzer

The Price of Uncertainty:

Presentation transcript:

Carnegie Mellon University Machine Learning for Online Decision Making + applications to Online Pricing Your guide: Avrim Blum Carnegie Mellon University

Plan for Today An interesting algorithm for online decision making. Problem of “combining expert advice” Same problem but now with very limited feedback: the “multi-armed bandit problem” Application to online pricing

Using “expert” advice Say we want to predict the stock market. We solicit n “experts” for their advice. (Will the market go up or down?) We then want to use their advice somehow to make our prediction. E.g., Or think of: we want to do spam filtering. So we asked our friends for copies of their strategies, and we want to do nearly as well as best of them for us. Basic question: Is there a strategy that allows us to do nearly as well as best of these in hindsight? [“expert” = someone with an opinion. Not necessarily someone who knows anything.]

Simpler question We have n “experts”. One of these is perfect (never makes a mistake). We just don’t know which one. Can we find a strategy that makes no more than lg(n) mistakes? Answer: sure. Just take majority vote over all experts that have been correct so far. Each mistake cuts # available by factor of 2. Note: this means ok for n to be very large. E.g., if we have a a bunch of rules and want to look at all possible combinations – end up with 1 million combinations. take the log you get a reasonable number like 20.

What if no expert is perfect? One idea: just run above protocol until all experts are crossed off, then repeat. Makes at most log(n) mistakes per mistake of the best expert (plus initial log(n)). Can we do better?

What if no expert is perfect? Intuition: Making a mistake doesn't completely disqualify an expert. So, instead of crossing off, just lower its weight. Weighted Majority Alg: Start with all experts having weight 1. Predict based on weighted majority vote. Penalize mistakes by cutting weight in half.

Analysis: do nearly as well as best expert in hindsight M = # mistakes we've made so far. m = # mistakes best expert has made so far. W = total weight (starts at n). After each mistake, W drops by at least 25%. So, after M mistakes, W is at most n(3/4)M. Weight of best expert is (1/2)m. So, So, if m is small, then M is pretty small too.

Randomized Weighted Majority 2.4(m + lg n) not so good if the best expert makes a mistake 20% of the time. Can we do better? Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30% on down, then pick 70:30) Idea: smooth out the worst case. Also, generalize ½ to 1- e. M = Expected number of mistakes. M = expected #mistakes

Analysis Say at time t we have fraction Ft of weight on experts that made mistake. So, we have probability Ft of making a mistake, and we remove an eFt fraction of the total weight. Wfinal = n(1-e F1)(1 - e F2)... ln(Wfinal) = ln(n) + åt [ln(1 - e Ft)] · ln(n) - e åt Ft (using ln(1-x) < -x) = ln(n) - e M. (å Ft = E[# mistakes]) If best expert makes m mistakes, then ln(Wfinal) > ln((1-e)m). Now solve: ln(n) - e M > m ln(1-e). So if M is large, W_final must have dropped by a lot.

Additive regret So, have M · OPT + eOPT + 1/e log(n). Say we know we will play for T time steps. Then can set e=(log(n) / T)1/2. Get M · OPT + 2(T * log(n))1/2. If we don’t know T in advance, can guess and double. These are called “additive regret” bounds. So if M is large, W_final must have dropped by a lot.

Extensions What if experts are actions? (rows in a matrix game, choice of deterministic alg to run,…) At each time t, each has a loss (cost) in {0,1}. Can still run the algorithm Rather than viewing as “pick a prediction with prob proportional to its weight” , View as “pick an expert with probability proportional to its weight” Same analysis applies. So if M is large, W_final must have dropped by a lot.

Cost’ = cost on c’ vectors Extensions What if losses (costs) in [0,1]? Here is a simple way to extend the results. Given cost vector c, view ci as bias of coin. Flip to create boolean vector c’, s.t. E[c’i] = ci. Feed c’ to alg A. For any sequence of vectors c’, we have: EA[cost’(A)] · mini cost’(i) + [regret term] So, E$[EA[cost’(A)]] · E$[mini cost’(i)] + [regret term] LHS is EA[cost(A)]. RHS · mini E$[cost’(i)] + [r.t.] = mini[cost(i)] + [r.t.] In other words, costs between 0 and 1 just make the problem easier… c $ c’ world A Cost’ = cost on c’ vectors So if M is large, W_final must have dropped by a lot.

Online pricing Say you are selling lemonade (or a cool new software tool, or bottles of water at the world expo). Protocol #1: for t=1,2,…T Seller sets price pt Buyer arrives with valuation vt If vt ¸ pt, buyer purchases and pays pt, else doesn’t. vt revealed to algorithm. repeat $2 $500 a glass $5.00 a glass Protocol #2: same as protocol #1 but without vt revealed. Assume all valuations in [1,h] Goal: do nearly as well as best fixed price in hindsight.

Online pricing Say you are selling lemonade (or a cool new software tool, or bottles of water at the world expo). Protocol #1: for t=1,2,…T Seller sets price pt Buyer arrives with valuation vt If vt ¸ pt, buyer purchases and pays pt, else doesn’t. vt revealed to algorithm. Bad algorithm: “best price in past” What if sequence of buyers = 1, h, 1, …, 1, h, 1, …, 1, h, … Alg makes T/h, OPT makes T. Factor of h worse!

[extra factor of h coming from range of gains] Online pricing Say you are selling lemonade (or a cool new software tool, or bottles of water at the world expo). Protocol #1: for t=1,2,…T Seller sets price pt Buyer arrives with valuation vt If vt ¸ pt, buyer purchases and pays pt, else doesn’t. vt revealed to algorithm. Good algorithm: Randomized Weighted Majority! Define one expert for each price p 2 [1,h]. Best price of this form gives profit OPT. Run RWM algorithm. Get expected gain at least: OPT/(1+²) - O(²-1 h log h) #experts = h [extra factor of h coming from range of gains]

Online pricing Say you are selling lemonade (or a cool new software tool, or bottles of water at the world expo). What about Protocol #2? [just see accept/reject decision] Now we can’t run RWM directly since we don’t know how to penalize the experts! Called the “adversarial multiarmed bandit problem” How can we solve that? $2 $5.00 a glass

Multi-armed bandit problem Exponential Weights for Exploration and Exploitation (exp3) [Auer,Cesa-Bianchi,Freund,Schapire] OPT RWM n = #experts Exp3 Distrib pt OPT Expert i ~ qt qt $1.25 Gain vector ĝt Gain git · nh/° qt = (1-°)pt + ° unif ĝt = (0,…,0, git/qit,0,…,0) 1. RWM believes gain is: pt ¢ ĝt = pit(git/qit) ´ gtRWM 2. t gtRWM ¸ /(1+²) - O(²-1 nh/° log n) OPT 3. Actual gain is: git = gtRWM (qit/pit) ¸ gtRWM(1-°) 4. E[ ] ¸ OPT. Because E[ĝjt] = (1- qjt)0 + qjt(gjt/qjt) = gjt , OPT so E[maxj[t ĝjt]] ¸ maxj [ E[t ĝjt] ] = OPT.

Multi-armed bandit problem Exponential Weights for Exploration and Exploitation (exp3) [Auer,Cesa-Bianchi,Freund,Schapire] OPT RWM n = #experts Exp3 Distrib pt OPT Expert i ~ qt qt $1.25 Gain vector ĝt Gain git · nh/° qt = (1-°)pt + ° unif ĝt = (0,…,0, git/qit,0,…,0) Conclusion (° = ²): E[Exp3] ¸ OPT/(1+²)2 - O(²-2 nh log(n)) Quick improvement: choose expert i to be price (1+²)i . Gives n = log1+²(h), & only hurts OPT by at most (1+²) factor.

Multi-armed bandit problem Exponential Weights for Exploration and Exploitation (exp3) [Auer,Cesa-Bianchi,Freund,Schapire] OPT RWM n = #experts Exp3 Distrib pt OPT Expert i ~ qt qt $1.25 Gain vector ĝt Gain git · nh/° qt = (1-°)pt + ° unif ĝt = (0,…,0, git/qit,0,…,0) Can even reduce ²-2 to ²-1 with more care in analysis. Conclusion (° = ² and n = log1+²(h)): E[Exp3] ¸ OPT/(1+²)3 - O(²-2 h log(h) loglog(h)) Almost as good as protocol 1!

Summary Algorithms for online decision-making with strong guarantees on performance compared to best fixed choice. Application: play repeated game against adversary. Perform nearly as well as fixed strategy in hindsight. Can apply even with very limited feedback. E.g., an example might represent different variable-settings in some experiment, with the label representing the outcome. So, what we’ve been describing would correspond to alg that passively observes experiments of others (or in above case, gets to choose among some fixed set). But also might allow alg to experiment --- pick its own settings and try it out. Strategies often very intuitive. E.g., take positive example and negative example and walk one toward the other to see what made the change. A lot like the notion that in a scientific experiment you should only change one variable at a time. Also, often get that you need a combination of active and passive. Only active and might never get a successful outcome. Need observations to get you started. Application: online pricing, even if only have buy/no buy feedback.