Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Slides:

Advertisements

Similar presentations

9.1 Strictly Determined Games Game theory is a relatively new branch of mathematics designed to help people who are in conflict situations determine the.

Advertisements

Online Learning for Online Pricing Problems Maria Florina Balcan.

Totally Unimodular Matrices

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

C&O 355 Mathematical Programming Fall 2010 Lecture 12 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.

Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.

Two-Player Zero-Sum Games

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.

Shengyu Zhang The Chinese University of Hong Kong.

Online learning, minimizing regret, and combining expert advice

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.

Part 3: The Minimax Theorem

Lecture 12: Lower bounds By "lower bounds" here we mean a lower bound on the complexity of a problem, not an algorithm. Basically we need to prove that.

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Analysis of Algorithms CS 477/677

Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.

Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.

Games & Adversarial Search Chapter 6 Section 1 – 4.

Machine Learning Rob Schapire Princeton Avrim Blum Carnegie Mellon Tommi Jaakkola MIT.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

An Intro to Game Theory Avrim Blum 12/07/04.

Game Theory Statistics 802. Lecture Agenda Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Online Oblivious Routing Nikhil Bansal, Avrim Blum, Shuchi Chawla & Adam Meyerson Carnegie Mellon University 6/7/2003.

online convex optimization (with partial information)

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.

1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:

The Design & Analysis of the Algorithms Lecture by me M. Sakalli Download two pdf files..

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 14, October 7 th 2010.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

1 Intrinsic Robustness of the Price of Anarchy Tim Roughgarden Stanford University.

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.

Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.

Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

Improved Equilibria via Public Service Advertising Maria-Florina Balcan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

1. 2 You should know by now… u The security level of a strategy for a player is the minimum payoff regardless of what strategy his opponent uses. u A.

Experts and Multiplicative Weights slides from Avrim Blum.

Zero-sum Games The Essentials of a Game Extensive Game Matrix Game Dominant Strategies Prudent Strategies Solving the Zero-sum Game The Minimax Theorem.

ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.

Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.

Carnegie Mellon University

The Duality Theorem Primal P: Maximize

Introduction to Randomized Algorithms and the Probabilistic Method

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Lecture 20 Linear Program Duality

Games & Adversarial Search

The Weighted Majority Algorithm

Presentation transcript:

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Itinerary Stop 1: Minimizing regret and combining advice. –Randomized Wtd Majority / Multiplicative Weights alg –Connections to game theory Stop 2: Extensions –Online learning from limited feedback (bandit algs) –Algorithms for large action spaces, sleeping experts Stop 3: Powerful online LTF algorithms –Winnow, Perceptron Stop 4: Powerful tools for using these algorithms –Kernels and Similarity functions Stop 5: Something completely different –Distributed machine learning

Stop 1: Minimizing regret and combining expert advice

Consider the following setting…  Each morning, you need to pick one of N possible routes to drive to work.  But traffic is different each day. Not clear a priori which will be best. Not clear a priori which will be best. When you get there you find out how long your route took. (And maybe others too or maybe not.) When you get there you find out how long your route took. (And maybe others too or maybe not.) Robots R Us 32 min  Is there a strategy for picking routes so that in the long run, whatever the sequence of traffic patterns has been, you’ve done nearly as well as the best fixed route in hindsight? (In expectation, over internal randomness in the algorithm)  Yes.

“No-regret” algorithms for repeated decisions A bit more generally:  Algorithm has N options. World chooses cost vector. Can view as matrix like this (maybe infinite # cols)  At each time step, algorithm picks row, life picks column. Alg pays cost for action chosen. Alg gets column as feedback (or just its own cost in the “bandit” model). Need to assume some bound on max cost. Let’s say all costs between 0 and 1. Algorithm World – life - fate

“No-regret” algorithms for repeated decisions  At each time step, algorithm picks row, life picks column. Alg pays cost for action chosen. Alg gets column as feedback (or just its own cost in the “bandit” model). Need to assume some bound on max cost. Let’s say all costs between 0 and 1. average regret Define average regret in T time steps as: (avg per-day cost of alg) – (avg per-day cost of best fixed row in hindsight). We want this to go to 0 or better as T gets large. [called a “no-regret” algorithm]

Some intuition & properties of no-regret algs.  Let’s look at a small example:  Note: Not trying to compete with best adaptive strategy – just best fixed path in hindsight.  No-regret algorithms can do much better than playing minimax optimal, and never much worse.  Existence of no-regret algs yields immediate proof of minimax thm! Algorithm World – life - fate dest Will define this later This too

Some intuition & properties of no-regret algs.  Let’s look at a small example:  View of world/life/fate: unknown sequence LRLLRLRR...  Goal: do well (in expectation) no matter what the sequence is.  Algorithms must be randomized or else it’s hopeless.  Viewing as game: algorithm against the world. (World as adversary) Algorithm World – life - fate dest

History and development (abridged)  [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). Re-phrasing, need only T = O(N/  2 ) steps to get time- average regret down to . (will call this quantity T  ) Re-phrasing, need only T = O(N/  2 ) steps to get time- average regret down to . (will call this quantity T  ) Optimal dependence on T (or  ). Game-theorists viewed #rows N as constant, not so important as T, so pretty much done. Optimal dependence on T (or  ). Game-theorists viewed #rows N as constant, not so important as T, so pretty much done. Why optimal in T? Say world flips fair coin each day. Any alg, in T days, has expected cost T/2. But E[min(# heads,#tails)] = T/2 – O(T 1/2 ). So, per-day gap is O(1/T 1/2 ). Algorithm World – life - fate dest

 [Hannan’57, Blackwell’56]: Alg. with regret O((N/T) 1/2 ). Re-phrasing, need only T = O(N/  2 ) steps to get time- average regret down to . (will call this quantity T  ) Re-phrasing, need only T = O(N/  2 ) steps to get time- average regret down to . (will call this quantity T  ) Optimal dependence on T (or  ). Game-theorists viewed #rows N as constant, not so important as T, so pretty much done. Optimal dependence on T (or  ). Game-theorists viewed #rows N as constant, not so important as T, so pretty much done.  Learning-theory 80s-90s: “combining expert advice”. Imagine large class C of N prediction rules. Perform (nearly) as well as best f 2 C. Perform (nearly) as well as best f 2 C. [L ittlestone W armuth ’89]: Weighted-majority algorithm [L ittlestone W armuth ’89]: Weighted-majority algorithm E[cost] · OPT(1+  ) + (log N)/ . E[cost] · OPT(1+  ) + (log N)/ . Regret O((log N)/T) 1/2. T  = O((log N)/  2 ). Regret O((log N)/T) 1/2. T  = O((log N)/  2 ). Optimal as fn of N too, plus lots of work on exact constants, 2 nd order terms, etc. [CFHHSW93]… Optimal as fn of N too, plus lots of work on exact constants, 2 nd order terms, etc. [CFHHSW93]…  Extensions to bandit model (adds extra factor of N). History and development (abridged)

To think about this, let’s look at the problem of “combining expert advice”.

Using “expert” advice We solicit n “experts” for their advice. (Will the market go up or down?) We then want to use their advice somehow to make our prediction. E.g., Say we want to predict the stock market. Basic question: Is there a strategy that allows us to do nearly as well as best of these in hindsight? [“expert” = someone with an opinion. Not necessarily someone who knows anything.]

Simpler question We have n “experts”. One of these is perfect (never makes a mistake). We just don’t know which one. Can we find a strategy that makes no more than lg(n) mistakes? Answer: sure. Just take majority vote over all experts that have been correct so far.  Each mistake cuts # available by factor of 2.  Note: this means ok for n to be very large. “halving algorithm”

What if no expert is perfect? One idea: just run above protocol until all experts are crossed off, then repeat. Makes at most log(n) mistakes per mistake of the best expert (plus initial log(n)). Seems wasteful. Constantly forgetting what we've “learned”. Can we do better?

Weighted Majority Algorithm Intuition: Making a mistake doesn't completely disqualify an expert. So, instead of crossing off, just lower its weight. Weighted Majority Alg: – Start with all experts having weight 1. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half.

Analysis: do nearly as well as best expert in hindsight M = # mistakes we've made so far. m = # mistakes best expert has made so far. W = total weight (starts at n). After each mistake, W drops by at least 25%. So, after M mistakes, W is at most n(3/4) M. Weight of best expert is (1/2) m. So, So, if m is small, then M is pretty small too. constant ratio

Randomized Weighted Majority ) 2.4(m + lg n) not so good if the best expert makes a mistake 20% of the time. Can we do better? Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30% on down, then pick 70:30) Idea: smooth out the worst case. Also, generalize ½ to 1- . unlike most worst-case bounds, numbers are pretty good. M = expected #mistakes

Analysis Say at time t we have fraction F t of weight on experts that made mistake. So, we have probability F t of making a mistake, and we remove an  F t fraction of the total weight. –W final = n(1-  F 1 )(1 -  F 2 )... –ln(W final ) = ln(n) +  t [ln(1 -  F t )] · ln(n) -   t F t (using ln(1-x) < -x) = ln(n) -  M. (  F t = E[# mistakes]) If best expert makes m mistakes, then ln(W final ) > ln((1-  ) m ). Now solve: ln(n) -  M > m ln(1-  ).

Summarizing E[# mistakes] ·  m +  -1 log(n). If set  =(log(n)/m) 1/2 to balance the two terms out (or use guess-and-double), get bound of E[mistakes] · m+2(m ¢ log n) 1/2 Since m · T, this is at most m + 2(Tlog n) 1/2. So, regret ! 0.

What can we use this for? Can use to combine multiple algorithms to do nearly as well as best in hindsight. But what about cases like choosing paths to work, where “experts” are different actions, not different predictions?

Extensions What if experts are actions? (paths in a network, rows in a matrix game,…) At each time t, each has a loss (cost) in {0,1}. Can still run the algorithm –Rather than viewing as “pick a prediction with prob proportional to its weight”, –View as “pick an expert with probability proportional to its weight” –Choose expert i with probability p i = w i /  i w i. Same analysis applies.

Extensions What if experts are actions? (paths in a network, rows in a matrix game,…) What if losses (costs) in [0,1]? If expert i has cost c i, do: w i Ã w i (1-c i  ). Our expected cost =  i c i w i /W. Amount of weight removed =   i w i c i. So, fraction removed =  ¢  (our cost). Rest of proof continues as before… So, now we can drive to work! (assuming full feedback)

Connections to Game Theory

Consider the following scenario… Shooter has a penalty shot. Can choose to shoot left or shoot right. Goalie can choose to dive left or dive right. If goalie guesses correctly, (s)he saves the day. If not, it’s a goooooaaaaall! Vice-versa for shooter.

2-Player Zero-Sum games Two players R and C. Zero-sum means that what’s good for one is bad for the other. Game defined by matrix with a row for each of R’s options and a column for each of C’s options. Matrix tells who wins how much. an entry (x,y) means: x = payoff to row player, y = payoff to column player. “Zero sum” means that y = -x. E.g., penalty shot: (0,0) (1,-1) (1,-1) (0,0) Left Right Left Right shooter goalie No goal GOAALLL!!!

Game Theory terminolgy Rows and columns are called pure strategies. Randomized algs called mixed strategies. “Zero sum” means that game is purely competitive. (x,y) satisfies x+y=0. (Game doesn’t have to be fair). (0,0) (1,-1) (1,-1) (0,0) Left Right Left Right shooter goalie No goal GOAALLL!!!

Minimax-optimal strategies Minimax optimal strategy is a (randomized) strategy that has the best guarantee on its expected gain, over choices of the opponent. [maximizes the minimum] I.e., the thing to play if your opponent knows you well. (0,0) (1,-1) (1,-1) (0,0) Left Right Left Right shooter goalie No goal GOAALLL!!!

Minimax-optimal strategies What are the minimax optimal strategies for this game? (0,0) (1,-1) (1,-1) (0,0) Left Right Left Right shooter goalie No goal GOAALLL!!! Minimax optimal strategy for both players is 50/50. Gives expected gain of ½ for shooter (-½ for goalie). Any other is worse.

(½,-½) (1,-1) (1,-1) (0,0) Left Right Left Right Minimax-optimal strategies How about penalty shot with goalie who’s weaker on the left? shooter goalie 50/50 GOAALLL!!! Minimax optimal for shooter is (2/3,1/3). Guarantees expected gain at least 2/3. Minimax optimal for goalie is also (2/3,1/3). Guarantees expected loss at most 2/3.

(½,-½) (1,-1) (1,-1) (0,0) Left Right Left Right Minimax-optimal strategies Can solve for minimax-optimal strategies using Linear programming No-regret strategies will do nearly as well or better against any sequence of opponent plays! –Do nearly as well as best fixed choice in hindsight. –Implies do nearly as well as best distrib in hindsight –Implies do nearly as well as minimax optimal! shooter goalie 50/50 GOAALLL!!!

Minimax Theorem (von Neumann 1928) Every 2-player zero-sum game has a unique value V. Minimax optimal strategy for R guarantees R’s expected gain at least V. Minimax optimal strategy for C guarantees C’s expected loss at most V. Counterintuitive: Means it doesn’t hurt to publish your strategy if both players are optimal. (Borel had proved for symmetric 5x5 but thought was false for larger games)

Proof of minimax thm using RWM Suppose for contradiction it was false. This means some game G has V C > V R : –If Column player commits first, there exists a row that gets the Row player at least V C. –But if Row player has to commit first, the Column player can make him get only V R. Scale matrix so payoffs to row are in [-1,0]. Say V R = V C - . VCVC VRVR

Proof contd Now, consider playing randomized weighted- majority alg as Row, against Col who plays optimally against Row’s distrib. In T steps, –Alg gets ¸ [best row in hindsight] – 2(Tlog n) 1/2 –BRiH ¸ T ¢ V C [Best against opponent’s empirical distribution] –Alg · T ¢ V R [Each time, opponent knows your randomized strategy] –Gap is  T. Contradicts assumption once  T > 2(Tlog n) 1/2 , or T > 4log(n)/  .

Proof contd Now, consider playing randomized weighted- majority alg as Row, against Col who plays optimally against Row’s distrib. Note that our procedure gives a fast way to compute apx minimax-optimal strategies, if we can simulate Col (best-response) quickly.

Interesting game “Smuggler vs border guard” Graph G, source s, sink t. Smuggler chooses path. Border guard chooses edge to watch. If edge is in path, guard wins, else smuggler wins. s t What are the minimax optimal strategies?

Interesting game “Smuggler vs border guard” Border guard: find min cut, pick random edge in it. Smuggler: find max flow, scale to unit flow, induces prob dist on paths. s t What are the minimax optimal strategies?

Interesting game Latest fast approximate max-flow algorithms based on applying RWM to variations on this game. –Run RWM for border guard (experts = edges) –Best-response = shortest path or linear system solve. s t What are the minimax optimal strategies?