Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Slides:

Advertisements

Similar presentations

Online Learning for Online Pricing Problems Maria Florina Balcan.

Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.

Mechanism Design without Money Lecture 1 Avinatan Hassidim.

Totally Unimodular Matrices

Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

C&O 355 Mathematical Programming Fall 2010 Lecture 12 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Two-Player Zero-Sum Games

ECO290E: Game Theory Lecture 5 Mixed Strategy Equilibrium.

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Zero-Sum Games (follow-up)

Online learning, minimizing regret, and combining expert advice

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.

Boosting Approach to ML

Part 3: The Minimax Theorem

Games of pure conflict two person constant sum. Two-person constant sum game Sometimes called zero-sum game. The sum of the players’ payoffs is the same,

Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Artificial Intelligence for Games and Puzzles1 Games in the real world Many real-world situations and problems.

1 Game Theory Here we study a method for thinking about oligopoly situations. As we consider some terminology, we will see the simultaneous move, one shot.

An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.

An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.

Game Theory Here we study a method for thinking about oligopoly situations. As we consider some terminology, we will see the simultaneous move, one shot.

An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Artificial Intelligence for Games and Puzzles1 Games in the real world Many real-world situations and.

Machine Learning Rob Schapire Princeton Avrim Blum Carnegie Mellon Tommi Jaakkola MIT.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.

An Intro to Game Theory Avrim Blum 12/07/04.

Game Theory Statistics 802. Lecture Agenda Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for.

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

1. problem set 6 from Osborne’s Introd. To G.T. p.210 Ex p.234 Ex p.337 Ex. 26,27 from Binmore’s Fun and Games.

CPS Learning in games Vincent Conitzer

CPS 170: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

1 Multiplicative Weights Update Method Boaz Kaminer Andrey Dolgin Based on: Aurora S., Hazan E. and Kale S., “The Multiplicative Weights Update Method:

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 14, October 7 th 2010.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

Price of Anarchy Georgios Piliouras. Games (i.e. Multi-Body Interactions) Interacting entities Pursuing their own goals Lack of centralized control Prediction?

Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.

1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.

Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.

1. 2 You should know by now… u The security level of a strategy for a player is the minimum payoff regardless of what strategy his opponent uses. u A.

Experts and Multiplicative Weights slides from Avrim Blum.

Zero-sum Games The Essentials of a Game Extensive Game Matrix Game Dominant Strategies Prudent Strategies Solving the Zero-sum Game The Minimax Theorem.

Statistics Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for Windows software Modeling.

Vincent Conitzer CPS Learning in games Vincent Conitzer

Games of pure conflict two-person constant sum games.

MiniMax Principle in Game Theory Slides Made by Senjuti Basu Roy.

Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.

Carnegie Mellon University

The Duality Theorem Primal P: Maximize

Game Theory Just last week:

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Games of pure conflict two person constant sum

Algorithmic Applications of Game Theory

Lecture 20 Linear Program Duality

The Weighted Majority Algorithm

Presentation transcript:

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Motivation Many situations involve repeated decision making Deciding how to invest your money (buy or sell stocks) What route to drive to work each day Playing repeatedly a game against an opponent with unknown strategy This course: Learning algos for such settings with connections to game theoretic notions of equilibria

Roadmap Last lecture: Online learning; combining expert advice; the Weighted Majority Algorithm. This lecture: Online learning, game theory, minimax optimality.

Recap: Online learning, minimizing regret, and combining expert advice. “The weighted majority algorithm”N. Littlestone & M. Warmuth “Online Algorithms in Machine Learning” (survey) A. Blum  Algorithmic Game Theory, Nisan, Roughgarden, Tardos, Vazirani (eds) [Chapters 4]

Expert 1 Expert 2 Expert 3 Online learning, minimizing regret, and combining expert advice.

Using “expert” advice We solicit n “experts” for their advice. Assume we want to predict the stock market. Can we do nearly as well as best in hindsight? We then want to use their advice somehow to make our prediction. E.g., Note: “expert” ´ someone with an opinion. Will the market go up or down? [Not necessairly someone who knows anything.]

Formal model There are n experts. Can we do nearly as well as best in hindsight? Each expert makes a prediction in {0,1} For each round t=1,2, …, T The learner (using experts’ predictions) makes a prediction in {0,1} The learner observes the actual outcome. There is a mistake if the predicted outcome is different form the actual outcome.

Weighted Majority Algorithm Instead of crossing off, just lower its weight. – Start with all experts having weight 1. Weighted Majority Algorithm Key Point: A mistake doesn't completely disqualify an expert. –Ifthen predict 1 else predict 0 – Predict based on weighted majority vote.

Weighted Majority Algorithm Instead of crossing off, just lower its weight. – Start with all experts having weight 1. Weighted Majority Algorithm Key Point: A mistake doesn't completely disqualify an expert. – Predict based on weighted majority vote. – Penalize mistakes by cutting weight in half.

Analysis: do nearly as well as best expert in hindsight If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then: Theorem:

Randomized Weighted Majority 2.4(OPT + lg n) 2.4(OPT + lg n) not so good if the best expert makes a mistake 20% of the time. Also, generalize ½ to 1- . Can we do better? Equivalent to select an expert with probability proportional with its weight. Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30% on down, then pick 70:30) Key Point: smooth out the worst case.

Randomized Weighted Majority

Formal Guarantee for Randomized Weighted Majority If M = expected # mistakes we've made so far and OPT = # mistakes best expert has made so far, then: Theorem: M ·  OPT + (1/  log(n)

Randomized Weighted Majority Solves to:

Summarizing E[# mistakes] ·  OPT +  -1 log(n). If set  =(log(n)/OPT) 1/2 to balance the two terms out (or use guess-and-double), get bound of E[mistakes] · OPT+2(OPT ¢ log n) 1/2 Note: Of course we might not know OPT, so if running T time steps, since OPT · T, set ² to get additive loss (2T log n) 1/2 regret E[mistakes] · OPT+2(T ¢ log n) 1/2 So, regret/T ! 0.[no regret algorithm]

What if have n options, not n predictors? We’re not combining n experts, we’re choosing one. Can we still do it? Nice feature of RWM: can be applied when experts are n different options We did not see the predictions in order to select an expert (only needed to see their losses to update our weights) E.g., n different ways to drive to work each day, n different ways to invest our money.

Decision Theoretic Version; Formal model There are n experts. The guarantee also applies to this model!!! For each round t=1,2, …, T No predictions. The learner produces a prob distr. on experts based on their past performance p t. The learner is given a loss vector l t and incurs expected loss l t ¢ p t. The learner updates the weights. [Interesting for connections between GT and Learning.]

Can generalize to losses in [0,1] If expert i has loss l i, do: w i Ã w i (1-l i  ). [before if an expert had a loss of 1, we multiplied by (1-epsilon), if it had loss of 0 we left it alone, now we do linearly in between] Same analysis as before.

“Game Theory, On-line Prediction, and Boosting”, Freund & Schapire, GEB This lecture: Online Learning, Game Theory, and Minimax Optimality

Zero Sum Games Game defined by a matrix M. RockPaper 0 1 1/2 Scissors /2 Rock Paper Scissors Row player (Mindy) chooses row i. Column player (Max) chooses column j (simultaneously). Mindy’s goal: minimize her loss M(i,j). Assume wlog entries are in [0,1]. Max’s goal: maximize this loss (zero sum).

Randomized Play Mindy chooses a distribution P over rows. Mindy’s expected loss: If i,j = pure strategies, and P,Q = mixed strategies Max chooses a distribution Q over columns [simultaneously] M(P,j) - Mindy’s expected loss when she plays P and Max plays j M(i,Q) - Mindy’s expected loss when she plays i and Max plays Q

Sequential Play Say Mindy plays before Max. If Mindy chooses P, then Max will pick Q to maximize M(P,Q), so the loss will be So, Mindy should pick P to minimize L(P). Loss will be: Similarly, if Max plays first, loss will be:

Minimax Theorem Playing second cannot be worse than playing first Mindy plays first Von Neumann’s minimax theorem: Mindy plays second No advantage to playing second! Regardless of who goes first the outcome is always the same!

Optimal Play Von Neumann’s minimax theorem: 1. Even if Max knows Mindy’s strategy, Max cannot get better outcome than v. v is the best possible value. Optimal strategies: Value of the game Min-max strategy Max-min strategy 9 min-max strategy P * s.t. for any Q, M(P *,Q) · v. 2. No matter what strategy Mindy uses, the outcome is at worst v. 9 max-min strategy Q * s.t. for any P, M(P, Q * ) ¸ v.

Optimal Play Von Neumann’s minimax theorem: Optimal strategies: Value of the game Min-max strategy Max-min strategy P * and Q * optimal strategies if the opponent is also optimal! For a two person zero-sum game against a good opponent, your best bet is to find your min-max optimal strategy and always play it.

Optimal Play Von Neumann’s minimax theorem: Note: (P *, Q * ) is a Nash equilibrium. Optimal strategies: Value of the game Min-max strategy Max-min strategy All the NE have the same value in zero-sum games. Not true in general, very specific to zero-sum games!!! P * is a best response to Q * ; Q * is a best response to P *

Optimal Play Von Neumann’s minimax theorem: Optimal strategies: Value of the game Min-max strategy Max-min strategy P * and Q * optimal strategies if the opponent is also optimal! For a two person zero-sum game against a good opponent, your best bet is to find your min-max optimal strategy and always play it.

Beyond the Classic Theory Opponent may not be fully adversarial. M maybe unknown or very large. As game is played over and over, opportunity to learn the game and/or the opponent’s strategy. Often limited info about the game or the opponent Bart Simpson always plays Rock instead of choosing the uniform distribution. You can play Paper and always beat Bart.

Repeated Play M unknown. Mindy chooses P t For each round t=1,2, …, T Max chooses Q t (possibly based on P t ) Mindy’s loss is M(P t, Q t ) Mindy observes loss M(i, Q t ) for each pure strategy i. Mindy can run RWM to ensure: where = P t ¢ (MQ t ) l t = MQ t Exactly fits DT experts model! min P M(P,(Q 1 +…+Q T )/T) · v

Prove minimax theorem as corollary Imaging game is played repeatedly. In each round t [ ¸ part is trivial ] Define: Need to prove: Mindy plays using RWM Max chooses best response

One slide proof of minimax theorem is a strategy that you can use if you have to go first.