Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.

Similar presentations


Presentation on theme: "INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta."— Presentation transcript:

1 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta November 8 th, 2006

2 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 2 Question Does pop culture have anything to offer advanced research projects?

3 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 3 Fun and Games for Scientists Fun problem (in scientist-ese) (1) A problem which has a wide base of players at a variety of levels (2) A problem which has aspects which provide interesting challenges for the human mind

4 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 4 Fun and Games for Scientists Game problem (in scientist-ese) (1) A problem which has a formal structure (rules) with a variety of parameter settings (opponents). (2) A problem where the world IS out to get you.

5 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 5 Fun and Games “Fun” can capture aspects of difficulty that are orthogonal to the size of the state space or the algorithmic complexity of the problems involved. “Games” are environments where issues such as: learning-to-learn can be studied amongst a variety of opponents, and non-stationarity can be studied in the presence of other learning agents.

6 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 6 Two Objectives of This Talk Finding Nash equilibria Developing “experts” a priori in games

7 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 7 Main Point Algorithms that learn in self-play can be utilized to generate both an equilibrium as well as experts. Constraint/column generation is among these

8 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 8 Question in This Talk What are interesting unbalanced strategies to consider?

9 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 9 Outline Introduction Iterated Best Response Iterated Generalized Best Response Other Applications Conclusion

10 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 10 Iterated Best Response (Broken Version) One broken idea 1. INIT: start with an arbitrary strategy 2. RESPONSE: Compute the best response 3. REPEAT: step 2 until satisfied

11 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 11 Hide and Seek HIDE ACTIONS:BLUESEEK ACTIONS: RED

12 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 12 Hide and Seek HIDE ACTIONS:BLUE SEEK ACTIONS: RED

13 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 13 Problem: No Balance There is no one killer strategy in some games. Without adding some balance, there is no way to fully explore the space.

14 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 14 What Games Require Balance? Simultaneous move games Imperfect Information Games (games with private information).

15 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 15 SEEK ACTIONS: RED +1 +1 +1 RESTRICTED NASH Balancing Existing Strategies HIDE ACTIONS:BLUE +1 +1 +1 50/50

16 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 16 Iterated Balanced Best Response 1. INIT: Start with strategies S for player 1 and T for player 2. 2. BALANCE: Make a bimatrix game and solve for equilibrium. 3. RESPONSE: Add the best responses to the equilibrium of the game to S and T. 4. REPEAT 2 and 3 until satisfied

17 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 17 What’s The Point? In general, equilibrium computations are significantly harder than best responses. In practice, it is easier to compute an approximate best response than an approximate Nash equilibrium.

18 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 18 Pure Poker Player 1, Player 2 each receive a “card” in [0,1] (a real number) Then, player 1 bets or checks. If player 1 bets, player 2 calls or folds.

19 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 19 Strategies BetCheck CallFold CallFold Player 2 Card Probability Mass 01 1 Card 01 1 Probability Mass Player 1

20 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 20 Pure Poker Continuous state space Given a strategy that splits [0,1] into a finite number of intervals and plays a fixed distribution in each interval, the best response is also of this form. CheckCallFold

21 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 21 Pure Poker 0-0.1250.1560.037 0.250-0.031-0.094 0.2340.047-0.2340.003 -0.165-0.178-0.0840.089 0-0.1250.156 0.250-0.031 0.2340.047-0.234 0-0.125 0.250 Player 1 Player 2 Bet Call BetCheckCallFBetCheckCallFold BetCall FBetCheck BetCheck Call F Fold BCBCallFold 0

22 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 22 Real Poker In one abstraction we are currently working with, each player has 625 private states, and there are about 16,000 betting sequences, for over several BILLION states. While it is possible to iterate over all possible states in a short period of time, you can’t really perform complex operations on this size of problem.

23 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 23 Positive Results In under a hundred iterations, this technique can approximately solve simple variants of poker, such as Kuhn and Leduc Poker.

24 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 24 Outline Introduction Iterated Best Response Iterated Generalized Best Response Other Applications Conclusion

25 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 25 Practical Problem Although balance-response technique above works, it can generate lots of strategies before equilibrium is achieved. Is there a way to cut down on this?

26 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 26 Robustness How do you develop a strategy that is robust assuming that your opponent will play a strategy you have already seen?

27 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 27 1213A 21029B 1137Z 4445Y 3573X Min cba Strat Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents

28 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 28 1213A 21029B 1137Z 4445Y 3573X Min cba Strat Robustness: Generalized Best Response Maximize the MINIMUM against a set of opponents The set of possible actions could be INFINITE

29 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 29 Iterated Generalized Best Response Start with strategies S and T. Add to T a generalized best response to S. Add to S a generalized best response to T. Repeat until satisfied.

30 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 30 Hide and Seek HIDE ACTIONS:BLUESEEK ACTIONS: RED

31 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 31 How to Compute a Generalized Best Response? 1. Use a linear program. Could be slow Could be arbitrarily high precision 2. Use iterated best response 1. Start with sets of strategies S and possibly empty T. 2. Compute a Nash equilibrium between S and T. 3. Find a best response to the mixture over S. 4. Add it to T.

32 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 32 Results in Poker Using this technique (iterated GBR), we solved a four-round game of Texas Hold’Em We beat Opti4 (Sparbot)! By 0.01 small bets/hand 

33 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 33 Other Applications Economics (non-zero sum) Counterstrike/RTS Games (best response not easy)

34 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 34 Extensions Non-zero sum games Approximate best response operation (through reinforcement learning) Learning the abstraction while learning the strategy

35 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 35 Conclusions Algorithms that learn in self-play (such as iterated generalized best response) yield a wealth of useful strategies including approximate Nash equilibrium.

36 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 36 How Hard is a Game? For a game to be hard, it has to be at least POSSIBLE to play it badly: otherwise, regardless of how complex it is, it is still easy. The depth of human skill in a particular game indicates its complexity.

37 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 37 How Hard is a Game?

38 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 38 Formalism If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

39 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 39 How Hard is a Game?

40 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 40 Important Property: Transitivity in The List

41 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 41 Formalism If the complexity of a game is at least k, then there exists people 1 to k, such that for any two people in the list i>j, player i can beat player j with at least 2/3 probability.

42 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 42 Why People?

43 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 43 Why People? Choose a number between 1 and 100. Highest number wins a dollar, no money is exchanged on a tie.

44 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 44 Formalism If the complexity of a game is at least k, then there exists strategies 1 to k, such that for any two strategies in the list i>j, strategy i can beat strategy j with at least 2/3 probability.

45 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 45 Formalism The epsilon-complexity of a game is at least k if there exists strategies 1 to k, and for any two strategies i>j, EV[i playing against j]>epsilon

46 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 46 Make it a Linear Program? The linear program (sequence form) has a number of constraints x variables roughly proportional to the size of the game tree. The coefficient matrix is big: this makes inversion difficult. Also: numerical instabilities

47 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 47 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

48 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 48 The Theoretical Problem Each new bot is a best response to a particular mixture of the previous bots. There could be a different mixture over those bots which would do BETTER against that new bot: in fact, it could even beat the new bot!

49 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 49 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

50 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 50 A Theoretical Guarantee? (No!) HIDE ACTIONS:BLUESEEK ACTIONS: RED

51 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 51 A Theoretical Guarantee? (Yes!)

52 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 52 Relation to Complexity Theorem: If the epsilon-complexity is k, then if you use iterative generalized best response for k steps, there is no strategy that can beat every strategy of yours by epsilon.

53 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 53 An Epsilon Equilibrium At the end of iterated generalized best response, you have a guarantee of how hard it is to beat the sequence, not necessarily an individual from the sequence. However, using a similar trick, you can easily compute a mixture over these strategies against which no strategy can gain more than epsilon.

54 INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 54 Two Techniques for Finding Nash of Extensive-Form Games Take ALL strategies of the full game, and make a bimatrix game where each is an action (Thomson; Kuhn). Bimatrix game exponential in the size of the tree. Form a linear program using the sequence form. LP is linear in the size of the tree (Koller et al).


Download ppt "INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta."

Similar presentations


Ads by Google