Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)

Slides:



Advertisements
Similar presentations
An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
M9302 Mathematical Models in Economics Instructor: Georgi Burlakov 3.1.Dynamic Games of Complete but Imperfect Information Lecture
Continuation Methods for Structured Games Ben Blum Christian Shelton Daphne Koller Stanford University.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Totally Unimodular Matrices
Bilinear Games: Polynomial Time Algorithms for Rank Based Subclasses Ruta Mehta Indian Institute of Technology, Bombay Joint work with Jugal Garg and Albert.
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Nash Equilibria In Graphical Games On Trees Edith Elkind Leslie Ann Goldberg Paul Goldberg.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Outline Dudo Rules Regret and Counterfactual Regret Imperfect Recall Abstraction Counterfactual Regret Minimization (CFR) Difficulties Fixed-Strategy.
Algorithms for solving two- player normal form games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Short introduction to game theory 1. 2  Decision Theory = Probability theory + Utility Theory (deals with chance) (deals with outcomes)  Fundamental.
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
by Vincent Conitzer of Duke
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Complexity Results about Nash Equilibria
Temporal Action-Graph Games: A New Representation for Dynamic Games Albert Xin Jiang University of British Columbia Kevin Leyton-Brown University of British.
This time: Outline Game playing The minimax algorithm
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
Artificial Intelligence for Games and Puzzles1 Games in the real world Many real-world situations and.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
Game theory & Linear Programming Steve Gu Mar 28, 2008.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Algorithms for solving two-player normal form games
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
EC941 - Game Theory Prof. Francesco Squintani Lecture 6 1.
Parameterized Two-Player Nash Equilibrium Danny Hermelin, Chien-Chung Huang, Stefan Kratsch, and Magnus Wahlstrom..
Strategy Grafting in Extensive Games
Extensive-Form Game Abstraction with Bounds
Game Theory Just last week:
Computing equilibria in extensive form games
Communication Complexity as a Lower Bound for Learning in Games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
Structured Models for Multi-Agent Interactions
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
CPS Extensive-form games
Multiagent Systems Repeated Games © Manfred Huber 2018.
Enumerating All Nash Equilibria for Two-person Extensive Games
Lecture 20 Linear Program Duality
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
9.3 Linear programming and 2 x 2 games : A geometric approach
Normal Form (Matrix) Games
Finding equilibria in large sequential games of imperfect information
Presentation transcript:

Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)

2 Motivation: Poker AI Poker is a wildly popular card game –2005 World Series of Poker: Total prize pool > $103M $56M for the Main Event ESPN broadcasting portions of the tournament Poker presents several challenges for AI –Imperfect information –Risk assessment and management –Deception (bluffing, slow-playing) –Counter-deception (calling a bluff, avoiding slow-play trap)

3 Sneak preview of results Rhode Island Hold’em poker invented as a testbed for AI research [Shi & Littman 2001] Game tree has more than 3.1 billion nodes Previously, the best techniques did not scale to games this large Using our algorithm we have computed optimal minimax strategies for this game This is the largest poker game solved to date by over four orders of magnitude

4 Rhode Island Hold’em poker: The Deal

5 Rhode Island Hold’em poker: Round 1

6 Rhode Island Hold’em poker: Round 2

7 Rhode Island Hold’em poker: Round 3

8 Rhode Island Hold’em poker: Showdown

9 Outline of this talk 1.Game theory: Representation, computing equilibrium 2.Model: Ordered games 3.Abstraction mechanism: Information filters 4.Strategic equivalence: Game isomorphisms 5.Algorithm: GameShrink 6.Solving Rhode Island Hold’em

10 Game Theory In multi-agent systems, an agent’s outcome depends on the actions of the other agents Consequently, an agent’s optimal action depends on the actions of the other agents Game theory provides guidance as to how an agent should act A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate –Such an equilibrium always exists [Nash 1950]

11 Nash equilibrium and a simple example 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 Rock Paper Scissors Paper 1/3 A Nash equilibrium specifies a strategy (possibly randomized) for each agent such that no agent wishes to deviate

12 Complexity of computing equilibria Finding a Nash equilibrium is “…the most important concrete open question on the boundary of P today” [Papadimitriou 2001] –Even for games with only two players There are algorithms (requiring exponential-time in the worst-case) for computing Nash equilibria Good news: Two-person zero-sum matrix games can be solved in poly-time using linear programming

13 Computing equilibria in zero-sum games Among all best responses, there is always at least one pure strategy Thus, player 1’s optimization problem is: This is equivalent to: By LP duality, player 2’s optimal strategy is given by the dual vars

14 What about sequential games? Sequential games involve turn-taking, moves of chance, and imperfect information Convert sequential game into a matrix game: This approach leads to exponential number of strategies

15 Sequence form Instead of a move for every information set, consider choices necessary for each leaf These choices are sequences and constitute the pure strategies in the sequence form S 1 = {{}, l, r, L, R} S 2 = {{}, c, d}

16 Realization plans Players strategies are specified as realization plans over sequences: Prop. Realization plans are equivalent to behavior strategies.

17 Computing equilibria via sequence form Players 1 and 2 have realization plans x and y Realization constraint matrices E and F specify constraints on realizations {} l r L R {} c d {} v v’ {} u

18 Computing equilibria via sequence form Payoffs for player 1 and 2 are: x T Ay and x T (-A)y Creating payoff matrix: –Initialize each entry to 0 –For each leaf, there is a (unique) pair of sequences corresponding to an entry in the payoff matrix –Weight the entry by the product of chance probabilities along the path from the root to the leaf {} c d {} l r L R

19 Computing equilibria via sequence form PrimalDual Holding x fixed, compute best response Holding y fixed, Compute best response Primal Dual

20 Computing equilibria via sequence form: An example min p1 subject to x1: p1 - p2 - p3 >= 0 x2: 0y1 + p2 >= 0 x3: -y2 + y3 + p2 >= 0 x4: 2y2 - 4y3 + p3 >= 0 x5: -y1 + p3 >= 0 q1: -y1 = -1 q2: y1 - y2 - y3 = 0 bounds y1 >= 0 y2 >= 0 y3 >= 0 p1 Free p2 Free p3 Free end

21 Sequence form summary The sequence form is an alternative, more compact representation [Romanovskii 1962], [Koller, Megiddo, von Stengel 1994] Two-player zero-sum games with perfect recall can be solved in time polynomial in the size of the game tree –Not enough to solve RI Hold’em (~10 9 nodes) or Texas Hold’em (~10 18 nodes)

22 Our approach: Automated abstraction Instead of developing an equilibrium-finding algorithm per se, we introduce an automated abstraction technique that results in a smaller, equivalent game We prove that a Nash equilibrium in the smaller game corresponds to a Nash equilibrium in the original game Our technique applies to n-player sequential games with observed actions and ordered signals

23 Illustration of our approach Nash equilibrium Original game Abstracted game Abstraction Compute Nash

24 Game with ordered signals (a.k.a. ordered game) 1.Players I = {1,…,n} 2.Stage games G = G 1,…,G r 3.Player label L 4.Game-ending nodes ω 5.Signal alphabet Θ 6.Signal quantities κ = κ 1,…,κ r and γ = γ 1,…,γ r 7.Signal probability distribution p 8.Partial ordering ≥ of subsets of Θ 9.Utility function u (increasing in private signals) I = {1,2} Θ = {2♠,…,A♦} κ = (0,1,1) γ = (1,0,0) UniformHand rank

25 Information filters Observation: We can make games smaller by filtering the information a player receives Instead of observing a specific signal exactly, a player instead observes a filtered set of signals –E.g. receiving the signal {A♠,A♣,A♥,A♦} instead of A♠ Represented as a partition of signal space Combining an ordered game and an information filter yields a filtered ordered game Prop. A filtered ordered game is a finite sequential game with perfect recall –Corollary If the filtered ordered game is two-person zero- sum, we can solve it in poly-time using LP

26 Filtered signal trees Every filtered ordered game has a corresponding filtered signal tree –Each edge corresponds to the revelation of some signal –Each path corresponds to the revelation of a set of signals Our algorithm operates directly on the filtered signal tree –We never load the full game into memory

27 Ordered game isomorphic relation Captures notion of strategic symmetry between nodes We define the relationship recursively: –Two leaves are ordered game isomorphic if, for each action history, the payoffs are the same at both leaves –Two internal nodes are ordered game isomorphic if they are siblings and there is a bijection between their children such that only ordered game isomorphic nodes are matched We can compute this relationship efficiently using dynamic programming and perfect matching computations in a bipartite graph

28 Ordered game isomorphic abstraction transformation Transforms an existing information filter into a new filter that merges two ordered game isomorphic nodes The new filter yields a smaller, abstracted game Thm If a strategy profile is a Nash equilibrium in the smaller, abstracted game, then it’s interpretation in the original game is a Nash equilibrium in that game

29 Applying the ordered game isomorphic abstraction transformation

30 Applying the ordered game isomorphic abstraction transformation

31 Applying the ordered game isomorphic abstraction transformation

32 GameShrink: Efficiently computing ordered game isomorphic abstraction transformations Recall: We have a dynamic program for determining if two nodes of the filtered signal tree are ordered game isomorphic Algorithm: Starting from the top of the filtered signal tree, perform the transformation where applicable Approximation algorithm: Instead of requiring perfect matching, require a matching with a penalty below some threshold

33 Algorithmic techniques to speed up GameShrink Union-Find data structure provides an efficient representation of the information filter –Linear memory and almost linear time Eliminate some perfect matching computations using easy-to-check necessary conditions –Compact histogram databases for storing win/loss frequencies to speed up the checks

34 Solving Rhode Island Hold’em poker Without abstraction, LP has 91,224,226 rows & columns GameShrink computes all ordered game isomorphic abstraction transformations in under one second Now the LP has only 1,237,238 rows & columns Solving this LP yields optimal minimax strategies –CPLEX barrier method takes 7 days, 17 hours and 25 GB RAM Largest poker game solved to date by over four orders of magnitude

35 Comparison to previous research Rule-based –Limited success in even small poker games Simulation/Learning –Do not take multi-agent aspect into account Game-theoretic approaches –Tiny games –Manual abstraction “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03 –Our approach: Automated abstraction

36 Directions for future work Computing strategies for larger games –Requires approximation of solutions Tournament poker More than two players Other types of abstraction

37 Summary Introduced an automatic method for performing abstractions in a broad class of games Introduced information filters as a technique for working with games with imperfect information Developed an equilibrium-preserving abstraction transformation, along with an efficient algorithm Described a simple extension that yields an approximation algorithm for tackling even larger games Solved the largest poker game to date –Playable on-line at Thank you very much for your interest