A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Advertisements

This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Tuomas Sandholm, Andrew Gilpin Lossless Abstraction of Imperfect Information Games Presentation : B 趙峻甫 B 蔡旻光 B 駱家淮 B 李政緯.
Randomized Strategies and Temporal Difference Learning in Poker Michael Oder April 4, 2002 Advisor: Dr. David Mutchler.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.
Automatically Generating Game-Theoretic Strategies for Huge Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 8 Tuesday, 11/19/02 Linear Programming.
Algorithms for solving two- player normal form games Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Adversarial Search Chapter 5.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
INFORMS 2006, Pittsburgh, November 8, 2006 © 2006 M. A. Zinkevich, AICML 1 Games, Optimization, and Online Algorithms Martin Zinkevich University of Alberta.
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Poker for Fun and Profit (and intellectual challenge) Robert Holte Computing Science Dept. University of Alberta.
Intelligence for Games and Puzzles1 Poker: Opponent Modelling Early AI work on poker used simplified.
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker Andrew Gilpin and Tuomas Sandholm Carnegie.
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Using Probabilistic Knowledge And Simulation To Play Poker (Darse Billings …) Presented by Brett Borghetti 7 Jan 2007.
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
Complexity of Mechanism Design Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Better automated abstraction techniques for imperfect information games, with application to Texas Hold’em poker * Andrew Gilpin and Tuomas Sandholm, CMU,
Nesterov’s excessive gap technique and poker Andrew Gilpin CMU Theory Lunch Feb 28, 2007 Joint work with: Samid Hoda, Javier Peña, Troels Sørensen, Tuomas.
Game Theoretic Analysis of Oligopoly lr L R 0000 L R 1 22 The Lane Selection Game Rational Play is indicated by the black arrows.
Computing equilibria in extensive form games Andrew Gilpin Advanced AI – April 7, 2005.
Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Models of Strategic Deficiency and Poker Workflow Inference: What to do with One Example and no Semantics.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Algorithms for Large Sequential Incomplete-Information Games Tuomas Sandholm Professor Carnegie Mellon University Computer Science Department.
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Chapter 9 Games with Imperfect Information Bayesian Games.
Advanced Artificial Intelligence Lecture 3B: Game theory.
Mechanisms for Making Crowds Truthful Andrew Mao, Sergiy Nesterko.
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Games with Imperfect Information Bayesian Games. Complete versus Incomplete Information So far we have assumed that players hold the correct belief about.
Game-theoretic analysis tools Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
The challenge of poker NDHU CSIE AI Lab 羅仲耘. 2004/11/04the challenge of poker2 Outline Introduction Texas Hold’em rules Poki’s architecture Betting Strategy.
Poker as a Testbed for Machine Intelligence Research By Darse Billings, Dennis Papp, Jonathan Schaeffer, Duane Szafron Presented By:- Debraj Manna Gada.
Neural Network Implementation of Poker AI
Sequential imperfect-information games Case study: Poker Tuomas Sandholm Carnegie Mellon University Computer Science Department.
Steering Evolution and Biological Adaptation Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and.
Algorithms for solving two-player normal form games
Better automated abstraction techniques for imperfect information games Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Extensive Form (Dynamic) Games With Perfect Information (Theory)
Finding equilibria in large sequential games of imperfect information CMU Theory Lunch – November 9, 2005 (joint work with Tuomas Sandholm)
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center with a lot of slides from Tuomas Sandholm Copyright 2015 Poker.
Data Transformation: Normalization
Extensive-Form Game Abstraction with Bounds
Stochastic tree search and stochastic games
Game Theory Just last week:
Computing equilibria in extensive form games
Extensive-form games and how to solve them
Noam Brown and Tuomas Sandholm Computer Science Department
Multiagent Systems Extensive Form Games © Manfred Huber 2018.
Alpha-Beta Search.
Alpha-Beta Search.
Alpha-Beta Search.
Games with Imperfect Information Bayesian Games
CPS Extensive-form games
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
Alpha-Beta Search.
Alpha-Beta Search.
Finding equilibria in large sequential games of imperfect information
Presentation transcript:

A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Motivation: Poker Poker games are wildly popular card games –2006 World Series of Poker $82M at World Championship event Portions broadcast on ESPN Presents several challenges for AI –Imperfect information –Risk assessment and management –Deception (bluffing, slow-playing) –Counter-deception (calling a bluff, addressing slow play)

Prior poker research Simulation/Learning [e.g. Findler 77, Billings et al 99, 02] –Do not take multi-agent aspect directly into account Game-theoretic –Small games [e.g. vN-M 44, Nash & Shapley 50, Kuhn 50] –Tournament games [Miltersen & Sørensen 06] –Manual abstraction for large games [Billings et al 03] –Ours: Automated abstraction for large games As computing speed increases, we can automatically take advantage of it by simply rerunning the abstraction algorithm with a different parameter to produce a finer-grained abstraction We apply our techniques to Texas Hold’em poker, the most popular poker variant

Computing equilibrium In two-person zero-sum games, –Nash equilibria are minimax equilibria, so there is no equilibrium selection problem –Equilibrium can be found using LP Any extensive form game (satisfying perfect recall) can be converted into a matrix game –Create one pure strategy in the matrix game for every possible pure contingency plan in the sequential game (set product of actions at information sets) –Leads to exponential blowup in number of strategies, even in the reduced normal form Sequence form: More compact representation based on sequences of moves rather than pure strategies [von Stengel 96, Koller & Megiddo 92, Romanovskii 62] –Two-person zero-sum games with perfect recall can be solved in time polynomial in size of game tree –Not enough to solve Rhode Island Hold’em (3.1 billion nodes) or Texas Hold’em (10 18 nodes)

Our prior work on automated abstraction [EC-06] Automatic method for performing abstractions in a broad class of sequential games of imperfect information Equilibrium-preserving game transformation, where certain information sets are merged and certain nodes within an information set are collapsed GameShrink, algorithm for identifying and applying all the game transformations –Õ(n 2 ) time n = #nodes in the signal tree. In poker, these are possible card deals in the game Run-time tends to be highly sublinear in the size of the game tree Used these techniques to solve Rhode Island Hold’em –Largest poker game solved to date by over four orders of magnitude Also developed approximate (lossy) version of GameShrink –Uses a similarity metric on nodes in the signal tree (e.g., |#wins 1 - #wins 2 | + |#losses 1 - #losses 2 |) and a similarity threshold

Example: Applying the ordered game isomorphic abstraction transformation

Optimized approximate abstractions Original version of GameShrink yielded lopsided abstractions when used as an approximation algorithm Now we instead find an abstraction via clustering: For each level of the tree (starting from root): –For each group of hands: use k-means clustering to split group i into k i abstract “states” –win probability as the similarity metric (ties count as half a win) for each value of k i, compute expected error (considering hand probs) –We find, using integer programming, an abstraction (split of K into k i ’s) that minimizes this expected error, subject to a constraint on the total number of states, K, at that level (=size of the resulting LP in the zero-sum case) Solving this class of integer programs is quite easy in practice

Application to Texas Hold’em Two-person game tree has ~10 18 leaves –Too large to run lossless GameShrink –Even after that, LP would be too large Already too large when we applied this to first two rounds We split the 4 betting rounds into two phases –Phase I (first 3 rounds) solved offline using new approximate version of GameShrink followed by LP –Phase II (last 2 rounds): abstractions computed offline real-time equilibrium computation using updated hand probabilities and anytime LP

Phase I (first three rounds) Automated abstraction using approximate version of GameShrink –Round 1 There are 1,326 hands, of which 169 are strategically different We consider 15 strategically different hands –Round 2 There are 25,989,600 distinct possible hands GameShrink (in lossless mode for Phase I) determines that there are about a million strategically different hands This is still too large to solve We used GameShrink to compute an abstraction that considers 225 strategically different hands –Round 3 There are 1,221,511,200 distinct possible hands We consider 900 strategically different hands –This process took about 3 days running on 4 CPUs LP solve took 7 days and 80 gigabytes using CPLEX’s barrier method (interior-point method for linear programming)

Mitigating effect of round-based abstraction (i.e., having 2 phases) For leaves in the first phase, we could assume no betting in the later rounds Ignores implied odds Can do better by estimating the amount of betting that occurs in later rounds –Incorporate this information into the LP for the first phase For each possible hand strength and in each possible betting situation, we store the probability of each possible action –Mine the betting history in the later rounds from hundreds of thousands of played hands

Example of betting in fourth round Player 1 has bet. Player 2 to fold, call, or raise

Phase II (last two rounds) Abstractions computed offline –Betting history doesn’t matter => ( ) situations –Simple suit isomorphisms at the root of Phase II halves this –For each such setting, we use GameShrink to generate an abstraction with 10 and 100 strategically different hands in the last two rounds, respectively Real-time equilibrium computation (using LP) –So that our strategies are specific to particular hand (too many to precompute) –Updated hand probabilities from Phase I equilibrium using betting histories and community card history: s i is player i’s strategy, h is an information set –Conditional choice of primal vs. dual simplex Achieve anytime capability for the player that is us –Dealing with running off the equilibrium path 52 4

Precompute several databases db5 : possible wins and losses (for a single player) for every combination of two hole cards and three community cards (25,989,600 entries) –Used by GameShrink for quickly comparing the similarity of two hands db223 : possible wins and losses (for both players) for every combination of pairs of two hole cards and three community cards based on a roll-out of the remaining cards (14,047,378,800 entries) –Used for computing payoffs of the Phase I game to speed up the LP creation handval : concise encoding of a 7-card hand rank used for fast comparisons of hands (133,784,560 entries) –Used in several places, including in the construction of db5 and db223 Colexicographical ordering used to compute indices into the databases allowing for very fast lookups

Experimental results GS1: Game theory-based player, old version of manual abstraction, no strategy simulation in later rounds [GS 2006] Sparbot: Game theory-based player, manual abstraction [Billings et al 2003] Vexbot: Opponent modeling, miximax search with statistical sampling [Billings et al 2004] OpponentSeries wonWin rate (small bets per 100) GS138 of Sparbot28 of Vexbot32 of

Summary Competitive Texas Hold’em player automatically generated –First phase (rounds 1, 2 & 3): automated abstraction & LP solved offline, using statistical data to compute payoffs at end of round 3 –Second phase (rounds 3 & 4): abstraction precomputed automatically; LP solved in real-time using updated hand probabilities and anytime Techniques are applicable to many sequential games of imperfect information

Where to from here? The top poker-playing programs are fairly equal Recent experimental results show our player is competitive with (but not better than) expert human players Provable approximation, e.g., ex post Other types of abstraction More scalable equilibrium-finding algorithms Tournament poker [e.g. Miltersen & Sørensen 06] More than two players [e.g. Nash & Shapley 50] Thank you