Yikan Chen Weikeng Qin 1
2 Evolutionary Algorithm Poker!
3
Evolution Process 4 Crossover Mutation Natural Selection Evolutionary Algorithm
Encoding and Crossover
Mutation
Natural Selection 7 Run the roulette-wheel selection based on the fitness value of candidates
Important Parameters Crossover rate Mutation rate Elite rate Fitness function Demo 8
AKQ 2-player game $1 blinds for each player Player1 bet or fold Player2 call or fold 9
Derive the optimal strategy using EA Chromosomal representations Fij: fold threshold when Pi got Cardj Fitness functions 10 Card1Card2Card3 P12/300 P212/30
Fitness functions Fi: fitness function Wij: money won by candidate I against candidate j 11
12
13
14 Decreased fluctuation Further decreased fluctuation generations Var(f11) ; Var(f22) Mean(f11); Mean(f22) Count only wins.065; ;.60 Penalize failure.037; ;.70 Penalize Failure heavier.028; ;.74
Real Texas Hold’em Encoding Strategy (Turn and River) Hand strength (player confidence) Fraction of opponent raise (opponent confidence) Total raise (profit) 15
Fitness Criterion 16
Performance 17
18
19 ∑ ∑ w1w1 w2w2 wnwn b …… a1a1 a2a2 anan 1 f f output
20 Input output Hidden Layer
Simplest Encoding Method 21 a a b b c c d d d d c c b b a a
Neuro Evolution of Augmenting Topologies Encoding Strategy: Node-based Neuron gene table Link gene table Innovation number Global database of innovations Each innovation has unique ID number 22
23
Mutation Perturb weights Add a link gene Add a neuron gene Crossover By innovation number 24
Crossover >4 2 2->4 3 3->4 4 2->5 5 5->4 8 1->5 1 1->4 2 2->4 3 3->4 4 2->5 5 5->4 6 5->6 7 6->4 9 3> >6
Crossover >5 1 1->4 2 2->4 3 3->4 4 2->5 5 5->4 6 5->6 7 6->4 9 3> >6
Simplified Poker Model 1-10 Initial credit: 10 chips One chip ante at the beginning Call, raise (1 chip each time), fold Tournament 27
28 Two player game
29
Four different types of opponents 30 Tight Aggressive (TA)Tight Passive (TP) Loose Aggressive (LP)Loose Passive (LP)
α: min win probability to call β: min win probability to raise 31
32 A: player type B: player action
33
Bluffing…… 34
35 Thanks!