1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010 New yet unpublished results. Includes results:
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Available projects. 2
A Bit About Tree-Based GP A method of solving problems by evolving solver programs. The programs are represented in memory in tree form (i.e. the genomes are trees). Initially promoted mostly through the efforts of John Koza. 3
Tree-Based GP Turning expressions into a tree shaped data structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X) 4 + − SQRT X X 1 IFT ≤ X Y * X Y * X X 3
Generic Genetic Operators: Self-Replication 5 IFT ≤ X Y * X Y * X X 3 ≤ X Y * X Y * X X 3
Generic Genetic Operators: Rebuild Mutation 6 IFT ≤ XY * X Y * X X 3 − Y 4
Generic Genetic Operators: Two-Way Crossover 7 IFT ≤ X Y X 3 − Y 4 + − SQRT X X 1
Synopsis Previous results in games using GP and GAs. Applying tree based GP to Lose Checkers. Applying tree based GP to Lose Checkers. Design. Design. Algorithm and operators. Algorithm and operators. Results. Results. Expanding work to other games. Conclusions and future work. 8
Applying GP to Lose Checkers: From Genotype to Phenotype Used strongly typed tree based GP. Trees are seen as board-state evaluators. The individual players are built around the evaluator, using it (integrated with alpha- beta search) to decide which move to take. 9
Terminal Nodes 10
Terminal Nodes (cont’d) 11
Function Nodes 12
Applying GP to Lose Checkers Algorithm: Generate random population consisting of individuals of tree height 5 for generation 0. Repeat for each generation i Evaluate fitness. Selection(). Procreation(XOprob,mutProb). 13
Fitness Calculations The system supports a sequence of guides. Each guide has a number of rounds assigned to it. Each guide has a number of games per round assigned to it. The system also supports play between individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games. Players get 1 fitness point for winning a game and 0.5 points for a draw. 14
Fitness Calculations (cont’d) for do for each guide i do for do for j ← 1 to guide i‘s Number of rounds do Have every individual in the population deemed fit enough play guide i’s round size games against guide i. Have every individual in the population play coPlayNum games as black against coPlayNum random opponents in the population. 15
Selection Repeat until number of parents selected is equal to original population size Randomly choose two different individuals from population : I1 and I2 if if I1.Fitness > I2.Fitness then Select a copy of I1 for parent population.else Select a copy of I2 for parent population. 16
Genetic Operators: Local Mutation 17 Every tree node N returning a floating point value was assigned a number. This number was initialized to 1.0 and acted as a factor for the return value. Local mutation is a slight change in the node’s factor. + A B Returns f1*(A+B) + A B Returns f2*(A+B)
Genetic Operators: One-Way Crossover 18 IFT ≤ X Y X 3 − Y 4 + − SQRT X X 1 1
Procreation(XOprob,mutProb) While While there remain at least 2 unselected individuals. find two unselected individuals I1 I2 at random. with probability XOprob If I1.Fitness > I2.Fitness use one-way XO to transfer genes from I1 to I2. Else use two-way XO between I1 and I2. For For each individual I1 in population. with probability mutProb choose a node in I1‘s tree at random and mutate it by either rebuild or local mutation. 19
Opponents There is no known simple evaluation function for Lose Checkers. All hand-crafted players used the random function to evaluate non-trivial board-states. Two types of opponents were written in code: The random player. An α-β player of depth d with a random evaluation function. 20
Quality of α-β Players To insure that α-β players using a random evaluation function are indeed proficient players, their performance was tested. Each test tournament consists of games. 21 1st player win ratio2nd player 1st player Randomαβ2αβ αβ2αβ2αβ3αβ αβ3αβ3αβ8αβ αβ3αβ3αβ5αβ αβ8αβ8αβ5αβ5
Results with Search Against α-β Players Using lookahead 3, playing 1000 games against αβ3. 22 vs. αβ3Fitness Eval Run ID Cor Cor Cor Cor Cor Cor Cor Cor Cor Cor00061
Results with Search Against α-β Players (cont’d) Using lookahead 3, playing against various opponents. 23 vs. αβ8vs.αβ6vs. αβ4vs. αβ3Run ID r r r r r r00061
Results with Search Against α-β Players: Parameters Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black, search depth 3. maximum tree depth: 12 in runs 44A-49A. 14 in runs 56A-61A XO_Prob 0.8, mutProb 0.2, local_muteProb
Evolving Players using Deeper Search Results with players using lookahead vs. αβ8vs. αβ6vs. αβ5Run ID r r r r r r r r00071
Results with Search Against α-β Players: Parameters Run parameters: Population 50, 70 generations. guide play: 20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb
The Role of Mobility Initial runs with search produced tepid results. The introduction of the mobility terminal greatly improved those results. Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 27
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Expanding work to other games. New results in Lose Checkers. New results in Lose Checkers. 10X10 Checkers. 10X10 Checkers. Reversi. Reversi. Dodgem. Dodgem. Conclusions and future work. 28
New Results in Lose Checkers 29 vs. αβ5Fitness EvalRun ID αβ2_20Cor αβ2_20Cor Cor Cor Cor Cor00099 Results with players using lookahead 4.
New Results in Lose Checkers (cont’d) 30 Run parameters: Population: Generations: Guide play: 10 games against αβ2 in two of the runs co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.
10x10 Checkers 31 10x10 Board. Objective: To eliminate all opponent pieces or render all opponent pieces immobile. Rules: As in 8x8 version.
Quality of α-β Players Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of games. 32 1st player win ratio 2nd player 1st player Randomαβ2αβ αβ2αβ2αβ3αβ αβ3αβ3αβ5αβ5
10x10 Checkers Results 33 vs. αβ3Search Depth Fitness EvalRun ID Cor Cor Cor Cor Cor Cor00095
10x10 Checkers Results (cont’d) Run parameters: Population: Generations: 100 No guide play co-play games as black. Search depth 4. Maximum tree depth XO_Prob 0.8, mutProb 0.2, local_muteProb
8x8 Reversi Popular board game. AKA Othello. 8x8 board. Each piece has black side and white side. Each player places piece on her turn, flipping trapped opponent pieces. Objective: Maximize number of friendly pieces on the board. 35
Reversi Specific Terminals 36 Return ValueReturn Type Node Name Number of corners occupied by opponent FEnemyCornerCount Number of corners occupied by player FFriendlyCornerCount − EnemyCornerCount FCornerCount
Quality of α-β Players 37 1st player win ratio 2nd player1st player Randomαβ2αβ αβ2αβ2αβ3αβ αβ3αβ3αβ5αβ αβ5αβ5αβ7αβ7 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of games.
Reversi Results 38 vs. αβ7vs. αβ5Search Depth Fitness Eval Run ID Cor Cor Cor Cor Cor Cor Cor Cor00111
Reversi Results (cont’d) Run parameters: Population: 120 Generations: 100 No guide play co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb
Dodgem 40
Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Available projects. Available projects. 41
Your mission (should you decide to accept it) 1. Choose a game. 2. Write game program in C and interface with Java system. 3. Write game specific terminal nodes and adjustments if necessary. 4. Run it, document results, produce report. 42
Games 43
My Current Areas of Interest. Games with high branching factor. Games with random element. Multiplayer games. Games with partial information. 44
Another project. I want to check my selective crossover operator. Adapt system to a toy problem. Execute runs with selective XO and with typical XO using several parameter sets. Compare and analyze results. Write report. 45