1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper.

1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010. A. Benbassat and M. Sipper “Evolving Board-Game Players with Genetic Programming”, in Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, ser. GECCO ’11 (2011). A. Benbassat and M. Sipper “Evolving Search and Strategy for Reversi Players using Genetic Programming”, IEEE Conference on Computational Intelligence and Games (CIG'12), 2012. Includes results:

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Evolving search. Available projects. 2

A Bit About Tree-Based GP A method of solving problems by evolving solver programs. The programs are represented in memory in tree form (i.e. the genomes are trees). Initially promoted mostly through the efforts of John Koza. 3

Tree-Based GP Turning expressions into a tree shaped data structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X) 4 + − SQRT X X 1 IFT ≤ + + 3 X Y * X Y * X X 3

Generic Genetic Operators: Self-Replication 5 IFT ≤ + + 3 X Y * X Y * X X 3 ≤ + + 3 X Y * X Y * X X 3

Generic Genetic Operators: Rebuild Mutation 6 IFT ≤ + + 3 XY * X Y * X X 3 − Y 4

Generic Genetic Operators: Two-Way Crossover 7 IFT ≤ + + 3 X Y X 3 − Y 4 + − SQRT X X 1

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Applying tree based GP to Lose Checkers. Design. Design. Algorithm and operators. Algorithm and operators. Results. Results. Expanding work to other games. Evolving Search. Available projects. 8

Applying GP to Lose Checkers: From Genotype to Phenotype Used strongly typed tree based GP. Trees are seen as board-state evaluators. The individual players are built around the evaluator, using it (integrated with alpha- beta search) to decide which move to take. 9

Terminal Nodes 10

Terminal Nodes (cont’d) 11

Function Nodes 12

Applying GP to Lose Checkers Algorithm: Generate random population consisting of individuals of tree height 5 for generation 0. Repeat for each generation i Evaluate fitness. Selection(). Procreation(XOprob,mutProb). 13

Fitness Calculations The system supports a sequence of guides. The system also supports play between individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games. The system also supports play against an archive of the best players from previous generations. Players get 1 fitness point for winning a game and 0.5 points for a draw. 14

Tournement Selection Repeat until number of parents selected is equal to original population size Randomly choose k different individuals from population : S={I 1 …I k } Select copies of the k’ fittest individuals for parent population. 15

Genetic Operators: Local Mutation 16 Every tree node N returning a floating point value was assigned a number. This number was initialized to 1.0 and acted as a factor for the return value. Local mutation is a slight change in the node’s factor. + A B Returns f1*(A+B) + A B Returns f2*(A+B)

Genetic Operators: One-Way Crossover 17 IFT ≤ + + 3 X Y X 3 − Y 4 + − SQRT X X 1 1

Procreation(XOprob,mutProb) While While there remain at least 2 unselected individuals. find two unselected individuals I1 I2 at random. with probability XOprob If I1.Fitness > I2.Fitness use one-way XO to transfer genes from I1 to I2. Else use two-way XO between I1 and I2. For For each individual I1 in population. with probability mutProb choose a node in I1‘s tree at random and mutate it by either rebuild or local mutation. 18

Opponents There is no known simple evaluation function for Lose Checkers. All hand-crafted players used the random function to evaluate non-trivial board-states. Two types of opponents were written in code: The random player. An α-β player of depth d with a random evaluation function. 19

Quality of α-β Players To insure that α-β players using a random evaluation function are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 20 1st player win ratio2nd player 1st player 0.9665Randomαβ2αβ2 0.8502αβ2αβ2αβ3αβ3 0.5873αβ3αβ3αβ8αβ8 0.82535αβ3αβ3αβ5αβ5 0.5562αβ8αβ8αβ5αβ5

Results with Search Against α-β Players Using lookahead 3, playing 1000 games against αβ3. 21 vs. αβ3Fitness Eval Run ID 744.050Cor00044 698.550Cor00046 765.550Cor00047 696.550Cor00048 781.550Cor00049 721.050Cor00056 786.550Cor00057 697.050Cor00058 737.050Cor00060 737.050Cor00061

Results with Search Against α-β Players (cont’d) Using lookahead 3, playing against various opponents. 22 vs. αβ8vs.αβ6vs. αβ4vs. αβ3Run ID 758.0816.0944.5744.0r00044 476.0722.5899.0765.5r00047 735.5809.0915.0781.5r00049 399.5745.5909.0786.5r00057 408.5627.0897.0737.0r00060 715.5781.5947.0737.0r00061

Results with Search Against α-β Players: Parameters Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black, search depth 3. maximum tree depth: 12 in runs 44A-49A. 14 in runs 56A-61A XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 23

Evolving Players using Deeper Search Results with players using lookahead 4. 24 vs. αβ8vs. αβ6vs. αβ5Run ID 395.0603.5582.0r00064 561.5782.5537.0r00065 483.5757.5567.0r00066 385.5723.0598.5r00067 524.0787.0548.0r00068 523.0715.5573.5r00069 476.0691.5577.0r00070 401.5582.5551.5r00071

Results with Search Against α-β Players: Parameters Run parameters: Population 50, 70 generations. guide play: 20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 25

The Role of Mobility Initial runs with search produced tepid results. The introduction of the mobility terminal greatly improved those results. Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 26

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Expanding work to other games. New results in Lose Checkers. New results in Lose Checkers. 10X10 Checkers. 10X10 Checkers. Reversi, Dodgem, 9 Men’s Morris. Reversi, Dodgem, 9 Men’s Morris. Evolving search. Available projects. 27

New Results in Lose Checkers 28 vs. αβ5Fitness EvalRun ID 632.010αβ2_20Cor00090 645.010αβ2_20Cor00091 608.025Cor00096 575.025Cor00097 575.540Cor00098 633.540Cor00099 Results with players using lookahead 4.

New Results in Lose Checkers (cont’d) 29 Run parameters: Population: 120-150 Generations: 90-100. Guide play: 10 games against αβ2 in two of the runs. 20-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

10x10 Checkers 30 10x10 Board. Objective: To eliminate all opponent pieces or render all opponent pieces immobile. Rules: As in 8x8 version.

Quality of α-β Players Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 31 1st player win ratio 2nd player 1st player 0.99885Randomαβ2αβ2 0.5229αβ2αβ2αβ3αβ3 0.876αβ3αβ3αβ5αβ5

10x10 Checkers Results 32 vs. αβ3Search Depth Fitness EvalRun ID 889.0350Cor00084 927.0350Cor00085 732.0225Cor00092 615.5225Cor00093 554.0225Cor00094 631.0225Cor00095

10x10 Checkers Results (cont’d) Run parameters: Population: 100-150 Generations: 100 No guide play. 25-50 co-play games as black. Search depth 4. Maximum tree depth 13-14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 33

8x8 Reversi Popular board game. AKA Othello. 8x8 board. Each piece has black side and white side. Each player places piece on her turn, flipping trapped opponent pieces. Objective: Maximize number of friendly pieces on the board. 34

Reversi Specific Terminals 35 Return ValueReturn Type Node Name Number of corners occupied by opponent FEnemyCornerCount Number of corners occupied by player FFriendlyCornerCount − EnemyCornerCount FCornerCount

Quality of α-β Players 36 1st player win ratio 2nd player1st player 0.8471Randomαβ2αβ2 0.6004αβ2αβ2αβ3αβ3 0.7509αβ3αβ3αβ5αβ5 0.7662αβ5αβ5αβ7αβ7 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games.

Reversi Results 37 vs. αβ7vs. αβ5Search Depth Fitness Eval Run ID 758.5875.0425Cor00100 803.0957.5425Cor00101 640.5942.5440Cor00102 711.5905.5440Cor00103 760.0956.0440Cor00108 826.0912.5440Cor00109 730.5953.5440Cor00110 815.5961.0440Cor00111

Reversi Results (cont’d) Run parameters: Population: 120 Generations: 100 No guide play. 25-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 38

10x10 Reversi Quality of α-β Players 39 1st player win ratio 2nd player1st player 0.82835Randomαβ2αβ2 0.61985αβ2αβ2αβ3αβ3 0.76195αβ3αβ3αβ5αβ5 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games.

10x10 Reversi Results 40 vs. αβ5Search Depth Fitness EvalRun ID 876.5 4 10αβ2+ 15Co r00114 917.5 4 10αβ2+ 15Co r00115 793.0 4 10αβ2+ 15Co r00116 932.5 4 10αβ2+ 15Co r00117

10x10 Reversi Results (cont’d) Run parameters: Population: 80 Generations: 50 10 games vs αβ2. 15 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 41

Dodgem 42 Relatively new game with unusual symmetry. Directions of movement are orthogonal. Players move pieces one square at a time cannot go backwards. Objective: Move all friendly pieces forward out of the board.

Dodgem Specific Terminals 43 Return ValueReturn Type Node Name Distance measure from victory for enemy player FEnemyPosCount Distance measure from victory for friendly player FFriendlyPosCount − EnemyPosCount FPosCount

5x5 Dodgem Quality of α-β Players 44 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 1st player win ratio 2nd player1st player 1.0Randomαβ2αβ2 0.9923Randomαβ3αβ3 0.5946αβ3αβ3αβ2αβ2 0.66695αβ2αβ2αβ5αβ5 0.6777αβ5αβ5αβ7αβ7

5x5 Dodgem Results 45 vs. αβ7Search Depth Fitness EvalRun ID 632.0 525Cor00120 656.0 525Cor00121 523.5 525Cor00122 694.5 525Cor00123 819.5 525Cor00124 532.5 525Cor00125

6x6 Dodgem Quality of α-β Players 46 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 1st player win ratio 2nd player1st player 1.0Randomαβ2αβ2 0.9995Randomαβ3αβ3 0.57255αβ3αβ3αβ2αβ2 0.7056αβ2αβ2αβ5αβ5 0.67675αβ5αβ5αβ7αβ7

6x6 Dodgem Results 47 vs. αβ5Search Depth Fitness Eval Run ID 723.0 425Cor00126 538.0 425Cor00127 710.5 425Cor00128 492.0 425Cor00129 666.0 425Cor00130 644.5 425Cor00131

Nine Men’s Morris 48 Ancient game solved in 1996. Two phased 3x8 game. Piece placing phase. Piece moving phase. Players can create mills to remove enemy pieces from the board. Objective: Remove all but 2 enemy pieces from the board.

Nine Men’s Morris Specific Terminals 49 Tree building method was manipulated to make a choice based on a FirstPhaseCheck() terminal appear at the top of every GP-tree built.

Nine Men’s Morris Quality of α-β Players 50 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 1st player win ratio 2nd player1st player 0.9490Randomαβ1αβ1 0.66245αβ1αβ1αβ2αβ2 0.53075αβ3αβ3αβ2αβ2 0.5283αβ2αβ2αβ4αβ4 0.5473αβ5αβ5αβ4αβ4

Nine Men’s Morris Results 51 vs. αβ4Search Depth Fitness Eval Run ID 953.0 320Cor00144 978.5 320Cor00145 947.0 320Cor00146 923.0 320Cor00147 919.0 320Cor00148 932.0 320Cor00149

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Evolving search. Evolving search. Available projects. 52

New Direction: Evolving Search with Forward Pruning High game-tree branching factor impedes using system with higher search depths. Forward pruning can be used to get deeper searching individuals fast enough to be evolved. Runs: Runtime parameters used to limit branching factor. A second pruning evaluation function is evolved alongside regular evaluation function. Branching factor can be scaled up after run is finished to achieve slower but stronger player. 53

Search with Forward Pruning 54

Reversi: Results Evolving Search and Scaling Results 55 vs. αβ7p Branching Factor Search Depth Fitness EvalRun ID 458.03625Cor00164 799.056− 607.03625Cor00165 817.056− 408.03640αβ3+20Cor00166 811.056− 443.03640αβ3+20Cor00167 862.056−

Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Evolving search. Available projects. Available projects. 56

Games 57

1 st Project(s) I want to check the level of play of some evolved Reversi players. Find an existing computer Reversi/Othello player that exhibits a variety of different levels of play. Interface player with my C code. Test some of my hand written players against the external computer player. Test evolved players against external computer program. 58

2 nd Project(s) I want to check my selective crossover operator. Adapt system to a game problem. Execute runs with selective XO and with typical XO using several parameter sets. Compare and analyze results using code that you create for that purpose. Write report. Leave your code with me for further analysis. 59

My Current Areas of Interest Games with high branching factor (Go,Hex). Project: Implement Monte-Carlo search algorithm (+variant control) on such a game so it works with my existing system. Games with random element (Backgammon). Project: Implement Monte-Carlo search algorithm (+variant control) on existing backgammon implementation. Others: Multiplayer games (Chinese checkers). Games with partial information (Card games). Single agent domains (Puzzles). 60

Very Important to note The purpose of your mini-project work is to help me do my research. One of the project requirements is that I be able to run your code with my system and repeat your results. By the end of the semester we’ll set up a meeting to make sure everything runs on my computer. אם זה לא רץ אצלי לשביעות רצוני, אז הפרוייקט בחזקת לא מוגש, ואין ציון ! 61

1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper.

Similar presentations

Presentation on theme: "1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper.

Similar presentations

Presentation on theme: "1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper."— Presentation transcript:

Similar presentations

About project

Feedback