Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper.

Similar presentations


Presentation on theme: "1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper."— Presentation transcript:

1 1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper “Evolving Lose-Checkers Players using Genetic Programming” IEEE Conference on Computational Intelligence and Games (CIG'10), 2010 New yet unpublished results. Includes results:

2 Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Available projects. 2

3 A Bit About Tree-Based GP A method of solving problems by evolving solver programs. The programs are represented in memory in tree form (i.e. the genomes are trees). Initially promoted mostly through the efforts of John Koza. 3

4 Tree-Based GP Turning expressions into a tree shaped data structure: (X + 1) – (√X) IF (X≤3) THEN ((X+Y) + 3) ELSE ((X*Y)*X) 4 + − SQRT X X 1 IFT ≤ + + 3 X Y * X Y * X X 3

5 Generic Genetic Operators: Self-Replication 5 IFT ≤ + + 3 X Y * X Y * X X 3 ≤ + + 3 X Y * X Y * X X 3

6 Generic Genetic Operators: Rebuild Mutation 6 IFT ≤ + + 3 XY * X Y * X X 3 − Y 4

7 Generic Genetic Operators: Two-Way Crossover 7 IFT ≤ + + 3 X Y X 3 − Y 4 + − SQRT X X 1

8 Synopsis Previous results in games using GP and GAs. Applying tree based GP to Lose Checkers. Applying tree based GP to Lose Checkers. Design. Design. Algorithm and operators. Algorithm and operators. Results. Results. Expanding work to other games. Conclusions and future work. 8

9 Applying GP to Lose Checkers: From Genotype to Phenotype Used strongly typed tree based GP. Trees are seen as board-state evaluators. The individual players are built around the evaluator, using it (integrated with alpha- beta search) to decide which move to take. 9

10 Terminal Nodes 10

11 Terminal Nodes (cont’d) 11

12 Function Nodes 12

13 Applying GP to Lose Checkers Algorithm: Generate random population consisting of individuals of tree height 5 for generation 0. Repeat for each generation i Evaluate fitness. Selection(). Procreation(XOprob,mutProb). 13

14 Fitness Calculations The system supports a sequence of guides. Each guide has a number of rounds assigned to it. Each guide has a number of games per round assigned to it. The system also supports play between individuals in the population (referred to in the EA literature as coevolution) and a parameter coPlayNum for number of games. Players get 1 fitness point for winning a game and 0.5 points for a draw. 14

15 Fitness Calculations (cont’d) for do for each guide i do for do for j ← 1 to guide i‘s Number of rounds do Have every individual in the population deemed fit enough play guide i’s round size games against guide i. Have every individual in the population play coPlayNum games as black against coPlayNum random opponents in the population. 15

16 Selection Repeat until number of parents selected is equal to original population size Randomly choose two different individuals from population : I1 and I2 if if I1.Fitness > I2.Fitness then Select a copy of I1 for parent population.else Select a copy of I2 for parent population. 16

17 Genetic Operators: Local Mutation 17 Every tree node N returning a floating point value was assigned a number. This number was initialized to 1.0 and acted as a factor for the return value. Local mutation is a slight change in the node’s factor. + A B Returns f1*(A+B) + A B Returns f2*(A+B)

18 Genetic Operators: One-Way Crossover 18 IFT ≤ + + 3 X Y X 3 − Y 4 + − SQRT X X 1 1

19 Procreation(XOprob,mutProb) While While there remain at least 2 unselected individuals. find two unselected individuals I1 I2 at random. with probability XOprob If I1.Fitness > I2.Fitness use one-way XO to transfer genes from I1 to I2. Else use two-way XO between I1 and I2. For For each individual I1 in population. with probability mutProb choose a node in I1‘s tree at random and mutate it by either rebuild or local mutation. 19

20 Opponents There is no known simple evaluation function for Lose Checkers. All hand-crafted players used the random function to evaluate non-trivial board-states. Two types of opponents were written in code: The random player. An α-β player of depth d with a random evaluation function. 20

21 Quality of α-β Players To insure that α-β players using a random evaluation function are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 21 1st player win ratio2nd player 1st player 0.9665Randomαβ2αβ2 0.8502αβ2αβ2αβ3αβ3 0.5873αβ3αβ3αβ8αβ8 0.82535αβ3αβ3αβ5αβ5 0.5562αβ8αβ8αβ5αβ5

22 Results with Search Against α-β Players Using lookahead 3, playing 1000 games against αβ3. 22 vs. αβ3Fitness Eval Run ID 744.050Cor00044 698.550Cor00046 765.550Cor00047 696.550Cor00048 781.550Cor00049 721.050Cor00056 786.550Cor00057 697.050Cor00058 737.050Cor00060 737.050Cor00061

23 Results with Search Against α-β Players (cont’d) Using lookahead 3, playing against various opponents. 23 vs. αβ8vs.αβ6vs. αβ4vs. αβ3Run ID 758.0816.0944.5744.0r00044 476.0722.5899.0765.5r00047 735.5809.0915.0781.5r00049 399.5745.5909.0786.5r00057 408.5627.0897.0737.0r00060 715.5781.5947.0737.0r00061

24 Results with Search Against α-β Players: Parameters Run parameters: Population 150, 120 generations. No guide play, 50 co-play games as black, search depth 3. maximum tree depth: 12 in runs 44A-49A. 14 in runs 56A-61A XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 24

25 Evolving Players using Deeper Search Results with players using lookahead 4. 25 vs. αβ8vs. αβ6vs. αβ5Run ID 395.0603.5582.0r00064 561.5782.5537.0r00065 483.5757.5567.0r00066 385.5723.0598.5r00067 524.0787.0548.0r00068 523.0715.5573.5r00069 476.0691.5577.0r00070 401.5582.5551.5r00071

26 Results with Search Against α-β Players: Parameters Run parameters: Population 50, 70 generations. guide play: 20 games (in 2 rounds of 10) against αβ5. 20 co-play games as black. Search depth 4. maximum tree depth of 10. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 26

27 The Role of Mobility Initial runs with search produced tepid results. The introduction of the mobility terminal greatly improved those results. Mobility is a general principle which apllies to many board games, and often associated with a high level of play. 27

28 Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Expanding work to other games. New results in Lose Checkers. New results in Lose Checkers. 10X10 Checkers. 10X10 Checkers. Reversi. Reversi. Dodgem. Dodgem. Conclusions and future work. 28

29 New Results in Lose Checkers 29 vs. αβ5Fitness EvalRun ID 632.010αβ2_20Cor00090 645.010αβ2_20Cor00091 608.025Cor00096 575.025Cor00097 575.540Cor00098 633.540Cor00099 Results with players using lookahead 4.

30 New Results in Lose Checkers (cont’d) 30 Run parameters: Population: 120-150 Generations: 90-100. Guide play: 10 games against αβ2 in two of the runs. 20-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5.

31 10x10 Checkers 31 10x10 Board. Objective: To eliminate all opponent pieces or render all opponent pieces immobile. Rules: As in 8x8 version.

32 Quality of α-β Players Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games. 32 1st player win ratio 2nd player 1st player 0.99885Randomαβ2αβ2 0.5229αβ2αβ2αβ3αβ3 0.876αβ3αβ3αβ5αβ5

33 10x10 Checkers Results 33 vs. αβ3Search Depth Fitness EvalRun ID 889.0350Cor00084 927.0350Cor00085 732.0225Cor00092 615.5225Cor00093 554.0225Cor00094 631.0225Cor00095

34 10x10 Checkers Results (cont’d) Run parameters: Population: 100-150 Generations: 100 No guide play. 25-50 co-play games as black. Search depth 4. Maximum tree depth 13-14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 34

35 8x8 Reversi Popular board game. AKA Othello. 8x8 board. Each piece has black side and white side. Each player places piece on her turn, flipping trapped opponent pieces. Objective: Maximize number of friendly pieces on the board. 35

36 Reversi Specific Terminals 36 Return ValueReturn Type Node Name Number of corners occupied by opponent FEnemyCornerCount Number of corners occupied by player FFriendlyCornerCount − EnemyCornerCount FCornerCount

37 Quality of α-β Players 37 1st player win ratio 2nd player1st player 0.8471Randomαβ2αβ2 0.6004αβ2αβ2αβ3αβ3 0.7509αβ3αβ3αβ5αβ5 0.7662αβ5αβ5αβ7αβ7 Evolved players were tested against α-β players that chose a material evaluation function at random for each turn. To insure that α-β players are indeed proficient players, their performance was tested. Each test tournament consists of 10000 games.

38 Reversi Results 38 vs. αβ7vs. αβ5Search Depth Fitness Eval Run ID 758.5875.0425Cor00100 803.0957.5425Cor00101 640.5942.5440Cor00102 711.5905.5440Cor00103 760.0956.0440Cor00108 826.0912.5440Cor00109 730.5953.5440Cor00110 815.5961.0440Cor00111

39 Reversi Results (cont’d) Run parameters: Population: 120 Generations: 100 No guide play. 25-40 co-play games as black. Search depth 4. Maximum tree depth of 14. XO_Prob 0.8, mutProb 0.2, local_muteProb 0.5. 39

40 Dodgem 40

41 Synopsis Tree based GP in a nutshell. Applying tree based GP to Lose Checkers. Expanding work to other games. Available projects. Available projects. 41

42 Your mission (should you decide to accept it) 1. Choose a game. 2. Write game program in C and interface with Java system. 3. Write game specific terminal nodes and adjustments if necessary. 4. Run it, document results, produce report. 42

43 Games 43

44 My Current Areas of Interest. Games with high branching factor. Games with random element. Multiplayer games. Games with partial information. 44

45 Another project. I want to check my selective crossover operator. Adapt system to a toy problem. Execute runs with selective XO and with typical XO using several parameter sets. Compare and analyze results. Write report. 45


Download ppt "1 Evolving Board Game Players Without Using Expert Knowledge A presentation of research by Amit Benbassat Advisor: Moshe Sipper. A. Benbassat and M. Sipper."

Similar presentations


Ads by Google