Presentation is loading. Please wait.

Presentation is loading. Please wait.

Backgammon project Oren Salzman Guy Levit Instructors:

Similar presentations


Presentation on theme: "Backgammon project Oren Salzman Guy Levit Instructors:"— Presentation transcript:

1

2 Backgammon project Oren Salzman Guy Levit Instructors:
Part a: Ishai Menashe Part b: Yaki Engel

3 Agenda Project’s Objectives The Learning Algorithm
TDGammon Problematic points The Race Problem Experimental Results Future Development

4 Objectives Developing an agent that learns to play backgammon by playing with itself, using reinforcement learning techniques Inspired by Tesauro’s TDGammon version 0.0

5 Learning Algorithm - general
Evaluating positions using a neural network Greedy policy When the game ends the agent gets a reward according to the result (+2, +1, -1, -2)

6 TDGammon Problematic points
Non linear neural network Policy is changing during training Environment is changing during training Solutions: Linear network Learning in alternations

7 The Race Problem In race, a more algorithmic approach is required for choosing a move Three solutions were considered: Designing a manual algorithm Using a different Network for races Using the same Network, but each feature is dedicated either to a race or a non race position.

8 Experiments Various settings of parameters were checked :
Learning step (0.1, 0.3, 0.8) Lambda (0.1, 0.3, 0.5, 0.7, 0.9) Discount factor (0.95, 0.97, 0.98, 0.999) For each setting the agent played between half a million and five million games. All versions were compared to one golden version

9 Experiments’ results

10 Experiments’ results

11 Conclusions Learning step of 0.1 yielded the best results
High discount factor (0.98, 0.999) were better than lower ones. Lambda of 0.1 and 0.9 were inferior to others. Among 0.3, 0.5, and 0.7, 0.5 seemed the best. None of the versions outperformed the golden version

12 Future development More than 1-ply search Adding features
Going back to a non – linear network Letting both agents learn simultaneously Connecting the player to the internet Graphical User Interface

13 END

14 Learning Alogrithm - general
The agents plays against itself, and get rewards (-2, -1, +1, +2) when the game ends. The network weights are updated using the following formulas: The eligibility trace is updated by:

15 The Features

16 Backgammon Board Definitions


Download ppt "Backgammon project Oren Salzman Guy Levit Instructors:"

Similar presentations


Ads by Google