Backgammon project Oren Salzman Guy Levit Instructors: Part a: Ishai Menashe Part b: Yaki Engel
Agenda Project’s Objectives The Learning Algorithm TDGammon Problematic points The Race Problem Experimental Results Future Development
Objectives Developing an agent that learns to play backgammon by playing with itself, using reinforcement learning techniques Inspired by Tesauro’s TDGammon version 0.0
Learning Algorithm - general Evaluating positions using a neural network Greedy policy When the game ends the agent gets a reward according to the result (+2, +1, -1, -2)
TDGammon Problematic points Non linear neural network Policy is changing during training Environment is changing during training Solutions: Linear network Learning in alternations
The Race Problem In race, a more algorithmic approach is required for choosing a move Three solutions were considered: Designing a manual algorithm Using a different Network for races Using the same Network, but each feature is dedicated either to a race or a non race position.
Experiments Various settings of parameters were checked : Learning step (0.1, 0.3, 0.8) Lambda (0.1, 0.3, 0.5, 0.7, 0.9) Discount factor (0.95, 0.97, 0.98, 0.999) For each setting the agent played between half a million and five million games. All versions were compared to one golden version
Experiments’ results
Experiments’ results
Conclusions Learning step of 0.1 yielded the best results High discount factor (0.98, 0.999) were better than lower ones. Lambda of 0.1 and 0.9 were inferior to others. Among 0.3, 0.5, and 0.7, 0.5 seemed the best. None of the versions outperformed the golden version
Future development More than 1-ply search Adding features Going back to a non – linear network Letting both agents learn simultaneously Connecting the player to the internet Graphical User Interface
END
Learning Alogrithm - general The agents plays against itself, and get rewards (-2, -1, +1, +2) when the game ends. The network weights are updated using the following formulas: The eligibility trace is updated by:
The Features
Backgammon Board Definitions