Backgammon project Oren Salzman Guy Levit Instructors:

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

RL for Large State Spaces: Value Function Approximation
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.
Reinforcement Learning
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning (1)
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
GoogolHex CS4701 Final Presentation Anand Bheemarajaiah Chet Mancini Felipe Osterling.
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Machine Learning Chapter 13. Reinforcement Learning
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Introduction Many decision making problems in real life
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
1 Phase II - Checkers Operator: Eric Bengfort Temporal Status: End of Week Five Location: Phase Two Presentation Systems Check: Checkers Checksum Passed.
Reinforcement Learning
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Jack Chen TJHSST Computer Systems Lab Abstract The purpose of this project is to explore Artificial Intelligence techniques in the board game.
Carla P. Gomes CS4700 CS 4701: Practicum in Artificial Intelligence Carla P. Gomes
Our project main purpose is to develop a tool for a combinatorial game researcher. Given a version of combinatorial puzzle game and few more parameters,
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning
Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Backgammon Group 1: - Remco Bras - Tim Beyer - Maurice Hermans - Esther Verhoef - Thomas Acker.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
CONTENTS 1. Introduction 2. The Basic Checker-playing Program
NON LINEAR FUNCTION Quadratic Function.
Deep Reinforcement Learning
PENGANTAR INTELIJENSIA BUATAN (64A614)
Reinforcement Learning
Mastering the game of Go with deep neural network and tree search
Reinforcement learning (Chapter 21)
AlphaGo with Deep RL Alpha GO.
Reinforcement learning (Chapter 21)
Markov Decision Processes
Deep reinforcement learning
Reinforcement Learning
Artificial Intelligence Chapter 12 Adversarial Search
Reinforcement learning
Continous-Action Q-Learning
RL for Large State Spaces: Value Function Approximation
Reinforcement Learning
The Alpha-Beta Procedure
Reinforcement Learning for Adaptive Game Learner
Chapter 1: Introduction
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Spring 2006
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Presentation transcript:

Backgammon project Oren Salzman Guy Levit Instructors: Part a: Ishai Menashe Part b: Yaki Engel

Agenda Project’s Objectives The Learning Algorithm TDGammon Problematic points The Race Problem Experimental Results Future Development

Objectives Developing an agent that learns to play backgammon by playing with itself, using reinforcement learning techniques Inspired by Tesauro’s TDGammon version 0.0

Learning Algorithm - general Evaluating positions using a neural network Greedy policy When the game ends the agent gets a reward according to the result (+2, +1, -1, -2)

TDGammon Problematic points Non linear neural network Policy is changing during training Environment is changing during training Solutions: Linear network Learning in alternations

The Race Problem In race, a more algorithmic approach is required for choosing a move Three solutions were considered: Designing a manual algorithm Using a different Network for races Using the same Network, but each feature is dedicated either to a race or a non race position.

Experiments Various settings of parameters were checked : Learning step (0.1, 0.3, 0.8) Lambda (0.1, 0.3, 0.5, 0.7, 0.9) Discount factor (0.95, 0.97, 0.98, 0.999) For each setting the agent played between half a million and five million games. All versions were compared to one golden version

Experiments’ results

Experiments’ results

Conclusions Learning step of 0.1 yielded the best results High discount factor (0.98, 0.999) were better than lower ones. Lambda of 0.1 and 0.9 were inferior to others. Among 0.3, 0.5, and 0.7, 0.5 seemed the best. None of the versions outperformed the golden version

Future development More than 1-ply search Adding features Going back to a non – linear network Letting both agents learn simultaneously Connecting the player to the internet Graphical User Interface

END

Learning Alogrithm - general The agents plays against itself, and get rewards (-2, -1, +1, +2) when the game ends. The network weights are updated using the following formulas: The eligibility trace is updated by:

The Features

Backgammon Board Definitions