10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Advertisements

RL for Large State Spaces: Value Function Approximation
Adversarial Search Chapter 5.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Applied Neuro-Dynamic Programming in the Game of Chess James Gideon.
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems.
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Practical techniques for agents playing multi-player games
Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.
Problem Solving Using Search Reduce a problem to one of searching a graph. View problem solving as a process of moving through a sequence of problem states.
Reinforcement learning
This time: Outline Game playing The minimax algorithm
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Reinforcement Learning
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning
Chapter 1: Introduction
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
double AlphaBeta(state, depth, alpha, beta) begin if depth
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Reinforcement Learning (1)
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Machine Learning Chapter 13. Reinforcement Learning
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Introduction Many decision making problems in real life
Othello Artificial Intelligence With Machine Learning
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Reinforcement Learning
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Backgammon Group 1: - Remco Bras - Tim Beyer - Maurice Hermans - Esther Verhoef - Thomas Acker.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Stochastic tree search and stochastic games
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement Learning
Reinforcement learning (Chapter 21)
Backgammon project Oren Salzman Guy Levit Instructors:
Reinforcement learning (Chapter 21)
Adversarial Search Chapter 5.
RL for Large State Spaces: Value Function Approximation
Chapter 1: Introduction
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01

Reinforcement Learning in Games2 10/29/01 Outline Reinforcement Learning & TD Learning TD-Gammon TDLeaf  Chinook Conclusion

Reinforcement Learning in Games3 10/29/01 The ideas behind Reinforcement Learning Two broad categories for learning:  Supervised  Unsupervised (Our concern) Problem with unsupervised learning:  Delayed rewards (temporal credit assignment) Goal:  Create a good control policy based on delayed rewards

Reinforcement Learning in Games4 10/29/01 Evaluation Function: Developing a Control Policy Evaluation function:  Function that estimates the total reward the agent will receive if it follows the function from this point onward We will assume the function evaluates states (good for deterministic games) The evaluation function could be:  Look-up table, Linear function, Neural Network, any function approximator…

Reinforcement Learning in Games5 10/29/01 Temporal Difference Learning TD(λ) Set initial weights to 0 or random values Assume our evaluation function evaluates a state at time t with the value Y t according to some weight vector w Modify the equation at the end of each game as follows for each time t+1:

Reinforcement Learning in Games6 10/29/01 A quick example: Printer Robot Objective:  Dock to printer, collect a document Assume 3 states:  C: next to coffee machine, no documents  P: next to printer, no documents  D: next to printer, carrying documents Assume 2 actions seen  a: dock to printer (available only from P or D)  b: go to printer (available only from C) P P (continue) C D (end) (Some time later) a reward b no reward

Reinforcement Learning in Games7 10/29/01 TD-Gammon Self-taught backgammon player Good enough to make the best sweat Huge success for reinforcement learning Far surpassed its supervised learning cousin, Neurogammon

Reinforcement Learning in Games8 10/29/01 How does it work? Used an artificial neural network for its evaluation function approximator Excellent neural network design Used expert features developed for Neurogammon along with basic board rep. Hundreds of thousands of training games against itself Hard-coded doubling algorithm

Reinforcement Learning in Games9 10/29/01 Why did it work so well? Stochastic domain – forces exploration Linear (basic) concepts are learned first Shallow search is “good enough” against humans

Reinforcement Learning in Games10 10/29/01 Backgammon vrs Other games Shallow Search TD-Gammon followed a greedy approach  1 ply look-ahead (later increased to 3-ply)  Its hard to predict your opponent’s move w/o his or her dice roll? What about your move after that? Doesn’t work so well for other games:  What features will tell me what move to take by looking only at the immediate results of the moves available to me?

Reinforcement Learning in Games11 10/29/01 TDLeaf(λ) TD Learning applied to the minimax algorithm For each state, search to a constant depth Evaluate a state according to a heuristic evaluation of its leaf of principle variation

Reinforcement Learning in Games12 10/29/01 Chinook This program, at this school, in this class, should need no introduction 84 features (4 sets of 21) were tunable by weight Each feature consists of many hand-picked parameters Question: Can we learn the 84 weights as well as a human can set them?

Reinforcement Learning in Games13 10/29/01 The Test Trained using TDLeaf All weight values set to 0 Variations introduced by using a book of opening moves (144 3-ply openings) Played no more than 10,000 games against itself before hitting a plateau Both programs are to use the same depth

Reinforcement Learning in Games14 10/29/01 The results were very positive Chinook w/ all weights set to 1 vrs Tournament Chinook: Chinook after self-play training vrs Tournament Chinook: Even Steven Some Lessons Learned:  You have to train at the same depth you plan to play at  You have to play against real people too

Reinforcement Learning in Games15 10/29/01 Conclusions TD(λ) can be a powerful tool in the creation of game-playing evaluation functions Must be a type of training that will introduce variation Features need to be hand-picked (for now) TD and TDLeaf allow quick weight tuning  Takes a lot of the tedium out of player design  Allows designers more experiment with features