Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Slides:



Advertisements
Similar presentations
METAGAMER: An Agent for Learning and Planning in General Games Barney Pell NASA Ames Research Center.
Advertisements

Artificial Intelligence 12. Two Layer ANNs
Fuzzy Reasoning in Computer Go Opening Stage Strategy P.Lekhavat and C.J.Hinde.
Reinforcement Learning
Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,
RL for Large State Spaces: Value Function Approximation
IGB GO —— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine.
How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Problem Solving Using Search Reduce a problem to one of searching a graph. View problem solving as a process of moving through a sequence of problem states.
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning (1)
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Introduction Many decision making problems in real life
Othello Artificial Intelligence With Machine Learning
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
CSC Intro. to Computing Lecture 22: Artificial Intelligence.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
ADALINE (ADAptive LInear NEuron) Network and
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Key understandings in mathematics: synthesis of research Anne Watson NAMA 2009 Research with Terezinha Nunes and Peter Bryant for the Nuffield Foundation.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
ConvNets for Image Classification
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Stochastic tree search and stochastic games
D1 Miwa Makoto Chikayama & Taura Lab
Reinforcement Learning
Mastering the game of Go with deep neural network and tree search
Machine Learning for Go
Backgammon project Oren Salzman Guy Levit Instructors:
Reinforcement learning (Chapter 21)
AlphaGo and learning methods
Deep reinforcement learning
AlphaGO from Google DeepMind in 2016, beat human grandmasters
AlphaGo and learning methods
RL for Large State Spaces: Value Function Approximation
Reinforcement Learning
Artificial Intelligence 12. Two Layer ANNs
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Presentation transcript:

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver

Local shape Local shape describes a pattern of stones Corresponds to expert Go knowledge  Joseki (corner patterns)  Tesuji (tactical patterns) Used extensively in current strongest programs  Pattern databases Difficult to extract expert Go knowledge, and input into pattern databases Focus of this work  Explicitly learn local shapes through experience  Learn a value for the goodness of each shape

Prior work Supervised learning of local shapes  Local move prediction [Stoutamire, Werf]  Mimics strong play rather than learning to evaluate and understand positions Reinforcement learning of neural networks  TD(0) [Schraudolph, Enzenberger]  Shape represented implicitly, difficult to interpret  Limited in scope by network architecture

System architecture

Feature types Each local shape feature has a type Specifies window size  1x1, 2x1, 2x2, 3x2, 3x3 Specifies weight sharing method  Location invariant, location dependent All possible configurations are enumerated for each feature type

Local shape features

Weight sharing: location dependent

Weight sharing: location invariant

Partial ordering of feature types There is a partial ordering > over the generality of feature types Small windows > large windows Location invariant > location dependent

Value function Reward of +1 for winning, 0 for losing Value function gives the probability of winning Move selection is done by 1-ply greedy search over value function Value function is approximated by weighted sum of local shape features

Learning algorithm Weights initialised to zero Weights updated by TD(0) No explicit exploration Step size set to 0.1/n

Minimum liberty opponent To evaluate a position s:  Find block of either colour with fewest liberties  Set col min to colour of minimum liberty block  Set lib min to number of liberties  If both players have a block with l liberties, col min is set to minimum liberty player  Evaluate position according to: Select move with 1-ply greedy search

Training procedure Random policy rarely beats minimum liberty player So train against an improving opponent Opponent plays some random moves, enough to win 50% of games Random moves are reduced as agent improves Eventually there are no random moves Testing is always performed against full opponent (no random moves)

Results on 5x5 board Different combinations of feature types tried  Just one feature type F  All feature types as or more general than F  All feature types as or less general than F Percentage wins during testing after 25,000 training games

Results on 5x5 board Single specified feature set, location invariant

Results on 5x5 board All feature sets as or more general than specified set

Board growing Local shape features have a direct interpretation The same interpretation applies to different board sizes So transfer knowledge from one board size to the next Learn key concepts rapidly and extend to more difficult contexts

Cascading errors Separate TD-error calculated for each feature type Helps preserve meaning between contexts TD-error for feature type F is calculated from all features with type F’ ≥ F

Board growing results Board grown from 5x5 to 9x9 Board size increased when winning 90% Weights transferred from previous size Percentage wins shown during training

Shapes learned

Example game 7x7 board Agent plays black Minimum liberty opponent plays white Agent has learned strategic concepts:  Keeping stones connected  Building territory  Controlling corners

Conclusions Local shape knowledge can be explicitly learnt directly from experience Multi-scale representation helps to learn quickly and provide fine differentiation Knowledge is easily interpretable and can be transferred to different board sizes The combined knowledge of local shape is sufficient to express global strategic concepts

Future work Stronger opponents, real Go not Atari-Go Learn shapes selectively rather than enumerating all possible shapes Learn shapes to answer specific questions  Can black B4 be captured?  Can white connect A2 to D5? Learn non-local shape:  Use connectivity relationships  Build hierarchies of shapes