Learning in Computer Go David Silver. The Problem Large state space  Approximately 10 172 states  Game tree of about 10 360 nodes  Branching factor.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Rerun of machine learning Clustering and pattern recognition.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Slides from: Doug Gray, David Poole
Development of the Best Tsume-Go Solver
Kostas Kontogiannis E&CE
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Modular Neural Networks CPSC 533 Franco Lee Ian Ko.
Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
Processing Digital Images. Filtering Analysis –Recognition Transmission.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Othello Sean Farrell June 29, Othello Two-player game played on 8x8 board All pieces have one white side and one black side Initial board setup.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Knowledge Acquisition from Game Records Takuya Kojima, Atsushi Yoshikawa Dept. of Computer Science and Information Engineering National Dong Hwa University.
Radial Basis Function Networks
Chapter 11: Artificial Intelligence
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Evolving a Sigma-Pi Network as a Network Simulator by Justin Basilico.
Chapter 9 Neural Network.
1 Learning CRFs with Hierarchical Features: An Application to Go Scott Sanner Thore Graepel Ralf Herbrich Tom Minka TexPoint fonts used in EMF. Read the.
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining and Decision Support
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
Applications of Machine Learning to the Game of Go David Stern Applied Games Group Microsoft Research Cambridge (working with Thore Graepel, Ralf Herbrich,
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
ConvNets for Image Classification
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Stochastic tree search and stochastic games
D1 Miwa Makoto Chikayama & Taura Lab
Deep Learning Amin Sobhani.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Review of AI Professor: Liqing Zhang
Mastering the game of Go with deep neural network and tree search
Machine Learning for Go
AlphaGo with Deep RL Alpha GO.
AlphaGo and learning methods
Deep reinforcement learning
AlphaGO from Google DeepMind in 2016, beat human grandmasters
AlphaGo and learning methods
Neuro-Computing Lecture 4 Radial Basis Function Network
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Word representations David Kauchak CS158 – Fall 2016.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Learning in Computer Go David Silver

The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor of about 200 Evaluating a position is hard  No good heuristics known  Volatile  Highly non-linear

Four ways to evaluate a position Don’t even try Hand-crafted heuristic Monte Carlo simulation Learned heuristic

Four choices about learning What to learn How to learn State representation Knowledge representation

What to learn Global evaluation function Shape Life and death Connectivity Eyes

Global evaluation function Several related concepts  Evaluation function  Heuristic  Value function What to evaluate  Probability of winning  Expected score How to evaluate  Sum of point territory estimates  Other approaches?

Shape Local pattern information Move recommendations Learning shape from expert games  Stoutamire, Enderton, Van der Werf, Dahl Learning shape by RL  NeuroGo v3

Life and Death Two problems:  Will a group live or die?  Can a group live or die? Solving the ‘can’ question  Alpha-beta search with learned heuristic [Wolf] Solving the ‘will’ question  Supervised learning using rich feature set [Werf]  Reinforcement learning, averaged over group [Dahl]

Connectivity Correlation between two points  Estimate potential groups of stones  Estimate potential regions of empty points ‘Will connect’ (NeuroGo v3)  Reinforcement learning of local connectivity.  Pathfinding module for global connectivity.  Connectivity map used for learning global evaluation function

What else can we learn? Eyes Heuristics for endgame Many other features…

How to learn Reinforcement Learning Supervised Learning Combined Approaches Evolutionary Methods

Reinforcement Learning Temporal Difference Learning  Schraudolph, Dayan, Sejnowski  Enzenberger (NeuroGo)  Dahl (Honte) Variants of TD( )  TD(0)  TD(  )  TD-leaf( ) Training methodology  Self-play  Expert games (Q-learning)

Supervised Learning Learn to mimic expert play  Expert move as +ve training example  Random move as -ve training example  Need a ranking metric and error function  e.g. Stoutamire, Enderton, Van der Werf, Dahl Learn from labelled final game positions  e.g. final score, life and death  Data is either noisy or sparse

Combined approaches Can combine elements of both reinforcement and supervised learning. e.g. Dahl’s Honte  Search Local searches for eyes, connections, life and death Global search using learned territory evaluation  Supervised learning Local move prediction (shape)  Reinforcement learning Life and death Territory

Evolutionary Methods Evolve a neural network to evaluate game positions  Donnelly, Lubberts, Richards, Rutquist Evolve rules to match positions [Kojima]  ‘Feed’ rules according to matches  Split successful rules  Weight rules according to success in predicting response  Different kinds of rule Flexible (production rules) Fixed (within radius from move) Semi-fixed (within radius of move, empty points only)

State Representation Invariances Graph representations Feature selection Dimensionality reduction

Invariances Go board has many symmetries  Rotational  Reflectional  Colour inversion Invariant under translation  Edges must be dealt with Schraudolph, Dayan, Sejnowski

Graph Representations Connected blocks are also (approximately) invariant.  Graepel’s ‘Common Fate Property’  Used previously by Baum, Stoutamire, Enzenberger. Generate a graph between units  Turn connected blocks and empty intersections into nodes  Turn adjacencies between units into edges Learn on graph representation  Learn relationships between units (NeuroGo v2)

Learning Relations in NeuroGo (v2)

Feature selection Raw board representation can be enhanced by any number of features Comparison of important features (Werf)  Most significant: Stones, Liberties, Last Move  Also significant: Edge, Captures, Nearby stones Trade-off between feature complexity and training time

Feature comparison in NeuroGo (v3)

Dimensionality Reduction Can use feature extraction techniques Werf compares a variety of algorithms  PCA performs well all round  Modified Eigenspace Separation Transform does even better  A combination may be best overall

Knowledge Representation Pattern Databases Neural Networks Rules Decision Trees Others

Pattern Databases Successful in commercial games Can be learned in similar format Go++ combines handcrafted pattern database and professional shape database (trade secret!)

Neural Networks Can learn and represent pattern information Successfully used in practice  Multilayer perceptrons + backpropagation  e.g. Schraudolph, Enzenberger, Werf, Dahl Variants  Resilient backpropagation (Werf)  Linear architecture (e.g. Werf)

Rules Horn clauses  Deductive inferencing (Kojima) Production rules  Evolutionary approach (Kojima)

Decision Trees Encodes patterns in concise, flexible form Tilde (Ramon, Blockeel)  Relational representation language  Inductive logic programming  Successfully learns nakade shapes  Learned heuristic compares favourably to GoTools at life and death.

Other representations Support Vector Machines (Graepel) Boltzmann Machines (Stern, MacKay)

Conclusions Common successful ideas General approach My approach

Common successful ideas Global evaluation function  Reinforcement learning  Exploiting invariances  Carefully selected features  Neural network Local move prediction  Supervised learning  +ve expert move, -ve random move  Neural network  But hasn’t led to a strong Go program

General Approach There are many different approaches to learning in Go. Focus on what to learn, and why it will help to play stronger Go.  What do we want to evaluate?  What knowledge do we need?  Which features will help? Then select appropriate learning algorithms.  How should we train?  How should knowledge be represented?

My Approach What to learn  Win/lose value function How to learn  Reinforcement learning  Options State representation  Predictive state representation  Can/will features Knowledge representation  Kanerva code (high dimensional patterns)  Linear architecture