# Learning in Computer Go David Silver. The Problem Large state space  Approximately 10 172 states  Game tree of about 10 360 nodes  Branching factor.

## Presentation on theme: "Learning in Computer Go David Silver. The Problem Large state space  Approximately 10 172 states  Game tree of about 10 360 nodes  Branching factor."— Presentation transcript:

Learning in Computer Go David Silver

The Problem Large state space  Approximately 10 172 states  Game tree of about 10 360 nodes  Branching factor of about 200 Evaluating a position is hard  No good heuristics known  Volatile  Highly non-linear

Four ways to evaluate a position Don’t even try Hand-crafted heuristic Monte Carlo simulation Learned heuristic

Four choices about learning What to learn How to learn State representation Knowledge representation

What to learn Global evaluation function Shape Life and death Connectivity Eyes

Global evaluation function Several related concepts  Evaluation function  Heuristic  Value function What to evaluate  Probability of winning  Expected score How to evaluate  Sum of point territory estimates  Other approaches?

Shape Local pattern information Move recommendations Learning shape from expert games  Stoutamire, Enderton, Van der Werf, Dahl Learning shape by RL  NeuroGo v3

Life and Death Two problems:  Will a group live or die?  Can a group live or die? Solving the ‘can’ question  Alpha-beta search with learned heuristic [Wolf] Solving the ‘will’ question  Supervised learning using rich feature set [Werf]  Reinforcement learning, averaged over group [Dahl]

Connectivity Correlation between two points  Estimate potential groups of stones  Estimate potential regions of empty points ‘Will connect’ (NeuroGo v3)  Reinforcement learning of local connectivity.  Pathfinding module for global connectivity.  Connectivity map used for learning global evaluation function

What else can we learn? Eyes Heuristics for endgame Many other features…

How to learn Reinforcement Learning Supervised Learning Combined Approaches Evolutionary Methods

Reinforcement Learning Temporal Difference Learning  Schraudolph, Dayan, Sejnowski  Enzenberger (NeuroGo)  Dahl (Honte) Variants of TD( )  TD(0)  TD(  )  TD-leaf( ) Training methodology  Self-play  Expert games (Q-learning)

Supervised Learning Learn to mimic expert play  Expert move as +ve training example  Random move as -ve training example  Need a ranking metric and error function  e.g. Stoutamire, Enderton, Van der Werf, Dahl Learn from labelled final game positions  e.g. final score, life and death  Data is either noisy or sparse

Combined approaches Can combine elements of both reinforcement and supervised learning. e.g. Dahl’s Honte  Search Local searches for eyes, connections, life and death Global search using learned territory evaluation  Supervised learning Local move prediction (shape)  Reinforcement learning Life and death Territory

Evolutionary Methods Evolve a neural network to evaluate game positions  Donnelly, Lubberts, Richards, Rutquist Evolve rules to match positions [Kojima]  ‘Feed’ rules according to matches  Split successful rules  Weight rules according to success in predicting response  Different kinds of rule Flexible (production rules) Fixed (within radius from move) Semi-fixed (within radius of move, empty points only)

State Representation Invariances Graph representations Feature selection Dimensionality reduction

Invariances Go board has many symmetries  Rotational  Reflectional  Colour inversion Invariant under translation  Edges must be dealt with Schraudolph, Dayan, Sejnowski

Graph Representations Connected blocks are also (approximately) invariant.  Graepel’s ‘Common Fate Property’  Used previously by Baum, Stoutamire, Enzenberger. Generate a graph between units  Turn connected blocks and empty intersections into nodes  Turn adjacencies between units into edges Learn on graph representation  Learn relationships between units (NeuroGo v2)

Learning Relations in NeuroGo (v2)

Feature selection Raw board representation can be enhanced by any number of features Comparison of important features (Werf)  Most significant: Stones, Liberties, Last Move  Also significant: Edge, Captures, Nearby stones Trade-off between feature complexity and training time

Feature comparison in NeuroGo (v3)

Dimensionality Reduction Can use feature extraction techniques Werf compares a variety of algorithms  PCA performs well all round  Modified Eigenspace Separation Transform does even better  A combination may be best overall

Knowledge Representation Pattern Databases Neural Networks Rules Decision Trees Others

Pattern Databases Successful in commercial games Can be learned in similar format Go++ combines handcrafted pattern database and professional shape database (trade secret!)

Neural Networks Can learn and represent pattern information Successfully used in practice  Multilayer perceptrons + backpropagation  e.g. Schraudolph, Enzenberger, Werf, Dahl Variants  Resilient backpropagation (Werf)  Linear architecture (e.g. Werf)

Rules Horn clauses  Deductive inferencing (Kojima) Production rules  Evolutionary approach (Kojima)

Decision Trees Encodes patterns in concise, flexible form Tilde (Ramon, Blockeel)  Relational representation language  Inductive logic programming  Successfully learns nakade shapes  Learned heuristic compares favourably to GoTools at life and death.

Other representations Support Vector Machines (Graepel) Boltzmann Machines (Stern, MacKay)

Conclusions Common successful ideas General approach My approach

Common successful ideas Global evaluation function  Reinforcement learning  Exploiting invariances  Carefully selected features  Neural network Local move prediction  Supervised learning  +ve expert move, -ve random move  Neural network  But hasn’t led to a strong Go program

General Approach There are many different approaches to learning in Go. Focus on what to learn, and why it will help to play stronger Go.  What do we want to evaluate?  What knowledge do we need?  Which features will help? Then select appropriate learning algorithms.  How should we train?  How should knowledge be represented?

My Approach What to learn  Win/lose value function How to learn  Reinforcement learning  Options State representation  Predictive state representation  Can/will features Knowledge representation  Kanerva code (high dimensional patterns)  Linear architecture

Similar presentations