Download presentation
Presentation is loading. Please wait.
Published byKaterina Hunger Modified over 10 years ago
1
Learning in Computer Go David Silver
2
The Problem Large state space Approximately 10 172 states Game tree of about 10 360 nodes Branching factor of about 200 Evaluating a position is hard No good heuristics known Volatile Highly non-linear
3
Four ways to evaluate a position Don’t even try Hand-crafted heuristic Monte Carlo simulation Learned heuristic
4
Four choices about learning What to learn How to learn State representation Knowledge representation
5
What to learn Global evaluation function Shape Life and death Connectivity Eyes
6
Global evaluation function Several related concepts Evaluation function Heuristic Value function What to evaluate Probability of winning Expected score How to evaluate Sum of point territory estimates Other approaches?
7
Shape Local pattern information Move recommendations Learning shape from expert games Stoutamire, Enderton, Van der Werf, Dahl Learning shape by RL NeuroGo v3
8
Life and Death Two problems: Will a group live or die? Can a group live or die? Solving the ‘can’ question Alpha-beta search with learned heuristic [Wolf] Solving the ‘will’ question Supervised learning using rich feature set [Werf] Reinforcement learning, averaged over group [Dahl]
9
Connectivity Correlation between two points Estimate potential groups of stones Estimate potential regions of empty points ‘Will connect’ (NeuroGo v3) Reinforcement learning of local connectivity. Pathfinding module for global connectivity. Connectivity map used for learning global evaluation function
10
What else can we learn? Eyes Heuristics for endgame Many other features…
11
How to learn Reinforcement Learning Supervised Learning Combined Approaches Evolutionary Methods
12
Reinforcement Learning Temporal Difference Learning Schraudolph, Dayan, Sejnowski Enzenberger (NeuroGo) Dahl (Honte) Variants of TD( ) TD(0) TD( ) TD-leaf( ) Training methodology Self-play Expert games (Q-learning)
13
Supervised Learning Learn to mimic expert play Expert move as +ve training example Random move as -ve training example Need a ranking metric and error function e.g. Stoutamire, Enderton, Van der Werf, Dahl Learn from labelled final game positions e.g. final score, life and death Data is either noisy or sparse
14
Combined approaches Can combine elements of both reinforcement and supervised learning. e.g. Dahl’s Honte Search Local searches for eyes, connections, life and death Global search using learned territory evaluation Supervised learning Local move prediction (shape) Reinforcement learning Life and death Territory
15
Evolutionary Methods Evolve a neural network to evaluate game positions Donnelly, Lubberts, Richards, Rutquist Evolve rules to match positions [Kojima] ‘Feed’ rules according to matches Split successful rules Weight rules according to success in predicting response Different kinds of rule Flexible (production rules) Fixed (within radius from move) Semi-fixed (within radius of move, empty points only)
16
State Representation Invariances Graph representations Feature selection Dimensionality reduction
17
Invariances Go board has many symmetries Rotational Reflectional Colour inversion Invariant under translation Edges must be dealt with Schraudolph, Dayan, Sejnowski
18
Graph Representations Connected blocks are also (approximately) invariant. Graepel’s ‘Common Fate Property’ Used previously by Baum, Stoutamire, Enzenberger. Generate a graph between units Turn connected blocks and empty intersections into nodes Turn adjacencies between units into edges Learn on graph representation Learn relationships between units (NeuroGo v2)
19
Learning Relations in NeuroGo (v2)
20
Feature selection Raw board representation can be enhanced by any number of features Comparison of important features (Werf) Most significant: Stones, Liberties, Last Move Also significant: Edge, Captures, Nearby stones Trade-off between feature complexity and training time
21
Feature comparison in NeuroGo (v3)
22
Dimensionality Reduction Can use feature extraction techniques Werf compares a variety of algorithms PCA performs well all round Modified Eigenspace Separation Transform does even better A combination may be best overall
23
Knowledge Representation Pattern Databases Neural Networks Rules Decision Trees Others
24
Pattern Databases Successful in commercial games Can be learned in similar format Go++ combines handcrafted pattern database and professional shape database (trade secret!)
25
Neural Networks Can learn and represent pattern information Successfully used in practice Multilayer perceptrons + backpropagation e.g. Schraudolph, Enzenberger, Werf, Dahl Variants Resilient backpropagation (Werf) Linear architecture (e.g. Werf)
26
Rules Horn clauses Deductive inferencing (Kojima) Production rules Evolutionary approach (Kojima)
27
Decision Trees Encodes patterns in concise, flexible form Tilde (Ramon, Blockeel) Relational representation language Inductive logic programming Successfully learns nakade shapes Learned heuristic compares favourably to GoTools at life and death.
28
Other representations Support Vector Machines (Graepel) Boltzmann Machines (Stern, MacKay)
29
Conclusions Common successful ideas General approach My approach
30
Common successful ideas Global evaluation function Reinforcement learning Exploiting invariances Carefully selected features Neural network Local move prediction Supervised learning +ve expert move, -ve random move Neural network But hasn’t led to a strong Go program
31
General Approach There are many different approaches to learning in Go. Focus on what to learn, and why it will help to play stronger Go. What do we want to evaluate? What knowledge do we need? Which features will help? Then select appropriate learning algorithms. How should we train? How should knowledge be represented?
32
My Approach What to learn Win/lose value function How to learn Reinforcement learning Options State representation Predictive state representation Can/will features Knowledge representation Kanerva code (high dimensional patterns) Linear architecture
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.