Download presentation
Presentation is loading. Please wait.
Published byJuliana Beasley Modified over 8 years ago
1
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine
2
Contents Introduction on Go Existing approaches Our approach Results Conclusion & Future work
3
What is Go?
4
Black & white play alternatively Stones with zero liberty will be removed The one who has more territory wins
5
Why is Go interested? Go is a hard game for computer. –The best Go computer programs are easily defeated by an average human amateur Board games have expert-level programs –Chess: Deep blue (1997) & F RITZ (2002) –Checker: Chinook (1994) –Othello (Reversi): Logistello (2002) –Backgammon: TD-G AMMON (1992)
6
Why is Go interested for AI? Poses unique opportunities and challenges for AI and machine learning –Hard to build high quality evaluation function –Big branching factor, 200-300, compared with 35-40 for chess
7
Existing approaches Hard-coded programs Evaluate the next move by playing large number of random games Use machine learning to learn the evaluation functions
8
Existing approaches ── hard-coded programs Hand-tailored pattern libraries Hard-coded rules to choose among multiple hits Tactical search (or reading) E.g. “Many Faces of Go”, “GnuGo”
9
Existing approaches ── hard-coded programs Pros: –Good performance Cons: –Intensive manual work –Pattern library is not complete –Hard to manage and improve
10
Existing approaches ── Random games Play huge number of random games from given position Use the results of games to evaluate all the legal moves Choose the legal move with best evaluation E.g: Gobble, Go81
11
Existing approaches ── Random games Pros –Easy to implement –Reasonable performance Cons –Small boards only, cannot scale to normal board
12
Existing approaches ── Machine learning Schraudolph et al., 1994 –TD0 –Neural Network Graepel et al., 2001 –Condensed graph by common fate property –SVM Stern, Graepel, and MacKay, 2005 –Conditional Markov random field
13
Existing approaches ── Machine learning Pros: –Learn automatically Cons: –Poor performance
14
Out approach Use scalable algorithms to learn high quality evaluation functions automatically Imitate human evaluating process
15
Our approach ── Human evaluating process Three key components –The understanding of patterns –The ability to combine patterns –The ability to relate strategic rewards to tactical ones
16
Our approach ── System components 3x3 pattern library –Learn tactical patterns automatically A structure-rich Recursive Neural Network –Propagate interaction between patterns –Learn the correlation between strategic rewards (Targets) and tactical reward (Inputs)
17
Our approach ── RNN architecture Six planes –One input plane –One output plane –Four Hidden Planes
18
Our approach ── Update sequence
19
Our approach ── Provide relevant inputs For intersections –Intersection type: black, white, or empty –Influence: influence from the same & opposite color –Pattern stability: a statistical value calculated from 3x3 patterns For groups –Number of eyes –Number of 1 st, 2 nd, 3 rd, and 4 th order liberties –Number of liberties of the 1 st and 2 nd weakest opponents
20
Our approach ── Pattern stability (I) 9x9 board is split into 10 unique locations for 3x3 patterns with mirror and rotation symmetries considered Stability is measured for each intersection of each pattern within each unique location.
21
Our approach ── Pattern stability (II) Ten unique pattern locations
22
Our approach ── Pattern stability (III)
23
Our approach ── Pattern stability results (I)
24
Our approach ── Pattern stability results (II)
25
Results ── Validation error
26
Results ── Results on move predictions
27
Results ── Matched move (I)
28
Results ── Matched move (II)
29
Conclusion & Future work
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.