Presentation is loading. Please wait.

Presentation is loading. Please wait.

AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D.

Similar presentations


Presentation on theme: "AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D."— Presentation transcript:

1 AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D

2 Contents Introduction Searching techniques The Capture Game Solving Go on Small Boards Learning techniques Move Prediction Learning to Score Predicting Life & Death Estimating Potential Territory Summary of results Conclusions

3 The game of Go Deceivingly simple rules Black and White move in turns A move places a stone on the board Surrounded stones are captured Direct repetition is forbidden (Ko-rule) The game is over when both players pass The player controlling most intersections wins

4 Some basic terminology Block- connected stones of one colour (no diagonal connections) Liberty- adjacent empty intersection Eye- surrounded region providing a safe liberty Group- stones of one colour controlling a local region Alive- group that cannot be captured Dead- group that can eventually be captured

5 Computer Go Even the best Go programs have no chance against strong amateurs Human players superior in area’s such as pattern recognition spatial reasoning Learning

6 Playing strength 29 stones handicap

7 Problem statement How can Artificial Intelligence techniques be used to improve the strength of Go programs? We focused on Searching techniques & Learning techniques

8 Searching techniques Very successful for other board games Evaluate positions by ‘thinking ahead’ Research Recognizing positions ‘that are irrelevant’ Fast heuristic evaluations Provably correct knowledge Move ordering (the best moves first) Re-use of partial results from the search process

9 The Capture Game Simplified version of Go First to capture a stone wins the game Passing not allowed  Detecting final positions trivial (unlike normal Go) Search method Iterative Deepening Principal Variation Search Enhanced transposition table Move ordering using shared tables for both colours for killer and history heuristic

10 Heuristic evaluation for the capture game Based on four principles: 1.Maximize liberties 2.Maximize territory 3.Connect stones 4.Make eyes Low order liberties (max. distance 3) Euler number (objects – holes) Fast computation using a bit-board representation

11 Solutions for the Capture Game All boards up to 5x5 were solved Winner decided by board-size parity Will initiative take over at 6 x 6? BoardWinnerDepthTime (s)Nodes (log 10 ) 2  2 W401.8 3  3 B703.2 4  4 W1415.7 5  5 B193958.4 6  6 ?>23>10 6 >12 Solution for 5  5 (Black wins) Solution for 4  4 (White wins)

12 Solutions for the Capture Game on 6x6 Starting positionStableCrosscut WinnerBlack Depth26 (+5)15 (+4) Nodes (log 10 )118.0 Time (s) 8.3  10 5 (10 days) 185 Initiative takes over at 6  6

13 Solving Go on Small Boards Iterative Deepening Principal Variation Search Enhanced transposition table Exploit board symmetry Internal unconditional bounds Effective move ordering Evaluation function Heuristic component Similar to the capture game Provably correct component Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory

14 Recognizing Unconditional Territory 1. Find regions surrounded by unconditionally alive stones of one colour 2. Find interior of the regions (eyespace) 3. Remove false eyes 4. Contract eyespace around defender stones 5. Count maximum sure liberties (MSL) MSL<2  Unconditionally territory. Otherwise  Play it out.

15 Solutions for Small Boards BoardResultDepthTime (s)Nodes (log 10 ) 2  2 draw5n.a.2.1 3  3 B+911n.a.3.5 4  4 B+2213.3 (s)5.8 5  5 B+25232.7 (h)9.2 Value of opening moves on 5x5 (3,2)(2,2)(3,3)

16 Learning techniques Successful in several related domains Heuristic knowledge can be ‘learned’ from analysis of human games Research Representation & Generalization Learn maximally from limited number of examples Pros and cons of different architectures Clever use of available domain knowledge

17 Move prediction Many moves in Go conform to local patterns which can be played almost reflexively Train a MLP network to rank moves Use move-pairs {expert, random} extracted from human game records Training attempts to rank expert moves first

18 Move Prediction - Representation Selection of raw features: Edge Liberties Captures Last move Stones Ko Liberties after Nearest stones Remove symmetry by canonical ordering & colour reversal High-dimensional representation suffers from curse of dimensionality => Apply linear feature extraction to reduce dimensionality

19 Move Prediction - Feature Extraction Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Move-Pair Analysis (MPA) Linear projection maximizing the expected quadratic distance between pairs Weakness: ignores global features Modified Eigenspace Separation Transform (MEST) Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix Good results using combination of MEST & MPA Standard techniques, sub-optimal for ranking

20 Human & Computer Performance Compared Game 1 Game 2 Game 3 Average 3 dan96.791.589.592.4 2 dan95.895.097.095.9 2 kyu95.091.592.592.9 MP*90.089.489.589.6 2 kyu87.590.8n.a.89.3 5 kyu87.584.485.085.5 8 kyu87.585.186.586.3 13 kyu83.375.282.780.2 14 kyu76.783.080.580.2 15 kyu80.073.882.078.4 Black must choose between two red intersections

21 Performance on professional 19×19 games RankingPerf. First25 % Top-345 % Top-2080 % moves Cumulative performance (%)

22 Learning to Score Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of : 1. Missing information: Only a single numeric result is given. The status of individual board-points is not available. 2. Unfinished games: Humans resign early or do not even finish the game at all 3. Bad moves To overcome 1&2, we need reliable final scores Large dataset created: 18k labeled final 9x9 positions Several tricks were used to identify dubious scores A few thousand positions scored/verified manually

23 The scoring method 1. Classify life & death for all blocks 2. Remove dead blocks 3. Mark empty intersections using flood-fills or distance to nearest remaining colour 4. (Optional) recursively update representation to take adjacent block status into account; return to 1

24 Blocks to Classify For final positions there are 3 types of blocks: 1.Alive (O): at border of own territory 2.Dead (X): inside the opponents territory 3.Irrelevant (?):removal does not change area score  We only train on blocks of type 1 and 2 !

25 Representation of the blocks Direct features of the block Size Perimeter Adjacent opponent stones 1 st, 2 nd, 3 rd - order liberties Protected liberties Auto-atari liberties Adjacent opponent blocks Local majority (MD < 3) Centre of mass Bounding box size Adjacent fully accessible CERs Number of regions Size Perimeter Split points Adjacent partially accessible CERs Number of partially accessible regions Accessible size Accessible perimeter Inaccessible size Inaccessible perimeter Inaccessible split points Disputed territory Direct liberties of the block in disputed territory Liberties of all friendly blocks in disputed territory Liberties of all enemy blocks in disputed territory Directly adjacent eyespace Size Perimeter Optimistic chain Number of blocks Size Perimeter Split points Adjacent CERs Adjacent CERs with eyespace Adjacent CERs, fully accessible from at least 1 block Size of adjacent eyespace Perimeter of adjacent eyespace External opponent liberties Opponent blocks (3x) (1) Weakest directly adjacent opponent block (weakest = block with the fewest liberties) (2) 2 nd weakest directly adjacent opponent block (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic chain Perimeter Liberties Shared liberties Split points Perimeter of adjacent eyespace Recursive features Predicted value of strongest adjacent friendly block Predicted value of weakest adjacent opponent block Predicted value of second weakest adjacent opponent block Average predicted value of weakest opponent block’s optimistic chain Adjacent eyespace size of the weakest opponent block’s optimistic chain Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain

26 Scoring Performance Blocks (direct/recursive classification) Training Size (blocks) Direct error (%) 2-step error (%) 3-step error (%) 4-step error (%) 1,0001.931.601.521.48 10,0001.090.760.740.72 100,0000.680.430.380.37 Full board (4-step recursive classification) Incorrect score: 1.1% = better than the average rated NNGS player (~7 kyu) Incorrect winner: 0.5% = comparable to the average NNGS player Average absolute score difference: 0.15 points

27 Life & Death during the game Predict whether blocks of stones can be captured Perfect predictions not possible in non-final positions!  Approximate the a posteriori probability that a block will be alive at the end of the game 4 Block types First 3 types identified from final position (as before) 4 th type: blocks captured during the game -> dead Irrelevant blocks not used during training! Representation extended with 5 additional features Player to move, Ko, Distance to ko, Nr. of black/white stones on the board Black blocks 50% alive

28 Performance over the game MLP, 25 hidden units, 175,000 training examples Average prediction error: 11.7%

29 Estimating Potential Territory Why estimate territory? 1. For predicting the score (potential territory) Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when ahead) 2. To detect safe regions (secure territory) Main purpose: forward pruning (risky unless provably correct) Our main focus is on (1) potential territory We investigate: Direct methods, known or derived from literature ML methods, trained on game records Enhancements with (heuristic) knowledge of L&D

30 Direct methods 1. Explicit control 2. Direct control 3. Distance-based control 4. Influence based control (~ numerical dilations) 5. Bouzy’s method (numerical dilations + erosions) 6. Combinations 5+3 or 5+4 Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)

31 ML methods Simple representation Intersections in ROI: Colour {+1 black, -1 white, 0 empty} Enhanced representation Intersections in ROI: Colour x Prob.(Alive) Edge Colour of nearest stone Colour of nearest living stone Prob.(Alive) obtained from pre-trained MLP predicted colour +1sure black 0neutral - 1sure white features

32 Performance at various levels of confidence

33 Predicting the winner (percentage correct)

34 Predicting the score (absolute error)

35 Summary: Searching Techniques The capture game Simplified Go rules (who captures the first stone wins) boards up to 6x6 solved Go on small boards Normal Go rules First program in the world to have solved 5x5 Go Perfect solutions up to ~30 intersections Heuristic knowledge required for larger boards

36 Summary: Learning Techniques 1 Move prediction Very good results (strong kyu level) Strong play is possible with limited selection of moves Scoring final positions Excellent classification Reliable training data

37 Summary: Learning Techniques 2 Predicting life and death Good results Most important ingredient for accurate evaluation of positions during the game Estimating potential territory Comparison of non-learning and learning methods Best results with learning methods

38 Conclusions Knowledge is the most important ingredient to improve Go programs Searching techniques Provably correct knowledge sufficient for solving small problems up to ~30 intersections Heuristic knowledge essential for larger problems Learning techniques Heuristic knowledge learned quite well from games Learned heuristic knowledge at least at the level of reasonably strong kyu players

39 Questions? ? More information at: http://erikvanderwerf.tengen.nl/http://erikvanderwerf.tengen.nl/ Email:


Download ppt "AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D."

Similar presentations


Ads by Google