Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Similar presentations


Presentation on theme: "Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver."— Presentation transcript:

1 Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver

2 Local shape Local shape describes a pattern of stones Corresponds to expert Go knowledge  Joseki (corner patterns)  Tesuji (tactical patterns) Used extensively in current strongest programs  Pattern databases Difficult to extract expert Go knowledge, and input into pattern databases Focus of this work  Explicitly learn local shapes through experience  Learn a value for the goodness of each shape

3 Prior work Supervised learning of local shapes  Local move prediction [Stoutamire, Werf]  Mimics strong play rather than learning to evaluate and understand positions Reinforcement learning of neural networks  TD(0) [Schraudolph, Enzenberger]  Shape represented implicitly, difficult to interpret  Limited in scope by network architecture

4 System architecture

5 Feature types Each local shape feature has a type Specifies window size  1x1, 2x1, 2x2, 3x2, 3x3 Specifies weight sharing method  Location invariant, location dependent All possible configurations are enumerated for each feature type

6 Local shape features

7 Weight sharing: location dependent

8 Weight sharing: location invariant

9 Partial ordering of feature types There is a partial ordering > over the generality of feature types Small windows > large windows Location invariant > location dependent

10 Value function Reward of +1 for winning, 0 for losing Value function gives the probability of winning Move selection is done by 1-ply greedy search over value function Value function is approximated by weighted sum of local shape features

11 Learning algorithm Weights initialised to zero Weights updated by TD(0) No explicit exploration Step size set to 0.1/n

12 Minimum liberty opponent To evaluate a position s:  Find block of either colour with fewest liberties  Set col min to colour of minimum liberty block  Set lib min to number of liberties  If both players have a block with l liberties, col min is set to minimum liberty player  Evaluate position according to: Select move with 1-ply greedy search

13 Training procedure Random policy rarely beats minimum liberty player So train against an improving opponent Opponent plays some random moves, enough to win 50% of games Random moves are reduced as agent improves Eventually there are no random moves Testing is always performed against full opponent (no random moves)

14 Results on 5x5 board Different combinations of feature types tried  Just one feature type F  All feature types as or more general than F  All feature types as or less general than F Percentage wins during testing after 25,000 training games

15 Results on 5x5 board Single specified feature set, location invariant

16 Results on 5x5 board All feature sets as or more general than specified set

17 Board growing Local shape features have a direct interpretation The same interpretation applies to different board sizes So transfer knowledge from one board size to the next Learn key concepts rapidly and extend to more difficult contexts

18 Cascading errors Separate TD-error calculated for each feature type Helps preserve meaning between contexts TD-error for feature type F is calculated from all features with type F’ ≥ F

19 Board growing results Board grown from 5x5 to 9x9 Board size increased when winning 90% Weights transferred from previous size Percentage wins shown during training

20 Shapes learned

21 Example game 7x7 board Agent plays black Minimum liberty opponent plays white Agent has learned strategic concepts:  Keeping stones connected  Building territory  Controlling corners

22 Conclusions Local shape knowledge can be explicitly learnt directly from experience Multi-scale representation helps to learn quickly and provide fine differentiation Knowledge is easily interpretable and can be transferred to different board sizes The combined knowledge of local shape is sufficient to express global strategic concepts

23 Future work Stronger opponents, real Go not Atari-Go Learn shapes selectively rather than enumerating all possible shapes Learn shapes to answer specific questions  Can black B4 be captured?  Can white connect A2 to D5? Learn non-local shape:  Use connectivity relationships  Build hierarchies of shapes


Download ppt "Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver."

Similar presentations


Ads by Google