Presentation is loading. Please wait.

Presentation is loading. Please wait.

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Similar presentations


Presentation on theme: "Honte, a Go-Playing Program Using Neural Nets Frederik Dahl."— Presentation transcript:

1 Honte, a Go-Playing Program Using Neural Nets Frederik Dahl

2 Combined approach Supervised learning  Shape evaluation Reinforcement learning  Group safety  Territory Heuristic evaluation  Influence Search  Capture  Connectivity  Life and death

3 Architecture

4 Shape evaluation: Multilayer perceptron 190 inputs  Receptive field of radius 3  Distance to edge  Liberties  Captured stones 50 hidden nodes Single output  Will an expert play here?

5 Shape evaluation: Training and performance Trained on 400 expert games  Expert move used as positive example (+1)  Random legal move as negative example (0) Error backpropagation  error = target - eval Performance measured by treating prediction as evaluation function What percentage of legal moves are ranked below the expert move?

6 Shape evaluation: Results

7 Local search Selective search for local goals  Capture  Connectivity  Life and death Only considers moves suggested by shape evaluating network  Deep and narrow search  Captures common-sense knowledge

8 Group safety evaluation: Multilayer perceptron Groups defined by connectable blocks 13 inputs  Number of stones in group  Number of liberties in group  Number of proven eyes  Average opponent influence over liberties 20 hidden nodes 1 output  Probability of group survival

9 Group safety evaluation: Temporal difference learning Trained by self-play Reward signal for the group is the average final safety of stones  0 = captured  1 = survived TD(0) is used, replaying games backwards Very simple idea:  error = eval(next) - eval(now)

10 Influence evaluation Consider random walks from an intersection How likely to end up at a black or white stone? Can also take account of group safety estimates

11 Territory evaluation Another multilayer perceptron 4 Inputs  Revised influence (for both sides)  Distance from edge 10 hidden nodes 1 output  Predicted territory value Trained by TD(0) using eventual territory value as reward signal

12 Playing strength Playing 19x19 Go  Approximately even against Handtalk 97-06e  Wins more than 50% against Ego 1.0 Weaknesses  Confuses group safety with group strength  Has no concept of the aji of a group

13 Recent work New version of WinHonte 1.03  Neural net to evaluate sente/gote Trial version available online!

14 Conclusions Go knowledge can be learned Combining different forms of knowledge can be a good idea Multilayer perceptrons provide a flexible representation Local search can be used effectively as input features for learning


Download ppt "Honte, a Go-Playing Program Using Neural Nets Frederik Dahl."

Similar presentations


Ads by Google