Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl

Combined approach Supervised learning  Shape evaluation Reinforcement learning  Group safety  Territory Heuristic evaluation  Influence Search  Capture  Connectivity  Life and death

Architecture

Shape evaluation: Multilayer perceptron 190 inputs  Receptive field of radius 3  Distance to edge  Liberties  Captured stones 50 hidden nodes Single output  Will an expert play here?

Shape evaluation: Training and performance Trained on 400 expert games  Expert move used as positive example (+1)  Random legal move as negative example (0) Error backpropagation  error = target - eval Performance measured by treating prediction as evaluation function What percentage of legal moves are ranked below the expert move?

Shape evaluation: Results

Local search Selective search for local goals  Capture  Connectivity  Life and death Only considers moves suggested by shape evaluating network  Deep and narrow search  Captures common-sense knowledge

Group safety evaluation: Multilayer perceptron Groups defined by connectable blocks 13 inputs  Number of stones in group  Number of liberties in group  Number of proven eyes  Average opponent influence over liberties 20 hidden nodes 1 output  Probability of group survival

Group safety evaluation: Temporal difference learning Trained by self-play Reward signal for the group is the average final safety of stones  0 = captured  1 = survived TD(0) is used, replaying games backwards Very simple idea:  error = eval(next) - eval(now)

Influence evaluation Consider random walks from an intersection How likely to end up at a black or white stone? Can also take account of group safety estimates

Territory evaluation Another multilayer perceptron 4 Inputs  Revised influence (for both sides)  Distance from edge 10 hidden nodes 1 output  Predicted territory value Trained by TD(0) using eventual territory value as reward signal

Playing strength Playing 19x19 Go  Approximately even against Handtalk 97-06e  Wins more than 50% against Ego 1.0 Weaknesses  Confuses group safety with group strength  Has no concept of the aji of a group

Recent work New version of WinHonte 1.03  Neural net to evaluate sente/gote Trial version available online!

Conclusions Go knowledge can be learned Combining different forms of knowledge can be a good idea Multilayer perceptrons provide a flexible representation Local search can be used effectively as input features for learning

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Similar presentations

Presentation on theme: "Honte, a Go-Playing Program Using Neural Nets Frederik Dahl."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Similar presentations

Presentation on theme: "Honte, a Go-Playing Program Using Neural Nets Frederik Dahl."— Presentation transcript:

Similar presentations

About project

Feedback