Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.

Slides:



Advertisements
Similar presentations
Fuzzy Reasoning in Computer Go Opening Stage Strategy P.Lekhavat and C.J.Hinde.
Advertisements

Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.
ChooseMove=16  19 V =30 points. LegalMoves= 16  19, or SimulateMove = 11  15, or …. 16  19,
Development of the Best Tsume-Go Solver
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
NEURAL NETWORKS Perceptron
IGB GO —— A self-learning GO program Lin WU Information & Computer Science University of California, Irvine.
How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.
Artificial Intelligence for Games Game playing Patrick Olivier
 The amount of time it takes a computer to solve a particular problem depends on:  The hardware capabilities of the computer  The efficiency of the.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Use logic to teach the computer how to play a game
Artificial Intelligence in Game Design
Planning under Uncertainty
1 A Library of Eyes in Go Author: Thomas Wolf, Matthew Pratola Dept of Mathematics Brock University Presented by: Xiaozhen Niu.
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Game Playing CSC361 AI CSC361: Game Playing.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.
A TIE IS NOT A LOSS Paul Adamiak T02 Aruna Meiyeppen T01.
Evolution and Coevolution of Artificial Neural Networks playing Go Thesis by Peter Maier, Salzburg, April 2004 Additional paper used Computer Go, by Martin.
Inside HARUKA Written by Ryuichi Kawa Surveyed by Akihiro Kishimto.
1 Game Playing Chapter 6 (supplement) Various deterministic board games Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s.
1 Solving Ponnuki-Go on Small Board Paper: Solving Ponnuki-Go on small board Authors: Erik van der Werf, Jos Uiterwijk, Jaap van den Herik Presented by:
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
1 An Open Boundary Safety-of- Territory Solver for the Game of Go Author: Xiaozhen Niu, Martin Mueller Dept of Computing Science University of Alberta.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Wei Qi, Baduk, Go a game of strategy
Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.
Go An ancient Oriental board game Andrew Simons. Introduction 2 player game of skill. Popular in the Far East, growing in the West. Simple rules, extremely.
Game Playing.
Introduction Many decision making problems in real life
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Computer Go : A Go player Rohit Gurjar CS365 Project Proposal, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
 Summary  How to Play Go  Project Details  Demo  Results  Conclusions.
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:
Analysis of Algorithms
Game-playing AIs Part 1 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part I (this set of slides)  Motivation  Game Trees  Evaluation.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
SARTRE: System Overview A Case-Based Agent for Two-Player Texas Hold'em Jonathan Rubin & Ian Watson University of Auckland Game AI Group
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
AP STATISTICS LESSON SIMULATING EXPERIMENTS.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
CSC Intro. to Computing Lecture 22: Artificial Intelligence.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Neural Network Implementation of Poker AI
3.3 Complexity of Algorithms
Part 3 Linear Programming
PROBLEM-SOLVING TECHNIQUES Rocky K. C. Chang November 10, 2015.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.
ConvNets for Image Classification
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
By Kevin Madison and Emma Drobina
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Chapter 6 : Game Search 게임 탐색 (Adversarial Search)
Joseph Xu Soar Workshop 31 June 2011
The Alpha-Beta Procedure
CS51A David Kauchak Spring 2019
CS51A David Kauchak Spring 2019
Presentation transcript:

Learning Shape in Computer Go David Silver

A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move The aim is to surround the most territory Usually played on 19x19 board

Capturing The lines radiating from a stone are called liberties If a connected group of stones has all of its liberties removed then it is captured Captured stones are removed from the board

Capturing The lines radiating from a stone are called liberties If a connected group of stones has all of its liberties removed then it is captured Captured stones are removed from the board

Atari Go (Capture Go) Atari Go is a simplified version of Go The winner is the first player to capture Often used to teach Go to beginners Circumvents several tricky issues  The game only finishing by agreement  Ko (local repetitions of position)  Seki (local stalemates)

Computer Go Computer Go programs are very weak  Search space is too large for brute force techniques  No good evaluation functions Human intuition (shape knowledge) has proven difficult to capture. Why not learn shape knowledge? And use it to learn an evaluation function?

Local shape Local shape describes a pattern of stones It is used extensively by current Computer Go programs (pattern databases) Inputting local shape by hand takes many years of hard labour We would like to:  Learn local shapes by trial and error  Assign a value for the goodness of a shape  Just how good is a particular shape?

Enumerating local shapes In these experiments all possible local shapes are used as features Up to a small maximum size (e.g. 2x2) A local shape is defined to be:  A particular configuration of stones  At a canonical position on the board Local shapes are used as binary features by the learning algorithm

Invariances Each canonical local shape can be:  Rotated  Reflected  Inverted So each position may cause updates to multiple instances of each feature.

Algorithm Value function is learnt for afterstates Move selection is done by 1-ply greedy search (ε = 0) over value function  Active local shapes are identified  Linear combination is taken  Sigmoid squashing function is applied Backups are performed using TD(0) Reward of +1 for winning, 0 for losing

Value function approximation

Training procedure The challenge:  Learn to beat the average liberty player So learning algorithm was trained specifically against the average liberty player The problem: learning is very slow, since the agent almost never wins any games by chance. The solution: mix in a proportion of random moves until the agent wins 50% of all games. Reduce the proportion of randomness as the agent learns to win more games.

Training procedure The two pint challenge:  Learn to beat the average liberty player So learning algorithm was trained specifically against the average liberty player The problem: learning is very slow, since the agent almost never wins any games by chance. The solution: mix in a proportion of random moves until the agent wins 50% of all games. Reduce the proportion of randomness as the agent learns to win more games.

Results for different shape sizes

Results for different board sizes

Shapes learned (1x1)

Shapes learned (2x2)

Shapes learned (3x3)

Conclusions Local shape information is sufficient to beat a naïve rule-based player Significant shapes can be learned The ‘goodness’ of shapes can be learned A linear threshold unit can provide a reasonable evaluation function Enumerating all local shapes reaches a natural limit at 3x3 Training methodology is crucial

Future work Learn shapes selectively rather than enumerating all possible shapes Learn shapes to answer specific questions  Can black B4 be captured?  Can white connect A2 to D5? Learn non-local shape:  Use connectivity relationships  Build hierarchies of shapes