Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Advertisements

Monte Carlo Tree Search A Tutorial
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
For Friday Finish chapter 5 Program 1, Milestone 1 due.
Games & Adversarial Search
For Monday Read chapter 7, sections 1-4 Homework: –Chapter 4, exercise 1 –Chapter 5, exercise 9.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Artificial Intelligence for Games Game playing Patrick Olivier
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Artificial Intelligence in Game Design
Game Intelligence: The Future Simon M. Lucas Game Intelligence Group School of CS & EE University of Essex.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Game Playing CSC361 AI CSC361: Game Playing.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search: Game Playing Reading: Chess paper.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
Game Playing.
Upper Confidence Trees for Game AI Chahine Koleejan.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Instructor: Vincent Conitzer
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Game Playing Revision Mini-Max search Alpha-Beta pruning General concerns on games.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Pigeon Problems Revisited Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images.
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
CE810 / IGGI Game Design II PTSP and Game AI Agents Diego Perez.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
Stochastic tree search and stochastic games
Mastering the game of Go with deep neural network and tree search
Pengantar Kecerdasan Buatan
AlphaGo with Deep RL Alpha GO.
Adversarial Search Chapter 5.
Games & Adversarial Search
Games & Adversarial Search
Games & Adversarial Search
Kevin Mason Michael Suggs
Instructor: Vincent Conitzer
Games & Adversarial Search
Games & Adversarial Search
Presentation transcript:

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex

Outline General machine intelligence: the ingredients Monte Carlo Tree Search – A quick overview and tutorial Example application: Mapello – Note: Game AI is Real AI !!! Example test problem: Physical TSP Results of open competitions Challenges and future directions

General Machine Intelligence: the ingredients Evolution Reinforcement Learning Function approximation – Neural nets, N-Tuples etc Selective search / Sample based planning / Monte Carlo Tree Search

Conventional Game Tree Search Minimax with alpha-beta pruning, transposition tables Works well when: – A good heuristic value function is known – The branching factor is modest E.g. Chess: Deep Blue, Rybka – Super-human on a smartphone! Tree grows exponentially with search depth

Go Much tougher for computers High branching factor No good heuristic value function MCTS to the rescue! “Although progress has been steady, it will take many decades of research and development before world-championship– calibre go programs exist”. Jonathan Schaeffer, 2001

Monte Carlo Tree Search (MCTS) Upper Confidence bounds for Trees (UCT) Further reading:

Attractive Features Anytime Scalable – Tackle complex games and planning problems better than before – May be logarithmically better with increased CPU No need for heuristic function – Though usually better with one Next we’ll look at: – General MCTS – UCT in particular

MCTS: the main idea Tree policy: choose which node to expand (not necessarily leaf of tree) Default (simulation) policy: random playout until end of game

MCTS Algorithm Decompose into 6 parts: MCTS main algorithm – Tree policy Expand Best Child (UCT Formula) – Default Policy – Back-propagate We’ll run through these then show demos

MCTS Main Algorithm BestChild simply picks best child node of root according to some criteria: e.g. best mean value In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but different versions can be used – E.g. final selection can be the max value child or the most frequently visited one

TreePolicy Note that node selected for expansion does not need to be a leaf of the tree But it must have at least one untried action

Expand

Best Child (UCT) This is the standard UCT equation – Used in the tree Higher values of c lead to more exploration Other terms can be added, and usually are – More on this later

DefaultPolicy Each time a new node is added to the tree, the default policy randomly rolls out from the current state until a terminal state of the game is reached The standard is to do this uniformly randomly – But better performance may be obtained by biasing with knowledge

Backup Note that v is the new node added to the tree by the tree policy Back up the values from the added node up the tree to the root

MCTS Builds Asymmetric Trees (demo)

All Moves As First (AMAF), Rapid Value Action Estimates (RAVE) Additional term in UCT equation: – Treat actions / moves the same independently of where they occur in the move sequence

Using for a new problem: Implement the State interface

Example Application: Mapello

Othello Each move you must Pincer one or more opponent counters between the one you place and an existing one of your colour Pincered counters are flipped to your own colour Winner is player with most pieces at the end

Basics of Good Game Design Simple rules Balance Sense of drama Outcome should not be obvious

Othello Example – white leads: -58 (from )

Black wins with score of 16

Mapello Take the counter-flipping drama of Othello Apply it to novel situations – Obstacles – Power-ups (e.g. triple square score) – Large maps with power-plays e.g. line fill Novel games – Allow users to design maps that they are expert in – The map design is part of the game Research bonus: large set of games to experiment with

Example Initial Maps

Or how about this?

Need Rapidly Smart AI Give players a challenging game – Even when the game map can be new each time Obvious easy to apply approaches – TD Learning – Monte Carlo Tree Search (MCTS – Combinations of these … E.g. Silver et al, ICML 2008 Robles et al, CIG 2011

MCTS (see Browne et al, TCIAIG 2012) Simple algorithm Anytime No need for a heuristic value function E-E balance Works well across a range of problems

Demo TDL learns reasonable weights rapidly How well will this play at 1 ply versus limited toll-out MCTS?

For Strong Play … Combine MCTS, TDL, N-Tuples

Where to play / buy Coming to Android (November 2012) Nestorgames (

MCTS in Real-Time Games: PTSP Hard to get long-term planning without good heuristics

Optimal TSP order != PTSP Order 36

MCTS: Challenges and Future Directions Better handling of problems with continuous action spaces – Some work already done on this Better understanding of handling real-time problems – Use of approximations and macro-actions Stochastic and partially observable problems / games of incomplete and imperfect information Hybridisation: – with evolution – with other tree search algorithms

Conclusions MCTS: a major new approach to AI Works well across a range of problems – Good performance even with vanilla UCT – Best performance requires tuning and heuristics – Sometimes the UCT formula is modified or discarded Can be used in conjunction with RL – Self tuning And with evolution – E.g. evolving macro-actions

Further reading and links