Upper Confidence Trees for Game AI Chahine Koleejan.

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
CS 484 – Artificial Intelligence
Adversarial Search: Game Playing Reading: Chapter next time.
Artificial Intelligence for Games Game playing Patrick Olivier
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Chess AI’s, How do they work? Math Club 10/03/2011.
Artificial Intelligence in Game Design
Game Intelligence: The Future Simon M. Lucas Game Intelligence Group School of CS & EE University of Essex.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Game Playing CSC361 AI CSC361: Game Playing.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search: Game Playing Reading: Chess paper.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Marco Adelfio CMSC 828N – Spring 2009 General Game Playing (GGP)
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Monte-Carlo Tree Search
Search and Planning for Inference and Learning in Computer Vision
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
Introduction Many decision making problems in real life
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Machine Learning for an Artificial Intelligence Playing Tic-Tac-Toe Computer Systems Lab 2005 By Rachel Miller.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
Artificial Intelligence and Searching CPSC 315 – Programming Studio Spring 2013 Project 2, Lecture 1 Adapted from slides of Yoonsuck Choe.
MDPs (cont) & Reinforcement Learning
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Adversarial Search 2 (Game Playing)
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Adversarial Reasoning: Sampling-Based Search with the UCT algorithm Joint work with Raghuram Ramanujan and Ashish Sabharwal.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
CE810 / IGGI Game Design II PTSP and Game AI Agents Diego Perez.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Improving Monte Carlo Tree Search Policies in StarCraft
Stochastic tree search and stochastic games
Mastering the game of Go with deep neural network and tree search
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
AlphaGo with Deep RL Alpha GO.
Announcements Homework 3 due today (grace period through Friday)
Kevin Mason Michael Suggs
Reinforcement Learning
Unit II Game Playing.
Presentation transcript:

Upper Confidence Trees for Game AI Chahine Koleejan

Background on Game AI For many years, computer chess was considered an ideal sandbox for testing AI algorithms Simple rules and clear benchmarks of performance against human intelligence Alpha-beta search programs domination over human players changed this

The Game of Go Researchers moved on to Go as their new challenge The game of Go is much harder to crack: 1.Massive search space – 19x19 board -> up to 361 possible moves per turn – More than possible states 2. Game itself is very complex – Hard to find good heuristics

Example of a Game of Go Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846

The Multi-arm Bandit Setting Hypothetical probability settting Gambler is at a row of k-”bandits” When a bandit is pulled the gambler gets some amount of money Each bandit has a different probability distribution The gambler must decide which bandits to pull to maximise his reward

Exploitation and Exploration We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run Upper Confidence Bound: – We want to maximise this value for an arm j: UCB1 = x̅ j + √[(2 ln n)/n j ]

Why do we care?

Sequential decision making games are basically a multi-arm bandit problem!

Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse.

Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse. …But it’s close enough so we can use the math.

Monte Carlo Tree Search(MCTS) A tree search method which has revolutionised computer Go Works by simulating thousands of random games Does not need any prior knowledge of the game Does not need heuristics or evaluation functions, just observes the outcome of the simulation

UCT Algorithm We have a tree where each node has a value given by the UCB1 bound Steps of the algorithm: 1.Selection 2.Expansion 3.Simulation 4.Backpropagation

Selection and Expansion Starting at root node, recursively choose the child with the highest value until we reach an expandable node A node is expandable if it is non-terminal and has unvisited children One child node is added to our tree

Simulation A simulation is run from the new node to the end of the game according to our defined default policy At the most basic level the default policy is just random legal play

Backpropagation The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value For example, +1 if we won and -1 if we lost

Example

References A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012 Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011

If you’re interested in Go talk to me! It’s really cool!

Othello Demo