Upper Confidence Trees for Game AI Chahine Koleejan.

Upper Confidence Trees for Game AI Chahine Koleejan

Background on Game AI For many years, computer chess was considered an ideal sandbox for testing AI algorithms Simple rules and clear benchmarks of performance against human intelligence Alpha-beta search programs domination over human players changed this

The Game of Go Researchers moved on to Go as their new challenge The game of Go is much harder to crack: 1.Massive search space – 19x19 board -> up to 361 possible moves per turn – More than 10 170 possible states 2. Game itself is very complex – Hard to find good heuristics

Example of a Game of Go Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846

The Multi-arm Bandit Setting Hypothetical probability settting Gambler is at a row of k-”bandits” When a bandit is pulled the gambler gets some amount of money Each bandit has a different probability distribution The gambler must decide which bandits to pull to maximise his reward

Exploitation and Exploration We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run Upper Confidence Bound: – We want to maximise this value for an arm j: UCB1 = x̅ j + √[(2 ln n)/n j ]

Why do we care?

Sequential decision making games are basically a multi-arm bandit problem!

Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse.

Why do we care? Sequential decision making games are basically a multi-arm bandit problem! …But worse. …But it’s close enough so we can use the math.

Monte Carlo Tree Search(MCTS) A tree search method which has revolutionised computer Go Works by simulating thousands of random games Does not need any prior knowledge of the game Does not need heuristics or evaluation functions, just observes the outcome of the simulation

UCT Algorithm We have a tree where each node has a value given by the UCB1 bound Steps of the algorithm: 1.Selection 2.Expansion 3.Simulation 4.Backpropagation

Selection and Expansion Starting at root node, recursively choose the child with the highest value until we reach an expandable node A node is expandable if it is non-terminal and has unvisited children One child node is added to our tree

Simulation A simulation is run from the new node to the end of the game according to our defined default policy At the most basic level the default policy is just random legal play

Backpropagation The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value For example, +1 if we won and -1 if we lost

Example

References A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012 Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011

If you’re interested in Go talk to me! It’s really cool!

Othello Demo

Upper Confidence Trees for Game AI Chahine Koleejan.

Similar presentations

Presentation on theme: "Upper Confidence Trees for Game AI Chahine Koleejan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Upper Confidence Trees for Game AI Chahine Koleejan.

Similar presentations

Presentation on theme: "Upper Confidence Trees for Game AI Chahine Koleejan."— Presentation transcript:

Similar presentations

About project

Feedback