Intelligent Search Techniques

Intelligent Search Techniques
Mark Winands & Cameron Browne

Monte Carlo Tree Search Applications
1-Jan-19 Contents Adversarial Search Single Agent Search Monte Carlo Tree Search Applications

Computer Game-playing
1-Jan-19 Computer Game-playing Can computers beat humans in board games like Chess, Checkers, Go? This is one of the first tasks of AI (Shannon 1950)

Two (or more) opponents, each trying to maximize their expectations
Adversarial Search Two (or more) opponents, each trying to maximize their expectations Player 1 is called MAX Obtain the maximum result Minimize that of the opponent Player 2 is called MIN Obtain the minimum result Maximize that of the opponent

Definitions - Nodes Root node Terminal node Leaf node (non-terminal)
State (position) which is to be searched Terminal node A node which has a fixed application dependent value (e.g., win, loss, draw) Leaf node (non-terminal) A node which has been assigned a heuristic value A heuristic is an “educated guess” to an approximate terminal value Internal nodes Nodes whose value is a function of the successors

Definitions - Tree Search depth d
Number of state transitions (moves) from the root of the search to the current state position (measured in ply) Branching factor b Average number of successor nodes (moves) Tree vs. Directed acyclic graph (DAG) Most trees are really DAGs A node can have 1 parent (tree) or possible more than 1 (DAG) Transposition

Tree (Traversal) Depth-first search Left to right
Other ways of traversal possible, but for the remainder we use this one!

MiniMax Search (Von Neumann, 1928)
3 Min 3 2 Max 7 3 4 2

Path from root to leaf node of optimal play by each side Optimal path
Principal Variation Path from root to leaf node of optimal play by each side Optimal path Main line 3 4 2 7

MiniMax Analysis Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent) Time complexity? O(bd) Space complexity? O(bd) (depth-first exploration) For chess, b ≈ 35, d ≈100 for "reasonable" games  exact solution completely infeasible Can we do better?

Observation Some nodes in the search can be proven to be irrelevant to the outcome of the search

α-β Algorithm 3 3 ≤ 2 β-pruning 2 2 7 3 4

The Strength of α-β 3 4 2 3 3 ≤ 2 More than thousand prunings

The Importance of α-β Algorithm
3 3 ≤ 2 β-pruning 4 2 3

Example: Alpha-Beta Algorithm
≥-4 5 ≤5 ≤-4 -4 ≤-6 ≤5 5 ≥6 ≥5 5 ≥-6 -4 ≥ -2 ≥5 5 ≥6 -6 5 6 -6 -4 -2 5 4 6 Principal Variation

7 6 3 2 8 9 4 5 10 1 11 12 13 14 15

6 ≥6 6 ≤2 6 ≥8 ≤2 6 ≤3 8 ≤1 ≤2 Shallow pruning 7 6 3 2 8 9 4 5 1 10 2 11 12 13 14 15 Deep pruning

Alpha-Beta Algorithm Why is it called alpha-beta? Maintain two bounds:
Alpha (α): a lower bound on the best value that the player can achieve Beta (β): an upper bound on what the opponent can achieve Search, maintaining α and β Whenever α ≥ β, further search at this node is irrelevant

Negate the values first
NegaMax Formulation MiniMax formulation is awkward because the search alternates between MINs and MAXs The NegaMax formulation allows only a MAX to be used (Knuth & Moore, 1975) Always maximize, but… Negate the values first

NegaMax 5 Negate, then maximize Negate, then maximize Negate, then maximize -6 -4 5 6 2 -8 4 -6 -4 -2 Discard minimax values for MIN leaf nodes. Replaced by negation -5 -6 -2 8 -4 6 4 2

Analysis What is the best case for Alpha-Beta?
Consider two cases in this MiniMax Search:

Better known as move ordering
Successor Ordering Better known as move ordering Alpha-beta’s performance depends on getting cut offs as soon as possible! At a node where a cut-off is possible, ideally wants to search (one of the) best move(s) first, and cut-off immediately

ALL – all successor (moves) of a node must be considered
Alpha-Beta Node Types Define two node types ALL – all successor (moves) of a node must be considered CUT – a cut-off can occur; one of more successors (moves) or a node must be considered

Minimal α-β Tree In reality you don’t know this!

Alpha-Beta Analysis Assume a fixed branching factor and a fixed depth
Best case: Approximate bd/2 Impact? b = 10, d = 9 Minimax: 109 = 1,000,000,000 Alpha-beta: = 110,000

The worst case? No cut offs, and alpha-beta degrades to MiniMax
Alpha-Beta Analysis But… best-case analysis depends on choosing the best move first at CUT nodes (not always possible) The worst case? No cut offs, and alpha-beta degrades to MiniMax

Heuristic Search Playing is solving a sequence of these game trees
0.25 –1 1 2 3 0.33 0.5 -1 Truncate the game tree (limited search depth) Use a (static heuristic) evaluation function at the leaves to replace pay-offs Minimax (with alpha-beta) on the reduced game tree Playing is solving a sequence of these game trees This approach works very well in Chess, Checkers, Backgammon

Quiescence Search A quiescent position is unlikely to show wild swings of value in near future Apply Eval func only to quiescent positions Expand until quiescent position found Instead of using the evaluation function at the leaves, a special function is called that evaluates special moves (e.g. captures) only down to (infinite) depth Selective Search

Quiescence Search D =0 Eval = 50 QS=1000

Holds for other games too!
Isn’t this good enough? No! Thompson (1982): search depth is strongly correlated with performance in chess Searching one move (one ply) deeper made a (huge) difference in performance Holds for other games too!

Performance! Performance!
Improve Alpha-Beta to guarantee near best-case results Move ordering Windowing Iterative deepening Transposition Tables Improve the heuristic evaluation Use parallelism to increase the search depth

Why Alpha-Beta search first?
Many search enhancements developed for alpha-beta translate to single-agent search Most originated with alpha-beta, and were adopted by other classes of search algorithms

Generalization of minimax to n players Assumption:
Maxn Algorithm Generalization of minimax to n players Luckhardt and Irani, 1986 Assumption: The players alternate moves Each player tries to maximize his/her return Indifferent to returns of others.

Maxn Algorithm 1 (9,9,5) 2 (4,5,4) 2 (9,9,5) 3 (7,1,8) 3 (4,5,4) 3
(1,8,3) 3 (9,9,5) 1 1 1 1 1 1 1 1 (5,3,2) (7,1,8) (8,5,4) (4,5,4) (1,8,3) (6, 6, 3) (3, 6, 3) (9 , 9, 5)

Paranoid Algorithm Here we see the other players as one big opponent (Sturtevant and Korf 2000) There is my own player. The Max player. There are all the others. The Min players. The paranoid algorithm evaluates the tree as follows. When it is my turn to play – take the maximum of my utility. When it is not my turn (it is one of them) I take the minimum of my utility.

Paranoid Algorithm ≥4 4 4 ≤1 5 4 ≤1 5 7 8 4 1 6 3 9

Expectimax Search Trees
Chance nodes when the outcome is uncertain Search-based approaches must take into account all possibilities at a chance node Increases the branching factor making deep search unlikely

Expectimax Example

What to do in the endgame?
Alpha-beta with enhancements (move ordering, transposition tables) Knowledge (domain dependent) Endgame databases Special search algorithms (endgame solvers) Proof-number search Lambda-search

Which node has to be expanded?
MAX a MIN c b d e f g h MAX ? Win ? Win ?

Which node has to be expanded?
MAX b c MIN d e f g h MAX ? ? ? i j k l m MIN ? ? ? ? ?

Best-first search method
PN search Allis et al. (1994) Best-first search method Criterion: develop the leaf node that is most promising to prove the goal Goal: (dis)prove the root node E.g., to be a win for the player to move

Proof number and Disproof Number
Proof number (pn): the minimum number of leaf nodes which have to be proved in order to prove the node Disproof number (dpn): the minimum number of leaf nodes which have to be disproved in order to disprove the node Proof number and disproof number for each node Assume one expansion for each unexpanded node

Three types of leaf nodes:
PN search (2) Three types of leaf nodes: Proved (goal is true): Disproved (goal is false): Unknown:

PN Search: AND/OR AND/OR Tree Internal nodes OR and AND nodes:
In MinMax tree OR equivalent with MAX and AND with MIN To prove an OR node it suffices to prove one child. To disprove an OR node all the children have to be disproved. To prove an AND node all the children have to be proved. To disprove an AND node it suffices to disprove one child.

PN Search: Back propagation
Two types of internal nodes (tree): OR node AND node

PN Search: Node selection
Best-first search Most-promising = most-proving Path to most-promising node: At OR node choose child with min pn At AND node choose child with min dpn

PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 ? win ?
b c 1 2 1 e f g d loss ? 1 1 1 h i j k l ? win ? draw ?

PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 1 ? 1 ?
b c 1 2 1 e f g d loss ? 1 2 1 1 1 h i j k l n m ? 1 ? win draw ?

PN Search Strength Weakness w w w w ? w Solution 11-ply deep w w w w 1
3 PN Search w 1 w w w ? Strength 1 w 1 Solution 11-ply deep 1 w 1 Weakness 1 w 1 1 w 1 w

Endgame positions with forced moves
PN Search Endgame positions with forced moves Other AND/OR Structures (Retrosynthesis) Mostly (dis)proves this kind of positions faster than alpha-beta Deep forced wins Weakest resistance Does not always find the shortest path Extreme cases have been reported Breuker (1998) PN found a mate in 114, while there existed a mate in 4

PN Search Does not need a domain-dependent heuristic evaluation function to determine the most-promising node to be expanded next Domain knowledge is not required The shape of the tree and the value of the leaf nodes is used to select the most-proving node You may use heuristics at leaf nodes

PN search: heuristic knowledge
Use heuristics at unknown leaf nodes to set their proof and disproof number Domain-independent: number of moves OR node: dpn = # moves AND node: pn = # moves Domain-dependent (knowledge) Heuristic should be admissible (never overestimates)

Single-Agent Search

Single Agent Search A* algorithms Deductive Search

A* Search Hart et al. (1968) Among the wide variety of pathfinding algorithms, A* is one of the best. A* is a best-first search algorithm which will find the shortest path, if it exists, and will do so relatively quickly

Application: Pathfinding
Consider a sample application Tile-Based Graph Find a minimal cost path from a start node to a goal node Can move one square horizontally or vertically, each with a cost of one Can be generalized to include diagonals Can be generalized to include variable costs

Idea: avoid expanding paths that are already expensive
A* Search Idea: avoid expanding paths that are already expensive Evaluation function f(n) = g(n) + h(n) g(n) = cost so far to reach n h(n) = estimated cost from n to goal (heuristic) f(n) = estimated total cost of path through n to goal

Admissible Heuristics
A heuristic h(n) is admissible if for every node n, h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n. An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal

Heuristic: Manhantan Distance

A* Data Structure OpenList ClosedList A* Data Structure
List of nodes in the graph/tree that are not fully considered Ordered from best to worst f value ClosedList Nodes that have been fully expanded

Take best (first) node from OpenList
A* Algorithm (1) Take best (first) node from OpenList Check for solution Expand all the children Move node to the ClosedList As far as we know, done with this node

In effect the lists act as a cache of previously seen results
A* Algorithm (2) Expanding a child Check if seen before Open/ClosedList If the node has been seen before with the same or better g value, then reject Add to OpenList for consideration In effect the lists act as a cache of previously seen results NOTE: the algorithm requires all nodes to be in these lists

Example ( ) A B C D E F Step 1: Initialize Open:
( C1, = 5, null ) Closed ( ) Step 2: Expand C1 Open ( C2, = 5, C1 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) 6 5 4 3 2 1 1+4=5 1+6=7 1+4=5 0+5 A B C D E F

Example Step 3: Expand C2 A B C D E F Open: Closed
( C3, = 5, C2 ) ( D2, = 5, C2 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ) ( C2, = 5, C1 ) Why isn’t C1 added to the OpenList? C1 is found in the ClosedList with a lower value 6 5 4 3 2 1 2+3=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

Example Step 4: Expand C3 A B C D E F Open ( D3, 3 + 2 = 5, C3 )
Closed ( C1, = 5, null ) ( C2, = 5, C1 ) ( C3, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

Example Step 5: Expand D3 A B C D E F Open ( D2, 2 + 3 = 5, C2 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

Example Step 6: Expand D2 A B C D E F Open ( D1, 1 + 4 = 5, C1 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

Example Step 7: Expand D1 A B C D E F Open ( E3, 4 + 3 = 7, D3 )
6 5 4 3 2 1 Step 7: Expand D1 Open ( E3, = 7, D3 ) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ) 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

Example Step 8: Expand E3 A B C D E F Open ( E4, 5 + 2 = 7, E3)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F3, = 9, E3) Closed ( C1, = 5, null ),( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ) 6 5 4 3 2 1 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

Example Step 9: Expand E4 A B C D E F Open ( E5, 6 + 1 = 7, E4)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F4, = 9, E4) ( F3, = 9, E3) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3) 6 5 4 3 2 1 6+1=7 5+2=7 6+3=9 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

Step 10: Expand E5 Open ( Goal, = 7, E5) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) ( E6, = 9, E5) ( F5, = 9, E5) ( F4, = 9, E4) ( F3, = 9, E3) Closed: ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ),( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4) 6 5 4 3 2 1 7+2=9 6+1=7 7+2=9 7+0=7 6+3=9 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

Example Closed Step 11: Close Goal A B C D E F Done
Backtrack to find path Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4), ( Goal, = 7, E5) 6 5 4 3 2 1 A B C D E F

Sorting Open List Sort by increasing f value, but what about ties?
Break ties based on g value Larger g values mean more accurate information and less heuristic approximation

Pros Cons A* Pros and Cons Clean algorithm
Guaranteed to find shortest path Flexible Cons Can be computationally intensive Doesn’t handle unexplored areas well

IDA* Iterative deepening A* (Korf, 1985)
The cost of a node is (using A* terms) f = g + h g = cost incurred to get to this node h = heuristic estimate of getting to goal Iterative deepening iterates on a threshold T Search a node as long as f ≤ T Either find a solution (done), or fail, in which case the threshold is increased and a new search starts

IDA* Tree Depth-first search Root’s f value = T Search nodes ≤ T
V≤T Depth-first search Root’s f value = T Search nodes ≤ T Search nodes ≤ T+1 Repeat until solution V≤T+1 V≤T+2

IDA* Comments Automatically builds a variable-depth search
Provably bad lines are cutoff as soon as possible When the cutoff occurs depends on the quality of the evaluation function Storage requirements are trivial; just the recursion stack Stack can be used to prevent repetitions along the investigated path Iteration i+1 repeats all the work of iteration i! For some domains can do better than iterate by 1 Use the minimum f-value seen at a leaf node during an iteration as the next threshold

6 5 4 3 2 1 Example IDA* T =5 h=4 h=3 h=2 h=3 C1(5) B1 (7) C2 (5) D1 (5) h=4 h=3 h=6 h=4 h=5 h=5 C3 (5) D2 (5) D2 (5) E1 (7) A B C D E F B3 (7) D3 (5) D3 (5) D1 (7) C2 (7) D3 (5) E3 (7) D2 (7) C3 (7) E3 (7) C3 (7) E3 (7) Move are generated as follows: left, up, right, down

Example (IDA*) Next iteration, T = 7 A B C D E F Exit f ≤ T and Goal 6
5 4 3 2 1 Example (IDA*) h=3 h=2 Next iteration, T = 7 h=3 h=1 h=4 h=3 h=2 C1(5) B1 (7) h=5 h=4 h=3 h=2 h=3 C2 (5) h=4 A1(9) C3 (5) h=7 h=6 h=5 A B C D E F B3 (7) D3 (5) E3 (7) A3 (9) B4 (7) E4(7) A4 (9) B5 (7) E5(7) A5 (9) B6 (9) Goal(7) Exit f ≤ T and Goal

Eliminating Redundant Nodes
Need to eliminate duplicate nodes Trivial optimization for many domains is to disallow move reversals For more sophisticated detection of redundant nodes, we can use a hash / transposition table

A* does not have the iterative overhead of IDA*
IDA* vs. A* A* does not have the iterative overhead of IDA* A* needs to maintain a history of all nodes previously searched In practice, faster than IDA*, but A* runs out of memory very quickly!

IDA* vs. A* For many types of problems, IDA* flounders
in the cost of the re-searches, causing many to prefer A* over IDA* IDA* is handicapped with no storage! A* uses a closed list -- in effect a perfect cache of previously seen states IDA* does not need much storage IDA* with a transposition table can be competitive with A*

IDA* is guaranteed to work, albeit possibly more slowly
Which to Choose? IDA* is guaranteed to work, albeit possibly more slowly A* is more efficient, but can run out of memory Can also run slower because of cache effects The right choice depends on properties of your application

Not only for pathfinding Single-agent search Applications
A* and IDA* Not only for pathfinding Single-agent search Applications Optimization/Scheduling Puzzles (e.g. sliding puzzle)

A* Real-time Example Baumgarten (2009)

Deductive Search

Monte Carlo Tree Search

(Computer) Go Vague concepts: Life and Death, Territory, Influence, Patterns Hard to make implicit human knowledge explicit No evaluation function, αβ search fails Alternatives: neural networks, decision trees, rule-based approaches, theorem provers, pattern recognition There has to be something better than this…..

Monte-Carlo Evaluation
-2 7 17 4 12 -5

Monte-Carlo Evaluation
Monte-Carlo Sampling: Idea from physics and simulation (Manhattan Project) Abramson (1990) in Chess, Othello, Tic-Tac-Toe Brügmann (1993) in Go Phantom Go, Bridge, Scrabble Boomed in 2000s Bouzy & Helmstetter (2003)

Limits of MC Evaluation
Time cost per sample must be low Statistics on samples must predict reliably Can choose move with suboptimal game value Integrate with tree search Problem: slow node evaluation Limited look ahead μ: 0.33 min: 0.33 μ: 1 min: 0.1 0.33 0.33 0.33 1.9 0.1

Monte-Carlo Tree Search
Coulom (2007) & Kocsis & Szepesvári (2006) Overcome limits of MC evaluation + search Build gradually a search tree Best-first search Very popular Revolution in Go Applied in other abstract games Amazons, Hex, LOA, Scotland Yard, Chinese Checkers, Lord of the Rings Real-time games (e.g., Ms Pacman) Real-life domains optimization, scheduling, & security 93

MCTS Scheme Selection Play-out Expansion Backpropagation
Repeated X times Selection Play-out Expansion Backpropagation The selection strategy is applied recursively until an unknown position is reached One simulated game is played One node is added to the tree The result of this game is backpropagated in the tree

MCTS R Z A C B D E F Player 1 wins

Selection: Multi-Armed Bandit Problem

Selection Step: UCT Upper Confidence Bound (Auer et al. 2002)
Balance exploitation and exploration Upper Confidence bounds for Trees, UCT (Kocsis & Szepesvári, 2006) Selects the child k of the node p according: vi is the value of the node i, (average reward) ni is the visit count of i, and np is the visit count of p. C is a coefficient ( ). 97

Possible Enhancements

Play-out Plays nearly-random moves (default)
Moves are played until the end of the game “Battle of the Apes” Uses heuristics (a simulation strategy)

Play-out: Game independent
Based on move statistics MAST (Finnsson and Bjornsson, 2009) Last-good-reply (Drake & Baier, 2010) N-grams (Stankiewicz et al., 2011; Tak et al., 2012) NAST(Powley et al., 2013) Better than Random

Play-out: Basic Knowledge
Simple: Roulette-Wheel Selection (based on their category moves have a certain weight) Rules to exclude bad moves, rules to enforce to play good moves Sophisticated: MCTS does not require an evaluation function But if you have one, make use of it

Play-out: Heuristic Evaluation Function
Selecting moves ε-greedy, select moves with a probability ε random otherwise use evaluation function Use a n-ply search to determine which move to select (αβ-search) Time consuming Needs aggressive pruning Chinese Checkers, LOA, Focus, Breakthrough

Play-out: Heuristic Evaluation Function
Early cutoff: stop a playout before a terminal state is reached Dynamic Check periodically evaluation function v0(s) return 1 if v0(s)≥x or -1 if v0(s) ≤–x Winands et al. (2008, 2009) Fixed-depth play i moves in playout and then return v0(s) (or scaled to [-1,1]) (Lorentz, 2008; Lorentz & Horey, 2013)

Adding knowledge may increase the numbers of play-outs
Play-out: Black Magic Warning: a smart simulation strategy can perform less than a simple one Too much determinism, less exploration Adding knowledge may increase the numbers of play-outs Why? Play-outs may be shorter.

Selection: Challenges for UCT
When few simulations have been played, UCT performs poorly When the branching factor is too high, only a few games can be played for a promising move

Selection: Progressive Bias
Chaslot et al. (2008) Adds domain knowledge Hi to UCT For few simulations Hi is dominant For many simulations, behaves at the standard UCT Value of Hi can be stored at the node This knowledge can be more elaborate (and time-consuming) than used in play-out

Selection: PUCT this search control
this search control strategy initially prefers actions with high prior probability and low visit count, but asympotically prefers actions with high action value

Selection: Progressive Widening
Chaslot et al. (2007) and Coulom (2007) Forward pruning p np>T Moves are selected according to the selection strategy amongst the unpruned moves Moves are progressively unpruned np=T The domain knowledge is applied Most of the moves are pruned np<T their simulation strategy Every move can be chosen np>>T

MCTS-Solver Winands, Björnsson & Saito (2008)
Deals with perfect knowledge Propagates proven wins and losses (negamax way) Nodes that are proven are not selected anymore Can detect hard traps (win, losses), but not soft ones (losing a piece)

Implicit Minimax Backups
Lanctot et al. (2014) Use heuristic evaluation function For soft traps Separate source of information v Minimax-style back ups are done implicitly during the standard update Approximation of MCTS-solver for subtrees that didn’t reach terminal states

Implicit Minimax Backups in MCTS

Leaf Parallelization Root Parallelization Tree Parallelization
Enzenberger & Muller (2009): Tree parallelization without mutexes /locks

MCTS Strengths / Weaknesses

MCTS applications

MCTS Application: Other two-player turn-based games
Havannah Hex LOA Amazons

Crossword Puzzle Construction
MCTS: Puzzles SameGame Crossword Puzzle Construction Sudoku

Real Life Applications
Hospital Planning Chemical Retrosynthesis Space Exploration MCTS is used in the following (real-life) applications: Controlling several virtual characters in a 3D multi-player learning game (i.e., 3D Virtual Operating Room project (3DVOR) is a learning game that aims at improving collaboration and communication amongst working operating room staff: surgeon, anesthesiologist, nurse anesthesiologist, and nurse instrumentalist) Balancing electricity supply and consumption is critical for the stable performance of an electricity Grid. Demand Side Management (DSM) refers to shifting consumers' energy usage to off-peaks as much as possible to avoid more electricity demand than available supply during peak times Determine good combinations of services that minimize the risk of emergency re-hospitalization as much as possible (Philips, Ralf knows more about this project) Smart Grids

Healthcare: Service Selection

MCTS Application: Multi-player Games
Rolit Chinese Checkers Blokus Focus

Stochastic/Imperfect-Information Games
Scotland Yard Magic: The Gathering Chinese Dark Chess

General Video Game Playing
General AI Game Player General Video Game Playing General Game Playing

Real-Time / Simultaneous Move Games
Pacman Starcraft TOTAL WAR: ROME II Tron

Ms Pac-Man - Demo

Crazy Stone vs. Norimoto Yoda
MCTS: Computer Go Crazy Stone vs. Norimoto Yoda 2008: MoGo Titan (INRIA France and UM) defeats grand master with 9-stone handicap 2015: Crazy Stone defeats grand master with stone handicap Google DeepMind & Facebook enter the race

Predicts 57% of the expert moves Integrated in MCTS
AlphaGo (2016) Deep Convolutional Neural Neural Networks Predicts 57% of the expert moves Integrated in MCTS Predict 57% of the expert moves 1,202 CPUs and 176 GPUs.

AlphaGo (2016) 2016: Google Deepmind’s AlphaGo defeats Lee Sedol
Massive hardware: 1920 CPU and 280 GPUs 2017: Defeats world No. 1 ranking player Ke Jie Learn from human Go records 4-1

AlphaGo Zero (2017) Plays against itself using MCTS
Repeated update CNN by analyzing the games played in batch 29 million games, $25 million machine 200point gap corresponds to a 75% probability of winning 89-11

Can learn more games to play
Alpha Zero (2017) Generalized version Can learn more games to play Outperforms strongest engine in Chess and Shogi (Japanese Chess)

Highly Selective Search Not much domain knowledge needed
MCTS: In the end Highly Selective Search Not much domain knowledge needed Easy to Parallelize Can be used in real-time

Evaluation function become obsolete
MCTS Myths Evaluation function become obsolete Can be used in the play-out Can be used for selection Can be used to assess the play-out Classic search algorithms become obsolete Can be used in play-out Can be integrated in the search part of MCTS

Abstraction / Continuous Domains Separate Subgames / tasks (Go)
MCTS: Challenges Forward model Abstraction / Continuous Domains Separate Subgames / tasks (Go)

IEEE Conference on Computational Intelligence and Games Maastricht, NL, August 14-17, 2018

Intelligent Search Techniques

Similar presentations

Presentation on theme: "Intelligent Search Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Search Techniques

Similar presentations

Presentation on theme: "Intelligent Search Techniques"— Presentation transcript:

Similar presentations

About project

Feedback