Download presentation
Presentation is loading. Please wait.
1
Intelligent Search Techniques
Mark Winands & Cameron Browne
2
Monte Carlo Tree Search Applications
1-Jan-19 Contents Adversarial Search Single Agent Search Monte Carlo Tree Search Applications
3
Computer Game-playing
1-Jan-19 Computer Game-playing Can computers beat humans in board games like Chess, Checkers, Go? This is one of the first tasks of AI (Shannon 1950)
4
Two (or more) opponents, each trying to maximize their expectations
Adversarial Search Two (or more) opponents, each trying to maximize their expectations Player 1 is called MAX Obtain the maximum result Minimize that of the opponent Player 2 is called MIN Obtain the minimum result Maximize that of the opponent
5
Definitions - Nodes Root node Terminal node Leaf node (non-terminal)
State (position) which is to be searched Terminal node A node which has a fixed application dependent value (e.g., win, loss, draw) Leaf node (non-terminal) A node which has been assigned a heuristic value A heuristic is an “educated guess” to an approximate terminal value Internal nodes Nodes whose value is a function of the successors
6
Definitions - Tree Search depth d
Number of state transitions (moves) from the root of the search to the current state position (measured in ply) Branching factor b Average number of successor nodes (moves) Tree vs. Directed acyclic graph (DAG) Most trees are really DAGs A node can have 1 parent (tree) or possible more than 1 (DAG) Transposition
7
Tree (Traversal) Depth-first search Left to right
Other ways of traversal possible, but for the remainder we use this one!
8
MiniMax Search (Von Neumann, 1928)
3 Min 3 2 Max 7 3 4 2
9
Path from root to leaf node of optimal play by each side Optimal path
Principal Variation Path from root to leaf node of optimal play by each side Optimal path Main line 3 4 2 7
10
MiniMax Analysis Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent) Time complexity? O(bd) Space complexity? O(bd) (depth-first exploration) For chess, b ≈ 35, d ≈100 for "reasonable" games exact solution completely infeasible Can we do better?
11
Observation Some nodes in the search can be proven to be irrelevant to the outcome of the search
12
α-β Algorithm 3 3 ≤ 2 β-pruning 2 2 7 3 4
13
The Strength of α-β 3 4 2 3 3 ≤ 2 More than thousand prunings
14
The Importance of α-β Algorithm
3 3 ≤ 2 β-pruning 4 2 3
15
Example: Alpha-Beta Algorithm
≥-4 5 ≤5 ≤-4 -4 ≤-6 ≤5 5 ≥6 ≥5 5 ≥-6 -4 ≥ -2 ≥5 5 ≥6 -6 5 6 -6 -4 -2 5 4 6 Principal Variation
16
Example: Alpha-Beta Algorithm
7 6 3 2 8 9 4 5 10 1 11 12 13 14 15
17
Example: Alpha-Beta Algorithm
6 ≥6 6 ≤2 6 ≥8 ≤2 6 ≤3 8 ≤1 ≤2 Shallow pruning 7 6 3 2 8 9 4 5 1 10 2 11 12 13 14 15 Deep pruning
18
Alpha-Beta Algorithm Why is it called alpha-beta? Maintain two bounds:
Alpha (α): a lower bound on the best value that the player can achieve Beta (β): an upper bound on what the opponent can achieve Search, maintaining α and β Whenever α ≥ β, further search at this node is irrelevant
19
Negate the values first
NegaMax Formulation MiniMax formulation is awkward because the search alternates between MINs and MAXs The NegaMax formulation allows only a MAX to be used (Knuth & Moore, 1975) Always maximize, but… Negate the values first
20
NegaMax 5 Negate, then maximize Negate, then maximize Negate, then maximize -6 -4 5 6 2 -8 4 -6 -4 -2 Discard minimax values for MIN leaf nodes. Replaced by negation -5 -6 -2 8 -4 6 4 2
21
Analysis What is the best case for Alpha-Beta?
Consider two cases in this MiniMax Search:
22
Better known as move ordering
Successor Ordering Better known as move ordering Alpha-beta’s performance depends on getting cut offs as soon as possible! At a node where a cut-off is possible, ideally wants to search (one of the) best move(s) first, and cut-off immediately
23
ALL – all successor (moves) of a node must be considered
Alpha-Beta Node Types Define two node types ALL – all successor (moves) of a node must be considered CUT – a cut-off can occur; one of more successors (moves) or a node must be considered
24
Minimal α-β Tree In reality you don’t know this!
25
Alpha-Beta Analysis Assume a fixed branching factor and a fixed depth
Best case: Approximate bd/2 Impact? b = 10, d = 9 Minimax: 109 = 1,000,000,000 Alpha-beta: = 110,000
26
The worst case? No cut offs, and alpha-beta degrades to MiniMax
Alpha-Beta Analysis But… best-case analysis depends on choosing the best move first at CUT nodes (not always possible) The worst case? No cut offs, and alpha-beta degrades to MiniMax
27
Heuristic Search Playing is solving a sequence of these game trees
0.25 –1 1 2 3 0.33 0.5 -1 Truncate the game tree (limited search depth) Use a (static heuristic) evaluation function at the leaves to replace pay-offs Minimax (with alpha-beta) on the reduced game tree Playing is solving a sequence of these game trees This approach works very well in Chess, Checkers, Backgammon
28
Quiescence Search A quiescent position is unlikely to show wild swings of value in near future Apply Eval func only to quiescent positions Expand until quiescent position found Instead of using the evaluation function at the leaves, a special function is called that evaluates special moves (e.g. captures) only down to (infinite) depth Selective Search
29
Quiescence Search D =0 Eval = 50 QS=1000
30
Holds for other games too!
Isn’t this good enough? No! Thompson (1982): search depth is strongly correlated with performance in chess Searching one move (one ply) deeper made a (huge) difference in performance Holds for other games too!
31
Performance! Performance!
Improve Alpha-Beta to guarantee near best-case results Move ordering Windowing Iterative deepening Transposition Tables Improve the heuristic evaluation Use parallelism to increase the search depth
32
Why Alpha-Beta search first?
Many search enhancements developed for alpha-beta translate to single-agent search Most originated with alpha-beta, and were adopted by other classes of search algorithms
33
Generalization of minimax to n players Assumption:
Maxn Algorithm Generalization of minimax to n players Luckhardt and Irani, 1986 Assumption: The players alternate moves Each player tries to maximize his/her return Indifferent to returns of others.
34
Maxn Algorithm 1 (9,9,5) 2 (4,5,4) 2 (9,9,5) 3 (7,1,8) 3 (4,5,4) 3
(1,8,3) 3 (9,9,5) 1 1 1 1 1 1 1 1 (5,3,2) (7,1,8) (8,5,4) (4,5,4) (1,8,3) (6, 6, 3) (3, 6, 3) (9 , 9, 5)
35
Paranoid Algorithm Here we see the other players as one big opponent (Sturtevant and Korf 2000) There is my own player. The Max player. There are all the others. The Min players. The paranoid algorithm evaluates the tree as follows. When it is my turn to play – take the maximum of my utility. When it is not my turn (it is one of them) I take the minimum of my utility.
36
Paranoid Algorithm ≥4 4 4 ≤1 5 4 ≤1 5 7 8 4 1 6 3 9
37
Expectimax Search Trees
Chance nodes when the outcome is uncertain Search-based approaches must take into account all possibilities at a chance node Increases the branching factor making deep search unlikely
38
Expectimax Example
39
What to do in the endgame?
Alpha-beta with enhancements (move ordering, transposition tables) Knowledge (domain dependent) Endgame databases Special search algorithms (endgame solvers) Proof-number search Lambda-search
40
Which node has to be expanded?
MAX a MIN c b d e f g h MAX ? Win ? Win ?
41
Which node has to be expanded?
MAX b c MIN d e f g h MAX ? ? ? i j k l m MIN ? ? ? ? ?
42
Best-first search method
PN search Allis et al. (1994) Best-first search method Criterion: develop the leaf node that is most promising to prove the goal Goal: (dis)prove the root node E.g., to be a win for the player to move
43
Proof number and Disproof Number
Proof number (pn): the minimum number of leaf nodes which have to be proved in order to prove the node Disproof number (dpn): the minimum number of leaf nodes which have to be disproved in order to disprove the node Proof number and disproof number for each node Assume one expansion for each unexpanded node
44
Three types of leaf nodes:
PN search (2) Three types of leaf nodes: Proved (goal is true): Disproved (goal is false): Unknown:
45
PN Search: AND/OR AND/OR Tree Internal nodes OR and AND nodes:
In MinMax tree OR equivalent with MAX and AND with MIN To prove an OR node it suffices to prove one child. To disprove an OR node all the children have to be disproved. To prove an AND node all the children have to be proved. To disprove an AND node it suffices to disprove one child.
46
PN Search: Back propagation
Two types of internal nodes (tree): OR node AND node
47
PN Search: Node selection
Best-first search Most-promising = most-proving Path to most-promising node: At OR node choose child with min pn At AND node choose child with min dpn
48
PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 ? win ?
b c 1 2 1 e f g d loss ? 1 1 1 h i j k l ? win ? draw ?
49
PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 1 ? 1 ?
b c 1 2 1 e f g d loss ? 1 2 1 1 1 h i j k l n m ? 1 ? win draw ?
50
PN Search Strength Weakness w w w w ? w Solution 11-ply deep w w w w 1
3 PN Search w 1 w w w ? Strength 1 w 1 Solution 11-ply deep 1 w 1 Weakness 1 w 1 1 w 1 w
51
Endgame positions with forced moves
PN Search Endgame positions with forced moves Other AND/OR Structures (Retrosynthesis) Mostly (dis)proves this kind of positions faster than alpha-beta Deep forced wins Weakest resistance Does not always find the shortest path Extreme cases have been reported Breuker (1998) PN found a mate in 114, while there existed a mate in 4
52
PN Search Does not need a domain-dependent heuristic evaluation function to determine the most-promising node to be expanded next Domain knowledge is not required The shape of the tree and the value of the leaf nodes is used to select the most-proving node You may use heuristics at leaf nodes
53
PN search: heuristic knowledge
Use heuristics at unknown leaf nodes to set their proof and disproof number Domain-independent: number of moves OR node: dpn = # moves AND node: pn = # moves Domain-dependent (knowledge) Heuristic should be admissible (never overestimates)
54
Single-Agent Search
55
Single Agent Search A* algorithms Deductive Search
56
A* Search Hart et al. (1968) Among the wide variety of pathfinding algorithms, A* is one of the best. A* is a best-first search algorithm which will find the shortest path, if it exists, and will do so relatively quickly
57
Application: Pathfinding
Consider a sample application Tile-Based Graph Find a minimal cost path from a start node to a goal node Can move one square horizontally or vertically, each with a cost of one Can be generalized to include diagonals Can be generalized to include variable costs
58
Idea: avoid expanding paths that are already expensive
A* Search Idea: avoid expanding paths that are already expensive Evaluation function f(n) = g(n) + h(n) g(n) = cost so far to reach n h(n) = estimated cost from n to goal (heuristic) f(n) = estimated total cost of path through n to goal
59
Admissible Heuristics
A heuristic h(n) is admissible if for every node n, h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n. An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
60
Heuristic: Manhantan Distance
61
A* Data Structure OpenList ClosedList A* Data Structure
List of nodes in the graph/tree that are not fully considered Ordered from best to worst f value ClosedList Nodes that have been fully expanded
62
Take best (first) node from OpenList
A* Algorithm (1) Take best (first) node from OpenList Check for solution Expand all the children Move node to the ClosedList As far as we know, done with this node
63
In effect the lists act as a cache of previously seen results
A* Algorithm (2) Expanding a child Check if seen before Open/ClosedList If the node has been seen before with the same or better g value, then reject Add to OpenList for consideration In effect the lists act as a cache of previously seen results NOTE: the algorithm requires all nodes to be in these lists
64
Example ( ) A B C D E F Step 1: Initialize Open:
( C1, = 5, null ) Closed ( ) Step 2: Expand C1 Open ( C2, = 5, C1 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) 6 5 4 3 2 1 1+4=5 1+6=7 1+4=5 0+5 A B C D E F
65
Example Step 3: Expand C2 A B C D E F Open: Closed
( C3, = 5, C2 ) ( D2, = 5, C2 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ) ( C2, = 5, C1 ) Why isn’t C1 added to the OpenList? C1 is found in the ClosedList with a lower value 6 5 4 3 2 1 2+3=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F
66
Example Step 4: Expand C3 A B C D E F Open ( D3, 3 + 2 = 5, C3 )
Closed ( C1, = 5, null ) ( C2, = 5, C1 ) ( C3, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F
67
Example Step 5: Expand D3 A B C D E F Open ( D2, 2 + 3 = 5, C2 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F
68
Example Step 6: Expand D2 A B C D E F Open ( D1, 1 + 4 = 5, C1 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F
69
Example Step 7: Expand D1 A B C D E F Open ( E3, 4 + 3 = 7, D3 )
6 5 4 3 2 1 Step 7: Expand D1 Open ( E3, = 7, D3 ) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ) 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F
70
Example Step 8: Expand E3 A B C D E F Open ( E4, 5 + 2 = 7, E3)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F3, = 9, E3) Closed ( C1, = 5, null ),( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ) 6 5 4 3 2 1 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F
71
Example Step 9: Expand E4 A B C D E F Open ( E5, 6 + 1 = 7, E4)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F4, = 9, E4) ( F3, = 9, E3) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3) 6 5 4 3 2 1 6+1=7 5+2=7 6+3=9 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F
72
Step 10: Expand E5 Open ( Goal, = 7, E5) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) ( E6, = 9, E5) ( F5, = 9, E5) ( F4, = 9, E4) ( F3, = 9, E3) Closed: ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ),( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4) 6 5 4 3 2 1 7+2=9 6+1=7 7+2=9 7+0=7 6+3=9 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F
73
Example Closed Step 11: Close Goal A B C D E F Done
Backtrack to find path Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4), ( Goal, = 7, E5) 6 5 4 3 2 1 A B C D E F
74
Sorting Open List Sort by increasing f value, but what about ties?
Break ties based on g value Larger g values mean more accurate information and less heuristic approximation
75
Pros Cons A* Pros and Cons Clean algorithm
Guaranteed to find shortest path Flexible Cons Can be computationally intensive Doesn’t handle unexplored areas well
76
IDA* Iterative deepening A* (Korf, 1985)
The cost of a node is (using A* terms) f = g + h g = cost incurred to get to this node h = heuristic estimate of getting to goal Iterative deepening iterates on a threshold T Search a node as long as f ≤ T Either find a solution (done), or fail, in which case the threshold is increased and a new search starts
77
IDA* Tree Depth-first search Root’s f value = T Search nodes ≤ T
V≤T Depth-first search Root’s f value = T Search nodes ≤ T Search nodes ≤ T+1 Repeat until solution V≤T+1 V≤T+2
78
IDA* Comments Automatically builds a variable-depth search
Provably bad lines are cutoff as soon as possible When the cutoff occurs depends on the quality of the evaluation function Storage requirements are trivial; just the recursion stack Stack can be used to prevent repetitions along the investigated path Iteration i+1 repeats all the work of iteration i! For some domains can do better than iterate by 1 Use the minimum f-value seen at a leaf node during an iteration as the next threshold
79
6 5 4 3 2 1 Example IDA* T =5 h=4 h=3 h=2 h=3 C1(5) B1 (7) C2 (5) D1 (5) h=4 h=3 h=6 h=4 h=5 h=5 C3 (5) D2 (5) D2 (5) E1 (7) A B C D E F B3 (7) D3 (5) D3 (5) D1 (7) C2 (7) D3 (5) E3 (7) D2 (7) C3 (7) E3 (7) C3 (7) E3 (7) Move are generated as follows: left, up, right, down
80
Example (IDA*) Next iteration, T = 7 A B C D E F Exit f ≤ T and Goal 6
5 4 3 2 1 Example (IDA*) h=3 h=2 Next iteration, T = 7 h=3 h=1 h=4 h=3 h=2 C1(5) B1 (7) h=5 h=4 h=3 h=2 h=3 C2 (5) h=4 A1(9) C3 (5) h=7 h=6 h=5 A B C D E F B3 (7) D3 (5) E3 (7) A3 (9) B4 (7) E4(7) A4 (9) B5 (7) E5(7) A5 (9) B6 (9) Goal(7) Exit f ≤ T and Goal
81
Eliminating Redundant Nodes
Need to eliminate duplicate nodes Trivial optimization for many domains is to disallow move reversals For more sophisticated detection of redundant nodes, we can use a hash / transposition table
82
A* does not have the iterative overhead of IDA*
IDA* vs. A* A* does not have the iterative overhead of IDA* A* needs to maintain a history of all nodes previously searched In practice, faster than IDA*, but A* runs out of memory very quickly!
83
IDA* vs. A* For many types of problems, IDA* flounders
in the cost of the re-searches, causing many to prefer A* over IDA* IDA* is handicapped with no storage! A* uses a closed list -- in effect a perfect cache of previously seen states IDA* does not need much storage IDA* with a transposition table can be competitive with A*
84
IDA* is guaranteed to work, albeit possibly more slowly
Which to Choose? IDA* is guaranteed to work, albeit possibly more slowly A* is more efficient, but can run out of memory Can also run slower because of cache effects The right choice depends on properties of your application
85
Not only for pathfinding Single-agent search Applications
A* and IDA* Not only for pathfinding Single-agent search Applications Optimization/Scheduling Puzzles (e.g. sliding puzzle)
86
A* Real-time Example Baumgarten (2009)
87
Deductive Search
88
Monte Carlo Tree Search
89
(Computer) Go Vague concepts: Life and Death, Territory, Influence, Patterns Hard to make implicit human knowledge explicit No evaluation function, αβ search fails Alternatives: neural networks, decision trees, rule-based approaches, theorem provers, pattern recognition There has to be something better than this…..
90
Monte-Carlo Evaluation
-2 7 17 4 12 -5
91
Monte-Carlo Evaluation
Monte-Carlo Sampling: Idea from physics and simulation (Manhattan Project) Abramson (1990) in Chess, Othello, Tic-Tac-Toe Brügmann (1993) in Go Phantom Go, Bridge, Scrabble Boomed in 2000s Bouzy & Helmstetter (2003)
92
Limits of MC Evaluation
Time cost per sample must be low Statistics on samples must predict reliably Can choose move with suboptimal game value Integrate with tree search Problem: slow node evaluation Limited look ahead μ: 0.33 min: 0.33 μ: 1 min: 0.1 0.33 0.33 0.33 1.9 0.1
93
Monte-Carlo Tree Search
Coulom (2007) & Kocsis & Szepesvári (2006) Overcome limits of MC evaluation + search Build gradually a search tree Best-first search Very popular Revolution in Go Applied in other abstract games Amazons, Hex, LOA, Scotland Yard, Chinese Checkers, Lord of the Rings Real-time games (e.g., Ms Pacman) Real-life domains optimization, scheduling, & security 93
94
MCTS Scheme Selection Play-out Expansion Backpropagation
Repeated X times Selection Play-out Expansion Backpropagation The selection strategy is applied recursively until an unknown position is reached One simulated game is played One node is added to the tree The result of this game is backpropagated in the tree
95
MCTS R Z A C B D E F Player 1 wins
96
Selection: Multi-Armed Bandit Problem
97
Selection Step: UCT Upper Confidence Bound (Auer et al. 2002)
Balance exploitation and exploration Upper Confidence bounds for Trees, UCT (Kocsis & Szepesvári, 2006) Selects the child k of the node p according: vi is the value of the node i, (average reward) ni is the visit count of i, and np is the visit count of p. C is a coefficient ( ). 97
98
Possible Enhancements
99
Play-out Plays nearly-random moves (default)
Moves are played until the end of the game “Battle of the Apes” Uses heuristics (a simulation strategy)
100
Play-out: Game independent
Based on move statistics MAST (Finnsson and Bjornsson, 2009) Last-good-reply (Drake & Baier, 2010) N-grams (Stankiewicz et al., 2011; Tak et al., 2012) NAST(Powley et al., 2013) Better than Random
101
Play-out: Basic Knowledge
Simple: Roulette-Wheel Selection (based on their category moves have a certain weight) Rules to exclude bad moves, rules to enforce to play good moves Sophisticated: MCTS does not require an evaluation function But if you have one, make use of it
102
Play-out: Heuristic Evaluation Function
Selecting moves ε-greedy, select moves with a probability ε random otherwise use evaluation function Use a n-ply search to determine which move to select (αβ-search) Time consuming Needs aggressive pruning Chinese Checkers, LOA, Focus, Breakthrough
103
Play-out: Heuristic Evaluation Function
Early cutoff: stop a playout before a terminal state is reached Dynamic Check periodically evaluation function v0(s) return 1 if v0(s)≥x or -1 if v0(s) ≤–x Winands et al. (2008, 2009) Fixed-depth play i moves in playout and then return v0(s) (or scaled to [-1,1]) (Lorentz, 2008; Lorentz & Horey, 2013)
104
Adding knowledge may increase the numbers of play-outs
Play-out: Black Magic Warning: a smart simulation strategy can perform less than a simple one Too much determinism, less exploration Adding knowledge may increase the numbers of play-outs Why? Play-outs may be shorter.
105
Selection: Challenges for UCT
When few simulations have been played, UCT performs poorly When the branching factor is too high, only a few games can be played for a promising move
106
Selection: Progressive Bias
Chaslot et al. (2008) Adds domain knowledge Hi to UCT For few simulations Hi is dominant For many simulations, behaves at the standard UCT Value of Hi can be stored at the node This knowledge can be more elaborate (and time-consuming) than used in play-out
107
Selection: PUCT this search control
this search control strategy initially prefers actions with high prior probability and low visit count, but asympotically prefers actions with high action value
108
Selection: Progressive Widening
Chaslot et al. (2007) and Coulom (2007) Forward pruning p np>T Moves are selected according to the selection strategy amongst the unpruned moves Moves are progressively unpruned np=T The domain knowledge is applied Most of the moves are pruned np<T their simulation strategy Every move can be chosen np>>T
109
MCTS-Solver Winands, Björnsson & Saito (2008)
Deals with perfect knowledge Propagates proven wins and losses (negamax way) Nodes that are proven are not selected anymore Can detect hard traps (win, losses), but not soft ones (losing a piece)
110
Implicit Minimax Backups
Lanctot et al. (2014) Use heuristic evaluation function For soft traps Separate source of information v Minimax-style back ups are done implicitly during the standard update Approximation of MCTS-solver for subtrees that didn’t reach terminal states
111
Implicit Minimax Backups in MCTS
112
Leaf Parallelization Root Parallelization Tree Parallelization
Enzenberger & Muller (2009): Tree parallelization without mutexes /locks
113
MCTS Strengths / Weaknesses
114
MCTS applications
115
MCTS Application: Other two-player turn-based games
Havannah Hex LOA Amazons
116
Crossword Puzzle Construction
MCTS: Puzzles SameGame Crossword Puzzle Construction Sudoku
117
Real Life Applications
Hospital Planning Chemical Retrosynthesis Space Exploration MCTS is used in the following (real-life) applications: Controlling several virtual characters in a 3D multi-player learning game (i.e., 3D Virtual Operating Room project (3DVOR) is a learning game that aims at improving collaboration and communication amongst working operating room staff: surgeon, anesthesiologist, nurse anesthesiologist, and nurse instrumentalist) Balancing electricity supply and consumption is critical for the stable performance of an electricity Grid. Demand Side Management (DSM) refers to shifting consumers' energy usage to off-peaks as much as possible to avoid more electricity demand than available supply during peak times Determine good combinations of services that minimize the risk of emergency re-hospitalization as much as possible (Philips, Ralf knows more about this project) Smart Grids
118
Healthcare: Service Selection
119
MCTS Application: Multi-player Games
Rolit Chinese Checkers Blokus Focus
120
Stochastic/Imperfect-Information Games
Scotland Yard Magic: The Gathering Chinese Dark Chess
121
General Video Game Playing
General AI Game Player General Video Game Playing General Game Playing
122
Real-Time / Simultaneous Move Games
Pacman Starcraft TOTAL WAR: ROME II Tron
123
Ms Pac-Man - Demo
124
Crazy Stone vs. Norimoto Yoda
MCTS: Computer Go Crazy Stone vs. Norimoto Yoda 2008: MoGo Titan (INRIA France and UM) defeats grand master with 9-stone handicap 2015: Crazy Stone defeats grand master with stone handicap Google DeepMind & Facebook enter the race
125
Predicts 57% of the expert moves Integrated in MCTS
AlphaGo (2016) Deep Convolutional Neural Neural Networks Predicts 57% of the expert moves Integrated in MCTS Predict 57% of the expert moves 1,202 CPUs and 176 GPUs.
126
AlphaGo (2016) 2016: Google Deepmind’s AlphaGo defeats Lee Sedol
Massive hardware: 1920 CPU and 280 GPUs 2017: Defeats world No. 1 ranking player Ke Jie Learn from human Go records 4-1
127
AlphaGo Zero (2017) Plays against itself using MCTS
Repeated update CNN by analyzing the games played in batch 29 million games, $25 million machine 200point gap corresponds to a 75% probability of winning 89-11
128
Can learn more games to play
Alpha Zero (2017) Generalized version Can learn more games to play Outperforms strongest engine in Chess and Shogi (Japanese Chess)
129
Highly Selective Search Not much domain knowledge needed
MCTS: In the end Highly Selective Search Not much domain knowledge needed Easy to Parallelize Can be used in real-time
130
Evaluation function become obsolete
MCTS Myths Evaluation function become obsolete Can be used in the play-out Can be used for selection Can be used to assess the play-out Classic search algorithms become obsolete Can be used in play-out Can be integrated in the search part of MCTS
131
Abstraction / Continuous Domains Separate Subgames / tasks (Go)
MCTS: Challenges Forward model Abstraction / Continuous Domains Separate Subgames / tasks (Go)
132
IEEE Conference on Computational Intelligence and Games Maastricht, NL, August 14-17, 2018
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.