Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Search Techniques

Similar presentations


Presentation on theme: "Intelligent Search Techniques"— Presentation transcript:

1 Intelligent Search Techniques
Mark Winands & Cameron Browne

2 Monte Carlo Tree Search Applications
1-Jan-19 Contents Adversarial Search Single Agent Search Monte Carlo Tree Search Applications

3 Computer Game-playing
1-Jan-19 Computer Game-playing Can computers beat humans in board games like Chess, Checkers, Go? This is one of the first tasks of AI (Shannon 1950)

4 Two (or more) opponents, each trying to maximize their expectations
Adversarial Search Two (or more) opponents, each trying to maximize their expectations Player 1 is called MAX Obtain the maximum result Minimize that of the opponent Player 2 is called MIN Obtain the minimum result Maximize that of the opponent

5 Definitions - Nodes Root node Terminal node Leaf node (non-terminal)
State (position) which is to be searched Terminal node A node which has a fixed application dependent value (e.g., win, loss, draw) Leaf node (non-terminal) A node which has been assigned a heuristic value A heuristic is an “educated guess” to an approximate terminal value Internal nodes Nodes whose value is a function of the successors

6 Definitions - Tree Search depth d
Number of state transitions (moves) from the root of the search to the current state position (measured in ply) Branching factor b Average number of successor nodes (moves) Tree vs. Directed acyclic graph (DAG) Most trees are really DAGs A node can have 1 parent (tree) or possible more than 1 (DAG) Transposition

7 Tree (Traversal) Depth-first search Left to right
Other ways of traversal possible, but for the remainder we use this one!

8 MiniMax Search (Von Neumann, 1928)
3 Min 3 2 Max 7 3 4 2

9 Path from root to leaf node of optimal play by each side Optimal path
Principal Variation Path from root to leaf node of optimal play by each side Optimal path Main line 3 4 2 7

10 MiniMax Analysis Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent) Time complexity? O(bd) Space complexity? O(bd) (depth-first exploration) For chess, b ≈ 35, d ≈100 for "reasonable" games  exact solution completely infeasible Can we do better?

11 Observation Some nodes in the search can be proven to be irrelevant to the outcome of the search

12 α-β Algorithm 3 3 ≤ 2 β-pruning 2 2 7 3 4

13 The Strength of α-β 3 4 2 3 3 ≤ 2 More than thousand prunings

14 The Importance of α-β Algorithm
3 3 ≤ 2 β-pruning 4 2 3

15 Example: Alpha-Beta Algorithm
≥-4 5 ≤5 ≤-4 -4 ≤-6 ≤5 5 ≥6 ≥5 5 ≥-6 -4 ≥ -2 ≥5 5 ≥6 -6 5 6 -6 -4 -2 5 4 6 Principal Variation

16 Example: Alpha-Beta Algorithm
7 6 3 2 8 9 4 5 10 1 11 12 13 14 15

17 Example: Alpha-Beta Algorithm
6 ≥6 6 ≤2 6 ≥8 ≤2 6 ≤3 8 ≤1 ≤2 Shallow pruning 7 6 3 2 8 9 4 5 1 10 2 11 12 13 14 15 Deep pruning

18 Alpha-Beta Algorithm Why is it called alpha-beta? Maintain two bounds:
Alpha (α): a lower bound on the best value that the player can achieve Beta (β): an upper bound on what the opponent can achieve Search, maintaining α and β Whenever α ≥ β, further search at this node is irrelevant

19 Negate the values first
NegaMax Formulation MiniMax formulation is awkward because the search alternates between MINs and MAXs The NegaMax formulation allows only a MAX to be used (Knuth & Moore, 1975) Always maximize, but… Negate the values first

20 NegaMax 5 Negate, then maximize Negate, then maximize Negate, then maximize -6 -4 5 6 2 -8 4 -6 -4 -2 Discard minimax values for MIN leaf nodes. Replaced by negation -5 -6 -2 8 -4 6 4 2

21 Analysis What is the best case for Alpha-Beta?
Consider two cases in this MiniMax Search:

22 Better known as move ordering
Successor Ordering Better known as move ordering Alpha-beta’s performance depends on getting cut offs as soon as possible! At a node where a cut-off is possible, ideally wants to search (one of the) best move(s) first, and cut-off immediately

23 ALL – all successor (moves) of a node must be considered
Alpha-Beta Node Types Define two node types ALL – all successor (moves) of a node must be considered CUT – a cut-off can occur; one of more successors (moves) or a node must be considered

24 Minimal α-β Tree In reality you don’t know this!

25 Alpha-Beta Analysis Assume a fixed branching factor and a fixed depth
Best case: Approximate bd/2 Impact? b = 10, d = 9 Minimax: 109 = 1,000,000,000 Alpha-beta: = 110,000

26 The worst case? No cut offs, and alpha-beta degrades to MiniMax
Alpha-Beta Analysis But… best-case analysis depends on choosing the best move first at CUT nodes (not always possible) The worst case? No cut offs, and alpha-beta degrades to MiniMax

27 Heuristic Search Playing is solving a sequence of these game trees
0.25 –1 1 2 3 0.33 0.5 -1 Truncate the game tree (limited search depth) Use a (static heuristic) evaluation function at the leaves to replace pay-offs Minimax (with alpha-beta) on the reduced game tree Playing is solving a sequence of these game trees This approach works very well in Chess, Checkers, Backgammon

28 Quiescence Search A quiescent position is unlikely to show wild swings of value in near future Apply Eval func only to quiescent positions Expand until quiescent position found Instead of using the evaluation function at the leaves, a special function is called that evaluates special moves (e.g. captures) only down to (infinite) depth Selective Search

29 Quiescence Search D =0 Eval = 50 QS=1000

30 Holds for other games too!
Isn’t this good enough? No! Thompson (1982): search depth is strongly correlated with performance in chess Searching one move (one ply) deeper made a (huge) difference in performance Holds for other games too!

31 Performance! Performance!
Improve Alpha-Beta to guarantee near best-case results Move ordering Windowing Iterative deepening Transposition Tables Improve the heuristic evaluation Use parallelism to increase the search depth

32 Why Alpha-Beta search first?
Many search enhancements developed for alpha-beta translate to single-agent search Most originated with alpha-beta, and were adopted by other classes of search algorithms

33 Generalization of minimax to n players Assumption:
Maxn Algorithm Generalization of minimax to n players Luckhardt and Irani, 1986 Assumption: The players alternate moves Each player tries to maximize his/her return Indifferent to returns of others.

34 Maxn Algorithm 1 (9,9,5) 2 (4,5,4) 2 (9,9,5) 3 (7,1,8) 3 (4,5,4) 3
(1,8,3) 3 (9,9,5) 1 1 1 1 1 1 1 1 (5,3,2) (7,1,8) (8,5,4) (4,5,4) (1,8,3) (6, 6, 3) (3, 6, 3) (9 , 9, 5)

35 Paranoid Algorithm Here we see the other players as one big opponent (Sturtevant and Korf 2000) There is my own player. The Max player. There are all the others. The Min players. The paranoid algorithm evaluates the tree as follows. When it is my turn to play – take the maximum of my utility. When it is not my turn (it is one of them) I take the minimum of my utility.

36 Paranoid Algorithm ≥4 4 4 ≤1 5 4 ≤1 5 7 8 4 1 6 3 9

37 Expectimax Search Trees
Chance nodes when the outcome is uncertain Search-based approaches must take into account all possibilities at a chance node Increases the branching factor making deep search unlikely

38 Expectimax Example

39 What to do in the endgame?
Alpha-beta with enhancements (move ordering, transposition tables) Knowledge (domain dependent) Endgame databases Special search algorithms (endgame solvers) Proof-number search Lambda-search

40 Which node has to be expanded?
MAX a MIN c b d e f g h MAX ? Win ? Win ?

41 Which node has to be expanded?
MAX b c MIN d e f g h MAX ? ? ? i j k l m MIN ? ? ? ? ?

42 Best-first search method
PN search Allis et al. (1994) Best-first search method Criterion: develop the leaf node that is most promising to prove the goal Goal: (dis)prove the root node E.g., to be a win for the player to move

43 Proof number and Disproof Number
Proof number (pn): the minimum number of leaf nodes which have to be proved in order to prove the node Disproof number (dpn): the minimum number of leaf nodes which have to be disproved in order to disprove the node Proof number and disproof number for each node Assume one expansion for each unexpanded node

44 Three types of leaf nodes:
PN search (2) Three types of leaf nodes: Proved (goal is true): Disproved (goal is false): Unknown:

45 PN Search: AND/OR AND/OR Tree Internal nodes OR and AND nodes:
In MinMax tree OR equivalent with MAX and AND with MIN To prove an OR node it suffices to prove one child. To disprove an OR node all the children have to be disproved. To prove an AND node all the children have to be proved. To disprove an AND node it suffices to disprove one child.

46 PN Search: Back propagation
Two types of internal nodes (tree): OR node AND node

47 PN Search: Node selection
Best-first search Most-promising = most-proving Path to most-promising node: At OR node choose child with min pn At AND node choose child with min dpn

48 PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 ? win ?
b c 1 2 1 e f g d loss ? 1 1 1 h i j k l ? win ? draw ?

49 PN Example a b c e f g d h i j k l 1 2 1 2 1 2 1 loss ? 1 1 1 1 ? 1 ?
b c 1 2 1 e f g d loss ? 1 2 1 1 1 h i j k l n m ? 1 ? win draw ?

50 PN Search Strength Weakness w w w w ? w Solution 11-ply deep w w w w 1
3 PN Search w 1 w w w ? Strength 1 w 1 Solution 11-ply deep 1 w 1 Weakness 1 w 1 1 w 1 w

51 Endgame positions with forced moves
PN Search Endgame positions with forced moves Other AND/OR Structures (Retrosynthesis) Mostly (dis)proves this kind of positions faster than alpha-beta Deep forced wins Weakest resistance Does not always find the shortest path Extreme cases have been reported Breuker (1998) PN found a mate in 114, while there existed a mate in 4

52 PN Search Does not need a domain-dependent heuristic evaluation function to determine the most-promising node to be expanded next Domain knowledge is not required The shape of the tree and the value of the leaf nodes is used to select the most-proving node You may use heuristics at leaf nodes

53 PN search: heuristic knowledge
Use heuristics at unknown leaf nodes to set their proof and disproof number Domain-independent: number of moves OR node: dpn = # moves AND node: pn = # moves Domain-dependent (knowledge) Heuristic should be admissible (never overestimates)

54 Single-Agent Search

55 Single Agent Search A* algorithms Deductive Search

56 A* Search Hart et al. (1968) Among the wide variety of pathfinding algorithms, A* is one of the best. A* is a best-first search algorithm which will find the shortest path, if it exists, and will do so relatively quickly

57 Application: Pathfinding
Consider a sample application Tile-Based Graph Find a minimal cost path from a start node to a goal node Can move one square horizontally or vertically, each with a cost of one Can be generalized to include diagonals Can be generalized to include variable costs

58 Idea: avoid expanding paths that are already expensive
A* Search Idea: avoid expanding paths that are already expensive Evaluation function f(n) = g(n) + h(n) g(n) = cost so far to reach n h(n) = estimated cost from n to goal (heuristic) f(n) = estimated total cost of path through n to goal

59 Admissible Heuristics
A heuristic h(n) is admissible if for every node n, h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n. An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal

60 Heuristic: Manhantan Distance

61 A* Data Structure OpenList ClosedList A* Data Structure
List of nodes in the graph/tree that are not fully considered Ordered from best to worst f value ClosedList Nodes that have been fully expanded

62 Take best (first) node from OpenList
A* Algorithm (1) Take best (first) node from OpenList Check for solution Expand all the children Move node to the ClosedList As far as we know, done with this node

63 In effect the lists act as a cache of previously seen results
A* Algorithm (2) Expanding a child Check if seen before Open/ClosedList If the node has been seen before with the same or better g value, then reject Add to OpenList for consideration In effect the lists act as a cache of previously seen results NOTE: the algorithm requires all nodes to be in these lists

64 Example ( ) A B C D E F Step 1: Initialize Open:
( C1, = 5, null ) Closed ( ) Step 2: Expand C1 Open ( C2, = 5, C1 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) 6 5 4 3 2 1 1+4=5 1+6=7 1+4=5 0+5 A B C D E F

65 Example Step 3: Expand C2 A B C D E F Open: Closed
( C3, = 5, C2 ) ( D2, = 5, C2 ) ( D1, = 5, C1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ) ( C2, = 5, C1 ) Why isn’t C1 added to the OpenList? C1 is found in the ClosedList with a lower value 6 5 4 3 2 1 2+3=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

66 Example Step 4: Expand C3 A B C D E F Open ( D3, 3 + 2 = 5, C3 )
Closed ( C1, = 5, null ) ( C2, = 5, C1 ) ( C3, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

67 Example Step 5: Expand D3 A B C D E F Open ( D2, 2 + 3 = 5, C2 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

68 Example Step 6: Expand D2 A B C D E F Open ( D1, 1 + 4 = 5, C1 )
( E3, = 7, D3 ) ( B3, = 7, C3 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ) ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ) 6 5 4 3 2 1 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 0+5 A B C D E F

69 Example Step 7: Expand D1 A B C D E F Open ( E3, 4 + 3 = 7, D3 )
6 5 4 3 2 1 Step 7: Expand D1 Open ( E3, = 7, D3 ) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ) 3+4=7 2+3=5 3+2=5 4+3=7 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

70 Example Step 8: Expand E3 A B C D E F Open ( E4, 5 + 2 = 7, E3)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F3, = 9, E3) Closed ( C1, = 5, null ),( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ) , ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ) 6 5 4 3 2 1 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

71 Example Step 9: Expand E4 A B C D E F Open ( E5, 6 + 1 = 7, E4)
( E1, = 7, D1 ) ( B1, = 7, C1 ) ( F4, = 9, E4) ( F3, = 9, E3) Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3) 6 5 4 3 2 1 6+1=7 5+2=7 6+3=9 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

72 Step 10: Expand E5 Open ( Goal, = 7, E5) ( B3, = 7, C3 ) ( E1, = 7, D1 ) ( B1, = 7, C1 ) ( E6, = 9, E5) ( F5, = 9, E5) ( F4, = 9, E4) ( F3, = 9, E3) Closed: ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ),( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4) 6 5 4 3 2 1 7+2=9 6+1=7 7+2=9 7+0=7 6+3=9 5+2=7 3+4=7 2+3=5 3+2=5 4+3=7 5+4=9 1+4=5 2+3=5 1+6=7 1+4=5 2+5=7 0+5 A B C D E F

73 Example Closed Step 11: Close Goal A B C D E F Done
Backtrack to find path Closed ( C1, = 5, null ), ( C2, = 5, C1 ), ( C3, = 5, C2 ), ( D3, = 5, C3 ), ( D2, = 5, C2 ), ( D1, = 5, C1 ), ( D1, = 5, C1 ), ( E3, = 7, D3 ), ( E4, = 7, E3), ( E5, = 7, E4), ( Goal, = 7, E5) 6 5 4 3 2 1 A B C D E F

74 Sorting Open List Sort by increasing f value, but what about ties?
Break ties based on g value Larger g values mean more accurate information and less heuristic approximation

75 Pros Cons A* Pros and Cons Clean algorithm
Guaranteed to find shortest path Flexible Cons Can be computationally intensive Doesn’t handle unexplored areas well

76 IDA* Iterative deepening A* (Korf, 1985)
The cost of a node is (using A* terms) f = g + h g = cost incurred to get to this node h = heuristic estimate of getting to goal Iterative deepening iterates on a threshold T Search a node as long as f ≤ T Either find a solution (done), or fail, in which case the threshold is increased and a new search starts

77 IDA* Tree Depth-first search Root’s f value = T Search nodes ≤ T
V≤T Depth-first search Root’s f value = T Search nodes ≤ T Search nodes ≤ T+1 Repeat until solution V≤T+1 V≤T+2

78 IDA* Comments Automatically builds a variable-depth search
Provably bad lines are cutoff as soon as possible When the cutoff occurs depends on the quality of the evaluation function Storage requirements are trivial; just the recursion stack Stack can be used to prevent repetitions along the investigated path Iteration i+1 repeats all the work of iteration i! For some domains can do better than iterate by 1 Use the minimum f-value seen at a leaf node during an iteration as the next threshold

79 6 5 4 3 2 1 Example IDA* T =5 h=4 h=3 h=2 h=3 C1(5) B1 (7) C2 (5) D1 (5) h=4 h=3 h=6 h=4 h=5 h=5 C3 (5) D2 (5) D2 (5) E1 (7) A B C D E F B3 (7) D3 (5) D3 (5) D1 (7) C2 (7) D3 (5) E3 (7) D2 (7) C3 (7) E3 (7) C3 (7) E3 (7) Move are generated as follows: left, up, right, down

80 Example (IDA*) Next iteration, T = 7 A B C D E F Exit f ≤ T and Goal 6
5 4 3 2 1 Example (IDA*) h=3 h=2 Next iteration, T = 7 h=3 h=1 h=4 h=3 h=2 C1(5) B1 (7) h=5 h=4 h=3 h=2 h=3 C2 (5) h=4 A1(9) C3 (5) h=7 h=6 h=5 A B C D E F B3 (7) D3 (5) E3 (7) A3 (9) B4 (7) E4(7) A4 (9) B5 (7) E5(7) A5 (9) B6 (9) Goal(7) Exit f ≤ T and Goal

81 Eliminating Redundant Nodes
Need to eliminate duplicate nodes Trivial optimization for many domains is to disallow move reversals For more sophisticated detection of redundant nodes, we can use a hash / transposition table

82 A* does not have the iterative overhead of IDA*
IDA* vs. A* A* does not have the iterative overhead of IDA* A* needs to maintain a history of all nodes previously searched In practice, faster than IDA*, but A* runs out of memory very quickly!

83 IDA* vs. A* For many types of problems, IDA* flounders
in the cost of the re-searches, causing many to prefer A* over IDA* IDA* is handicapped with no storage! A* uses a closed list -- in effect a perfect cache of previously seen states IDA* does not need much storage IDA* with a transposition table can be competitive with A*

84 IDA* is guaranteed to work, albeit possibly more slowly
Which to Choose? IDA* is guaranteed to work, albeit possibly more slowly A* is more efficient, but can run out of memory Can also run slower because of cache effects The right choice depends on properties of your application

85 Not only for pathfinding Single-agent search Applications
A* and IDA* Not only for pathfinding Single-agent search Applications Optimization/Scheduling Puzzles (e.g. sliding puzzle)

86 A* Real-time Example Baumgarten (2009)

87 Deductive Search

88 Monte Carlo Tree Search

89 (Computer) Go Vague concepts: Life and Death, Territory, Influence, Patterns Hard to make implicit human knowledge explicit No evaluation function, αβ search fails Alternatives: neural networks, decision trees, rule-based approaches, theorem provers, pattern recognition There has to be something better than this…..

90 Monte-Carlo Evaluation
-2 7 17 4 12 -5

91 Monte-Carlo Evaluation
Monte-Carlo Sampling: Idea from physics and simulation (Manhattan Project) Abramson (1990) in Chess, Othello, Tic-Tac-Toe Brügmann (1993) in Go Phantom Go, Bridge, Scrabble Boomed in 2000s Bouzy & Helmstetter (2003)

92 Limits of MC Evaluation
Time cost per sample must be low Statistics on samples must predict reliably Can choose move with suboptimal game value Integrate with tree search Problem: slow node evaluation Limited look ahead μ: 0.33 min: 0.33 μ: 1 min: 0.1 0.33 0.33 0.33 1.9 0.1

93 Monte-Carlo Tree Search
Coulom (2007) & Kocsis & Szepesvári (2006) Overcome limits of MC evaluation + search Build gradually a search tree Best-first search Very popular Revolution in Go Applied in other abstract games Amazons, Hex, LOA, Scotland Yard, Chinese Checkers, Lord of the Rings Real-time games (e.g., Ms Pacman) Real-life domains optimization, scheduling, & security 93

94 MCTS Scheme Selection Play-out Expansion Backpropagation
Repeated X times Selection Play-out Expansion Backpropagation The selection strategy is applied recursively until an unknown position is reached One simulated game is played One node is added to the tree The result of this game is backpropagated in the tree

95 MCTS R Z A C B D E F Player 1 wins

96 Selection: Multi-Armed Bandit Problem

97 Selection Step: UCT Upper Confidence Bound (Auer et al. 2002)
Balance exploitation and exploration Upper Confidence bounds for Trees, UCT (Kocsis & Szepesvári, 2006) Selects the child k of the node p according: vi is the value of the node i, (average reward) ni is the visit count of i, and np is the visit count of p. C is a coefficient ( ). 97

98 Possible Enhancements

99 Play-out Plays nearly-random moves (default)
Moves are played until the end of the game “Battle of the Apes” Uses heuristics (a simulation strategy)

100 Play-out: Game independent
Based on move statistics MAST (Finnsson and Bjornsson, 2009) Last-good-reply (Drake & Baier, 2010) N-grams (Stankiewicz et al., 2011; Tak et al., 2012) NAST(Powley et al., 2013) Better than Random

101 Play-out: Basic Knowledge
Simple: Roulette-Wheel Selection (based on their category moves have a certain weight) Rules to exclude bad moves, rules to enforce to play good moves Sophisticated: MCTS does not require an evaluation function But if you have one, make use of it

102 Play-out: Heuristic Evaluation Function
Selecting moves ε-greedy, select moves with a probability ε random otherwise use evaluation function Use a n-ply search to determine which move to select (αβ-search) Time consuming Needs aggressive pruning Chinese Checkers, LOA, Focus, Breakthrough

103 Play-out: Heuristic Evaluation Function
Early cutoff: stop a playout before a terminal state is reached Dynamic Check periodically evaluation function v0(s) return 1 if v0(s)≥x or -1 if v0(s) ≤–x Winands et al. (2008, 2009) Fixed-depth play i moves in playout and then return v0(s) (or scaled to [-1,1]) (Lorentz, 2008; Lorentz & Horey, 2013)

104 Adding knowledge may increase the numbers of play-outs
Play-out: Black Magic Warning: a smart simulation strategy can perform less than a simple one Too much determinism, less exploration Adding knowledge may increase the numbers of play-outs Why? Play-outs may be shorter.

105 Selection: Challenges for UCT
When few simulations have been played, UCT performs poorly When the branching factor is too high, only a few games can be played for a promising move

106 Selection: Progressive Bias
Chaslot et al. (2008) Adds domain knowledge Hi to UCT For few simulations Hi is dominant For many simulations, behaves at the standard UCT Value of Hi can be stored at the node This knowledge can be more elaborate (and time-consuming) than used in play-out

107 Selection: PUCT this search control
this search control strategy initially prefers actions with high prior probability and low visit count, but asympotically prefers actions with high action value

108 Selection: Progressive Widening
Chaslot et al. (2007) and Coulom (2007) Forward pruning p np>T Moves are selected according to the selection strategy amongst the unpruned moves Moves are progressively unpruned np=T The domain knowledge is applied Most of the moves are pruned np<T their simulation strategy Every move can be chosen np>>T

109 MCTS-Solver Winands, Björnsson & Saito (2008)
Deals with perfect knowledge Propagates proven wins and losses (negamax way) Nodes that are proven are not selected anymore Can detect hard traps (win, losses), but not soft ones (losing a piece)

110 Implicit Minimax Backups
Lanctot et al. (2014) Use heuristic evaluation function For soft traps Separate source of information v Minimax-style back ups are done implicitly during the standard update Approximation of MCTS-solver for subtrees that didn’t reach terminal states

111 Implicit Minimax Backups in MCTS

112 Leaf Parallelization Root Parallelization Tree Parallelization
Enzenberger & Muller (2009): Tree parallelization without mutexes /locks

113 MCTS Strengths / Weaknesses

114 MCTS applications

115 MCTS Application: Other two-player turn-based games
Havannah Hex LOA Amazons

116 Crossword Puzzle Construction
MCTS: Puzzles SameGame Crossword Puzzle Construction Sudoku

117 Real Life Applications
Hospital Planning Chemical Retrosynthesis Space Exploration MCTS is used in the following (real-life) applications: Controlling several virtual characters in a 3D multi-player learning game (i.e., 3D Virtual Operating Room project (3DVOR) is a learning game that aims at improving collaboration and communication amongst working operating room staff: surgeon, anesthesiologist, nurse anesthesiologist, and nurse instrumentalist) Balancing electricity supply and consumption is critical for the stable performance of an electricity Grid. Demand Side Management (DSM) refers to shifting consumers' energy usage to off-peaks as much as possible to avoid more electricity demand than available supply during peak times Determine good combinations of services that minimize the risk of emergency re-hospitalization as much as possible (Philips, Ralf knows more about this project) Smart Grids

118 Healthcare: Service Selection

119 MCTS Application: Multi-player Games
Rolit Chinese Checkers Blokus Focus

120 Stochastic/Imperfect-Information Games
Scotland Yard Magic: The Gathering Chinese Dark Chess

121 General Video Game Playing
General AI Game Player General Video Game Playing General Game Playing

122 Real-Time / Simultaneous Move Games
Pacman Starcraft TOTAL WAR: ROME II Tron

123 Ms Pac-Man - Demo

124 Crazy Stone vs. Norimoto Yoda
MCTS: Computer Go Crazy Stone vs. Norimoto Yoda 2008: MoGo Titan (INRIA France and UM) defeats grand master with 9-stone handicap 2015: Crazy Stone defeats grand master with stone handicap Google DeepMind & Facebook enter the race

125 Predicts 57% of the expert moves Integrated in MCTS
AlphaGo (2016) Deep Convolutional Neural Neural Networks Predicts 57% of the expert moves Integrated in MCTS Predict 57% of the expert moves 1,202 CPUs and 176 GPUs.

126 AlphaGo (2016) 2016: Google Deepmind’s AlphaGo defeats Lee Sedol
Massive hardware: 1920 CPU and 280 GPUs 2017: Defeats world No. 1 ranking player Ke Jie Learn from human Go records 4-1

127 AlphaGo Zero (2017) Plays against itself using MCTS
Repeated update CNN by analyzing the games played in batch 29 million games, $25 million machine 200­point gap corresponds to a 75% probability of winning 89-11

128 Can learn more games to play
Alpha Zero (2017) Generalized version Can learn more games to play Outperforms strongest engine in Chess and Shogi (Japanese Chess)

129 Highly Selective Search Not much domain knowledge needed
MCTS: In the end Highly Selective Search Not much domain knowledge needed Easy to Parallelize Can be used in real-time

130 Evaluation function become obsolete
MCTS Myths Evaluation function become obsolete Can be used in the play-out Can be used for selection Can be used to assess the play-out Classic search algorithms become obsolete Can be used in play-out Can be integrated in the search part of MCTS

131 Abstraction / Continuous Domains Separate Subgames / tasks (Go)
MCTS: Challenges Forward model Abstraction / Continuous Domains Separate Subgames / tasks (Go)

132 IEEE Conference on Computational Intelligence and Games Maastricht, NL, August 14-17, 2018


Download ppt "Intelligent Search Techniques"

Similar presentations


Ads by Google