Presentation on theme: "Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)"— Presentation transcript:
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Goal Find the best move to make in a two-agent, zero-sum game. o win=+1, lose=-1 o player1 + player2 = 0 Ideally, do this as quickly as possible Terms: o MAX = us, we’re trying to maximize our score o MIN = opponent, they’re trying to minimize our score.
Brute-Force (minimax) Given: B (current board state) Create a search tree B B1B1 B2B2 BnBn … B 1,1 … B 1,m B 2,1 B 2,2 B 2,p … MAX’s move MIN’s move … B q,r +1 MAX’s move
Problems A lot of states to calculate / evaluate! – For tic-tac-toe, at most 9! = 362,880 states – For chess, over states (zillions of years to calculate) We may need to limit the ply (number of times both min and max move) – Cuts down on search tree size – But…we’re not always seeing the game to it’s end. – Often necessitates a heuristic score of the board (from MAX’s point-of-view) Also, there are many win / loses cases – which is best? – If we get to the win through nodes where MIN picks their best move, we stand a better chance of winning.
Minimax algorithm Let’s say the heursitics (show beside the boxes) look like this (from Max’s point of view): – (a 1-ply look-ahead) B B1B2 B3 B1,1 B1,2B1,3B2,1 B2,2 B2,3 B3,1 B3,2B3,3 MAX’s move MIN’s move MIN wants to minimize the score, so they would choose the lowest value on their turn(s) MAX wants to maximize the score, so they would choose the highest value on their turn(s) 3 So…against an optimal opponent, MAX will get a score of 3 if they make move#1 The values are backed up
Analysis Always picks the optimal solution (assuming the heuristic is good) But…does a complete depth-first traversal of states (up to the max-ply)
Another way of looking at minimax minimax(B) = max(min(3, 12,8), min(2,4,6), min(14,5,2)) =max(3, 2, 2) = 3 But…notice if we hadn't evaluated the 4 or 6: minimax(B) = max(min(3,12,8), min(2,x,y), min(14,5,2)) = max(3, min(2, x, y), 2) = max(3, z, 2) where z <= 2 (why??) A: because on the first branch, the min is 3, we wouldn't choose the second branch because it's at best 2. = 3 the trick is, how can we determine this algorithmically. B B1B2 B3 B1,1 B1,2B1,3B2,1 B2,2 B2,3 B3,1 B3,2B3,3 MAX’s move
alpha-beta search Track these two values (each recursive call has its own copy) – α: the best value (highest) for any paths going through a MAX node. – β: the best value (lowest) for any paths going through a MIN node. Together these are the range of values MAX can expect if we go through this node.
alpha-beta search, cont. If looking at a MAX node: – Possibly update α (if a child branch is higher) – Terminate early if we see a child branch bigger than β – Return the minimal child value that we looked at [and the action] If looking at a MIN node: – Possibly update β (if a child branch is lower) – Terminate early if we see a child branch smaller than α – Return the maximal child value that we looked at [and the action]
alpha-beta algorithm def alpha_beta(state): v = max_value(state, -∞, +∞) return move with value v def max_value(state, α, β): if ending_state(state) return value(state) v = -∞ for each move in actions(state): r = result(state, move) v = max(v, min_value(r, α, β)) if v ≥ β, return v α = max(α, v) return v def min_value(state, α, β): if ending_state(state) return value(state) v = +∞ for each move in actions(state): r = result(state, move) v = min(v, max_value(r, α, β)) if v ≤ α, return v β = min(β, v) return v
max_value(A) [α= -∞ β= ∞ v= -∞ ] min_value(B) [α= -∞ β= ∞ v= -∞ ] max_value(C) [α= -∞ β= ∞ v= -∞ ] min_value(D) [α= -∞ β= ∞] min_value(E) [α= 5 β= ∞] min_value(F) [α= 5 β= ∞] max_value(G) [α= -∞ β= 8 v= -∞ ] min_value(H) [α= -∞ β= 8] break out of loop early b/c v(12) ≥ β (8) min_value(K) [α= 8 β= ∞ v= -∞ ] max_value(L) [α= 8 β= ∞ v= -∞ ] min_value(M) [α= 8 β= ∞] min_value(N) [α= 8 β= ∞] break out of loop b/c v(4) ≤ α (8) A C G L OR B K DEFHIJ M NPQST 5 => => 3 => => => 8 8 => -1 => => 44 => A->B
Analysis Alpha-beta pruning can shave off some state checks Move-ordering: – It does best when moves are ordered: highest=>lowest for MIN nodes lowest=>highest for MAX nodes – Sometimes it's possible to order moves: e.g. Chess: captures first, then threats, then forward-moves, then backwards-moves. – Sometimes you can't, though. Worst-case: alpha-beta pruning prunes nothing, then you have minimax. Cutoff-depth (or time) restraints
Games of Chance (stochastic games)
"Modern" Applications Deep Blue (IBM c.1996) Beat Gary Kasparov Algorithms (Chess 4.0): – a playbook of common opening and closing moves – alpha-beta – quiescence search (searching those branches that look "promising" (heurisitic) a bit deeper) Helps avoid the horizon problem. – a few more optimizations