Download presentation

Presentation is loading. Please wait.

Published byUrsula Cummings Modified about 1 year ago

2
G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.

3
Game Playing - Minimax Game Playing An opponent tries to thwart your every move 1944 - John von Neumann outlined a search method (Minimax) that maximised your position whilst minimising your opponents

4
Example Game Tic Tac Toe

5
Game Playing – Example Nim (a simple game) Start with a single pile of tokens At each move the player must select a pile and divide the tokens into two non- empty, non-equal piles + + +

6
Game Playing - Minimax Starting with 7 tokens, the game is small enough that we can draw the entire game tree The “game tree” to describe all possible games follows:

7
7 6-15-24-35-1-14-2-13-2-23-3-1 4-1-1-13-2-1-12-2-2-1 3-1-1-1-12-2-1-1-1 2-1-1-1-1-1

8
Game Playing – Nim Game Tree NOTE: We converted the tree of possible games to a graph by merging nodes that have the same “game state” –this just saves repetition of work But what do we do with the “game tree” How can we use it to help decide how to play? Use “Minimax Method”

9
Game Playing - Minimax In order to implement minimax we need a method of measuring how good a position is. Often called a utility function –a.k.a. score, evaluation function, utility value, … Initially this will be a value that describes our position exactly

10
Game Playing - Minimax Conventionally, in discussion of minimax, have two players “MAX” and “MIN” The utility function is taken to be the utility for MAX Larger values are better for MAX”

11
Game Playing – Nim Remember that larger values are taken to be better for MAX Assume that use a utility function of –1 = a win for MAX –0 = a win for MIN We only compare values, “larger or smaller”, so the actual sizes do not matter –in other games might use {+1,0,-1} for {win,draw,lose}.

12
Game Playing – Minimax Basic idea of minimax: Player MAX is going to take the best move available Will select the next state to be the one with the highest utility Hence, value of a MAX node is the MAXIMUM of the values of the next possible states –i.e. the maximum of its children in the search tree

13
Game Playing – Minimax Player MIN is going to take the best move available for MIN –i.e. the worst available for MAX Will select the next state to be the one with the lowest utility –recall, higher utility values are better for MAX and so worse for MIN Hence, value of a MIN node is the MINIMUM of the values of the next possible states –i.e. the minimum of its children in the search tree

14
Game Playing – Minimax Summary A “MAX” move takes the best move for MAX – so takes the MAX utility of the children A “MIN” move takes the best for min – hence the worst for MAX – so takes the MIN utility of the children Games alternate in play between MIN and MAX

15
Game Playing – Minimax for NIM Assuming MIN plays first, complete the MIN/MAX tree Assume that use a utility function of –1 = a win for MAX –0 = a win for MIN

16
7 6-15-24-3 5-1-14-2-13-2-23-3-1 4-1-1-13-2-1-12-2-2-1 3-1-1-1-12-2-1-1-1 2-1-1-1-1-1 MIN MAX 0 (loss for MAX) 1 0 0 0 1 0101 111 1

17
Game Playing – Use of Minimax The Min node has value +1 All moves by MIN lead to a state of value +1 for MAX MIN cannot avoid losing From the values on the tree one can read off the best moves for each player –make sure you know how to extract these best moves (“perfect lines of play”)

18
Game Playing – Bounded Minimax For real games, search trees are much bigger and deeper than Nim Cannot possibly evaluate the entire tree Have to put a bound on the depth of the search

19
Game Playing – Bounded Minimax The terminal states are no longer a definite win/loss –actually they are really a definite win/draw/loss but with reasonable computer resources we cannot determine which Have to heuristically/approximately evaluate the quality of the positions of the states Evaluation of the utility function is expensive if it is not a clear win or loss

20
Game Playing – Bounded Minimax Next Slide: Artificial example of minimax bounded Evaluate “terminal position” after all possible moves by MAX (The numbers are invented, and just to illustrate the working of minimax)

21
= terminal position= agent= opponent 1 MIN MAX 1-3 A B BC Utility values of “terminal” positions obtained by an evaluation function

22
Game Playing – Bounded Minimax Example of minimax with bounded depth Evaluate “terminal position” after all possible moves in the order: 1.MAX (aka “agent”) 2.MIN (aka “opponent”) 3.MAX (The numbers are invented, and just to illustrate the working of minimax) Assuming MX plays first, complete the MIN/MAX tree

23
DEFG = terminal position= agent= opponent 4-5 1-72-3-8 1 MAX MIN 412-3 MAX 1-3 BC A

24
Game Playing – Bounded Minimax If both players play their best moves, then which “line” does the play follow?

25
DEFG = terminal position= agent= opponent 4-5 1-72-3-8 1 MAX MIN 412-3 MAX 1-3 BC A

26
Game Playing – Perfect Play Note that the line of perfect play leads the a terminal node with the same value as the root node All intermediate nodes also have that same value Essentially, this is the meaning of the value at the root node Caveat: This only applies if the tree is not expanded further after a move because then the terminals will change and so values can change

27
Game Playing – Summary So Far Game tree –describes the possible sequences of play –might be drawn as a graph if we merge together identical states Minimax –Utility values assigned to the leaves Values “backed up” the tree by –MAX node takes max value of children –MIN node takes min value of children –Can read off best lines of play and results Depth Bound – utility of terminal states estimated using an “evaluation function”

28
Minimax algorithm

29
Minimax max min max min

30
Minimax max min max min 109141321324 1014224 10 2

31
A MINMAX GAME

32
Properties of minimax Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(b m ) Space complexity? O(bm) (depth-first exploration) For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible

33
α-β pruning example

38
Alpha and beta The ALPHA value of a MAX node is set equal to the current LARGES final backed- up value of its successors. The BETA value of a MIN node is set equal to the current SMALLEST final backed-up value of its successors.

39
ALPHA-BETA PRUNING

40
Properties of α-β Pruning does not affect final result Good move ordering improves effectiveness of pruning With "perfect ordering," time complexity = O(b m/2 ) doubles depth of search A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

41
Why is it called α-β? α is the value of the best (i.e., highest- value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it prune that branch Define β similarly for min

42
The α-β algorithm

44
Resource limits Suppose we have 100 secs, explore 10 4 nodes/sec 10 6 nodes per move Standard approach: cutoff test: e.g., depth limit (perhaps add quiescence search) evaluation function = estimated desirability of position

45
Evaluation functions For chess, typically linear weighted sum of features Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + … + w n f n (s) e.g., w 1 = 9 with f 1 (s) = (number of white queens) – (number of black queens), etc.

46
Cutting off search MinimaxCutoff is identical to MinimaxValue except 1.Terminal? is replaced by Cutoff? 2.Utility is replaced by Eval Does it work in practice? b m = 10 6, b=35 m=4 4-ply lookahead is a hopeless chess player! –4-ply ≈ human novice –8-ply ≈ typical PC, human master –12-ply ≈ Deep Blue, Kasparov

47
Deterministic games in practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: human champions refuse to compete against computers, who are too good. Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

48
Summary Games are fun to work on! They illustrate several important points about AI perfection is unattainable must approximate good idea to think about what to think about

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google