Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf
Games 2 Alpha-Beta Example (continued) [-∞,3] [-∞,+∞]
Games 3 Alpha-Beta Example (continued) [-∞,3] [-∞,+∞]
Games 4 Alpha-Beta Example (continued) [3,+∞] [3,3]
Games 5 Alpha-Beta Example (continued) [-∞,2] [3,+∞] [3,3] This node is worse for MAX
Games 6 Alpha-Beta Example (continued) [-∞,2] [3,14] [3,3][-∞,14],
Games 7 Alpha-Beta Example (continued) [−∞,2] [3,5] [3,3][-∞,5],
Games 8 Alpha-Beta Example (continued) [2,2] [−∞,2] [3,3]
Games 9 Alpha-Beta Example (continued) [2,2] [-∞,2] [3,3]
Games 10 Comments about Alpha-Beta Pruning Pruning does not affect final results Entire subtrees can be pruned Alpha-beta pruning can look twice as far as minimax in the same amount of time
Games 11 Heuristic Evaluation Function (EVAL) Idea: produce an estimate of the expected utility of the game from a given position. Performance depends on quality of EVAL. Must be able to differentiate between good and bad board states Exact values not important
Games 12 Must be consistent with the utility function values for terminal nodes (or at least their order) must be the same should reflect the actual chances of winning Frequently weighted linear functions are used E = w 1 f 1 + w 2 f 2 + … + w n f n combination of features, weighted by their relevance Example in chess Weights: Pawn=1, knight=bishop=3, rook=5, queen=9 Heuristic Evaluation Function (EVAL)
Games 13 Example Chess Score Black has: 5 pawns, 1 bishop, 2 rooks Score = 1*(5)+3*(1)+5*(2) = = 18 White has: 5 pawns, 1 rook Score = 1*(5)+5*(1) = = 10 Overall scores for this board state: black = = 8 white = = -8
Games 14 Example: Tic-Tac-Toe simple evaluation function E(s) = (rx + cx + dx) - (ro + co + do) where r,c,d are the numbers of row, column and diagonal lines still available; x and o are the pieces of the two players 1-ply lookahead start at the top of the tree evaluate all 9 choices for player 1 pick the maximum E-value 2-ply lookahead also looks at the opponents possible move assuming that the opponents picks the minimum E-value
Games 15 E(s12) = 2 E(s13) = 3 E(s14) = 2 E(s15) = 4 E(s16) = 2 E(s17) = 3 E(s18) = 2 E(s19) = 3 Tic-Tac-Toe 1-Ply XXX XXX XXX E(s11) = 3 E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4
Games 16 E(s2:16) = -1 E(s2:15) 6 -6 = 0 E(s28) = 0 E(s27) = 1 E(s2:48) = 1 E(s2:47) = 2 E(s2:13) = 0 E(s2:9) = -1 E(s2:10) 5 -6 = -1 E(s2:11) = -1 E(s2:12) = -2 E(s2:14) = -1 E(s25) = 1 E(s21) = 1 E(s22) = 0 E(s23) = 1 E(s24) = -1 E(s26) = 0 E(s1:6) = 2 E(s1:7) = 3 E(s1:8) = 2 E(s1:9) = 3 E(s1:5) = 4 E(s1:3) = 3 E(s1:2) = 2 E(s1:1) = 3 E(s2:45) = 2 Tic-Tac-Toe 2-Ply XXX XXX XXX E(s0) = Max{E(s11), E(s1n)} = Max{2,3,4} = 4 E(s1:4) = 2 XOX O X O E(s2:41) = 1 E(s2:42) = 2 E(s2:43) = 1 E(s2:44) = 2 E(s2:46) = 1 OX O X O X O XX O X O X O X O XX O XOOXX O X O X O X O X O X O XOXOX O O
Games Checkers Case Study initial board configuration Black single on 20 single on 21 king on 31 Redsingle on 23 king on 22 evaluation function E(s) = (5 x 1 + x 2 ) - (5r 1 + r 2 ) where x 1 = black king advantage, x 2 = black single advantage, r 1 = red king advantage, r 2 = red single advantage
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MAX MIN Checkers MiniMax Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 6 1 6 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 1 1 1 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 1 1 1 cutoff: no need to examine further branches MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 1 1 1 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 1 1 1 cutoff: no need to examine further branches MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 1 1 1 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 0 1 0 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 -4 MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 -4 cutoff: no need to examine further branches MAX MIN Checkers Alpha-Beta Example
Games > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1 -8 MAX MIN Checkers Alpha-Beta Example
Games 29 Horizon Problem Moves may have disastrous consequences in the future, but the consequences are not visible Agent cannot see far enough into search space
Games 30 Games with Chance In many games, there is a degree of unpredictability through random elements throwing dice, card distribution, roulette wheel, … This requires chance nodes in addition to the Max and Min nodes branches indicate possible variations each branch indicates the outcome and its likelihood (probability)
Games 31 Games with Chance chance nodes
Games 32 Decisions with Chance The utility value of a position depends on the random element the definite minimax value must be replaced by an expected value Calculation of expected values utility function for terminal nodes for all other nodes calculate the utility for each chance event weigh by the chance that the event occurs add up the individual utilities
Games 33 More interesting (but still trivial) game Deal four cards face up Player 1 chooses a card Player 2 throws a die If it’s a six, player 2 chooses a card, swaps it with player 1’s and keeps player 1’s card If it’s not a six, player 2 just chooses a card Player 1 chooses next card Player 2 takes the last card
Games 34 Expectiminimax Diagram
Games 35 Expectiminimax Calculations
Games 36 Games and Computers State of the art for some game programs Chess Checkers Othello Backgammon Go
Games 37 Chess Deep Blue, a special-purpose parallel computer, defeated the world champion Gary Kasparov in 1997 the human player didn’t show his best game some claims that the circumstances were questionable Deep Blue used a massive data base with games from the literature Fritz, a program running on an ordinary PC, challenged the world champion Vladimir Kramnik to an eight-game draw in 2002 top programs and top human players are roughly equal
Games 38 Checkers Arthur Samuel develops a checkers program in the 1950s that learns its own evaluation function reaches an expert level stage in the 1960s Chinook becomes world champion in 1994 human opponent, Dr. Marion Tinsley, withdraws for health reasons Tinsley had been the world champion for 40 years Chinook uses off-the-shelf hardware, alpha-beta search, end-games data base for six-piece positions
Games 39 Othello Logistello defeated the human world champion in 1997 Many programs play far better than humans smaller search space than chess little evaluation expertise available
Games 40 Backgammon TD-Gammon, neural-network based program, ranks among the best players in the world improves its own evaluation function through learning techniques search-based methods are practically hopeless chance elements, branching factor
Games 41 Go Humans play far better large branching factor (around 360) search-based methods are hopeless Rule-based systems play at amateur level The use of pattern-matching techniques can improve the capabilities of programs difficult to integrate $2,000,000 prize for the first program to defeat a top-level player
Games 42 Chapter Summary Many game techniques are derived from search methods The minimax algorithm determines the best move for a player by calculating the complete game tree Alpha-beta pruning dismisses parts of the search tree that are provably irrelevant An evaluation function gives an estimate of the utility of a state when a complete search is impractical Chance events can be incorporated into the minimax algorithm by considering the weighted probabilities of chance events