Presentation is loading. Please wait.

Presentation is loading. Please wait.

Game tree search As ever, slides drawn from Andrew Moore’s Machine Learning Tutorials: as well as from Faheim Bacchus.

Similar presentations


Presentation on theme: "Game tree search As ever, slides drawn from Andrew Moore’s Machine Learning Tutorials: as well as from Faheim Bacchus."— Presentation transcript:

1 Game tree search As ever, slides drawn from Andrew Moore’s Machine Learning Tutorials: http://www.autonlab.org/tutorials/ as well as from Faheim Bacchus

2 Game tree search In this lecture we will cover some basics of two player zero sum discrete finite deterministic games of perfect information

3

4

5 More on the meaning of “zero sum” We will focus on “Zero Sum” games. “Zero-sum” games are fully competitive if one player wins, the other player loses more specifically, the amount of money I win (lose) at poker is the amount of money you lose (win) More general games can be cooperative some outcomes are preferred by both of us, or at least our values aren’t diametrically opposed

6 Is tic tac toe “zero sum”? Scissors cut paper, paper covers rock, rock smashes scissors Represented as a matrix: Player I chooses a row, Player II chooses a column Payoff to each player in each cell (Pl.I / Pl.II) 1: win, 0: tie, -1: loss so it’s zero-sum RPS 0/00/0 0/00/0 0/00/0 -1/1 1/-1 R P S Player II Player I

7 Is the “prisoner’s dilemma” zero sum? Two prisoner’s in separate cells, DA doesn’t have enough evidence to convict them If one confesses, other doesn’t: confessor goes free other sentenced to 4 years If both confess (both defect) both sentenced to 3 years Neither confess (both cooperate) sentenced to 1 year on minor charge Payoff: 4 minus sentence CoopDef 3/3 1/1 0/4 4/0 Coop Def

8

9

10

11

12 With a search space defined for II-Nim, we can define a Game Tree. A game tree looks like a search tree Layers reflect alternating moves between A and B Player A doesn’t decide where to go alone after Player A moves to a state, B decides which of the state’s children to move to. Thus, A must have a strategy: A must know what to do for each possible move of B. “What to do” will depend on how B plays.

13

14

15

16

17

18

19

20

21

22 Questions: What happens if there are loops in the tree? How would looping influence your determination of the minimax value for a node?

23 Let’s practice by computing all the game theoretic values for nodes in this tree. A A A A B B B

24 A A A A B B B 0 -3 3 -3 -2 2 -5 0 1 -3 -5 -3 2

25 Let’s practice by computing all the game theoretic values for nodes in this tree. A A A A B B B 0 -3 3 -3 -2 2 -5 0 1 -3 -5 -3 2 0 3 2 0 1 -5 -3 2

26 A A A A B B B 0 -3 3 -3 -2 2 -5 0 1 -3 -5 -3 2 0 3 2 0 1 -5 -3 2 0 2 0 1 -5 2 0 2 1 2 0 1 1 Question: if both players play rationally, what path will be followed through this tree?

27 A A A A B B B 0 -3 3 -3 -2 2 -5 0 1 -3 -5 -3 2 0 3 2 0 1 -5 -3 2 0 2 0 1 -5 2 0 2 1 2 0 1 1

28 A note on complexity Imagine you have a game with N states, that each state has b successors, and the length of the game is usually D moves. Minimax will expand O(b D ) states, which is both a BEST and WORSE case scenario. This is different than regular DFS! Tricks exist, however, to reduce this complexity to O(N)... using a tool called dynamic programming.

29

30

31

32

33

34 Alpha Beta Pruning, from Russel & Norvig Function max-value(state, game, alpha, beta) returns minimax value of state inputs: state (current game state) game (game description) alpha (best score for MAX on path -- ie highest) beta (best score for MIN on path -- ie lowest) if (state) is a terminal, return eval(state) else for every successor, s, alpha = max(alpha, min-value (s, game, alpha, beta)) if (alpha >= beta) return beta; %we’re done here, MIN will never go this way so cut!! end return alpha. Function min-value(state, game, alpha, beta) returns minimax value of state if (state) is a terminal, return eval(state) else for every successor, s, beta = min(beta, max-value (s, game, alpha, beta)) if (beta <= alpha) return alpha; %we’re done here, MAX will never go this was so cut!! end return beta. alpha beta state alpha beta state

35 A s1s2 s3 14 128 β = 8 24 α = 2, then 4, then.... s4 s5 B 911 2 Example 1: We are currently expanding possible moves for player A, from left to right. Which of the node expansions above could we prune, and why?

36 A s1s2 s3 14 128 β = 8 24 α = 9 s4 s5 B 911 2 Once we discover a node with value ‘9’, there is no need to expand the nodes to the right!

37 B s1s2 s3 6 27 α = 7 93 β = 9, then 3, then.... s4 s5 A 42 8 Example 2: We are currently expanding possible moves for player B, from left to right. Which of the node expansions above could we prune, and why?

38 B s1s2 s3 6 27 α = 7 93 β = 3 s4 s5 A 42 8 Once we discover a node with value ‘3’, there is no need to expand the nodes to the right!

39 Which computations could we have avoided here? Assuming we expand nodes left to right? A A A A B B B 0 -3 3 -3 -2 2 -5 0 1 -3 -5 -3 2 0 3 2 0 1 -5 -3 2 0 2 0 1 -5 2 0 2 1 2 0 1 1

40 A A A A B B B

41 Effectiveness of alpha beta pruning With no pruning, you have to explore O(b D ) nodes, which makes the run time of a search with pruning the same as plain minimax. If, however, the move ordering for the search is optimal (meaning the best moves are searched first), the number of nodes we need to search using alpha beta pruning O(b D/2 ). That means you can, in theory, search twice as deep! In Deep Blue, they found that alpha beta pruning meant the average branching factor at each node was about 6 instead of 35.

42 This is an example of the best case scenario for alpha beta pruning. The effective branching factor of the first layer is B. The effective branching of the second is 1. The effective layer of the third is B. And so on.... A A B

43 Rational Opponents This, however, all assumes that your opponent is rational e.g., will choose moves that minimize your score Storing your strategy is a potential issue: you must store “decisions” for each node you can reach by playing optimally if your opponent has unique rational choices, this is a single branch through game tree if there are “ties”, opponent could choose any one of the “tied” moves: which means you must store a strategy for each subtree What if your opponent doesn’t play rationally? will it affect the quality of the outcome? will your stored strategies work?

44

45 Heuristic evaluation functions in games Some intuitions: Visibility:the evaluation function will be more accurate nearer the end of the game, so worth using heuristic estimates from there. Filtering: Each heuristic estimate may be noisy. By searching we are combining thousands of these estimates, & we hope, eliminating noise.

46 Some examples of heuristic evaluation functions Example for tic tac toe: h(n) = [# of 3 lengths that are left open for player A] - [# of 3 lengths that are left open for player B]. Alan Turing’s function for chess: h(n) = A(n)/B(n) where A(n) is the sum of the point value for player A’s pieces and B(n) is the sum for player B. Most evaluation functions are specified as a weighted sum of features: h(n) = w 1 *feat 1 (n) + w 2 *feat 2 (n) +... w i *feat i (n). Deep Blue used about 6000 features in its evaluation function.

47 Heuristic evaluation functions in games Think of a few games and suggest some heuristics for estimating the “goodness” of a position chess? checkers? connect-4? your favorite video game?

48 Other issues with real games How to decide how far to search if you only have a fixed time to make a decision. What if you stop the search at a state where subsequent moves dramatically change the evaluation? The horizon problem. What if s is a state which is clearly bad because your opponent will inevitably be able to do something bad to you? But you have some delaying tactics. The search algorithm won’t recognize the state’s badness if the number of delaying moves exceeds the search horizon. Endgames are easy to play well. How? Openings fairly easy to play well. How?

49 Question: is there an alpha beta version you can use to search this tree?


Download ppt "Game tree search As ever, slides drawn from Andrew Moore’s Machine Learning Tutorials: as well as from Faheim Bacchus."

Similar presentations


Ads by Google