Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.

Similar presentations


Presentation on theme: "Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance."— Presentation transcript:

1 Lecture 6: Game Playing Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance State of the art

2 2 Two-player games: motivation Multi agent environments: cooperative or competitive Zero-Sum games Two or multi players games Adversarial games Previous heuristics and search procedures are only useful for single-player games no notion of turns: one or more cooperative agents does not take into account adversarial moves Games are ideal to explore adversarial strategies well-defined, abstract rules most formulated as search problems really hard combinatorial problems -- chess!!

3 3 Two-player games Search tree for each player remains the same Even levels i are moves of player A Odd levels i+1 are moves of player B Each player searches for a goal (different for each) at their level Each player evaluates the states according to their heuristic function A’s best move brings B to the worst state A searches for its best move assuming B will also search for its best move

4 4 Structure of Game Initial state Successor function Terminal test Utility function

5 5 Game Tree and MinMax search strategy Search for A’s best next move, so that no matter what B does (in particular, choosing its best move) A will be better off At each step, evaluate the value of all descendants: take the maximum if it is A’s turn, or the minimum if it is B’s turn We need the estimated values d moves ahead generate all nodes to level d (BFS) propagate Min-Max values up from leafs

6 6

7 7 Illustration of MinMax principle

8 8

9 9 MinMax algorithm while actual node is not a winning state do 1. Generate the game tree to level m from actual 2. Apply the utility function to the leafs 3. Propagate the values of each node up by layers, according to the MinMax principle, up to the root 4. Choose the action with the maximum value at the root (minmax decision) and move to next state (actual := Apply(op,game) end Complexity: O(b m )

10 10 MinMax search on Tic-Tac-Toe Evaluation function Eval(n) for A infinity if n is a win state for A (Max) -infinity if n is a win state for B (Min) (# of 3-moves for A) -- (# of 3-moves for B) a 3-move is an open row, column, diagonal A is X Eval(s) = 6 - 4

11 11 Tic-Tac-Toe MinMax search, d=2

12 12 Tic-Tac-Toe MinMax search, d=4

13 13 Tic-Tac-Toe MinMax search, d=6

14 14 MinMax search m is the look-ahead factor: how many moves ahead of the current one we compute. MinMax optimal if d <= m ( d is depth of solution ) IDS can be adapted to save memory. Horizon problem: node n can be the best at level m but disastrous at level m+1! Quiescent states: small variations of evaluation function values. Explore nonquiescent states further.

15 15 Multi player games Concepts: alliances, cooperative games

16 16 Alpha-Beta Prunning A disadvantage of MinMax search is that it has to expand every node in the subtree to depth m. Can we make a correct MinMax decision without looking at every node? We want to prune the tree: stop exploring subtrees for which their value will not influence the final MinMax root decision Observation: all nodes whose values are greater (smaller) that the current minimum (maximum) need not be explored!

17 17 Alpha-Beta prunning

18 18 Alpha-Beta search cutoff rules Keep track and update two values: alpha is the lowest value in the MAX path so far beta is the highest value in the MIN path so far Rule: do not expand node n when beta <= alpha for MIN node return alpha for MAX node return beta ( ,  ) is propagated from parent to child In min nodes,  is sent parent In max nodes,  is sent to parent In min nodes  = min(  parent,  ) In max nodes  = max(  parent,  )

19 19 Alpha-Beta prunning function Max-Value(state,game,alpha,beta) returns minmax value of state if Cutoff-Test(state) then return Eval(state) for each s in Successor(state) do alpha := Max(alpha,Min-Value(s, game,alpha,beta) if beta <= alpha then return beta end; return alpha function Min-Value(state,game,alpha,beta) returns minmax value of state if Cutoff-Test(state) then return Eval(state) for each s in Successor(state) do beta := Min (beta,Max-Value(s, game,alpha,beta) if beta <= alpha then return alpha end; return beta

20 20

21 21 Alpha-Beta prunning in Tic-Tac-Toe

22 22 Imperfect, real time decisions The search depth is not practical Time limitation Should cut-off nodes earlier (cut-off test) Decide when to apply EVAL Evaluation function: return an estimate of the expected utility from a given node Terminal nodes in the same way to utility Computation not take long time … For non-terminal nodes, strongly correlated to actual chances of winning

23 23 Evaluation functions Calculating various features. Number of pawns, Features defines categories Lead to win, loss, draw Example: In a category, 72% win (+1), 20% loss (-1) and 8% draw(0) Expected value = 72%*1+20%*(-1)+8%*0 = 0.52 Require too many categories and too many experience Compute numerical contribution from each feature and combine these numbers In Chess: each Pawn: 1, night and bishop: 3, rook: 5, queen: 9 Good pawn structure : 1, Queen Safety: 1 Weighted Linear function: Non-linear function: Feature are not dependant to each other -Bishop in the end of chess is more useful from beginning -Two Bishop together in more useful than two value bishop - These functions are approximated from human experience from ages... - For other problems, where no human experience exists, machine learning techniques should be used …

24 24 Cut-off search If Cut-off-test(state, depth) then return EVAL(state) Fixed depth width(depth >d) Use IDS until time run out and return the deepest completed search (see next slide) Cutoff should distinct quiescent states Forward pruning: some moves at a given node are pruned immediately without further consideration Can be very dangerous Should be used in symmetric moves..

25 25 IDS error …

26 26 Practical extensions By combining all techniques there still a problem E.g. in chess, each standard play (3 minutes) can process 200 million nodes = five plies. With alpha-beta pruning become 10 plies Evaluation function measures expected utility of a move. Like the heuristic function before Cut-off criteria by “goodness”, not by depth: selective deepening Quiescent states expansion Horizon problem Precompiled evaluations for the beginning and end of the game

27 27 Games with chance Backgammon board

28 28 Search tree with probabilities

29 29 Games with chance chance nodes: nodes where chance events happen (rolling dice, flipping a coin, etc) Evaluate expected value by averaging outcome probabilities: C is a chance node P(d i ) probability of rolling d i (1,2, …, 12) S(C,d i ) is the set of positions generated by applying all legal moves for roll d i to C

30 30 Position evaluation in games with chance

31 31 State-of-the art game programs Championship programs for Checkers, Chess, Backgammon, Othello, Go, and more Combination of brute force search, heuristics, and game database Some programs attempt to improve or learn the heuristic function by comparing its estimates with actual outcomes Some programs attempt to discover rules and heuristic functions

32 32 ?


Download ppt "Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance."

Similar presentations


Ads by Google