Two-player games overview Computer programs which play 2-player games – –game-playing as search – –with the complication of an opponent General principles.

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
CMSC 671 Fall 2001 Class #8 – Thursday, September 27.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Adversarial Search Chapter 5.
Adversarial Search: Game Playing Reading: Chapter next time.
ICS-171:Notes 6: 1 Notes 6: Two-Player Games and Search ICS 171 Winter 2001.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
Adversarial Search CSE 473 University of Washington.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CMSC 463 Chapter 5: Game Playing Prof. Adam Anthony.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Artificial Intelligence in Game Design
This time: Outline Game playing The minimax algorithm
Game Playing CSC361 AI CSC361: Game Playing.
November 10, 2009Introduction to Cognitive Science Lecture 17: Game-Playing Algorithms 1 Decision Trees Many classes of problems can be formalized as search.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search: Game Playing Reading: Chess paper.
Games & Adversarial Search Chapter 6 Section 1 – 4.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.
Ocober 10, 2012Introduction to Artificial Intelligence Lecture 9: Machine Evolution 1 The Alpha-Beta Procedure Example: max.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Notes on Game Playing by Yun Peng of theYun Peng University of Maryland Baltimore County.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Minimax with Alpha Beta Pruning The minimax algorithm is a way of finding an optimal move in a two player game. Alpha-beta pruning is a way of finding.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
Game Playing Revision Mini-Max search Alpha-Beta pruning General concerns on games.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Chapter 5: Adversarial Search & Game Playing
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Adversarial Search and Game-Playing
Last time: search strategies
Iterative Deepening A*
PENGANTAR INTELIJENSIA BUATAN (64A614)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Game Playing: Adversarial Search
The Alpha-Beta Procedure
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Game Playing: Adversarial Search
Notes 6: Two-Player Games and Search
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Two-player games overview Computer programs which play 2-player games – –game-playing as search – –with the complication of an opponent General principles of game-playing and search – –evaluation functions – –minimax principle – –alpha-beta-pruning – –heuristic techniques

Status of Game-Playing Systems in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players Applications? think of “nature” as an opponent economics, war-gaming, medical drug treatment

Games of strategy Deterministic rules (or deterministic rules plus probabilistic rules – these are games that combine strategy and luck, e.g. bridge, backgammon, blackjack) Moves are alternately made by two players A and B. rules define how configurations change A subset F of configurations is identified as final. typically F is partitioned into three sets: T, A and B. – –T is tie, A (B) is win for player A (B) Goal is to develop a strategy for one player to win. (computer plays for that player)

Chess Rating Scale

Two-Player Games with Complete Trees We can use search algorithms to write “intelligent” programs that play games against a human opponent. Just consider this extremely simple (and not very exciting) game: At the beginning of the game, there are seven coins on a table. Player 1 makes the first move, then player 2, then player 1 again, and so on. One move consists of removing 1, 2, or 3 coins. The player who makes the last move wins.

Two-Player Games with Complete Trees Let us assume that the computer has the first move. Then, the game can be described as a series of decisions, where the first decision is made by the computer, the second one by the human, the third one by the computer, and so on, until all coins are gone. The computer wants to make decisions that guarantee its victory, against every possible opponent. The underlying assumption is that the opponent always finds the optimal move.

Game Tree Representation New aspect to search problem –there’s an opponent we cannot control –how can we handle this? S Computer Moves Opponent Moves Computer Moves G Possible Goal State lower in Tree (winning situation for computer)

Game Trees

An optimal procedure: The Min-Max method Designed to find the optimal strategy for Max and find best move: 1. Generate the whole game tree to leaves 2. Apply utility (payoff) function to leaves 3. Back-up values from leaves toward the root: a Max node computes the max of its child values a Min node computes the Min of its child values 4. When value reaches the root: choose max value and the corresponding move. However: It is impossible to develop the whole search tree, instead develop part of the tree and evaluate promise of leaves using a static evaluation function.

Complexity of Game Playing Suppose the entire tree is explored. (depth d, branching factor b) What is the time for search be in this case? – –worst case, it will be O(b d ) – –Chess: b ~ 35 (average branching factor) d ~ 100 (depth of game tree for typical game) b d ~ ~ nodes!! – –Tic-Tac-Toe ~5 legal moves, total of 9 moves 5 9 = 1,953,125 9! = 362,880 (Computer goes first) 8! = 40,320 (Computer goes second) well-known games can produce enormous search trees

Static (Heuristic) Evaluation Functions An Evaluation Function: – –estimates how good the current board configuration is for a player. – –Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the players – –Othello: Number of white pieces - Number of black pieces – –Chess: Value of all white pieces - Value of all black pieces Typical values from -infinity (loss) to +infinity (win) or [-1, +1]. If the board evaluation is X for a player, it’s -X for the opponent

Two-Player Games We need to define a static evaluation function e(p) that tells the computer how favorable the current game position p is from its perspective. In other words, e(p) will assume large values if a position is likely to result in a win for the computer, and low values if it predicts its defeat. In any given situation, the computer will make a move that guarantees a maximum value for e(p) after a certain number of moves. For this purpose, we can use the Minimax procedure with a specific maximum search depth (ply-depth k for k moves of each player).

e(p) for tic-tac toe e(p) = 8 – 8 = 0 X OX e(p) = 6 – 2 = 4 OOX XO X e(p) = 2 – 2 = 0 OOX X X e(p) =  XX OOO X e(p) = - 

General Minimax Procedure on a Game Tree For each move: 1. expand the game tree as far as possible 2. assign state evaluations at each open node 3. propagate upwards the minimax choices if the parent is a Min node (opponent) propagate up the minimum value of the children if the parent is a Max node (computer) propagate up the maximum value of the children

Minimax Principle “Assume the worst” – –say each configuration has an evaluation number – –high numbers favor the player (the computer) so we want to choose moves which maximize evaluation – –low numbers favor the opponent so they will choose moves which minimize evaluation Minimax Principle – –you (the computer) assume that the opponent will choose the minimizing move next (after your move) – –so you now choose the best move under this assumption i.e., the maximum (highest-value) option considering both your move and the opponent’s optimal move. – –we can extend this argument more than 2 moves ahead: we can search ahead as far as we can afford.

Backup Values

Games of chance Backgammon is a two player game with uncertainty. Players roll dice to determine what moves to make. White has just rolled 5 and 6 and had four legal moves: 5-10, , , , Such games are good for exploring decision making in adversarial problems involving skill and luck.

Backgammon start direction of move

Backgammon

Game trees with chance nodes Chance nodes (shown as circles) represent the dice rolls. Each chance node has 21 distinct children with a probability associated with each. We can use minimax to compute the values for the MAX and MIN nodes. Use expected values for chance nodes. For chance nodes over a max node, as in C, we compute: epectimax(C) = Sum i (P(d i ) * maxvalue(i)) For chance nodes over a min node compute: expectimin(C) = Sumi(P(di) * minvalue(i))

Meaning of the evaluation function Dealing with probabilities and expected values means we have to be careful about the “meaning” of values returned by the static evaluator. Note that a “relative-order preserving” change of the values would not change the decision of minimax, but could change the decision with chance nodes. Linear transformations are ok A1 is best move A2 is best move 2 outcomes with prob {.9,.1}

Pruning with Alpha/Beta Backup Values

Alpha Beta Procedure Idea: – –Do Depth first search to generate partial game tree, – –Give static evaluation function to leaves, – –compute bound on internal nodes. Alpha, Beta bounds: – –Alpha value for Max node means that Max real value is at least alpha. – –Beta for Min node means that Min can guarantee a value below Beta. Computation: – –Alpha of a Max node is the maximum value of its seen children. – –Beta of a Min node is the lowest value seen of its child node.

When to Prune Pruning – –Below a Min node whose beta value is lower than or equal to the alpha value of its ancestors. – –Below a Max node having an alpha value greater than or equal to the beta value of any of its Min nodes ancestors.

The Alpha-Beta Procedure Now let us specify how to prune the Minimax tree in the case of a static evaluation function. Use two variables alpha (associated with MAX nodes) and beta (associated with MIN nodes). These variables contain the best (highest or lowest, resp.) e(p) value at a node p that has been found so far. Notice that alpha can never decrease, and beta can never increase.

The Alpha-Beta Procedure There are two rules for terminating search: Search can be stopped below any MIN node having a beta value less than or equal to the alpha value of any of its MAX ancestors. Search can be stopped below any MAX node having an alpha value greater than or equal to the beta value of any of its MIN ancestors. Alpha-beta pruning thus expresses a relation between nodes at level n and level n+2 under which entire subtrees rooted at level n+1 can be eliminated from consideration.

Alpha-beta procedure [Adapted from J.Pearl]

The Alpha-Beta Procedure Example: max min max min

The Alpha-Beta Procedure 4 Example: max min max min  = 4

The Alpha-Beta Procedure 4 5 Example: max min max min  = 4

The Alpha-Beta Procedure Example: max min max min  = 3  = 3

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 8

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 3 Propagated from grandparent – no values below 3 can influence MAX’s decision any more.

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 3  = 5

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 3  = 4

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 4  = 4

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 4  = 4  = 6

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 3  = 2  = 4  = 4  = 6

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 4  = 2  = 4  = 4  = 6  = 6

The Alpha-Beta Procedure Example: max min max min  = 3  = 3  = 1  = 3  = 6  = 6  = 4  = 2  = 4  = 4  = 6  = 6 Done!

The Alpha-Beta Procedure Can we estimate the benefit of the alpha-beta method? Suppose that there is a game that always allows a player to choose among b different moves, and we want to look d moves ahead. Then our search tree has b d leaves. Therefore, if we do not use alpha-beta pruning, we would have to apply the static evaluation function N d = b d times.

The Alpha-Beta Procedure Of course, the efficiency gain by the alpha-beta method always depends on the rules and the current configuration of the game. However, if we assume that new children of a node are explored in a particular order - those nodes p are explored first that will yield maximum values e(p) at depth d for MAX and minimum values for MIN - the number of nodes to be evaluated is:

The Alpha-Beta Procedure Therefore, the actual number N d can range from about 2b d/2 (best case) to b d (worst case). This means that in the best case the alpha-beta technique enables us to look ahead almost twice as far as without it in the same amount of time. In order to get close to the best case, we can compute e(p) immediately for every new node that we expand and use this value as an estimate for the Minimax value that the node will receive after expanding its successors until depth d. We can then use these estimates to expand the most likely candidates first (greatest e(p) for MAX, smallest for MIN).

The Alpha-Beta Procedure Of course, this pre-sorting of nodes requires us to compute the static evaluation function e(p) not only for the leaves of our search tree, but also for all of its inner nodes that we create. However, in most cases, pre-sorting will substantially increase the algorithm’s efficiency. The better our function e(p) captures the actual standing of the game in configuration p, the greater will be the efficiency gain achieved by the pre-sorting method.

Timing Issues It is very difficult to predict for a given game situation how many operations a depth d look- ahead will require. Since we want the computer to respond within a certain amount of time, it is a good idea to apply the idea of iterative deepening. First, the computer finds the best move according to a one-move look-ahead search. Then, the computer determines the best move for a two-move look-ahead, and remembers it as the new best move. This is continued until the time runs out. Then the currently remembered best move is executed.

How to Find Static Evaluation Functions Often, a static evaluation function e(p) first computes an appropriate feature vector f(p) that contains information about features of the current game configuration that are important for its evaluation. There is also a weight vector w(p) that indicates the weight (= importance) of each feature for the assessment of the current situation. Then e(p) is simply computed as the scalar product of f(p) and w(p). Both the identification of the most relevant features and the correct estimation of their relative importance are crucial for the strength of a game-playing program.

For example, in the case of chess, some features are: Material strength Rook, bishop in open files Castle Adjacent pawns Doubled pawns etc.

How to Find Static Evaluation Functions Once we have found suitable features, the weights can be adapted algorithmically. This can be achieved, for example, with a neural network. So the greatest problem consists in extracting the most informative features from a game configuration.

Heuristics and Game Tree Search The Horizon Effect – –sometimes there’s a major “effect” (such as a piece being captured) which is just “below” the depth to which the tree has been expanded (see example in Chapter 6) – –the computer cannot see that this major event could happen – –it has a “limited horizon” – –there are heuristics to try to follow certain branches more deeply to detect such important events (need to determine active vs. quiescent boards) – –this helps to avoid catastrophic losses due to “short- sightedness”

Heuristics and Game Tree Search Heuristics for Tree Exploration – –it may be better to explore some branches more deeply in the allotted time – –various heuristics exist to identify “promising” branches

Computers can play GrandMaster Chess “Deep Blue” (IBM) – –parallel processor, 32 nodes – –each node has 8 dedicated VLSI “chess chips” – –each chip can search 200 million configurations/second – –uses minimax, alpha-beta, heuristics: can search to depth 14 – –memorizes starts, end-games – –power based on speed and memory: no common sense Kasparov v. Deep Blue, May 1997 – –6 game full-regulation chess match (sponsored by ACM) – –Kasparov lost the match (2.5 to 3.5) – –a historic achievement for computer chess: the first time a computer is the best chess-player on the planet Note that Deep Blue plays by “brute-force”: there is relatively little which is similar to human intuition and cleverness

Rybka is free to download and has a rating of 3000, above any human player.

Status of Computers in Other Games Checkers/Draughts – –current world champion is Chinook, can beat any human – –uses alpha-beta search Othello – –computers can easily beat the world experts Backgammon – –system which learns is ranked in the top 3 in the world – –uses neural networks to learn from playing many many games against itself Go – –branching factor b ~ 360: very large! – –$2 million prize for any system which can beat a world expert

Summary Game playing is best modeled as a search problem Game trees represent alternate computer/opponent moves Evaluation functions estimate the quality of a given board configuration for the Max player. Minmax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them

Summary Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts. Reading: Chapter 6 of the text.