Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.

Slides:



Advertisements
Similar presentations
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Advertisements

Artificial Intelligence Chapter 9 Heuristic Search Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
CMSC 671 Fall 2001 Class #8 – Thursday, September 27.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Game Playing (Tic-Tac-Toe), ANDOR graph By Chinmaya, Hanoosh,Rajkumar.
Game Playing Games require different search procedures. Basically they are based on generate and test philosophy. At one end, generator generates entire.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Adversarial Search: Game Playing Reading: Chapter next time.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Adversarial Search Board games. Games 2 player zero-sum games Utility values at end of game – equal and opposite Games that are easy to represent Chess.
Two-player games overview Computer programs which play 2-player games – –game-playing as search – –with the complication of an opponent General principles.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Mahgul Gulzai Moomal Umer Rabail Hafeez
This time: Outline Game playing The minimax algorithm
1 Game Playing Chapter 6 Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s CS188 slides:
Game Playing CSC361 AI CSC361: Game Playing.
Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
Game-Playing Read Chapter 6 Adversarial Search. Game Types Two-person games vs multi-person –chess vs monopoly Perfect Information vs Imperfect –checkers.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search and Game Playing Examples. Game Tree MAX’s play  MIN’s play  Terminal state (win for MAX)  Here, symmetries have been used to reduce.
Adversarial Search: Game Playing Reading: Chess paper.
HEURISTIC SEARCH. Luger: Artificial Intelligence, 5 th edition. © Pearson Education Limited, 2005 Portion of the state space for tic-tac-toe.
Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,
Game-Playing Read Chapter 6 Adversarial Search. State-Space Model Modified States the same Operators depend on whose turn Goal: As before: win or win.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
ICS-270a:Notes 5: 1 Notes 5: Game-Playing ICS 270a Winter 2003.
Ocober 10, 2012Introduction to Artificial Intelligence Lecture 9: Machine Evolution 1 The Alpha-Beta Procedure Example: max.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Notes adapted from lecture notes for CMSC 421 by B.J. Dorr
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Rote Learning.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.
Chapter 4 Search in State Spaces Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Instructor: Vincent Conitzer
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
CSE473 Winter /04/98 State-Space Search Administrative –Next topic: Planning. Reading, Chapter 7, skip 7.3 through 7.5 –Office hours/review after.
Game Playing Revision Mini-Max search Alpha-Beta pruning General concerns on games.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Adversarial Search 2 (Game Playing)
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Adversarial Search and Game Playing Russell and Norvig: Chapter 5 Russell and Norvig: Chapter 6 CS121 – Winter 2003.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Adversarial Search In this lecture, we introduce a new search scenario: game playing 1.two players, 2.zero-sum game, (win-lose, lose-win, draw) 3.perfect.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Adversarial Search and Game-Playing
Iterative Deepening A*
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Artificial Intelligence Chapter 12 Adversarial Search
Chapter 6 : Game Search 게임 탐색 (Adversarial Search)
The Alpha-Beta Procedure
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Adversarial Search and Game Playing Examples
Unit II Game Playing.
Presentation transcript:

Chapter 12 Adversarial Search

(c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved. Example  Grid-Space World  Two robots : “Black” and “White”  Goal of Robots  White : to be in the same cell with Black  Black : to prevent this from happening  After settling on a first move, the agent makes the move, senses what the other agent does, and then repeats the planning process in sense/plan/act fashion.

(c) 2000, 2001 SNU CSE Biointelligence Lab3

4 Two-Agent Games (2) two-agent, perfect information, zero-sum games Two agents move in turn until either one of them wins or the result is a draw. Each player has a complete model of the environment and of its own and the other’s possible actions and their effects.

(c) 2000, 2001 SNU CSE Biointelligence Lab5 Minimax Procedure (1) Two player : MAX and MIN Task : find a “best” move for MAX Assume that MAX moves first, and that the two players move alternately. MAX node  nodes at even-numbered depths correspond to positions in which it is MAX’s move next MIN node  nodes at odd-numbered depths correspond to positions in which it is MIN’s move next

(c) 2000, 2001 SNU CSE Biointelligence Lab6 Minimax Procedure (2) Complete search of most game graphs is impossible.  For Chess, nodes  centuries to generate the complete search graph  assuming that a successor could be generated in 1/3 of a nanosecond  The universe is estimated to be on the order of 10 8 centuries old.  Heuristic search techniques do not reduce the effective branching factor sufficiently to be of much help. Can use either breadth-first, depth-first, or heuristic methods, except that the termination conditions must be modified.

(c) 2000, 2001 SNU CSE Biointelligence Lab7 Minimax Procedure (3) Estimate of the best-first move  applying a static evaluation function to the leaf nodes  measure the “worth” of the leaf nodes.  The measurement is based on various features thought to influence this worth.  It is customary in analyzing game trees to adopt the convention  game positions favorable to MAX cause the evaluation function to have a positive value  positions favorable to MIN cause the evaluation function to have negative value  Values near zero correspond to game positions not particularly favorable to either MAX or MIN.

(c) 2000, 2001 SNU CSE Biointelligence Lab8 Minimax Procedure (4) Good first move extracted  Assume that MAX were to choose among the tip nodes of a search tree, he would prefer that node having the largest evaluation.  The backed-up value of a MAX node parent of MIN tip nodes is equal to the maximum of the static evaluations of the tip nodes.  MIN would choose that node having the smallest evaluation.

(c) 2000, 2001 SNU CSE Biointelligence Lab9 Minimax Procedure (5) After the parents of all tip nods have been assigned backed-up values, we back up values another level.  MAX would choose that successor MIN node with the largest backed-up value  MIN would choose that successor MAX node with the smallest backed-up value.  Continue to back up values, level by level from the leaves, until the successors of the start node are assigned backed-up values.

(c) 2000, 2001 SNU CSE Biointelligence Lab10 Example : Tic-Tac-Toe (1) MAX marks crosses and MIN marks circles and it is MAX’s turn to play first.  With a depth bound of 2, conduct a breadth-first search  evaluation function e(p) of a position p  If p is not a winning for either player, e(p) = (no. of complete rows, columns, or diagonals that are still open for MAX) - (no. of complete rows, columns, or diagonals that are still open for MIN)  If p is a win of MAX, e(p) =   If p is a win of MIN e(p) = - 

(c) 2000, 2001 SNU CSE Biointelligence Lab11 Example : Tic-Tac-Toe (2) First move

(c) 2000, 2001 SNU CSE Biointelligence Lab12 Example : Tic-Tac-Toe (3)

(c) 2000, 2001 SNU CSE Biointelligence Lab13 Example : Tic-Tac-Toe (4)

(c) 2000, 2001 SNU CSE Biointelligence Lab14 The Alpha-Beta Procedure (1) Only after tree generation is completed does position evaluation begin  inefficient Remarkable reductions in the amount of search needed are possible if perform tip-node evaluations and calculate backed-up values simultaneously with tree generation. After the node marked A is generated and evaluated, there is no point in generating nodes B, C, and D.  MIN has A available and MIN could prefer nothing to A.

(c) 2000, 2001 SNU CSE Biointelligence Lab15 The Alpha-Beta Procedure (2) Alpha value  depending on the backed-up values of the other successors of the start node, the final backed-up value of the start node may be greater than -1, but it cannot be less Beta value  depending on the static values of the rest of node successors, the final backed-up value of node can be less than -1, but it cannot be greater Note  The alpha values of MAX nodes can never decrease.  The beta values of MIN nodes can never increase.

(c) 2000, 2001 SNU CSE Biointelligence Lab16 The Alpha-Beta Procedure (3) Rules for discontinuing the search 1. Search can be discontinued below any MIN node having a beta value less than or equal to the alpha value of any of its MAX node ancestors. The final backed-up value of this MIN node can be set to its beta value. 2. Search can be discontinued below any MAX node having an alpha value greater than or equal to the beta value of any of its MIN node ancestors. The final backed-up value of this MAX node can be set to its alpha value.

(c) 2000, 2001 SNU CSE Biointelligence Lab17 The Alpha-Beta Procedure (4) How to compute alpha and beta values  The alpha value of a MAX node is set equal to the current largest final backed-up value of its successors.  The beta value of a MIN node is set equal to the current smallest final backed-up value of its successors. Cut-off  Alpha cut-off  search is discontinued under rule 1.  Beta cut-off  search is discontinued under rule 2. Alpha-Beta Procedure  The whole process of keeping track of alpha and beta values and making cut-offs when possible

(c) 2000, 2001 SNU CSE Biointelligence Lab18 The Alpha-Beta Procedure (5) Pseudocode

(c) 2000, 2001 SNU CSE Biointelligence Lab19

(c) 2000, 2001 SNU CSE Biointelligence Lab20 Search Efficiency (1) Notation  b : depth of tree  d : number of successors of every node (except a tip node)  b d : number of tip nodes  Suppose that an alpha-beta procedure generated successors in the order of their true backed-up values.  This order maximizes the number of cut-offs that will minimizes the number of tip nodes generated.  N d : this minimal number of tip nodes

(c) 2000, 2001 SNU CSE Biointelligence Lab21 Search Efficiency (2) [Slager & Dixon 1969, Knuth & Moore 1975]  The number of tip nodes of depth d that would be generated by optimal alpha-beta search is about the same as the number of tip nodes that would have been generated at depth d / 2 without alpha-beta.  Alpha-beta, with perfect ordering, reduces the effective branching factor from b to approximately. [Pearl 1982]  The average branching factor is reduced to approximately.

(c) 2000, 2001 SNU CSE Biointelligence Lab22 Search Efficiency (3) The most straightforward method for ordering successor nodes  to use the static evaluation function.  Side effect of using a version of iterative deepening  Depending on the time resources available, search to deeper plys can be aborted at any time, and the move judged best by the search last completed can be made.

(c) 2000, 2001 SNU CSE Biointelligence Lab23 Other Important Matters (1) Various Difficulties  Search might end at a position in which MAX (or MIN) is able to make a great move.  Make sure that a position is quiescent before ending search at that position.  Quiescent position  Its static value is not much different from what its backed-up value would be by looking a move or two ahead.

(c) 2000, 2001 SNU CSE Biointelligence Lab24 Other Important Matters (2) Horizon Effect  There can be situations in which disaster or success lurks just beyond the search horizon. Both minimax and alpha-beta extension assume that the opposing player will always make its best move.  There are occasions in which this assumption is inappropriate.  Minimax would be inappropriate if one player had some kind of model of the other player’s strategy.

(c) 2000, 2001 SNU CSE Biointelligence Lab25 Games of Chance (1) Backammon  MAX’s and MIN’s turns now each involve a throw of the die.  Imagine that at each dice throw, a fictitious third player, DICE, makes a move.

(c) 2000, 2001 SNU CSE Biointelligence Lab26 Games of Chance (2) Expectimaxing  Back up the expected (average) values of the values of the successors instead of a maximum or minimum.  Back up the minimum value of the values of successors of nodes for which it is MIN’s move, the maximum value of the values of successors of nodes for which it is MAX’s move, and the expected value of the values of successors of nodes for which it is the DICE’s move. Introducing a chance move often makes the game tree branch too much for effective searching.  Important to have a good static evaluation function

(c) 2000, 2001 SNU CSE Biointelligence Lab27 Learning Evaluation Functions (1) TD-GAMMON  play Backgammon by training a layered, feedforward neural network  overall value of a board position  v = p 1 + 2p 2 - p 3 -2p 4

(c) 2000, 2001 SNU CSE Biointelligence Lab28 Learning Evaluation Functions (2) Temporal difference training of the network  accomplished during actual play  v t+1 : the estimate at time t + 1  Wt : vector of all weights at time t  Training method have been performed by having the network play many hundreds of thousands of games against itself.  Performance of a well-trained network is at or near championship level.