Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth ( 

Slides:



Advertisements
Similar presentations
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Advertisements

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
This lecture topic: Game-Playing & Adversarial Search
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
Lecture 12 Last time: CSPs, backtracking, forward checking Today: Game Playing.
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
EIE426-AICV 1 Game Playing Filename: eie426-game-playing-0809.ppt.
G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.
Artificial Intelligence in Game Design
MDPs as Utility-based problem solving agents
This time: Outline Game playing The minimax algorithm
1 Game Playing Chapter 6 Additional references for the slides: Luger’s AI book (2005). Robert Wilensky’s CS188 slides:
Game Playing CSC361 AI CSC361: Game Playing.
Games and adversarial search
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
An eye for eye only ends up making the whole world blind. -Mohandas Karamchand Gandhi, born October 2 nd, Lecture of October 2 nd, 2001.
11/19  Connection between MC/HMM and MDP/POMDP  Utility in terms of the value of the vantage point.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Adversarial Search and Game Playing Examples. Game Tree MAX’s play  MIN’s play  Terminal state (win for MAX)  Here, symmetries have been used to reduce.
Games & Adversarial Search Chapter 6 Section 1 – 4.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Game Playing: Adversarial Search Chapter 6. Why study games Fun Clear criteria for success Interesting, hard problems which require minimal “initial structure”
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.
Game Playing.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
1 Computer Group Engineering Department University of Science and Culture S. H. Davarpanah
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
Game Playing Revision Mini-Max search Alpha-Beta pruning General concerns on games.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
CMSC 421: Intro to Artificial Intelligence October 6, 2003 Lecture 7: Games Professor: Bonnie J. Dorr TA: Nate Waisbrot.
Adversarial Search. Regular Tic Tac Toe Play a few games. –What is the expected outcome –What kinds of moves “guarantee” that?
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Adversarial Search In this lecture, we introduce a new search scenario: game playing 1.two players, 2.zero-sum game, (win-lose, lose-win, draw) 3.perfect.
Artificial Intelligence in Game Design Board Games and the MinMax Algorithm.
Chapter 5: Adversarial Search & Game Playing
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Artificial Intelligence AIMA §5: Adversarial Search
Adversarial Search and Game-Playing
Games and Adversarial Search
Games and adversarial search (Chapter 5)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Adversarial Search Chapter 5.
Expectimax Lirong Xia. Expectimax Lirong Xia Project 2 MAX player: Pacman Question 1-3: Multiple MIN players: ghosts Extend classical minimax search.
Games & Adversarial Search
Games & Adversarial Search
Artificial Intelligence
Games & Adversarial Search
Games & Adversarial Search
Artificial Intelligence
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Artificial Intelligence
Games & Adversarial Search
CS51A David Kauchak Spring 2019
Games & Adversarial Search
Presentation transcript:

Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (  analysis)

Wilmer McLean The war began in my front yard and ended in my front parlor

Deep Thought: Chess is easy for but the pesky opponent Search: If I do A, then I will be in S, then if I do B, then I will get to S’ Game Search: If I do A, then I will be in S, then my opponent gets to do B. then I will be forced to S’. Then I get to do C,..

Snakes-and-ladders is perfect information with chance  think of the utter boringness of deterministic snakes and ladders  Not that the normal snakes-and-ladders has any real scope for showing your thinking power (your only action is dictated by the dice—so the dice can play it as a solitaire—at most they need your hand..). Kriegspiel (blind-fold chess) Snakes & Ladders?

Searching Tic Tac Toe using Minmax A game is considered Solved if it can be shown that the MAX player has a winning (or at least Non-losing) Strategy This means that the backed-up Value in the Full min-max Tree is +ve

2 <= 2 Cut 14 <= 14 5 <= 5 2 <= 2 Whenever a node gets its “true” value, its parent’s bound gets updated When all children of a node have been evaluated (or a cut off occurs below that node), the current bound of that node is its true value Two types of cutoffs: If a min node n has bound =j, then cutoff occurs as long as j >=k If a max node n has bound >=k, and a min ancestor of n, say m, has a bound <=j, then cutoff occurs as long as j<=k

Another alpha-beta example

Click for an animation of Alpha-beta search in action on Tic-Tac-Toen animation of Alpha-beta search in action on Tic-Tac-Toe (order nodes in terms of their static eval values)

Evaluation Functions: TicTacToe If win for Max +infty If lose for Max -infty If draw for Max 0 Else # rows/cols/diags open for Max - #rows/cols/diags open for Min

What depth should we go to? --Deeper the better (but why?) Should we go to uniform depth? --Go deeper in branches where the game is in a flux (backed up values are changing fast) [Called “Quiescence” ] Can we avoid the horizon effect?

Depth Cutoff and Online Search Until now we considered mostly “all or nothing” computations –The computation takes the time it takes, and only at the end will give any answer When the agent has to make decisions online, it needs flexibility in the time it can devote to “thinking” (“deliberation scheduling”) –Can’t do it if we have all-or-nothing computations. We need flexibile or anytime computations The depth-limited min-max is an example of an anytime computation. –Pick a small depth limit. Do the analysis w.r.t. that tree. Decide the best move. Keep it as a back up. If you have more time, go deeper and get a better move. Online Search is not guaranteed to be optimal --The agent may not even survive unless the world is ergodic (non-zero prob. of reach any state from any other state)

Why is “deeper” better? Possible reasons –Taking mins/maxes of the evaluation values of the leaf nodes improves their collective accuracy –Going deeper makes the agent notice “traps” thus significantly improving the evaluation accuracy All evaluation functions first check for termination states before computing the non-terminal evaluation If this is indeed the case, then we should remember the backed-up values for game positions—since they are better than straight evaluations

(just as human weight lifters refuse to compete against cranes)

Uncertain Actions & Games Against Nature

[can generalize to have action costs C(a,s)] If M ij matrix is not known a priori, then we have a reinforcement learning scenario.. Repeat

3,2 4,23,33,13,33,24, This is a game against the nature, and nature decides which outcome of each action will occur. How do you think it will decide?  I am the chosen one: So nature will decide the course that is most beneficial to me [Max-Max]  I am the Loser: So nature will decide the course that is least beneficial to me [Min-Max]  I am a rationalist: Nature is oblivious of me—and it does what it does—so I do “expectation analysis” Leaf node values have been set to their immediate rewards Can do better if we set to them to an estimate of their expected value..

Real Time Dynamic Programming Interleave “search” and “execution” (Real Time Dynamic Programming) Do limited-depth analysis based on reachability to find the value of a state (and there by the best action you should be doing—which is the action that is sending you the best value) The values of the leaf nodes are set to be their immediate rewards –Alternatively some admissible estimate of the value function (h*) If all the leaf nodes are terminal nodes, then the backed up value will be true optimal value. Otherwise, it is an approximation… RTDP For leaf nodes, can use R(s) or some heuristic value h(s)

The expected value computation is fine if you are maximizing “expected” return If you are --if you are risk-averse? (and think “nature” is out to get you) V 2 = min(V 3,V 4 ) If you are perpetual optimist then V 2 = max(V 3,V 4 ) If you have deterministic actions then RTDP becomes RTA* (if you use h(.) to evaluate leaves

RTA* (RTDP with deterministic actions and leaves evaluated by f(.)) Sn m k G S n m G=1 H=2 F=3 G=1 H=2 F=3 k G=2 H=3 F=5 infty --Grow the tree to depth d --Apply f-evaluation for the leaf nodes --propagate f-values up to the parent nodes f(parent) = min( f(children)) RTA* is a special case of RTDP --It is useful for acting in determinostic, dynamic worlds --While RTDP is useful for actiong in stochastic, dynamic worlds LRTA*: Can store backed up values for states (and they will be better heuristics)

End of Gametrees

Game Playing (Adversarial Search) Perfect play –Do minmax on the complete game tree Alpha-Beta pruning (a neat idea that is the bane of many a CSE471 student) Resource limits –Do limited depth lookahead –Apply evaluation functions at the leaf nodes –Do minmax Miscellaneous –Games of Chance –Status of computer games..

(so is MDP policy)

Multi-player Games Everyone maximizes their utility --How does this compare to 2-player games? (Max’s utility is negative of Min’s)

Expecti-Max