4. Games and adversarial search

Slides:



Advertisements
Similar presentations
Chapter 6, Sec Adversarial Search.
Advertisements

Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
Games and adversarial search
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
Adversarial Search CSE 473 University of Washington.
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Games Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Games and adversarial search
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
Games & Adversarial Search Chapter 6 Section 1 – 4.
CSC 412: AI Adversarial Search
Utility Theory & MDPs Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Games Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore,
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Games and adversarial search (Chapter 5)
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
Games and adversarial search (Chapter 5)
EA C461 – Artificial Intelligence Adversarial Search
Announcements Homework 1 Full assignment posted..
Instructor: Vincent Conitzer
Stochastic tree search and stochastic games
Games and Adversarial Search
Last time: search strategies
PENGANTAR INTELIJENSIA BUATAN (64A614)
Games and adversarial search (Chapter 5)
CS440/ECE448 Lecture 9: Minimax Search
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Pengantar Kecerdasan Buatan
Adversarial Search Chapter 5.
Games & Adversarial Search
Artificial Intelligence
Games & Adversarial Search
Adversarial Search.
Artificial Intelligence
Games & Adversarial Search
Games & Adversarial Search
Artificial Intelligence
Instructor: Vincent Conitzer
Minimax strategies, alpha beta pruning
Game Playing Fifth Lecture 2019/4/11.
Instructor: Vincent Conitzer
Mini-Max search Alpha-Beta pruning General concerns on games
Games & Adversarial Search
Adversarial Search CMPT 420 / CMPG 720.
Adversarial Search CS 171/271 (Chapter 6)
Minimax strategies, alpha beta pruning
Games & Adversarial Search
Adversarial Search Chapter 6 Section 1 – 4.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

4. Games and adversarial search

The problems that arise when we try to plan ahead in a world where other agents are planning against us

Why study games? Games are fun! Clear criteria for success Games often define very large search spaces Games can be a good model of many competitive activities Military confrontations, negotiation, auctions, … Games are a traditional hallmark of intelligence

Search versus Games Search – no adversary Solution is (heuristic) method for finding goal Heuristics and search techniques can find optimal solution Evaluation function: estimate of cost from start to goal through given node Examples: path planning, scheduling activities Games – adversary Solution is strategy (strategy specifies move for every possible opponent reply). Time limits force an approximate solution Evaluation function: evaluate “goodness” of game position Examples: chess, checkers, Othello, backgammon

Types of game environments Deterministic Stochastic Perfect information (fully observable) Imperfect information (partially observable)

Types of game environments Deterministic Stochastic Perfect information (fully observable) Chess, checkers, go Backgammon, monopoly Imperfect information (partially observable) Battleships Scrabble, poker, bridge

Alternating two-player zero-sum games Players take turns Each game outcome or terminal state has a utility for each player (e.g., 1 for win, 0 for loss) The sum of both players’ utilities is a constant

Games vs. single-agent search We don’t know how the opponent will act The solution is not a fixed sequence of actions from start state to goal state, but a strategy or policy (a mapping from state to best move in that state) Efficiency is critical to playing well The time to make a move is limited The branching factor, search depth, and number of terminal configurations are huge In chess, branching factor ≈ 35 and depth ≈ 100, giving a search tree of 10154 nodes This rules out searching all the way to the end of the game

Game tree A game of tic-tac-toe between two players, “max” and “min”

A more abstract game tree 3 3 2 2 Terminal utilities (for MAX) A two-ply game

A more abstract game tree 3 3 2 2 Minimax value of a node: the utility (for MAX) of being in the corresponding state, assuming perfect play on both sides Minimax strategy: Choose the move that gives the best worst-case payoff

Computing the minimax value of a state 3 3 2 2 Minimax(state) = Utility(state) if state is terminal max Minimax(successors(state)) if player = MAX min Minimax(successors(state)) if player = MIN

Computing the minimax value of a state 3 3 2 2 The minimax strategy is optimal against an optimal opponent If the opponent is sub-optimal, the utility can only be higher A different strategy may work better for a sub-optimal opponent, but it will necessarily be worse against an optimal opponent

More general games More than two players, non-zero-sum 4,3,2 4,3,2 1,5,2 4,3,2 7,4,1 1,5,2 7,7,1 More than two players, non-zero-sum Utilities are now tuples Each player maximizes their own utility at each node Utilities get propagated (backed up) from children to parents

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 14

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 5

Alpha-beta pruning It is possible to compute the exact minimax decision without expanding every node in the game tree 3 3 2 2 Could we have done better with the right branch?

Alpha-beta pruning α is the value of the best choice for the MAX player found so far at any choice point above n We want to compute the MIN-value at n As we loop over n’s children, the MIN-value decreases If it drops below α, MAX will never take this branch, so we can ignore n’s remaining children Analogously, β is the value of the lowest-utility choice found so far for the MIN player MAX  MIN MAX n MIN

Alpha-beta pruning Pruning does not affect final result Amount of pruning depends on move ordering Should start with the “best” moves (highest-value for MAX or lowest-value for MIN) For chess, can try captures first, then threats, then forward moves, then backward moves Can also try to remember “killer moves” from other branches of the tree With perfect ordering, branching factor can be cut in two, or depth of search effectively doubled

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) Evaluation function Cut off search at a certain depth and compute the value of an evaluation function for a state instead of its minimax value The evaluation function may be thought of as the probability of winning from a given state or the expected value of that state A common evaluation function is a weighted sum of features: Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) For chess, wk may be the material value of a piece (pawn = 1, knight = 3, rook = 5, queen = 9) and fk(s) may be the advantage in terms of that piece Evaluation functions may be learned from game databases or by having the program play many games against itself

Cutting off search Horizon effect: you may incorrectly estimate the value of a state by overlooking an event that is just beyond the depth limit For example, a damaging move by the opponent that can be delayed but not avoided Possible remedies Quiescence search: do not cut off search at positions that are unstable – for example, are you about to lose an important piece? Singular extension: a strong move that should be tried when the normal depth limit is reached

Additional techniques Transposition table to store previously expanded states Forward pruning to avoid considering all possible moves Lookup tables for opening moves and endgames

Chess playing systems Baseline system: 200 million node evalutions per move (3 min), minimax with a decent evaluation function and quiescence search 5-ply ≈ human novice Add alpha-beta pruning 10-ply ≈ typical PC, experienced player Deep Blue: 30 billion evaluations per move, singular extensions, evaluation function with 8000 features, large databases of opening and endgame moves 14-ply ≈ Garry Kasparov Recent state of the art (Hydra): 36 billion evaluations per second, advanced pruning techniques 18-ply ≈ better than any human alive?

Games of chance How to incorporate dice throwing into the game tree?

Games of chance

Games of chance Expectiminimax: for chance nodes, average values weighted by the probability of each outcome Nasty branching factor, defining evaluation functions and pruning algorithms more difficult Monte Carlo simulation: when you get to a chance node, simulate a large number of games with random dice rolls and use win percentage as evaluation function Can work well for games like Backgammon

Partially observable games Card games like bridge and poker Monte Carlo simulation: deal all the cards randomly in the beginning and pretend the game is fully observable “Averaging over clairvoyance” Problem: this strategy does not account for bluffing, information gathering, etc.

Game playing algorithms today Computers are better than humans Checkers: solved in 2007 Chess: IBM Deep Blue defeated Kasparov in 1997 Computers are competitive with top human players Backgammon: TD-Gammon system used reinforcement learning to learn a good evaluation function Bridge: top systems use Monte Carlo simulation and alpha-beta search Computers are not competitive Go: branching factor 361. Existing systems use Monte Carlo simulation and pattern databases