CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.

Slides:



Advertisements
Similar presentations
Chapter 6, Sec Adversarial Search.
Advertisements

Adversarial Search Chapter 6 Sections 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
February 7, 2006AI: Chapter 6: Adversarial Search1 Artificial Intelligence Chapter 6: Adversarial Search Michael Scherger Department of Computer Science.
Games & Adversarial Search
CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Solving Problems by Searching Paula Matuszek Spring, 2010 Slides based on Hwee Tou.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
1 Game Playing. 2 Outline Perfect Play Resource Limits Alpha-Beta pruning Games of Chance.
Adversarial Search Game Playing Chapter 6. Outline Games Perfect Play –Minimax decisions –α-β pruning Resource Limits and Approximate Evaluation Games.
Adversarial Search CSE 473 University of Washington.
Adversarial Search Chapter 6.
Artificial Intelligence for Games Game playing Patrick Olivier
Advanced Artificial Intelligence
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
An Introduction to Artificial Intelligence Lecture VI: Adversarial Search (Games) Ramin Halavati In which we examine problems.
1 Adversarial Search Chapter 6 Section 1 – 4 The Master vs Machine: A Video.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
EIE426-AICV 1 Game Playing Filename: eie426-game-playing-0809.ppt.
G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Games and adversarial search
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
How computers play games with you CS161, Spring ‘03 Nathan Sturtevant.
Games & Adversarial Search Chapter 6 Section 1 – 4.
CS 188: Artificial Intelligence Spring 2007 Lecture 7: CSP-II and Adversarial Search 2/6/2007 Srini Narayanan – ICSI and UC Berkeley Many slides over the.
Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining.
CSC 412: AI Adversarial Search
1 Game Playing Why do AI researchers study game playing? 1.It’s a good reasoning problem, formal and nontrivial. 2.Direct comparison with humans and other.
Quiz 4 : CSP  Tree-structured CSPs require only polynomial time.True  If a CSP is not tree-structured, you can always transform it into a tree. True.
Adversarial Search. Game playing Perfect play The minimax algorithm alpha-beta pruning Resource limitations Elements of chance Imperfect information.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
1 Adversarial Search CS 171/271 (Chapter 6) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Adversarial Search Chapter 6 Section 1 – 4. Search in an Adversarial Environment Iterative deepening and A* useful for single-agent search problems What.
CHAPTER 4 PROBABILITY THEORY SEARCH FOR GAMES. Representing Knowledge.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
CSE373: Data Structures & Algorithms Lecture 23: Intro to Artificial Intelligence and Game Theory Based on slides adapted Luke Zettlemoyer, Dan Klein,
Paula Matuszek, CSC 8520, Fall Based in part on aima.eecs.berkeley.edu/slides-ppt 1 CS 8520: Artificial Intelligence Adversarial Search Paula Matuszek.
Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
CS 188: Artificial Intelligence Fall 2006 Lecture 7: Adversarial Search 9/19/2006 Dan Klein – UC Berkeley Many slides over the course adapted from either.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
CS 188: Artificial Intelligence Fall 2008 Lecture 6: Adversarial Search 9/16/2008 Dan Klein – UC Berkeley Many slides over the course adapted from either.
Adversarial Search Chapter 5 Sections 1 – 4. AI & Expert Systems© Dr. Khalid Kaabneh, AAU Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Adversarial Search CMPT 463. When: Tuesday, April 5 3:30PM Where: RLC 105 Team based: one, two or three people per team Languages: Python, C++ and Java.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
5/4/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 9, 5/4/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
PENGANTAR INTELIJENSIA BUATAN (64A614)
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Adversarial Search Chapter 5.
CS 188: Artificial Intelligence Fall 2007
Games & Adversarial Search
Games & Adversarial Search
Adversarial Search.
Games & Adversarial Search
Games & Adversarial Search
Lecture 1C: Adversarial Search(Game)
CS 188: Artificial Intelligence Spring 2006
Games & Adversarial Search
Adversarial Search CS 171/271 (Chapter 6)
Games & Adversarial Search
Adversarial Search Chapter 6 Section 1 – 4.
Presentation transcript:

CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley

Today  Reminder: P3 due at midnight  Finish reinforcement learning  Function approximation  Start game playing  Minimax search

Project 2 Contest Results  Naïve Bayes  Runners-up: Chris Crutchfield and Wei Tu (83%)  Number of curves in the image  Ratio of height to width  Runners-up: Danny Guan and Daniel Low (83%)  Percentage of active pixels  Maximum contiguous active pixels per row  Winners: Taylor Berg-Kirkpatrick and Fenna Krienen (84%)  Color changes across rows and columns

Project 2 Contest Results  Perceptron  Runner-up: Victor Feldman (86% on 1K training)  Center of mass of all active pixels  Runner-up: Jocelyn Cozzo (91%)  Percentage of active pixels  Randomized prediction on ties  Winners: Taylor Berg-Kirkpatrick and Fenna Krienen (92%)  Color changes across rows and columns  25 training iterations

Project 2 Contest Results  Other approaches  Dan Gillick (94%)  Nearest neighbor classifier  Overlapping pixels Euclidian distance function  Only considers a pruned set of training instances that are sufficiently distant from each other  The GSIs (XX%)  Only 10 minutes of work  How did they do it?

Game Playing in Practice  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Exact solution imminent.  Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply.  Othello: human champions refuse to compete against computers, who are too good.  Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

Game Playing  Axes:  Deterministic or not  Number of players  Perfect information or not  Want algorithms for calculating a strategy (policy) which recommends a move in each state

Deterministic Single Player?  Deterministic, single player, perfect information:  Know the rules  Know what moves will do  Have some utility function over outcomes  E.g. Freecell, 8-Puzzle, Rubik’s cube  … it’s (basically) just search!  Slight reinterpretation:  Calculate best utility from each node  Each node is a max over children  Note that goal values are on the goal, not path sums as before 8256

Stochastic Single Player  What if we don’t know what the result of an action will be?  E.g. solitaire, minesweeper, trying to drive home  … just an MDP!  Can also do expectimax search  Chance nodes, like actions except the environment controls the action chosen  Calculate utility for each node  Max nodes as in search  Chance nodes take expectations of children 8256

Deterministic Two Player (Turns)  E.g. tic-tac-toe  Minimax search  Basically, a state-space search tree  Each layer, or ply, alternates players  Choose move to position with highest minimax value = best achievable utility against best play  Zero-sum games  One player maximizes result  The other minimizes result 8256

Minimax Example

Minimax Search

Minimax Properties  Optimal against a perfect player. Otherwise?  Time complexity?  O(b m )  Space complexity?  O(bm)  For chess, b  35, m  100  Exact solution is completely infeasible  But, do we need to explore the whole tree?

Multi-Player Games  Similar to minimax:  Utilities are now tuples  Each player maximizes their own entry at each node  Propagate (or back up) nodes from children 1,2,61,2,64,3,24,3,26,1,26,1,27,4,17,4,15,1,15,1,11,5,21,5,27,7,17,7,15,4,55,4,5

Games with Chance  E.g. backgammon  Expectiminimax search!  Environment is an extra player than moves after each agent  Chance nodes take expectations, otherwise like minimax

Games with Chance  Dice rolls increase b: 21 possible rolls with 2 dice  Backgammon  20 legal moves  Depth 4 = 20 x (21 x 20) x 10 9  As depth increases, probability of reaching a given node shrinks  So value of lookahead is diminished  So limiting depth is less damaging  But pruning is less possible…  TDGammon uses depth-2 search + very good eval function + reinforcement learning: world- champion level play

Games with Hidden Information  Imperfect information:  E.g., card games, where opponent's initial cards are unknown  Typically we can calculate a probability for each possible deal  Seems just like having one big dice roll at the beginning of the game  Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals  Special case: if an action is optimal for all deals, it's optimal.  GIB, current best bridge program, approximates this idea by  1) generating 100 deals consistent with bidding information  2) picking the action that wins most tricks on average  Drawback to this approach?  It’s broken!  (Though useful in practice)

Averaging over Deals is Broken  Road A leads to a small heap of gold pieces  Road B leads to a fork:  take the left fork and you'll find a mound of jewels;  take the right fork and you'll be run over by a bus.  Road A leads to a small heap of gold pieces  Road B leads to a fork:  take the left fork and you'll be run over by a bus;  take the right fork and you'll find a mound of jewels.  Road A leads to a small heap of gold pieces  Road B leads to a fork:  guess correctly and you'll nd a mound of jewels;  guess incorrectly and you'll be run over by a bus.

Efficient Search  Several options:  Pruning: avoid regions of search tree which will never enter into (optimal) play  Limited depth: don’t search very far into the future, approximate utility with a value function (familiar?)

Next Class  More game playing  Pruning  Limited depth search  Connection to reinforcement learning!

 -  Pruning Example

Q-Learning  Model free, TD learning with Q-functions:

Function Approximation  Problem: too slow to learn each state’s utility one by one  Solution: what we learn about one state should generalize to similar states  Very much like supervised learning  If states are treated entirely independently, we can only learn on very small state spaces

Discretization  Can put states into buckets of various sizes  E.g. can have all angles between 0 and 5 degrees share the same Q estimate  Buckets too fine  takes a long time to learn  Buckets too coarse  learn suboptimal, often jerky control  Real systems that use discretization usually require clever bucketing schemes  Adaptive sizes  Tile coding  [DEMOS]

Linear Value Functions  Another option: values are linear functions of features of states (or action-state pairs)  Good if you can describe states well using a few features (e.g. for game playing board evaluations)  Now we only have to learn a few weights rather than a value for each state

TD Updates for Linear Values  Can use TD learning with linear values  (Actually it’s just like the perceptron!)  Old Q-learning update:  Simply update weights of features in Q  (a,s)

Example: TD for Linear Qs