Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Games and adversarial search
Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth ( 
For Monday Read chapter 7, sections 1-4 Homework: –Chapter 4, exercise 1 –Chapter 5, exercise 9.
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
CS 484 – Artificial Intelligence
1 Game Playing. 2 Outline Perfect Play Resource Limits Alpha-Beta pruning Games of Chance.
Advanced Artificial Intelligence
Adversarial Search 對抗搜尋. Outline  Optimal decisions  α-β pruning  Imperfect, real-time decisions.
Reinforcement Learning
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Game Playing 최호연 이춘우. Overview Intro: Games as search problems Perfect decisions in 2-person games Imperfect decisions Alpha-beta pruning.
Adversarial Search Board games. Games 2 player zero-sum games Utility values at end of game – equal and opposite Games that are easy to represent Chess.
Games Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Games Henry Kautz.
Markov Decision Processes
Problem Solving Using Search Reduce a problem to one of searching a graph. View problem solving as a process of moving through a sequence of problem states.
This time: Outline Game playing The minimax algorithm
91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010 Lecture 16: MEU / Utilities Oct 8, 2010 A subset of Lecture 8 slides of Dan Klein – UC.
Games and adversarial search
Game-Playing Read Chapter 6 Adversarial Search. Game Types Two-person games vs multi-person –chess vs monopoly Perfect Information vs Imperfect –checkers.
Reinforcement Learning
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
double AlphaBeta(state, depth, alpha, beta) begin if depth
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Game-Playing Read Chapter 6 Adversarial Search. State-Space Model Modified States the same Operators depend on whose turn Goal: As before: win or win.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Machine Learning Chapter 13. Reinforcement Learning
Reinforcement Learning
Utility Theory & MDPs Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Chapter 12 Adversarial Search. (c) 2000, 2001 SNU CSE Biointelligence Lab2 Two-Agent Games (1) Idealized Setting  The actions of the agents are interleaved.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 6 –Adversarial Search Thursday –AIMA, Ch. 6 –More Adversarial Search The “Luke.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
For Wednesday Read chapter 7, sections 1-4 Homework: –Chapter 6, exercise 1.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Search exploring the consequences of possible actions.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
Reinforcement Learning
For Friday Finish chapter 6 Program 1, Milestone 1 due.
Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?
Reinforcement Learning
CS 188: Artificial Intelligence Spring 2006 Lecture 23: Games 4/18/2006 Dan Klein – UC Berkeley.
Backgammon Group 1: - Remco Bras - Tim Beyer - Maurice Hermans - Esther Verhoef - Thomas Acker.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/19 at 5:00pm.  Optional.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
1 Chapter 6 Game Playing. 2 Chapter 6 Contents l Game Trees l Assumptions l Static evaluation functions l Searching game trees l Minimax l Bounded lookahead.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
4. Games and adversarial search
Reinforcement Learning
Optimizing Minmax Alpha-Beta Pruning Real Time Decisions
Backgammon project Oren Salzman Guy Levit Instructors:
Markov Decision Processes
Expectimax Lirong Xia. Expectimax Lirong Xia Project 2 MAX player: Pacman Question 1-3: Multiple MIN players: ghosts Extend classical minimax search.
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Lecture 1C: Adversarial Search(Game)
Reinforcement Learning
Expectimax Lirong Xia.
Presentation transcript:

Probability CSE 473 – Autumn 2003 Henry Kautz

ExpectiMax

Hungry Monkey: 2-Ply Game Tree jump shake 2/3 1/3 1/6 5/6 1/6 5/6

ExpectiMax 1 – Chance Nodes 0 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6

ExpectiMax 2 – Max Nodes 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6

ExpectiMax 3 – Chance Nodes 1/2 1/3 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6

ExpectiMax 4 – Max Node 1/2 1/3 2/ / / / jump shake 2/3 1/3 1/6 5/6 1/6 5/6

Policies The result of the ExpectiMax analysis is a conditional plan (also called a policy): –Optimal plan for 2 steps: jump; shake –Optimal plan for 3 steps: jump; if (ontable) {shake; shake} else {jump; shake} Probabilistic planning can be generalized in many ways, including: –Action costs –Hidden state The general problem is that of solving a Markov Decision Process (MDP)

2 Player Games of Chance

Backgammon Branching factor: –Chance node: 21 –Max node: about 20 on average –Size of tree: O(c k m k ) –In practice: can search 3 plies Neurogammon & TD-Gammon (Tesauro 1995) –Learned weights on static evaluation function by playing against itself –Use results of games to optimize weights: “Punish” features that were on in losing games “Reward” features that were on in winning games –A kind of reinforcement learning –Became world’s best backgammon player!