U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.

Slides:



Advertisements
Similar presentations
Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)
Advertisements

Markov Decision Process
Partially Observable Markov Decision Process (POMDP)
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
D ECISION -M AKING WITH P ROBABILISTIC U NCERTAINTY.
Partially Observable Markov Decision Processes
Decision Theoretic Planning
5/11/2015 Mahdi Naser-Moghadasi Texas Tech University.
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Planning under Uncertainty
1 Policies for POMDPs Minqing Hu. 2 Background on Solving POMDPs MDPs policy: to find a mapping from states to actions POMDPs policy: to find a mapping.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
Incremental Pruning CSE 574 May 9, 2003 Stanley Kok.
Making Decisions under Probabilistic Uncertainty (Where an agent optimizes what it gets on average, but it may get more... or less ) R&N: Chap. 17, Sect.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
CS B 659: I NTELLIGENT R OBOTICS Planning Under Uncertainty.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Conference Paper by: Bikramjit Banerjee University of Southern Mississippi From the Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
CPSC 502, Lecture 13Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 13 Oct, 25, 2011 Slide credit POMDP: C. Conati.
Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Planning with Non-Deterministic Uncertainty. Recap Uncertainty is inherent in systems that act in the real world Last lecture: reacting to unmodeled disturbances.
G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Solving POMDPs through Macro Decomposition
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
U NCERTAINTY IN S ENSING ( AND ACTION ). P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion.
Lecture 3: 18/4/1435 Searching for solutions. Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Solving problems by searching Chapter 3. 2 Outline Problem types Example problems Assumptions in Basic Search State Implementation Tree search Example.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
CS b659: Intelligent Robotics
Making complex decisions
POMDPs Logistics Outline No class Wed
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Markov Decision Processes
Markov Decision Processes
CS 188: Artificial Intelligence Fall 2007
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Instructor: Vincent Conitzer
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

U NCERTAINTY IN S ENSING ( AND ACTION )

A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

B ELIEF S TATE A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: To plan a course of actions, the agent searches a space of belief states, instead of a space of states

S ENSOR M ODEL ( DEFINITION #1) State space S The sensor model is a function SENSE: S  2 S that maps each state s  S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) =

S ENSOR M ODEL ( DEFINITION #2) State space S, percept space P The sensor model is a function SENSE: S  P that maps each state s  S to a percept (the percept that the agent would obtain if actually observing state s) We can then define the set of states consistent with the observation P CONSISTENT(P) = { s if SENSE(s)=P } SENSE( ) = CONSISTENT( ) = ??

V ACUUM R OBOT A CTION AND S ENSOR M ODEL Right Appl if s  In(R 1 ) {s 1 = s - In(R 1 ) + In(R 2 ), s 2 = s} [Right does either the right thing, or nothing] Left Appl if s  In(R 2 ) {s 1 = s - In(R 2 ) + In(R 1 ), s 2 = s - In(R 2 ) + In(R 1 ) - Clean(R 2 )} [Left always move the robot to R 1, but it may occasionally deposit dust in R 2 ] Suck(r) Appl s  In(r) {s 1 = s+Clean(r)} [ Suck always does the right thing] The robot perfectly senses the room it is in and whether there is dust in it But it can’t sense if there is dust in the other room State s : any logical conjunction of In(R 1 ), In(R 2 ), Clean(R 1 ), Clean (R 2 ) (notation: + adds an attribute, - removes an attribute)

T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1

T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1 Left Clean(R 1 )  Clean(R 1 )

T RANSITION B ETWEEN B ELIEF S TATES How do you propagate the action/sensing operation to obtain the successors of a belief state? Left Clean(R 1 )  Clean(R 1 )

C OMPUTING THE T RANSITION BETWEEN BELIEF STATES Given an action A, and a belief state S = {s 1,…,s n } Result of applying action, without sensing: Take the union of all SUCC(s i,A) for i=1,…,n This gives us a pre-sensing belief state S’ Possible percepts resulting from sensing: {SENSE(s i ’) for s i ’ in S’} (using SENSE definition #2) This gives us a percept set P Possible states both in S’ AND consistent with each possible percept p j in P: S j = {s i | SENSE(s i ’)=p j for s i ’ in S’} i.e., S j = CONSISTENT(p j ) ∩ S’

AND/OR T REE OF B ELIEF S TATES Left Suck goal A goal belief state is one in which all states are goal states An action is applicable to a belief state B if its precondition is achieved in all states in B Right loop goal

P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion

P ARTIALLY O BSERVABLE MDP S Consider the MDP model with states s  S, actions a  A Reward R(s) Transition model P(s’|s,a) Discount factor  With sensing uncertainty, initial belief state is a probability distributions over state: b(s) b(s i )  0 for all s i  S,  i b(s i ) = 1 Observations are generated according to a sensor model Observation space o  O Sensor model P(o|s) Resulting problem is a Partially Observable Markov Decision Process (POMDP)

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = ?

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = P(s|  (b  ),b 0 ) =  s’ P(s|s’,  (b 0 )) P(S 0 =s’) =  s’ P(s|s’,  (b 0 )) b 0 (s’)

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) P(S 2 =s) = ?

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) What belief states could the robot take on after 1 step?

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1

b0b0 oAoA oBoB oCoC oDoD Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation

b0b0 P(o A |b 1 ) Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation b 1,A P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b 1,B b 1,C b 1,D

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief b 1,A (s) = P(s|b 1,o A ) P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) Receive observation b 1,A b 1,B b 1,C b 1,D b 1,B (s) = P(s|b 1,o B ) b 1,C (s) = P(s|b 1,o C ) b 1,D (s) = P(s|b 1,o D )

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) Receive observation P(o|b) =  s P(o|s)b(s) P(s|b,o) = P(o|s)P(s|b)/P(o|b) = 1/Z P(o|s) b(s) b 1,A (s) = P(s|b 1,o A ) b 1,B (s) = P(s|b 1,o B ) b 1,C (s) = P(s|b 1,o C ) b 1,D (s) = P(s|b 1,o D ) b 1,A b 1,B b 1,C b 1,D

B ELIEF - SPACE SEARCH TREE Each belief node has |A| action node successors Each action node has |O| belief successors Each (action,observation) pair (a,o) requires predict/update step similar to HMMs Matrix/vector formulation: b(s): a vector b of length |S| P(s’|s,a): a set of |S|x|S| matrices T a P(o k |s): a vector o k of length |S| b a = T a b (predict) P(o k |b a ) = o k T b a (probability of observation) b a,k = diag( o k ) b a / ( o k T b a ) (update) Denote this operation as b a,o

R ECEDING HORIZON SEARCH Expand belief-space search tree to some depth h Use an evaluation function on leaf beliefs to estimate utilities For internal nodes, back up estimated utilities: U(b) = E[R(s)|b] +  max a  A  o  O P(o|b a )U(b a,o )

QMDP E VALUATION F UNCTION One possible evaluation function is to compute the expectation of the underlying MDP value function over the leaf belief states f(b) =  s U MDP (s) b(s) “Averaging over clairvoyance” Assumes the problem becomes instantly fully observable after 1 action Is optimistic: U(b)  f(b) Approaches POMDP value function as state and sensing uncertainty decreases In extreme h=1 case, this is called the QMDP policy

QMDP P OLICY (L ITTMAN, C ASSANDRA, K AELBLING 1995 )

W ORST - CASE C OMPLEXITY Infinite-horizon undiscounted POMDPs are undecideable (reduction to halting problem) Exact solution to infinite-horizon discounted POMDPs are intractable even for low |S| Finite horizon: O(|S| 2 |A| h |O| h ) Receding horizon approximation: one-step regret is O(  h ) Approximate solution: becoming tractable for |S| in millions  -vector point-based techniques Monte Carlo tree search …Beyond scope of course…

S CHEDULE 11/29: Robotics 12/1: Guest lecture: Mike Gasser, Natural Language Processing 12/6: Review 12/8: Final project presentations, review