U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.

Slides:



Advertisements
Similar presentations
Planning with Non-Deterministic Uncertainty (Where failure is not an option) R&N: Chap. 12, Sect (+ Chap. 10, Sect 10.7)
Advertisements

Markov Decision Process
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Partially Observable Markov Decision Process (POMDP)
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
D ECISION -M AKING WITH P ROBABILISTIC U NCERTAINTY.
Partially Observable Markov Decision Processes
Meeting 3 POMDP (Partial Observability MDP) 資工四 阮鶴鳴 李運寰 Advisor: 李琳山教授.
N ONDETERMINISTIC U NCERTAINTY & S ENSORLESS P LANNING.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
An Introduction to Markov Decision Processes Sarah Hickmott
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Planning under Uncertainty
3/25  Monday 3/31 st 11:30AM BYENG 210 Talk by Dana Nau Planning for Interactions among Autonomous Agents.
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
An Introduction to PO-MDP Presented by Alp Sardağ.
Planning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning Supervised.
1 Target Finding. 2 Example robot’s visibility region hiding region 1 cleared region robot.
Markov Decision Processes
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning.
Solving problems by searching
Dynamic Bayesian Networks CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanningLearning.
CS B 659: I NTELLIGENT R OBOTICS Planning Under Uncertainty.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Overview  Decision processes and Markov Decision Processes (MDP)  Rewards and Optimal Policies  Defining features of Markov Decision Process  Solving.
CSE-473 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Planning with Non-Deterministic Uncertainty. Recap Uncertainty is inherent in systems that act in the real world Last lecture: reacting to unmodeled disturbances.
G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance.
CSE-573 Reinforcement Learning POMDPs. Planning What action next? PerceptsActions Environment Static vs. Dynamic Fully vs. Partially Observable Perfect.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Solving POMDPs through Macro Decomposition
G AME P LAYING 2. T HIS L ECTURE Alpha-beta pruning Games with chance Partially observable games.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
U NCERTAINTY IN S ENSING ( AND ACTION ). P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion.
Search CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Lecture 3: 18/4/1435 Searching for solutions. Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
MDPs (cont) & Reinforcement Learning
U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Planning Under Uncertainty. Sensing error Partial observability Unpredictable dynamics Other agents.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Solving problems by searching Chapter 3. 2 Outline Problem types Example problems Assumptions in Basic Search State Implementation Tree search Example.
Solving problems by searching
CS b659: Intelligent Robotics
Making complex decisions
POMDPs Logistics Outline No class Wed
Reinforcement Learning in POMDPs Without Resets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
ECE 448 Lecture 4: Search Intro
Markov Decision Processes
Robust Belief-based Execution of Manipulation Programs
Markov Decision Processes
CS 188: Artificial Intelligence Fall 2007
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

U NCERTAINTY IN S ENSING ( AND ACTION )

A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

A CTION U NCERTAINTY Each action representation is of the form: Action: a(s) -> {s 1,…,s r } where each s i, i = 1,..., r describes one possible effect of the action in a state s

Right AND/OR T REE action nodes (world “decision” nodes) state nodes (agent decision nodes) Suck(R 1 ) loop

Right AND/OR T REE Suck(R 1 ) loop LeftSuck(R 2 ) goal loop Suck(R 2 ) RightSuck(R 1 ) loop

OR S UB -T REE OR sub-tree : AND-OR tree :: path : classical search tree For each state node, only one child is included For each action node, all children are included It forms a part of a potential solution if none of its nodes is closed A solution is an OR sub-tree in which all leaves are goal states

B ELIEF S TATE A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: To plan a course of actions, the agent searches a space of belief states, instead of a space of states

S ENSOR M ODEL ( DEFINITION #1) State space S The sensor model is a function SENSE: S  2 S that maps each state s  S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) =

S ENSOR M ODEL ( DEFINITION #2) State space S, percept space P The sensor model is a function SENSE: S  P that maps each state s  S to a percept (the percept that the agent would obtain if actually observing state s) We can then define the set of states consistent with the observation P CONSISTENT(P) = { s if SENSE(s)=P } SENSE( ) = CONSISTENT( ) = ??

V ACUUM R OBOT A CTION AND S ENSOR M ODEL Right Appl if s  In(R 1 ) {s 1 = s - In(R 1 ) + In(R 2 ), s 2 = s} [Right does either the right thing, or nothing] Left Appl if s  In(R 2 ) {s 1 = s - In(R 2 ) + In(R 1 ), s 2 = s - In(R 2 ) + In(R 1 ) - Clean(R 2 )} [Left always move the robot to R 1, but it may occasionally deposit dust in R 2 ] Suck(r) Appl s  In(r) {s 1 = s+Clean(r)} [ Suck always does the right thing] The robot perfectly senses the room it is in and whether there is dust in it But it can’t sense if there is dust in the other room State s : any logical conjunction of In(R 1 ), In(R 2 ), Clean(R 1 ), Clean (R 2 ) (notation: + adds an attribute, - removes an attribute)

T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1

T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1 Left Clean(R 1 )  Clean(R 1 )

T RANSITION B ETWEEN B ELIEF S TATES How do you propagate the action/sensing operation to obtain the successors of a belief state? Left Clean(R 1 )  Clean(R 1 )

C OMPUTING THE T RANSITION BETWEEN BELIEF STATES Given an action A, and a belief state S = {s 1,…,s n } Result of applying action, without sensing: Take the union of all SUCC(s i,A) for i=1,…,n This gives us a pre-sensing belief state S’ Possible percepts resulting from sensing: {SENSE(s i ’) for s i ’ in S’} (using SENSE definition #2) This gives us a percept set P Possible states both in S’ AND consistent with each possible percept p j in P: S j = {s i | SENSE(s i ’)=p j for s i ’ in S’} i.e., S j = CONSISTENT(p j ) ∩ S’

AND/OR T REE OF B ELIEF S TATES Left Suck goal A goal belief state is one in which all states are goal states An action is applicable to a belief state B if its precondition is achieved in all states in B Right loop goal

AND/OR T REE OF B ELIEF S TATES Left SuckRight loop Suck goal Right loop goal

AND/OR T REE OF B ELIEF S TATES Left SuckRight Suck goal Right goal

B ELIEF S TATE R EPRESENTATION Solution #1: Represent the set of states explicitly Under the closed world assumption, if states are described with n propositions, there are O(2 n ) states The number of belief states is A belief state may contain O(2 n ) states This can be hugely expensive

B ELIEF S TATE R EPRESENTATION Solution #2: Represent only what is known For example, if the vacuum robot knows that it is in R 1 (so, not in R 2 ) and R 2 is clean, then the representation is K(In(R 1 ))  K(  In(R 2 ))  K(Clean(R 2 )) where K stands for “Knows that...” How many belief states can be represented? Only 3 n, instead of

S UCCESSOR OF A B ELIEF S TATE T HROUGH AN A CTION Left Appl if s  In(R 2 ) {s 1 = s - In(R 2 ) + In(R 1 ), s 2 = s - In(R 2 ) + In(R 1 ) - Clean(R 2 )} K(In(R 2 ))  K(  In(R 1 ))  K(Clean(R 2 )) s 1  K(  In(R 2 ))  K(In(R 1 ))  K(Clean(R 2 )) s 2  K(  In(R 2 ))  K(In(R 1 ))  K(  Clean(R 2 )) K(  In(R 2 ))  K(In(R 1 )) An action does not depend on the agent’s belief state  K does not appear in the action description (different from R&N, p. 440)

S ENSORY A CTIONS So far, we have assumed a unique sensory operation automatically performed after executing of each action of a plan But an agent may have several sensors, each having some cost (e.g., time) to use In certain situations, the agent may like better to avoid the cost of using a sensor, even if using the sensor could reduce uncertainty This leads to introducing specific sensory actions, each with its own representation  active sensing Like with other actions, the agent chooses which sensory actions it want to execute and when

E XAMPLE Check-Dust(r): Appl if s  In(r) {when Clean(r) b’ = b - K(  Clean(r)) + K(Clean(r)) } {when  Clean(r) b’ = b - K(Clean(r)) + K(  Clean(r))} K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 )) K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 ))  K(Clean(R 1 )) K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 ))  K(  Clean(R 1 )) Check-Dust(R 1 ): A sensory action maps a state into a belief state Its precondition is about the state Its effects are on the belief state

I NTRUDER F INDING P ROBLEM A moving intruder is hiding in a 2-D workspace The robot must “sweep” the workspace to find the intruder Both the robot and the intruder are points robot’s visibility region hiding region 1 cleared region robot

D OES A SOLUTION ALWAYS EXIST ? Easy to test: “Hole” in the workspace Hard to test: No “hole” in the workspace No !

I NFORMATION S TATE Example of an information state = (x,y,a=1,b=1,c=0) An initial state is of the form (x,y,1, 1,..., 1) A goal state is any state of the form (x,y,0,0,..., 0) (x,y) a = 0 or 1 c = 0 or 1 b = 0 or 1 0  cleared region 1  hidding region

C RITICAL L INE a=0 b=1 a=0 b=1 Information state is unchanged a=0 b=0 Critical line

A BCD E C RITICALITY -B ASED D ISCRETIZATION Each of the regions A, B, C, D, and E consists of “equivalent” positions of the robot, so it’s sufficient to consider a single position per region

C RITICALITY -B ASED D ISCRETIZATION A B CD E (C, 1, 1) (D, 1)(B, 1)

C RITICALITY -B ASED D ISCRETIZATION A BC D E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0)

C RITICALITY -B ASED D ISCRETIZATION A B CD E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0) (B, 0)(D, 1)

C RITICALITY -B ASED D ISCRETIZATION A CD E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0) (B, 0)(D, 1) Much smaller search tree than with grid-based discretization ! B

S ENSORLESS P LANNING

P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion

P ARTIALLY O BSERVABLE MDP S Consider the MDP model with states s  S, actions a  A Reward R(s) Transition model P(s’|s,a) Discount factor  With sensing uncertainty, initial belief state is a probability distributions over state: b(s) b(s i )  0 for all s i  S,  i b(s i ) = 1 Observations are generated according to a sensor model Observation space o  O Sensor model P(o|s) Resulting problem is a Partially Observable Markov Decision Process (POMDP)

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = ?

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = P(s|   (b),b 0 ) =  s’ P(s|s’,  (b 0 )) P(S 0 =s’) =  s’ P(s|s’,  (b 0 )) b 0 (s’)

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) P(S 2 =s) = ?

POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) What belief states could the robot be after 1 step?

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1

b0b0 oAoA oBoB oCoC oDoD Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation

b0b0 P(o A |b 1 ) Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation b2b2 P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b3b3 b4b4 b5b5

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief b2b2 b3b3 b4b4 b5b5 b 2 (s) = P(s|b 1,o A ) b 3 (s) = P(s|b 1,o B ) b 4 (s) = P(s|b 1,o C ) P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b 5 (s) = P(s|b 1,o D ) Receive observation

b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief b2b2 b3b3 b4b4 b5b5 b 2 (s) = P(s|b 1,o A ) b 3 (s) = P(s|b 1,o B ) b 4 (s) = P(s|b 1,o C ) P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b 5 (s) = P(s|b 1,o D ) Receive observation P(o|b) =  s P(o|s)b(s) P(s|b,o) = P(o|s)P(s|b)/P(o|b) = 1/Z P(o|s) b(s)

B ELIEF - SPACE SEARCH TREE Each belief node has |A| action node successors Each action node has |O| belief successors Each (action,observation) pair (a,o) requires predict/update step similar to HMMs Matrix/vector formulation: b(s): a vector b of length |S| P(s’|s,a): a set of |S|x|S| matrices T a P(o k |s): a vector o k of length |S| b ’ = T a b (predict) P(o k |b’) = o k T b ’ (probability of observation) b k = diag( o k ) b ’ / ( o k T b ’) (update) Denote this operation as b a,o

R ECEDING HORIZON SEARCH Expand belief-space search tree to some depth h Use an evaluation function on leaf beliefs to estimate utilities For internal nodes, back up estimated utilities: U(b) = E[R(s)|b] +  max a  A  o  O P(o|b a )U(b a,o )

QMDP E VALUATION F UNCTION One possible evaluation function is to compute the expectation of the underlying MDP value function over the leaf belief states f(b) =  s U MDP (s) b(s) “Averaging over clairvoyance” Assumes the problem becomes instantly fully observable Is optimistic: U(b)  f(b) Approaches POMDP value function as state and sensing uncertainty decreases In extreme h=1 case, this is called the QMDP policy

QMDP P OLICY (L ITTMAN, C ASSANDRA, K AELBLING 1995 )

W ORST - CASE C OMPLEXITY Infinite-horizon undiscounted POMDPs are undecideable (reduction to halting problem) Exact solution to infinite-horizon discounted POMDPs are intractable even for low |S| Finite horizon: O(|S| 2 |A| h |O| h ) Receding horizon approximation: one-step regret is O(  h ) Approximate solution: becoming tractable for |S| in millions  -vector point-based techniques Monte Carlo tree search …Beyond scope of course…

N EXT TIME Is it possible to learn how to make good decisions just by interacting with the environment? Reinforcement learning R&N

D UE TODAY HW6 Midterm project report HW7 available