Presentation is loading. Please wait.

Presentation is loading. Please wait.

U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty.

Similar presentations


Presentation on theme: "U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty."— Presentation transcript:

1 U NCERTAINTY IN S ENSING ( AND ACTION )

2 A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty

3 A CTION U NCERTAINTY Each action representation is of the form: Action: a(s) -> {s 1,…,s r } where each s i, i = 1,..., r describes one possible effect of the action in a state s

4 Right AND/OR T REE action nodes (world “decision” nodes) state nodes (agent decision nodes) Suck(R 1 ) loop

5 Right AND/OR T REE Suck(R 1 ) loop LeftSuck(R 2 ) goal loop Suck(R 2 ) RightSuck(R 1 ) loop

6 OR S UB -T REE OR sub-tree : AND-OR tree :: path : classical search tree For each state node, only one child is included For each action node, all children are included It forms a part of a potential solution if none of its nodes is closed A solution is an OR sub-tree in which all leaves are goal states

7 B ELIEF S TATE A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: To plan a course of actions, the agent searches a space of belief states, instead of a space of states

8 S ENSOR M ODEL ( DEFINITION #1) State space S The sensor model is a function SENSE: S  2 S that maps each state s  S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) =

9 S ENSOR M ODEL ( DEFINITION #2) State space S, percept space P The sensor model is a function SENSE: S  P that maps each state s  S to a percept (the percept that the agent would obtain if actually observing state s) We can then define the set of states consistent with the observation P CONSISTENT(P) = { s if SENSE(s)=P } SENSE( ) = CONSISTENT( ) = ??

10 V ACUUM R OBOT A CTION AND S ENSOR M ODEL Right Appl if s  In(R 1 ) {s 1 = s - In(R 1 ) + In(R 2 ), s 2 = s} [Right does either the right thing, or nothing] Left Appl if s  In(R 2 ) {s 1 = s - In(R 2 ) + In(R 1 ), s 2 = s - In(R 2 ) + In(R 1 ) - Clean(R 2 )} [Left always move the robot to R 1, but it may occasionally deposit dust in R 2 ] Suck(r) Appl s  In(r) {s 1 = s+Clean(r)} [ Suck always does the right thing] The robot perfectly senses the room it is in and whether there is dust in it But it can’t sense if there is dust in the other room State s : any logical conjunction of In(R 1 ), In(R 2 ), Clean(R 1 ), Clean (R 2 ) (notation: + adds an attribute, - removes an attribute)

11 T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1

12 T RANSITION B ETWEEN B ELIEF S TATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R 1 if there is dust in R 1 Left Clean(R 1 )  Clean(R 1 )

13 T RANSITION B ETWEEN B ELIEF S TATES How do you propagate the action/sensing operation to obtain the successors of a belief state? Left Clean(R 1 )  Clean(R 1 )

14 C OMPUTING THE T RANSITION BETWEEN BELIEF STATES Given an action A, and a belief state S = {s 1,…,s n } Result of applying action, without sensing: Take the union of all SUCC(s i,A) for i=1,…,n This gives us a pre-sensing belief state S’ Possible percepts resulting from sensing: {SENSE(s i ’) for s i ’ in S’} (using SENSE definition #2) This gives us a percept set P Possible states both in S’ AND consistent with each possible percept p j in P: S j = {s i | SENSE(s i ’)=p j for s i ’ in S’} i.e., S j = CONSISTENT(p j ) ∩ S’

15 AND/OR T REE OF B ELIEF S TATES Left Suck goal A goal belief state is one in which all states are goal states An action is applicable to a belief state B if its precondition is achieved in all states in B Right loop goal

16 AND/OR T REE OF B ELIEF S TATES Left SuckRight loop Suck goal Right loop goal

17 AND/OR T REE OF B ELIEF S TATES Left SuckRight Suck goal Right goal

18 B ELIEF S TATE R EPRESENTATION Solution #1: Represent the set of states explicitly Under the closed world assumption, if states are described with n propositions, there are O(2 n ) states The number of belief states is A belief state may contain O(2 n ) states This can be hugely expensive

19 B ELIEF S TATE R EPRESENTATION Solution #2: Represent only what is known For example, if the vacuum robot knows that it is in R 1 (so, not in R 2 ) and R 2 is clean, then the representation is K(In(R 1 ))  K(  In(R 2 ))  K(Clean(R 2 )) where K stands for “Knows that...” How many belief states can be represented? Only 3 n, instead of

20 S UCCESSOR OF A B ELIEF S TATE T HROUGH AN A CTION Left Appl if s  In(R 2 ) {s 1 = s - In(R 2 ) + In(R 1 ), s 2 = s - In(R 2 ) + In(R 1 ) - Clean(R 2 )} K(In(R 2 ))  K(  In(R 1 ))  K(Clean(R 2 )) s 1  K(  In(R 2 ))  K(In(R 1 ))  K(Clean(R 2 )) s 2  K(  In(R 2 ))  K(In(R 1 ))  K(  Clean(R 2 )) K(  In(R 2 ))  K(In(R 1 )) An action does not depend on the agent’s belief state  K does not appear in the action description (different from R&N, p. 440)

21 S ENSORY A CTIONS So far, we have assumed a unique sensory operation automatically performed after executing of each action of a plan But an agent may have several sensors, each having some cost (e.g., time) to use In certain situations, the agent may like better to avoid the cost of using a sensor, even if using the sensor could reduce uncertainty This leads to introducing specific sensory actions, each with its own representation  active sensing Like with other actions, the agent chooses which sensory actions it want to execute and when

22 E XAMPLE Check-Dust(r): Appl if s  In(r) {when Clean(r) b’ = b - K(  Clean(r)) + K(Clean(r)) } {when  Clean(r) b’ = b - K(Clean(r)) + K(  Clean(r))} K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 )) K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 ))  K(Clean(R 1 )) K(In(R 1 ))  K(  In(R 2 ))  K(  Clean(R 2 ))  K(  Clean(R 1 )) Check-Dust(R 1 ): A sensory action maps a state into a belief state Its precondition is about the state Its effects are on the belief state

23 I NTRUDER F INDING P ROBLEM A moving intruder is hiding in a 2-D workspace The robot must “sweep” the workspace to find the intruder Both the robot and the intruder are points robot’s visibility region hiding region 1 cleared region 2 3 4 5 6 robot

24 D OES A SOLUTION ALWAYS EXIST ? Easy to test: “Hole” in the workspace Hard to test: No “hole” in the workspace No !

25 I NFORMATION S TATE Example of an information state = (x,y,a=1,b=1,c=0) An initial state is of the form (x,y,1, 1,..., 1) A goal state is any state of the form (x,y,0,0,..., 0) (x,y) a = 0 or 1 c = 0 or 1 b = 0 or 1 0  cleared region 1  hidding region

26 C RITICAL L INE a=0 b=1 a=0 b=1 Information state is unchanged a=0 b=0 Critical line

27 A BCD E C RITICALITY -B ASED D ISCRETIZATION Each of the regions A, B, C, D, and E consists of “equivalent” positions of the robot, so it’s sufficient to consider a single position per region

28 C RITICALITY -B ASED D ISCRETIZATION A B CD E (C, 1, 1) (D, 1)(B, 1)

29 C RITICALITY -B ASED D ISCRETIZATION A BC D E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0)

30 C RITICALITY -B ASED D ISCRETIZATION A B CD E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0) (B, 0)(D, 1)

31 C RITICALITY -B ASED D ISCRETIZATION A CD E (C, 1, 1) (D, 1)(B, 1)(E, 1)(C, 1, 0) (B, 0)(D, 1) Much smaller search tree than with grid-based discretization ! B

32 S ENSORLESS P LANNING

33 P LANNING W ITH P ROBABILISTIC U NCERTAINTY IN S ENSING No motion Perpendicular motion

34 P ARTIALLY O BSERVABLE MDP S Consider the MDP model with states s  S, actions a  A Reward R(s) Transition model P(s’|s,a) Discount factor  With sensing uncertainty, initial belief state is a probability distributions over state: b(s) b(s i )  0 for all s i  S,  i b(s i ) = 1 Observations are generated according to a sensor model Observation space o  O Sensor model P(o|s) Resulting problem is a Partially Observable Markov Decision Process (POMDP)

35 POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = ?

36 POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) = P(s|   (b),b 0 ) =  s’ P(s|s’,  (b 0 )) P(S 0 =s’) =  s’ P(s|s’,  (b 0 )) b 0 (s’)

37 POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) P(S 2 =s) = ?

38 POMDP U TILITY F UNCTION A policy  (b)  is defined as a map from belief states to actions Expected discounted reward with policy  : U  (b) = E[  t  t R(S t )] where S t is the random variable indicating the state at time t P(S 0 =s) = b 0 (s) P(S 1 =s) =  s’ P(s|s’,  (b)) b 0 (s’) What belief states could the robot be after 1 step?

39 b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1

40 b0b0 oAoA oBoB oCoC oDoD Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation

41 b0b0 P(o A |b 1 ) Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Receive observation b2b2 P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b3b3 b4b4 b5b5

42 b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief b2b2 b3b3 b4b4 b5b5 b 2 (s) = P(s|b 1,o A ) b 3 (s) = P(s|b 1,o B ) b 4 (s) = P(s|b 1,o C ) P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b 5 (s) = P(s|b 1,o D ) Receive observation

43 b0b0 Predict b 1 (s)=  s’ P(s|s’,  (b 0 )) b 0 (s’) Choose action  (b 0 ) b1b1 Update belief b2b2 b3b3 b4b4 b5b5 b 2 (s) = P(s|b 1,o A ) b 3 (s) = P(s|b 1,o B ) b 4 (s) = P(s|b 1,o C ) P(o A |b 1 )P(o B |b 1 )P(o C |b 1 )P(o D |b 1 ) b 5 (s) = P(s|b 1,o D ) Receive observation P(o|b) =  s P(o|s)b(s) P(s|b,o) = P(o|s)P(s|b)/P(o|b) = 1/Z P(o|s) b(s)

44 B ELIEF - SPACE SEARCH TREE Each belief node has |A| action node successors Each action node has |O| belief successors Each (action,observation) pair (a,o) requires predict/update step similar to HMMs Matrix/vector formulation: b(s): a vector b of length |S| P(s’|s,a): a set of |S|x|S| matrices T a P(o k |s): a vector o k of length |S| b ’ = T a b (predict) P(o k |b’) = o k T b ’ (probability of observation) b k = diag( o k ) b ’ / ( o k T b ’) (update) Denote this operation as b a,o

45 R ECEDING HORIZON SEARCH Expand belief-space search tree to some depth h Use an evaluation function on leaf beliefs to estimate utilities For internal nodes, back up estimated utilities: U(b) = E[R(s)|b] +  max a  A  o  O P(o|b a )U(b a,o )

46 QMDP E VALUATION F UNCTION One possible evaluation function is to compute the expectation of the underlying MDP value function over the leaf belief states f(b) =  s U MDP (s) b(s) “Averaging over clairvoyance” Assumes the problem becomes instantly fully observable Is optimistic: U(b)  f(b) Approaches POMDP value function as state and sensing uncertainty decreases In extreme h=1 case, this is called the QMDP policy

47 QMDP P OLICY (L ITTMAN, C ASSANDRA, K AELBLING 1995 )

48 W ORST - CASE C OMPLEXITY Infinite-horizon undiscounted POMDPs are undecideable (reduction to halting problem) Exact solution to infinite-horizon discounted POMDPs are intractable even for low |S| Finite horizon: O(|S| 2 |A| h |O| h ) Receding horizon approximation: one-step regret is O(  h ) Approximate solution: becoming tractable for |S| in millions  -vector point-based techniques Monte Carlo tree search …Beyond scope of course…

49 N EXT TIME Is it possible to learn how to make good decisions just by interacting with the environment? Reinforcement learning R&N 21.1-2

50 D UE TODAY HW6 Midterm project report HW7 available


Download ppt "U NCERTAINTY IN S ENSING ( AND ACTION ). A GENDA Planning with belief states Nondeterministic sensing uncertainty Probabilistic sensing uncertainty."

Similar presentations


Ads by Google