Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University.

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle PineauTractable Planning for Real-World Robotics2 Robots in unstructured environments

Joelle PineauTractable Planning for Real-World Robotics3 A vision for robotic-assisted health-care Moving things around Moving things around Supporting inter-personal communication Supporting inter-personal communication Calling for help in emergencies Calling for help in emergencies Monitoring Rx adherence & safety Monitoring Rx adherence & safety Providing information Providing information Reminding to eat, drink, & take meds Reminding to eat, drink, & take meds Providing physical assistance Providing physical assistance

Joelle PineauTractable Planning for Real-World Robotics4 What is hard about planning for real-world robotics? Generating a plan that has high expected utility. Switching between different representations of the world. Resolving conflicts between interfering jobs. Accomplishing jobs in changing, partly unknown environments. Handling percepts which are incomplete, ambiguous, outdated, incorrect. * Highlights from a proposed robot grand challenge

Joelle PineauTractable Planning for Real-World Robotics5 Talk outline Uncertainty in plan-based robotics Partially Observable Markov Decision Processes (POMDPs) POMDP solver #1: Point-based value iteration (PBVI) POMDP solver #2: Policy-contingent abstraction (PolCA+)

Joelle PineauTractable Planning for Real-World Robotics6 Why use a POMDP? POMDPs provide a rich framework for sequential decision-making, which can model: –Effect uncertainty –State uncertainty –Varying rewards across actions and goals

Joelle PineauTractable Planning for Real-World Robotics7 Robotics: Robust mobile robot navigation [Simmons & Koenig, 1995; + many more] Autonomous helicopter control [Bagnell & Schneider, 2001; Ng et al., 2003] Machine vision [Bandera et al., 1996; Darrell & Pentland, 1996] High-level robot control [Pineau et al., 2003] Robust dialogue management [Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000] POMDP applications in last decade Others: Machine maintenance [Puterman., 1994] Network troubleshooting [Thiebeaux et al., 1996] Circuits testing [correspondence, 2004] Preference elicitation [Boutilier, 2002] Medical diagnosis [Hauskrecht, 1997]

Joelle PineauTractable Planning for Real-World Robotics8 POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: What goes on: s t-1 stst a t-1 atat S = state set A = action set Z = observation set What we see: z t-1 ztzt

Joelle PineauTractable Planning for Real-World Robotics9 POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: What goes on: s t-1 stst a t-1 atat T: Pr(s’|s,a) = state-to-state transition probabilitiesS = state set A = action set Z = observation set What we see: z t-1 ztzt

Joelle PineauTractable Planning for Real-World Robotics10 POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: What goes on: s t-1 stst a t-1 atat T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities S = state set A = action set Z = observation set What we see: z t-1 ztzt

Joelle PineauTractable Planning for Real-World Robotics11 POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: What goes on: s t-1 stst a t-1 atat T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function S = state set A = action set Z = observation set What we see: z t-1 ztzt r t-1 rtrt

Joelle PineauTractable Planning for Real-World Robotics12 POMDP model POMDP is n-tuple { S, A, Z, T, O, R }: What goes on: s t-1 stst a t-1 atat What we see: z t-1 ztzt What we infer: b t-1 btbt r t-1 rtrt T: Pr(s’|s,a) = state-to-state transition probabilities O: Pr(z|s,a) = observation generation probabilities R(s,a) = reward function S = state set A = action set Z = observation set

Joelle PineauTractable Planning for Real-World Robotics13 Examples of robot beliefs robot particles Uniform belief Bi-modal belief

Joelle PineauTractable Planning for Real-World Robotics14 Understanding the belief state A belief is a probability distribution over states Where Dim(B) = |S|-1 –E.g. Let S={s 1, s 2 } P(s 1 ) 0 1

Joelle PineauTractable Planning for Real-World Robotics15 Understanding the belief state A belief is a probability distribution over states Where Dim(B) = |S|-1 –E.g. Let S={s 1, s 2, s 3 } P(s 1 ) P(s 2 ) 0 1 1

Joelle PineauTractable Planning for Real-World Robotics16 Understanding the belief state A belief is a probability distribution over states Where Dim(B) = |S|-1 –E.g. Let S={s 1, s 2, s 3, s 4 } P(s 1 ) P(s 2 ) 0 1 1 P(s 3 )

Joelle PineauTractable Planning for Real-World Robotics17 POMDP solving Objective: Find the sequence of actions that maximizes the expected sum of rewards. Value function Immediate reward Future reward

Joelle PineauTractable Planning for Real-World Robotics18 Represent V(b) as the upper surface of a set of vectors. –Each vector is a piece of the control policy (= action sequences). –Dim(vector) = number of states. Modify / add vectors to update value fn (i.e. refine policy). POMDP value function P(s 1 ) V(b) b 2 states

Joelle PineauTractable Planning for Real-World Robotics19 Optimal POMDP solving Simple problem: 2 states, 3 actions, 3 observations V 0 (b) b Plan length #vectors 0 1 P(break-in)

Joelle PineauTractable Planning for Real-World Robotics20 Optimal POMDP solving Simple problem: 2 states, 3 actions, 3 observations P(break-in) V 1 (b) b Plan length # vectors 0 1 1 3 Call-911 Investigate Go-to-bed

Joelle PineauTractable Planning for Real-World Robotics21 Optimal POMDP solving Simple problem: 2 states, 3 actions, 3 observations V 2 (b) b Plan length # vectors 0 1 1 3 2 27 P(break-in) Call-911 Investigate Go-to-bed

Joelle PineauTractable Planning for Real-World Robotics22 Optimal POMDP solving Simple problem: 2 states, 3 actions, 3 observations V 3 (b) b Plan length # vectors 0 1 1 3 2 27 3 2187 P(break-in) Call-911 Investigate Go-to-bed

Joelle PineauTractable Planning for Real-World Robotics23 Optimal POMDP solving Simple problem: 2 states, 3 actions, 3 observations Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907 V 4 (b) b P(break-in) Call-911 Investigate Go-to-bed

Joelle PineauTractable Planning for Real-World Robotics24 The curse of history Policy size grows exponentially with the planning horizon: WhereΓ = policy size n = planning horizon A = # actions Z = # observations

Joelle PineauTractable Planning for Real-World Robotics25 How many vectors for this problem? 10 4 (navigation) x 10 3 (dialogue) states 1000+ observations 100+ actions

Joelle PineauTractable Planning for Real-World Robotics27 Exact solving assumes all beliefs are equally likely robot particles Uniform beliefBi-modal beliefN-modal belief INSIGHT: No sequence of actions and observations can produce this N-modal belief.

Joelle PineauTractable Planning for Real-World Robotics28 A new algorithm: Point-based value iteration P(s 1 ) V(b) b1b1 b0b0 b2b2 Approach: Select a small set of belief points Plan for those belief points only

Joelle PineauTractable Planning for Real-World Robotics29 A new algorithm: Point-based value iteration P(s 1 ) V(b) b1b1 b0b0 b2b2 a,z Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only

Joelle PineauTractable Planning for Real-World Robotics30 A new algorithm: Point-based value iteration P(s 1 ) V(b) b1b1 b0b0 b2b2 a,z Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient

Joelle PineauTractable Planning for Real-World Robotics31 A new algorithm: Point-based value iteration Approach: Select a small set of belief points  Use well-separated, reachable beliefs Plan for those belief points only  Learn value and its gradient Pick action that maximizes value: P(s 1 ) V(b) b b1b1 b0b0 b2b2

Joelle PineauTractable Planning for Real-World Robotics32 The anytime PBVI algorithm Alternate between: 1. Growing the set of belief point 2. Planning for those belief points Terminate when you run out of time or have a good policy.

Joelle PineauTractable Planning for Real-World Robotics33 Complexity of value update Exact UpdatePBVI Time: ProjectionS 2 A Z  n S 2 A Z  n SumS A  n Z S A Z  n B Size: (# vectors) A  n Z B where:S = # states  n = # vectors at iteration n A = # actionsB = # belief points Z = # observations

Joelle PineauTractable Planning for Real-World Robotics34 Theoretical properties of PBVI Theorem: For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by: P(s 1 ) V(b) b1b1 b0b0 b2b2 Where  is the set of reachable beliefs B is the set of all beliefs Err

Joelle PineauTractable Planning for Real-World Robotics35 The anytime PBVI algorithm Alternate between: 1. Growing the set of belief point 2. Planning for those belief points Terminate when you run out of time or have a good policy.

Joelle PineauTractable Planning for Real-World Robotics36 PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: –Focus on reachable beliefs. P(s 1 ) bb a1,z2 b a2,z2 b a2,z1 b a1,z1 a2,z2a1,z2 a2,z1 a1,z1

Joelle PineauTractable Planning for Real-World Robotics37 PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: –Focus on reachable beliefs. 2. Focus on high probability beliefs: –Consider all actions, but stochastic observation choice. P(s 1 ) bb a1,z2 b a2,z1 a1,z2 a2,z1 b a2,z2 b a1,z1

Joelle PineauTractable Planning for Real-World Robotics38 PBVI’s belief selection heuristic 1. Leverage insight from policy search methods: –Focus on reachable beliefs. 2. Focus on high probability beliefs: –Consider all actions, but stochastic observation choice. 3. Use the error bound on point-based value updates: –Select well-separated beliefs, rather than near-by beliefs. P(s 1 ) bb a1,z2 b a2,z1 a1,z2 a2,z1

Joelle PineauTractable Planning for Real-World Robotics39 Classes of value function approximations 1. No belief [Littman et al., 1995] 3. Compressed belief [Poupart&Boutilier, 2002; Roy&Gordon, 2002] x1x1 x0x0 x2x2 2. Grid over belief [Lovejoy, 1991; Brafman 1997; Hauskrecht, 2000; Zhou&Hansen, 2001] 4. Sample belief points [Poon, 2001; Pineau et al, 2003]

Joelle PineauTractable Planning for Real-World Robotics40 Performance on well-known POMDPs Maze1 0.20 0.94 0.00 2.30 2.25 REWARD Maze2 0.11 - 0.07 0.35 0.34 Maze3 0.26 - 0.11 0.53 Maze1 0.19 - 24hrs 12166 3448 TIME Maze2 1.44 - 24hrs 27898 360 Maze3 0.51 - 24hrs 450 288 Maze1 - 174 - 660 470 # Belief points Maze2 - 337 - 1840 95 Maze3 - 300 86 Method No belief [Littman&al., 1995] Grid [Brafman., 1997] Compressed [Poupart&al., 2003] Sample [Poon, 2001] PBVI [Pineau&al., 2003] Maze1: 36 states Maze2: 92 states Maze3: 60 states

Joelle PineauTractable Planning for Real-World Robotics41 PBVI in the Nursebot domain Objective: Find the patient! State space = RobotPosition  PatientPosition Observation space = RobotPosition + PatientFound Action space = {North, South, East, West, Declare}

Joelle PineauTractable Planning for Real-World Robotics42 PBVI performance on find-the-patient domain Patient found 17% of trials Patient found 90% of trials No Belief PBVI No Belief PBVI

Joelle PineauTractable Planning for Real-World Robotics43 Validation of PBVI’s belief expansion heuristic Greedy PBVI No Belief Random Find-the-patient domain 870 states, 5 actions, 30 observations

Joelle PineauTractable Planning for Real-World Robotics44 Policy assuming full observability You find some: (25%) You loose some: (75%)

Joelle PineauTractable Planning for Real-World Robotics45 PBVI Policy with 3141 belief points You find some: (81%) You loose some: (19%)

Joelle PineauTractable Planning for Real-World Robotics46 PBVI Policy with 643 belief points You find some: (22%) You loose some: (78%)

Joelle PineauTractable Planning for Real-World Robotics47 Highlights of the PBVI algorithm Algorithmic : –New belief sampling algorithm. –Efficient heuristic for belief point selection. –Anytime performance. Experimental : –Outperforms previous value approximation algorithms on known problems. –Solves new larger problem (1 order of magnitude increase in problem size). Theoretical : –Bounded approximation error. [ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ]

Joelle PineauTractable Planning for Real-World Robotics48 Back to the grand challenge How can we go from 10 3 states to real-world problems?

Joelle PineauTractable Planning for Real-World Robotics50 Navigation Structured POMDPs  Many real-world decision-making problems exhibit structure inherent to the problem domain. Cognitive supportSocial interaction High-level controller Move AskWhere LeftRightForwardBackward

Joelle PineauTractable Planning for Real-World Robotics51 Structured POMDP approaches Factored models [Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001] –Idea: Represent state space with multi-valued state features. –Insight: Independencies between state features can be leveraged to overcome the curse of dimensionality. Hierarchical POMDPs [Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol & Mahadevan, 2000; Pineau & Thrun, 2000] –Idea: Exploit domain knowledge to divide one POMDP into many smaller ones. –Insight: Smaller action sets further help overcome the curse of history.

Joelle PineauTractable Planning for Real-World Robotics52 A hierarchy of POMDPs Act ExamineHealth Navigate Move VerifyFluids ClarifyGoal NorthSouthEastWest VerifyMeds subtask abstract action primitive action

Joelle PineauTractable Planning for Real-World Robotics53 PolCA+: Planning with a hierarchy of POMDPs Navigate Move ClarifyGoal SouthEast West North A Move = {N,S,E,W} ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds Step 1: Select the action set

Joelle PineauTractable Planning for Real-World Robotics54 PolCA+: Planning with a hierarchy of POMDPs Navigate Move ClarifyGoal SouthEast West North A Move = {N,S,E,W} S Move = {s 1,s 2 } STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds Step 1: Select the action set Step 2: Minimize the state set

Joelle PineauTractable Planning for Real-World Robotics55 PolCA+: Planning with a hierarchy of POMDPs Navigate Move ClarifyGoal SouthEast West North A Move = {N,S,E,W} S Move = {s 1,s 2 } STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds PARAMETERS {b h,T h,O h,R h } PARAMETERS {b h,T h,O h,R h } Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters

Joelle PineauTractable Planning for Real-World Robotics56 PolCA+: Planning with a hierarchy of POMDPs Navigate Move ClarifyGoal SouthEast West North STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus STATE FEATURES X-position Y-position X-goal Y-goal HealthStatus ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds ACTIONS North South East West ClarifyGoal VerifyFluids VerifyMeds PLAN  h PLAN  h PARAMETERS {b h,T h,O h,R h } PARAMETERS {b h,T h,O h,R h } Step 1: Select the action set Step 2: Minimize the state set Step 3: Choose parameters Step 4: Plan task h A Move = {N,S,E,W} S Move = {s 1,s 2 }

Joelle PineauTractable Planning for Real-World Robotics57 PolCA+ in the Nursebot domain Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments.

Joelle PineauTractable Planning for Real-World Robotics58 Performance measure Hierarchy + Belief Execution Steps Hierarchy + Belief PolCA+

Joelle PineauTractable Planning for Real-World Robotics59 Comparing user performance 0.1 0.18 POMDP Policy No Belief Policy

Joelle PineauTractable Planning for Real-World Robotics60 Visit to the nursing home

Joelle PineauTractable Planning for Real-World Robotics61 Highlights of the PolCA+ algorithm Algorithmic: –New hierarchical approach for POMDP framework. –POMDP-specific state and observation abstraction methods. Experimental: –First instance of POMDP-based high-level robot controller. –Novel application of POMDPs to robust dialogue management. Theoretical: –For special case (fully observable), guarantees recursive optimality. [ Pineau, Gordon & Thrun, UAI 2003. Pineau et al., RAS 2003. Roy, Pineau & Thrun, ACL 2001]

Joelle PineauTractable Planning for Real-World Robotics62 Future work PBVI: How can we handle domains with multi-valued state features? Can we leverage dimensionality reduction? Can we find better ways to pick belief points? PolCA+: Can we automatically learn hierarchies? How can we learn (or do without) pseudo-reward functions?

Joelle PineauTractable Planning for Real-World Robotics63 Questions? Project information: www.cs.cmu.edu/~nursebot Navigation software: www.cs.cmu.edu/~carmen Papers and more: www.cs.cmu.edu/~jpineau Collaborators: Geoffrey Gordon, Judith Matthews, Michael Montemerlo, Martha Pollack, Nicholas Roy, Sebastian Thrunh

Joelle PineauTractable Planning for Real-World Robotics64 Two types of uncertainty Effect→ Stochastic action effects State→ Partial and noisy sensor information

Joelle PineauTractable Planning for Real-World Robotics65 Example: Effect uncertainty Start position Distribution over possible next-step positions Start position Distribution over possible next-step positions Motion action

Joelle PineauTractable Planning for Real-World Robotics66 Two types of uncertainty Effect→ Stochastic action effects State→ Partial and noisy sensor information

Joelle PineauTractable Planning for Real-World Robotics67 Example: State uncertainty Effect → Stochastic action effects State → Partial and noisy sensor information Model → Inaccurate parameterization of the environment Agent → Unknown behaviour of other agents Start position Distribution over possible next-step positions

Joelle PineauTractable Planning for Real-World Robotics68 Validation of PBVI’s belief expansion heuristic No Belief Hallway domain 60 states, 5 actions, 20 observations

Joelle PineauTractable Planning for Real-World Robotics69 State space reduction No hierarchy PolCA+

Joelle PineauTractable Planning for Real-World Robotics70 Future directions Improving POMDP planning –sparser belief space sampling, ordered value updating, dimensionality reduction, continuous / hybrid domains Addressing two more types of uncertainty: 1. Effect 2. State 3. Model 4. Agent Exploring new applications

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University.

Similar presentations

Presentation on theme: "Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University.

Similar presentations

Presentation on theme: "Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback