Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Methods for Planning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig.

Similar presentations


Presentation on theme: "Hierarchical Methods for Planning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig."— Presentation transcript:

1 Hierarchical Methods for Planning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig Boutilier, U. of Toronto

2 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Integrating robots in living environments The robot’s role: - Social interaction - Mobile manipulation - Intelligent reminding - Remote-operation - Data collection / monitoring

3 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty A broad perspective GOAL = Selecting appropriate actions USER + WORLD + ROBOT ACTIONS OBSERVATIONS Belief state STATE

4 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user Why is this a difficult problem? UNCERTAINTY

5 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Cause #1: Non-deterministic effects of actions Cause #2: Partial and noisy sensor information Cause #3: Inaccurate model of the world and the user Why is this a difficult problem? UNCERTAINTY A solution: Partially Observable Markov Decision Processes (POMDPs) S 3 o 1, o 2 S 1 o 1, o 2 S 2 o 1, o 2 a1a1 a2a2

6 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The truth about POMDPs Bad news: –Finding an optimal POMDP action selection policy is computationally intractable for complex problems.

7 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The truth about POMDPs Bad news: –Finding an optimal POMDP action selection policy is computationally intractable for complex problems. Good news: –Many real-world decision-making problems exhibit structure inherent to the problem domain. –By leveraging structure in the problem domain, I propose an algorithm that makes POMDPs tractable, even for large domains.

8 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty How is it done? Use a “Divide-and-conquer” approach: –We decompose a large monolithic problem into a collection of loosely-related smaller problems. Dialogue manager Health manager Social manager Reminding manager

9 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Thesis statement Decision-making under uncertainty can be made tractable for complex problems by exploiting hierarchical structure in the problem domain.

10 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Outline Problem motivation  Partially observable Markov decision processes The hierarchical POMDP algorithm Proposed research

11 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty POMDPs within the family of Markov models Markov Chain Hidden Markov Model (HMM) Markov Decision Process (MDP) Partially Observable MDP (POMDP) Uncertainty in sensor input? no Control problem? yes

12 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty POMDP parameters: Initial belief: b 0 (s)=Pr(s o =s) Observation probabilities: O(s,a,o)=Pr(o|s,a) Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Rewards: R(s,a) HMM What are POMDPs? Components: Set of states: s  S Set of actions: a  A Set of observations: o  O MDP S2 Pr(o1)=0.9 Pr(o2)=0.1 S1 Pr(o1)=0.5 Pr(o2)=0.5 a1a1 a2a2 S3 Pr(o1)=0.2 Pr(o2)=0.8

13 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty A POMDP example: The tiger problem S1 “tiger-left” Pr(o=growl-left)=0.85 Pr(o=growl-right)=0.15 S2 “tiger-right” Pr(o=growl-left)=0.15 Pr(o=growl-right)=0.85 Actions={ listen, open-left, open-right} Reward Function:R(a=listen)= -1 R(a=open-right, s=tiger-left)= 10 R(a=open-left, s=tiger-left)= -100

14 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty What can we do with POMDPs? 1) State tracking: –After an action, what is the state of the world, s t ? 2) Computing a policy: –Which action, a j, should the controller apply next? Very hard! Not so hard. b t-1 ?? a t-1 otot Robot: S t-1 stst World: Control layer:... ??

15 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: State tracking S1 “tiger-left” S2 “tiger-right” Belief vector b0b0 Belief

16 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: State tracking S1 “tiger-left” S2 “tiger-right” Belief vector b0b0 Belief obs=growl-left action=listen

17 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: State tracking b1b1 obs=growl-left S1 “tiger-left” S2 “tiger-right” Belief vector Belief b0b0 action=listen

18 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Policy Optimization Which action, a j, should the controller apply next? –In MDPs: Policy is a mapping from state to action,  : s i  a j –In POMDPs: Policy is a mapping from belief to action,  : b  a j Recursively calculate expected long-term reward for each state/belief: Find the action that maximizes the expected reward:

19 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: Optimal policy Belief vector: open-left open-right listen S1 “tiger-left” S2 “tiger-right” Optimal policy:

20 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Finite-horizon POMDPs are in worse-case doubly exponential: Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard, and may not be decidable (|  n  |  ). Complexity of policy optimization

21 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The essence of the problem How can we find good policies for complex POMDPs? Is there a principled way to provide near-optimal policies in reasonable time?

22 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Outline Problem motivation Partially observable Markov decision processes  The hierarchical POMDP algorithm Proposed research

23 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty A hierarchical approach to POMDP planning Key Idea: Exploit hierarchical structure in the problem domain to break a problem into many “related” POMDPs. What type of structure? Action set partitioning Act InvestigateHealthMove Navigate CheckPulse AskWhere LeftRightForwardBackward CheckMeds subtask abstract action

24 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Assumptions Each POMDP controller has a subset of A o. Each POMDP controller has full state set S 0, observation set O 0. Each controller includes discriminative reward information. We are given the action set partitioning graph. We are given a full POMDP model of the problem: {S o,A o,O o,M o }.

25 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: An action hierarchy P investigate ={S 0, A investigate, O 0, M investigate } A investigate ={listen, open-right} act open-leftinvestigate open-rightlisten

26 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Optimizing the “investigate” controller S1 “tiger-left” S2 “tiger-right” Locally optimal policy: Belief vector: open-right listen

27 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: An action hierarchy P act ={S 0, A act, O 0, M act } A act ={open-left, investigate} act open-leftinvestigate open-rightlisten But... R(s, a=investigate) is not defined!

28 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Modeling abstract actions Insight: Use the local policy of corresponding low-level controller. General form: R( s i, a k ) = R ( s i, Policy(controller k,s i ) ) Example: R(s=tiger-left,a k =investigate) = open-right listen open-left tiger-left tiger-right Policy (investigate,s=tiger-left) = open-right

29 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Optimizing the “act” controller S1 “tiger-left” S2 “tiger-right” Locally optimal policy: investigate Belief vector: open-left

30 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The complete hierarchical policy S1 “tiger-left” S2 “tiger-right” Hierarchical policy: Belief vector: open-left open-right listen

31 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The complete hierarchical policy S1 “tiger-left” S2 “tiger-right” Hierarchical policy: open-left open-right listen Optimal policy: Belief vector:

32 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Results for larger simulation domains

33 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Related work on hierarchical methods Hierarchical HMMs –Fine et al., 1998 Hierarchical MDPs –Dayan&Hinton, 1993; Dietterich, 1998; McGovern et al., 1998; Parr&Russell, 1998; Singh, Loosely-coupled MDPs –Boutilier et al., 1997; Dean&Lin, 1995; Meuleau et al. 1998; Singh&Cohn, 1998; Wang&Mahadevan, Factored state POMDPs –Boutilier et al., 1999; Boutilier&Poole, 1996; Hansen&Feng, Hierarchical POMDPs –Castanon, 1997; Hernandez-Gardiol&Mahadevan, 2001; Theocharous et al., 2001; Wiering&Schmidhuber, 1997.

34 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Outline Problem motivation Partially observable Markov decision processes The hierarchical POMDP algorithm  Proposed research

35 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Proposed research 1) Algorithmic design 2) Algorithmic analysis 3) Model learning 4) System development and application

36 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Research block # 1: Algorithmic design Goal 1.1: Developing/implementing hierarchical POMDP algorithm. Goal 1.2: Extending H-POMDP for factorized state representation. Goal 1.3: Using state/observation abstraction. Goal 1.4: Planning for controllers with no local reward information.

37 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Assumption #2: “Each POMDP controller has full state set S 0, and observation set O 0.” Can we reduce the number of states/observations, |S| and |O|? Goal 1.3: State/observation abstraction

38 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Assumption #2: “Each POMDP controller has full state set S 0, and observation set O 0.” Can we reduce the number of states/observations, |S| and |O|? Yes! Each controller only needs subset of state/observation features. What is the computational speed-up? Goal 1.3: State/observation abstraction Navigate LeftRight ForwardBackward InvestigateHealth CheckPulse CheckMeds POMDP recursiveupper-bound Time complexity:

39 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Goal 1.4: Local controller reward information Assumption #3: “Each controller includes some amount of discriminative reward information.” Can we relax this assumption?

40 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Goal 1.4: Local controller reward information Assumption #3: “Each controller includes some amount of discriminative reward information.” Can we relax this assumption? Possibly. Use reward shaping to select policy-invariant reward function. What is the benefit? –H-POMDP could solve problems with sparse reward functions.

41 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Research block # 2: Algorithmic analysis Goal 2.1: Evaluating performance of the H-POMDP algorithm. Goal 2.2: Quantifying the loss due to the hierarchy. Goal 2.3: Comparing different possible decompositions of a problem.

42 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Goal 2.1: Performance evaluation How does the hierarchical POMDP algorithm compare to: –Exact value function methods »Sondik, 1971; Monahan, 1982; Littman, 1996; Cassandra et al, –Policy search methods »Hansen, 1998; Kearns et al., 1999; Ng&Jordan, 2000; Baxter&Bartlett, –Value approximation methods »Parr&Russell, 1995; Thrun, –Belief approximation methods »Nourbakhsh, 1995; Koenig&Simmons, 1996; Hauskrecht, 2000; Roy&Thrun, –Memory-based methods »McCallum, Consider problems from POMDP literature and dialogue management domain.

43 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Goal 2.2: Quantifying the loss The hierarchical POMDP planning algorithm provides an approximately-optimal policy. How “near-optimal” is the policy? Subject to some (very restrictive) conditions: “The value function of top-level controller is an upper-bound on the value of the approximation.” Can we loosen the restrictions? Tighten the bound? Find a lower-bound? A top A1A1... V top (b)  V actual (b)

44 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Goal 2.3: Comparing different decomposition Assumption #4: “We are given an action set partitioning graph.” What makes a good hierarchical action decomposition? Comparing decompositions is the first step towards automatic decomposition. ManufactureExamineInspect Replace a1a1 a2a2 ManufactureReplaceExamineInspect a1a1 a2a2 a3a3

45 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Research block # 3: Model learning Goal 3.1: Automatically generating good action hierarchies. –Assumption #4: “We are given an action set partitioning graph.” –Can we automatically generate a good hierarchical decomposition? –Maybe. It is being done for hierarchical MDPs. Goal 3.2: Including parameter learning. –Assumption #5: “We are given a full POMDP model of the problem.” –Can we introduce parameter learning? –Yes! Maximum-likelihood parameter optimization (Baum-Welch) can be used for POMDPs.

46 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Touchscreen input Speech utterance Research block # 4: System development and application Goal 4.1: Building an extensive dialogue manager Touchscreen message Speech utterance Dialogue Manager Reminder message Robot sensor readingsMotion command Status information Fac operations Robot module Reminding module Teleoperation module User Remote-control command

47 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty An implemented scenario Physiotherapy Patient room Robot home Problem size: |S|=288, |A|=14, |O|=15 State Features: {RobotLocation, UserLocation, UserStatus, ReminderGoal, UserMotionGoal, UserSpeechGoal} Test subjects: 3 elderly residents in assisted living facility

48 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Contributions Algorithmic contribution: A novel POMDP algorithm based on hierarchical structure.  Enables use of POMDPs for much larger problems. Application contribution: Application of POMDPs to dialogue management is novel.  Allows design of robust robot behavioural managers.

49 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Research schedule 1) Algorithmic design/implementation 2) Algorithmic analysis 3) Model learning 4) System development and application 5) Thesis writing fall 01 spring/summer 02 spring/summer/fall 02 ongoing fall 02 / spring 03

50 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Questions?

51 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty A simulated robot navigation example Domain size: |S|=11, |A|=6, |O|=6 GetReward(t)ReadMap Act Navigate(t ) ReadOpenDoor GoLeftGoRightGoBack GoForward ($$)

52 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty A dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - Call911 - CallNurse - CallRelative - Verify911 - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather MoveGreet Domain size: |S|=20, |A|=30, |O|=27

53 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Action hierarchy for implemented scenario Act RemindAssistRest MoveContactInform BringtoPhysio CheckUserPresent DeliverUser SayWeather VerifyRequest SayTime RemindPhysio PublishStatus RingBell GotoRoom VerifyBring VerifyRelease Recharge GotoHome

54 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Sondik’s parts manufacturing problem ManufactureExamineInspectReplace a1a1 a2a2 a3a3 ManufactureExamineInspect Replace a1a1 a2a2 Decomposition1: Decomposition2: +5 more decompositions

55 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Manufacturing task results

56 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty ReminderGoal={none, medsX} CommunicationGoal={none, personX} UserHealth={good, poor, emergency} Using state/observation abstraction Action Set:State Set: CommunicationGoal={none, nurse, 911, relative} - AskHealth - OfferHelp CheckHealth Phone DoMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative Phone

57 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Related work on robot planning and control Manually-scripted dialogue strategies: –Denecke&Waibel, 1997; Walker et al., Markov decision processes (MDPs) for dialogue management –Levin et al., 1997; Fromer, 1998; Walker et al., 1998; Goddeau&Pineau, 2000; Singh et al., 2000; Walker, Robot interface: –Torrance, 1996; Asoh et al., Classical planning –Fikes&Nilsson, 1971; Simmons, 1987; McAllester&Rosenblitt, 1991; Penberthy&Weld, 1992; Kushmerick, 1995; Veloso&al., 1995; Smith&Weld, Execution architectures –Firby, 1987; Musliner, 1993; Simmons, 1994; Bonasso&Kortenkamp, 1996;

58 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Decision-theoretic planning models

59 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty The tiger problem: Value function solution V belief open-right open-left listen S=tiger-leftS=tiger-right

60 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Optimizing the “investigate” controller V open-right listen belief S=tiger-leftS=tiger-right

61 Joelle PineauThesis Proposal: Hierarchical Methods for Planning under Uncertainty Optimizing the “act” controller V belief open-left investigate S=tiger-leftS=tiger-right


Download ppt "Hierarchical Methods for Planning under Uncertainty Thesis Proposal Joelle Pineau Thesis Committee: Sebastian Thrun, Chair Matthew Mason Andrew Moore Craig."

Similar presentations


Ads by Google