A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento

Planning under Uncertainty (ICAPS’03 Workshop)  Qualitative (disjunctive) uncertainty  Which real problem can you solve?  Quantitative (probabilistic) uncertainty  Which real problem can you model?

The Quantitative View  Markov Decision Process  models uncertainty with probabilistic outcomes  general decision-theoretic framework  algorithms are slow  do we need the full power of decision theory?  is an unconverged partial policy any good?

The Qualitative View  Conditional Planning  Model uncertainty as logical disjunction of outcomes  exploits classical planning techniques  FAST  ignores probabilities  poor solutions  how bad are pure qualitative solutions?  can we improve the qualitative policies?

HybPlan: A Hybridized Planner  combine probabilistic + disjunctive planners  produces good solutions in intermediate times  anytime: makes effective use of resources  bounds termination with quality guarantee  Quantitative View  completes partial probabilistic policy by using qualitative policies in some states  Qualitative View  improves qualitative policies in more important regions

Outline  Motivation  Planning with Probabilistic Uncertainty (RTDP)  Planning with Disjunctive Uncertainty (MBP)  Hybridizing RTDP and MBP (HybPlan)  Experiments  Conclusions and Future Work

Markov Decision Process S : a set of states A : a set of actions Pr : prob. transition model C : cost model s 0 : start state G : a set of goals Find a policy (S ! A)  minimizes expected cost to reach a goal  for an indefinite horizon  for a fully observable  Markov decision process. Optimal cost function, J*, ~ optimal policy

s0s0 Goal 2 2 Example Longer path Wrong direction, but goal still reachable All states are dead-ends

4 1 3 8 4 4 1 8 32 2 77 33 1 1 1 2 6 0 1 1 2 2 Goal Optimal State Costs

4 3 32 2 0 1 1 Goal Optimal Policy

Bellman Backup: Create better approximation to cost function @ s

Bellman Backup: Create better approximation to cost function @ s Trial=simulate greedy policy & update visited states

Bellman Backup: Create better approximation to cost function @ s Real Time Dynamic Programming (Barto et al. ’95; Bonet & Geffner’03) Repeat trials until cost function converges Trial=simulate greedy policy & update visited states

Planning with Disjunctive Uncertainty  S : a set of states A : a set of actions T : disjunctive transition model s 0 : the start state G : a set of goals  Find a strong-cyclic policy (S ! A)  that guarantees reaching a goal  for an indefinite horizon  for a fully observable  planning problem

Model Based Planner (Bertoli et. al.)  States, transitions, etc. represented logically  Uncertainty  multiple possible successor states  Planning Algorithm  Iteratively removes “bad” states.  Bad = don’t reach anywhere or reach other bad states

Goal MBP Policy Sub-optimal solution

Outline  Motivation  Planning with Probabilistic Uncertainty (RTDP)  Planning with Disjunctive Uncertainty (MBP)  Hybridizing RTDP and MBP (HybPlan)  Experiments  Conclusions and Future Work

HybPlan Top Level Code 0. run MBP to find a solution to goal 1.run RTDP for some time 2.compute partial greedy policy (  rtdp ) 3.compute hybridized policy (  hyb ) by 1.  hyb (s) =  rtdp (s) if visited(s) > threshold   hyb (s) =  mbp (s) otherwise 4.clean  hyb by removing 1. dead-ends 2. probability 1 cycles 5.evaluate  hyb 6.save best policy obtained so far repeat until 1) resources exhaust or 2)a satisfactory policy found

0 0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 First RTDP Trial 1.run RTDP for some time

0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Q 1 (s,N) = 1 + 0.5£ 0 + 0.5£ 0 Q 1 (s,N) = 1 Q 1 (s,S) = Q 1 (s,W) = Q 1 (s,E) = 1 J 1 (s) = 1 Let greedy action be North Bellman Backup 1.run RTDP for some time

1 0 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Simulation of Greedy Action 1.run RTDP for some time

1 0 0 0 0 0 0 00 0 00 00 0 0 0 0 0 Goal 0 0 2 2 Continuing First Trial 1.run RTDP for some time

0 0 0 0 1 0 0 00 0 00 0 0 0 0 0 0 Goal 0 0 2 2 1 Continuing First Trial 1.run RTDP for some time

0 0 0 0 0 0 00 0 00 10 0 0 0 0 Goal 0 0 2 2 1 1 Finishing First Trial 1.run RTDP for some time

0 0 0 0 0 0 00 0 00 0 0 0 0 2 0 Goal 0 0 2 2 1 1 1 Cost Function after First Trial 1.run RTDP for some time

0 2 Goal 2 1 1 1 Partial Greedy Policy 2. compute greedy policy (  rtdp )

0 2 Goal 2 1 1 1 0 Construct Hybridized Policy w/ MBP 3. compute hybridized policy (  hyb ) (threshold = 0)

0 2 Goal 2 1 1 1 After first trial 0 J(  hyb ) = 5 5 4 3 2 4 3 Evaluate Hybridized Policy 5. evaluate  hyb 6. store  hyb

0 0 0 0 1 0 00 0 00 0 0 1 2 2 0 Goal 0 0 2 2 1 1 1 Second Trial

0 11 21 Partial Greedy Policy

0 11 21 Absence of MBP Policy MBP Policy doesn’t exist! no path to goal £ 0 1 01 2 Goal 2

0 0 1 0 1 0 00 0 01 0 0 1 2 3 0 0 2 2 1 1 1 Third Trial 2

1 0 1 3 2 1 Partial Greedy Policy

1 0 1 3 2 1 Probability 1 Cycles 0 repeat find a state s in cycle  hyb (s) =  mbp (s) until cycle is broken

1 0 1 3 2 1 Probability 1 Cycles 0 0 1 01 2 Goal 2 repeat find a state s in cycle  hyb (s) =  mbp (s) until cycle is broken

0 2 Goal 2 1 1 1 After 1 st trial 0 J(  hyb ) = 5 5 4 3 2 4 3 Error Bound J*(s 0 ) · 5 J*(s 0 ) ¸ 1 ) Error(  hyb ) = 5-1 = 4

Termination  when a policy of required error bound is found  when the planning time exhausts  when the available memory exhausts Properties  outputs a proper policy  anytime algorithm (once MBP terminates)  HybPlan = RTDP, if infinite resources available  HybPlan = MBP, if extremely limited resources  HybPlan = better than both, otherwise

Outline  Motivation  Planning with Probabilistic Uncertainty (RTDP)  Planning with Disjunctive Uncertainty (MBP)  Hybridizing RTDP and MBP (HybPlan)  Experiments  Anytime Properties  Scalability  Conclusions and Future Work

Domains NASA Rover Domain Factory Domain Elevator domain

Anytime Properties RTDP

Scalability ProblemsTime before memory exhausts J(  rtdp )J(  mbp )J(  hyb ) Rov5~1100 sec55.3667.0448.16 Rov2~800 sec 1 65.2249.91 Mach9~1500 sec143.9566.5048.49 Mach6~300 sec 1 71.56 Elev14~10000 sec 1 46.4944.48 Elev15~10000 sec 1 233.0787.46

Conclusions  First algorithm that integrates disjunctive and probabilistic planners.  Experiments show that HybPlan is  anytime  scales better than RTDP  produces better quality solutions than MBP  can interleaved planning and execution

Hybridized Planning: A General Notion  Hybridize other pairs of planners  an optimal or close-to-optimal planner  a sub-optimal but fast planner to yield a planner that produces  a good quality solution in intermediate running times  Examples  POMDP : RTDP/PBVI with POND/MBP/BBSP  Oversubscription Planning : A* with greedy solutions  Concurrent MDP : Sampled RTDP with single-action RTDP

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.

Similar presentations

Presentation on theme: "A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento.

Similar presentations

Presentation on theme: "A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento."— Presentation transcript:

Similar presentations

About project

Feedback