Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle

Motivation Three features of real world planning domains : Durative actions All actions (navigation between sites, placing instruments etc.) take time. Concurrency Some instruments may warm up Others may perform their tasks Others may shutdown to save power. Uncertainty All actions (pick up the rock, send data etc.) have a probability of failure.

Motivation (contd.) Concurrent Temporal Planning (widely studied with deterministic effects) Extends classical planning Doesn’t easily extend to probabilistic outcomes. Concurrent planning with uncertainty (Concurrent MDPs – AAAI’04) Handle combinations of actions over an MDP Actions take unit time. Few planners handle the three in concert!

Outline of the talk MDP and CoMDP Concurrent Probabilistic Temporal Planning Concurrent MDP in augmented state space. Solution Methods for CPTP Two heuristics to guide the search Hybridisation Experiments & Conclusions Related & Future Work

Markov Decision Process S : a set of states, factored into Boolean variables. A : a set of actions P r ( S£A£S! [0,1]): the transition model C ( A! R ) : the cost model s 0 : the start state G : a set of absorbing goals unit duration

GOAL of an MDP Find a policy ( S ! A ) which: minimises expected cost of reaching a goal for a fully observable Markov decision process if the agent executes for indefinite horizon.

Equations : optimal policy Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s. J* should satisfy:

Bellman Backup Given an estimate of J* function (say J n ) Backup J n function at state s to calculate a new estimate (J n+1 ) as follows Value Iteration Perform Bellman updates at all states in each iteration. Stop when costs have converged at all states.

Min Bellman Backup a1a1 a2a2 a3a3 s JnJn JnJn JnJn JnJn JnJn JnJn JnJn Q n+1 (s,a) J n+1 (s) Ap(s) min

Min RTDP Trial a1a1 a2a2 a3a3 JnJn JnJn JnJn JnJn JnJn JnJn JnJn Q n+1 (s,a) J n+1 (s) Ap(s) a min = a 2 Goal s min

Real Time Dynamic Programming (Barto, Bradtke and Singh’95) Trial : Simulate greedy policy; Perform Bellman backup on visited states Repeat RTDP Trials until cost function converges Anytime behaviour Only expands reachable state space Complete convergence is slow Labeled RTDP (Bonet & Geffner’03) Admissible, if started with admissible cost function. Monotonic; converges quickly optimistic Lower bound

Concurrent MDP (CoMDP) (Mausam & Weld’04) Allows concurrent combinations of actions Safe execution: Inherit mutex definitions from classical planning: Conflicting preconditions Conflicting effects Interfering preconditions and effects

JnJn JnJn JnJn JnJn JnJn Bellman Backup (CoMDP) a2a2 a 1,a 2 a3a3 s J n+1 (s) Ap(s) a1a1 a 1,a 3 a 2,a 3 a 1,a 2,a 3 JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn JnJn Exponential blowup to calculate a Bellman Backup! min

Sampled RTDP RTDP with Stochastic (partial) backups: Approximate Always try the last best combination Randomly sample a few other combinations In practice Close to optimal solutions Converges very fast

Modelling CPTP as CoMDP CoMDP CPTP Model explicit action durations Minimise expected make-span. If we initialise C (a) as its duration –  (a) : Aligned epochs Interwoven epochs

Augmented state space 0 3 6 9 X a b c e d f h g X 1 : Application of b on X. X 2 : Application of a, b, c, d and e over X. Time

Simplifying assumptions All actions have deterministic durations. All action durations are integers. Action model Effects of an action are realised at some unknown point during action execution, and thus can be used only once the action has completed. Preconditions must hold at the beginning of an action. Preconditions and the features on which the action’s transition function is conditioned must remain unchanged while the action is being executed, unless the action itself is modifying them.

Simplifying assumptions All actions have deterministic durations. All action durations are integers. Action model Preconditions must hold until end of action. Effects are usable only at the end of action. Properties : Mutex rules are still required. Sufficient to consider only epochs when an action ends

Completing the CoMDP Redefine Applicability set Transition function Start and goal states. Example: Transition function is redefined Agent moves forward in time to an epoch where some action completes. Start state : etc.

Solution CPTP = CoMDP in interwoven state space. Thus one may use our sampled RTDP (etc) PROBLEM: Exponential blowup in the size of the state space.

Outline of the talk MDP and CoMDP Concurrent Probabilistic Temporal Planning Concurrent MDP in augmented state space. Solution Methods for CPTP Solution 1 : Two heuristics to guide the search Solution 2 : Hybridisation Experiments & Conclusions Related & Future Work

Max Concurrency Heuristic (MC) Define c : maximum number of actions executable concurrently in the domain. J*(X) · 2 £ J  *( ) J  *( ) ¸ J*(X)/2 a bc J  *( ) = 10 XG abc J*(X) · 20 XG Serialisation Admissible Heuristic

Eager Effects Heuristic : Solving a relaxed problem S : S £ Z Let (X  be a state where X is the world state.  : time remaining for all actions (started anytime in the history) to complete execution. Start state : (s 0,0) Goal states : { (X,0) | X 2G }

Eager Effects Heuristic (contd.) After 2 units (V,6) a b X 2 8 V c 4 Allow all actions even when mutex with a or c! Allowing inapplicable actions to execute, thus optimistic! Assuming information of action effects ahead of time, thus optimisitic! Hence the name – Eager Effects! Admissible Heuristic

Solution2 : Hybridisation Observations Aligned epoch policy is sub-optimal but fast to compute. Interwoven epoch policy is optimal but slow to compute. Solution: Produce a hybrid policy i.e. : Output interwoven policy for probable states. Output aligned policy for improbable states.

Path to goals sG G Low Prob.

Hybrid algorithm (contd.) Observation: RTDP explores probable branches much more than others. Algorithm(m,k,r) : Loop Do m RTDP trials: let current value of start state be J(s 0 ). Output a hybrid policy (  ) Interwoven policy for states visited > k times Aligned policy for other states. Evaluate policy  : J  (s 0 ) Stop if {J  (s 0 ) – J(s 0 )} < rJ(s 0 ) Less than optimal Greater than optimal

Hybridisation Outputs a proper policy : Policy defined at all reachable policy states Policy guaranteed to take agent to goal. Has an optimality ratio (r) parameter Controls balance between optimality & running times. Can be used as an anytime algorithm. Is general – we can hybridise two algorithms in other cases e.g. in solving original concurrent MDP.

Experiments Domains Rover MachineShop Artificial State Variables: 14-26 Durations: 1-20

Speedups in Rover domain

Qualities of solution

Experiments : Summary Max Concurrency heuristic Fast to compute Speeds up the search. Eager Effects heuristic High quality Can be expensive in some domains. Hybrid algorithm Very fast Produces good quality solutions. Aligned epoch model Superfast Outputs poor quality solutions at times.

Related Work Prottle (Little, Aberdeen, Thiebaux’05) Generate, test and debug paradigm (Younes & Simmons’04) Concurrent options (Rohanimanesh & Mahadevan’04)

Future Work Other applications of hybridisation CoMDP MDP OverSubscription Planning Relaxing the assumptions Handling mixed costs Extending to PDDL2.1 Stochastic action durations Extensions to metric resources State space compression/aggregation

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Similar presentations

Presentation on theme: "Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle.

Similar presentations

Presentation on theme: "Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle."— Presentation transcript:

Similar presentations

About project

Feedback