Download presentation

Presentation is loading. Please wait.

Published bySheila Goodfriend Modified about 1 year ago

1
© 2007 SRI International 1 Dan’s Multi-Option Talk Option 1: HUMIDRIDE: Dan’s Trip to the East Coast –Whining: High –Duration: Med –Viruses: Low Option 2: T-Cell: Attacking Dan’s Cold Virus –Whining: Med –Duration: Low –Viruses: High Option 3: Model-Lite Planning: Diverse Multi-Option Plans and Dynamic Objectives –Whining: Low –Duration: High –Viruses: Low

2
© 2007 SRI International Model-Lite Planning: Diverse Multi- Option Plans and Dynamic Objectives Daniel Bryce William Cushing Subbarao Kambhampati

3
© 2007 SRI International 3 Questions When must the plan executor decide on their planning objective? –Before synthesis? Traditional model –Before execution? Similar to IR model: select plan from set of diverse, but relevant plans –During execution? Multi-Option Plans (subsumes previous) –At all? “Keep your options open” Can the executor change their planning objective without replanning? Can the executor start acting without committing to an objective?

4
© 2007 SRI International 4 Overview Diverse Multi-Option Plans –Diversity –Representation –Connection to Conditional Plans –Execution Synthesizing Multi-Option Plans –Example –Speed-ups Analysis –Synthesis –Execution Conclusion

5
© 2007 SRI International 5 Diverse Multi-Option Plans Each plan step presents several diverse choices –Option 1: Train(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.) –Option 1a: Train(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.) –Option 2: Shuttle(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.) –Option2a: Shuttle(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.) Diversity is Reliant on Pareto Optimality –Each option is non-dominated –Diversity through Pareto Front w/ High Spread O1 Duration Cost O2 O2a O1a Fly(BOS,PVD) Car(BOS,Prov.) Train(MP, SFO) Shuttle(MP, SFO) Fly(SFO, BOS) Fly(BOS,PVD) Car(BOS,Prov.) Cab(PVD, Prov.) O2 O2a O1 O1a Diversity

6
© 2007 SRI International 6 Dynamic Objectives Multi-Options Plans are a type of Conditional Plan –Conditional on the user’s Objective Function –Allow the objective Function to change –Ensured that, irrespective of their obj. fn., will have non-dominated options Fly(BOS,PVD) Car(BOS,Prov.) Train(MP, SFO) Shuttle(MP, SFO) Fly(SFO, BOS) Fly(BOS,PVD) Car(BOS,Prov.) Cab(PVD, Prov.) O2 O2a O1 O1a

7
© 2007 SRI International 7 Executing Multi-Option Plans Duration Cost O1 O2 O2a O1a Fly(BOS,PVD) Car(BOS,Prov.) Train(MP, SFO) Shuttle(MP, SFO) Fly(SFO, BOS) Fly(BOS,PVD) Car(BOS,Prov.) Cab(PVD, Prov.) O2 O2a O1 O1a Duration Cost O1 O1a Local action choice corresponds to multiple options Duration Cost O1 Duration Cost O1 O1a Option values Change at each step

8
© 2007 SRI International 8 Multi-Option Conditional Probabilistic Planning (PO)MDP setting: (Belief) State Space Search –Stochastic Actions, Observations, Uncertain Initial State, Loops –Two Objectives: Expected Plan Cost, Probability of Plan Success Traditional Reward functions are linear combination of above. Assume objective fn. Extend LAO* to multiple objectives (Multi-Option LAO*) –Each generated (belief) state has an associated Pareto set of “best” sub-plans –Dynamic programming (state backup) combines successor state Pareto sets Yes, its exponential time per backup per state ♦ There are approximations –Basic Algorithm While not have a good plan ♦ ExpandPlan ♦ RevisePlan S S S

9
© 2007 SRI International 9 Example of State Backup

10
© 2007 SRI International 10 Search Example -- Initially 0.0 C Pr(G) Initialize Root Pareto Set with null plan and heuristic estimate

11
© 2007 SRI International 11 Search Example – 1 st Expansion a1a1 a2a2 0.0 C Pr(G) C C C 0.0 Expand Root Node and Initialize Pareto Sets of Children with null plan And Heuristic Estimate

12
© 2007 SRI International 12 Search Example – 1 st Revision a1a1 a2a2 0.0 C Pr(G) C C C a1 a1 0.0 Recompute Pareto Set For Root, find best heuristic Point is through a 1

13
© 2007 SRI International 13 Search Example – 2 nd Expansion a1a1 a2a2 a3a3 a4a C Pr(G) C C C C 0.0 C Pr(G) a1 a1 Expand Children of a 1 and initialize their Pareto Sets with null plan and Heuristic estimate – Both children Satisfy the Goal with non-zero probability

14
© 2007 SRI International 14 Search Example – 2 nd Revision a1a1 a2a2 a3a3 a4a C Pr(G) C C C C a4 a4 a4a4 a3 a3 a3a3 0.0 C Pr(G) a 1,[a 4 |a 3 ] a 1,[a 4 |a 3 ] Recompute Pareto Set of both expanded nodes and the root node – There is a feasible plan a 1, [a 4, a 3 ] that satisfies the goal with 0.66 probability and cost 2. The heuristic estimate indicates extending a 1, [a 4, a 3 ] will lead to a plan that satisfies the goal with 1.0 probability

15
© 2007 SRI International 15 Search Example – 3 rd Expansion a1a1 a2a2 a3a3 a4a4 a7a C Pr(G) C C C C C C a4a4 a3a3 a 1,[a 4 |a 3 ] 0.0 a 1,[a 4 |a 3 ] Expand Plan to include a 7. There is no applicable action after a 3 a4 a4 a3 a3

16
© 2007 SRI International 16 Search Example – 3 rd Revision a1a1 a2a2 a3a3 a4a4 a7a C Pr(G) C C C C C C , a 7 a7a7 a4, a7a4, a7 a4a4 a3a3 a2 a2 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 Recompute all Pareto Sets that are Ancestors of Expanded Nodes. Heuristic for plans extended through a 3 is higher because of no applicable action. Heuristic at root node changes to plans extended through a 2 a4,a7 a4,a7 a3 a3

17
© 2007 SRI International 17 Search Example – 4 th Expansion a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a C Pr(G) C C C C C C C C a7a7 a4, a7a4, a7 a4a4 a3a3 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 a2 a2 Expand Plan through a 2, one expanding child satisfies the goal with 0.1 probability. , a 7 a4,a7 a4,a7 a3 a3

18
© 2007 SRI International 18 Search Example – 4 th Revision a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a C Pr(G) C C C C C C C C a7a7 a4, a7a4, a7 a4a4 a3a3 a6 a6 a5a5 a 2,a 5 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 a2, a6 a2, a6 Recompute Pareto sets of expanded Ancestors. Plan a 2, a 5 is dominated at the root. a7 a7 a4,a7 a4,a7 a3 a3

19
© 2007 SRI International 19 Search Example – 5 th Expansion a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a C Pr(G) C C C C C C C C C a7a7 a4, a7a4, a7 a4a4 a3a3 a5a5 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 a2, a6 a2, a6 Expand Plan through a 6 a7 a7 a4,a7 a4,a7 a3 a3 a6 a6

20
© 2007 SRI International 20 Search Example – 5 th Revision a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a C Pr(G) C C C C C C C C C a7a7 a4, a7a4, a7 a4a4 a3a3 a8 a8 a8a8 a 6, a 8 a5a5 a 2,a 6,a 8 a 2,a 5 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 Recompute Pareto Sets. Plans a 2, a 6, a 8, and a 2, a 5 are dominated at root. a7 a7 a4,a7 a4,a7 a3 a3 a 6, a 8 a 2, a 6, a 8

21
© 2007 SRI International 21 Search Example – Final a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a C Pr(G) C C C C C C C C C a7a7 a4, a7a4, a7 a4a4 a3a3 a8a8 a 6, a 8 a5a5 a 1,[a 4, a 7 |a 3 ] a 1,[a 4 |a 3 ] 0.0 a7 a7 a4,a7 a4,a7 a3 a3 a8 a8 a 6, a 8 a 2, a 6, a 8

22
© 2007 SRI International 22 Speed-ups -domination [Papadimtriou & Yannakakis, 2003] Randomized Node Expansions –Simulate Partial Plan to Expand a single node Reachability Heuristics –Use the McLUG (CSSAG)

23
© 2007 SRI International 23 domination x x’ x’/x = 1+ Cost 1-Pr(G) Multiply Each Objective By (1+ ) Check Domination Dominated Non-Dominated Each Hyper-Rectangle Has a single point

24
© 2007 SRI International 24 Synthesis Results

25
© 2007 SRI International 25 Execution Results Random Option: Sample Option, execute action Keep Options Open –Most Options: Execute action in most options –Diverse Options: Execute action in most diverse set of options

26
© 2007 SRI International 26 Summary & Future Work Summary –Multi-Option Plans let executor delay/change commitments to objective functions –Multi-Option Plans help executor understand alternatives –Multi-Option Plans passively enforce diversity through Pareto set approximation Future Work –Synthesis Proactive Diversity: Guide search to broaden Pareto set Speedups: Alternative Pareto set representation, standard MDP tricks –Execution Option Lookahead: how will set of options change? Meta-Objectives: Diversity, Decision Delay –Model-Lite Planning Unspecified objectives (not just unspecified objective function) Objective Function preference elicitation

27
© 2007 SRI International 27 Final Options Option 1: Questions Option 2: Criticisms Option 3: Next Talk!

28
© 2007 SRI International 28 Overview Traditional Planning assumes the objective is given a priori –Con: Users must know exactly what they want –Pro: Can synthesize on the fly Information Retrieval (IR) assumes user’s keywords constrain the objective –Con: Relies on existing term frequency index, and more… –Pro: Deals with human imprecision Want planners to: –Handle underspecified (model-lite) problems, like IR –Generate diverse, but relevant plans –The objective can be one of many underspecified aspects (actions, state, etc.) Just care about objective here Its not so easy: –Users can change/refine objective continually Same in IR: ♦ “decision making” ♦ “multi criteria decision making” ♦ “multi criteria decision making tutorial” ♦ “Providence weather” Solution at a Glance: –If objective is undefined, get a Pareto set –If need to start executing, keep options open –If objective is known, follow best fitting plan in Pareto set –If objective changes, follow best fitting plan in Pareto set The Trick: –Capture structure in the Pareto set The Case Study: –Conditional Probabilistic Planning Objective = f(Plan Cost, Plan Success)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google