Presentation on theme: "Planning Chapter 11 Yet another popular formulation for AI – Logic-based language – One of the most structured formulations Can be translate into less."— Presentation transcript:
Planning Chapter 11 Yet another popular formulation for AI – Logic-based language – One of the most structured formulations Can be translate into less structured formulations such as state-space, CSP, SAT, etc.
What is Planning Generate sequences of actions to perform tasks and achieve objectives. – States, actions and goals Objective: John wants to go to NYC Plan: taxi(John,home,STL) board(John,plane) fly(plane,STL,JFK) exit(John,plane) taxi(John,JFK,5 th Ave) Classical planning environment: fully observable, deterministic, finite, and discrete.
Difficulty of real world problems What’s wrong with state-space search, e.g. A*? – Which actions are relevant? Too many irrelevant actions Board(Mary,plane), board(Tim,plane), taxi(John, home, six flags), fly(plane, LAX, SFO), …. – What is a good heuristic functions? It is difficult to come up with domain-independent heuristics if only a state-space description is given – How to decompose the problem? Most real-world problems are nearly decomposable. E.g. John, Mary, and Tim want to go to different cities
Planning language What is a good language? – Expressive enough to describe a wide variety of problems. – Restrictive enough to allow efficient algorithms to operate on it. – Planning algorithm should be able to take advantage of the logical structure of the problem. STRIPS
General STRIPS features Representation of states – Represent the state of the world as a conjunction of positive literals. Propositional literals: Poor Unknown First order predicates: At(Plane1, Melbourne) At(Plane2, Sydney) – Closed world assumption (anything not explicitly claimed true is assumed false) Representation of goals – Partially specified state and represented as a conjunction of positive ground literals At(John, NYC) At(Mary, LA) – A goal is satisfied if the state contains all literals in goal
General STRIPS features Representations of actions – Action = PRECOND + EFFECT Action(Fly(p,from, to), PRECOND: At(p,from) Plane(p) Airport(from) Airport(to) EFFECT: ¬AT(p,from) At(p,to)) (p, from, to) need to be instantiated e.g. Fly(plane, STL, JFK) – Add-list vs delete-list in Effect
Language semantics? How do actions affect states? – An action is applicable in any state that satisfies the preconditions Suppose current state: At(P1,JFK) At(P2,SFO) Plane(P1) Plane(P2) Airport(JFK) Airport(SFO) Satisfies PRECOD: At(p,from) Plane(p) Airport(from) Airport(to) Thus the action fly(p1,JFK, SFO) is applicable.
Language semantics? The result of executing action a in state s is the state s’ – s’ is same as s except Any positive literal P in the effect of a is added to s’ Any negative literal ¬P is removed from s’ EFFECT: ¬AT(p,from) At(p,to): At(P1,SFO) At(P2,SFO) Plane(P1) Plane(P2) Airport(JFK) Airport(SFO) – STRIPS assumption: every literal NOT in the effect remains unchanged
Expressiveness and extensions STRIPS is simplified – Important limit: function-free literals Allows for propositional representation Function symbols lead to infinitely many states and actions Extension: Action Description language (ADL) Action(Fly(p:Plane, from: Airport, to: Airport), PRECOND: At(p,from) (from to) EFFECT: ¬At(p,from) At(p,to)) Standardization : Planning domain definition language (PDDL)
State-Space vs. Planning Formalisms State-Space FormalismSTRIPS Planning Atomic objectState, actionpropositional object StateAtomicConjunction of predicates ActionImplicitly defined by (state, state) transitions Explicitly defined by preconditions & effects Initial stateAtomicConjunction of predicates (full) Goal stateGoal testConjunction of predicates (partial) ExpressivenessHigherLower HeuristicDomain dependentDomain independent
Solving STRIPS planning First of all, look at the goals – Blocksworld: Goals are on(A,B) and on(B,C) Can be do linear planning? – Achieve goal1 – Achieve goal2 while keeping goal1 – Achieve goal3 while keeping goal1, goal2 ….. -Achieve goal-n while keeping goals 1 to n-1 -If so, a large planning task can be decomposed
The Sussman Anomaly
The Rocket Problem (Veloso) Objects: n boxes, Positions (Earth, Moon), one Rocket Actions: load/unload a box, move the Rocket (oneway: only from earth to moon, no way back!) Linear planning: to reach the goal, that Box1 is on the Moon, load it, shoot the Rocket, unload it, now no other Box can be transported!
Interleaving of Goals Linear planning is generally not viable Non-linear planning allows that a sequence of planning steps dealing with one goal is interrupted to deal with another goal. For the Sussman Anomaly, that means that after block C is put on the table pursuing goal on(A, B), the planner switches to the goal on(B, C). Non-linear planning corresponds to dealing with goals organized in a set.
Planning by state-space search Both forward and backward search possible – Using a state-space formulation for the planning problem Progression planners – forward state-space search – Consider the effect of all possible actions in a given state Regression planners – backward state-space search – To achieve a goal, what must have been true in the previous state.
Progression and regression
Progression algorithm Formulation as state-space search problem: – Initial state = initial state of the planning problem Literals not appearing are false – Actions = those whose preconditions are satisfied Add positive effects, delete negative – Goal test = does the state satisfy the goal – Step cost = each action costs 1 Any graph search that is complete is a complete planning algorithm. – E.g. A* Inefficient: – (1) irrelevant action problem – (2) good heuristic required for efficient search
Regression algorithm How to determine predecessors? – What are the states from which applying a given action leads to the goal? Goal state = At(C1, B) At(C2, B) … At(C20, B) Relevant action for first conjunct: Unload(C1,p,B) Works only if pre-conditions are satisfied. Previous state= In(C1, p) At(p, B) At(C2, B) … At(C20, B) Subgoal At(C1,B) should not be present in this state. Actions must not undo desired literals (consistent) Load(C2,p,B) is not consistent Main advantage: only relevant actions are considered. – Often much lower branching factor than forward search.
Regression algorithm General process for predecessor construction – Give a goal description G – Let A be an action that is relevant and consistent – The predecessors is as follows: Any positive effects of A that appear in G are deleted. Each precondition literal of A is added, unless it already appears. Any standard search algorithm can be added to perform the search. Termination when predecessor satisfied by initial state.
Heuristics for state-space search Neither progression or regression are very efficient without a good heuristic. – How many actions are needed to achieve the goal? – Exact solution is NP hard, find a good estimate – Could use the optimal solution to the relaxed problem. Remove all preconditions from actions Remove all delete effects from actions
Planning graphs Blum & Furst (1996) – A significant breakthrough in planning research A structure that gives rich information of the planning domain Can be used to 1) directly extract solutions, or 2) achieve better heuristic estimates.
Planning graphs Consists of a sequence of levels that correspond to time steps in the plan. – Level 0 is the initial state. – Each level consists of a set of literals and a set of actions. Literals = all those that could be true at that time step, depending upon the actions executed at the preceding time step Actions = all those actions that could have their preconditions satisfied at that time step, depending on which of the literals actually hold
Planning graphs “Could”? – Records only a restricted subset of possible negative interactions among actions. Example: Init(Have(Cake)) Goal(Have(Cake) Eaten(Cake)) Action(Eat(Cake), PRECOND: Have(Cake) EFFECT: ¬Have(Cake) Eaten(Cake)) Action(Bake(Cake), PRECOND: ¬ Have(Cake) EFFECT: Have(Cake))
Cake example Start at level S0 and determine action level A0 and next level S1. – A0: all actions whose preconditions are satisfied in the previous level. – Connect precond and effect of actions S0 --> S1 – Inaction is represented by persistence actions. Level A0 contains the actions that could occur – Conflicts between actions are represented by mutex links
Cake example Level S1 contains all literals that could result from picking any subset of actions in A0 – Conflicts between literals that can not occur together (as a consequence of the selection action) are represented by mutex links. – S1 defines multiple states and the mutex links are the constraints that define this set of states. Continue until two consecutive levels are identical: leveled off
Cake example A mutex relation holds between two actions when: – Inconsistent effects: one action negates the effect of another. – Interference: one of the effects of one action is the negation of a precondition of the other. – Competing needs: one of the preconditions of one action is mutually exclusive with the precondition of the other. A mutex relation holds between two literals when (inconsistent support): – If one is the negation of the other OR – if each possible action pair that could achieve the literals is mutex.
Mutex Mutex in Graphplan is not complete (two actions that not mutex may not always be compatible) – But useful Original Mutex is level-dependent – But expensive to compute Used by GRAPHPLAN – Level independent mutex (called persistent mutex) is a subset of mutex but efficient to compute Used by some other planners such as LPG, SATPLAN
Another GRAPHPLAN example PG are monotonically increasing or decreasing: – Literals increase monotonically – Actions increase monotonically – Mutexes decrease monotonically Because of these properties and because there is a finite number of actions and literals, every PG will eventually level off !
The GRAPHPLAN Algorithm How to extract a solution directly from the PG function GRAPHPLAN(problem) return solution or failure graph INITIAL-PLANNING-GRAPH(problem) goals GOALS[problem] loop do if goals all non-mutex in last level of graph then do solution EXTRACT-SOLUTION(graph, goals, LENGTH(graph)) if solution failure then return solution else if NO-SOLUTION-POSSIBLE(graph) then return failure graph EXPAND-GRAPH(graph, problem)
GRAPHPLAN example Initially the plan consist of 5 literals from the initial state and the CWA literals (S0). Add actions whose preconditions are satisfied by EXPAND-GRAPH (A0) Also add persistence actions and mutex relations. Add the effects at level S1 Repeat until goal is in level Si
GRAPHPLAN example EXPAND-GRAPH also looks for mutex relations – Inconsistent effects E.g. Remove(Spare, Trunk) and LeaveOverNight due to At(Spare,Ground) and not At(Spare, Ground) – Interference E.g. Remove(Flat, Axle) and LeaveOverNight At(Flat, Axle) as PRECOND and not At(Flat,Axle) as EFFECT – Competing needs E.g. PutOn(Spare,Axle) and Remove(Flat, Axle) due to At(Flat.Axle) and not At(Flat, Axle) – Inconsistent support E.g. in S2, At(Spare,Axle) and At(Flat,Axle)
GRAPHPLAN example In S2, the goal literals exist and are not mutex with any other – Solution might exist and EXTRACT-SOLUTION will try to find it EXTRACT-SOLUTION : – Initial state = last level of PG and goal goals of planning problem – Actions = select any set of non-conflicting actions that cover the goals in the state – Goal = reach level S0 such that all goals are satisfied – Cost = 1 for each action.
GRAPHPLAN example Termination? YES PG are monotonically increasing or decreasing: – Literals increase monotonically – Actions increase monotonically – Mutexes decrease monotonically Because of these properties and because there is a finite number of actions and literals, every PG will eventually level off !
PG and heuristic estimation PG’s provide information about the problem – A literal that does not appear in the final level of the graph cannot be achieved by any plan. Useful for backward search (cost = inf). – Level of appearance can be used as cost estimate of achieving any goal literals = level cost. – Small problem: several actions can occur Restrict to one action using serial PG (add mutex links between every pair of actions, except persistence actions). – Cost of a conjunction of goals? Max-level, sum-level, set-level heuristics – The Fast Forward heuristic (Hoffmann’00) Extract a plan from a relaxed planning graph (RPG) Not admissible but performs well
June 7th, 2006 ICAPS'06 Tutorial T6 46 Rover Domain
June 7th, 2006 ICAPS'06 Tutorial T6 47 Level Based Heuristics The distance of a proposition is the index of the first proposition layer in which it appears – Proposition distance changes when we propagate cost functions – described later What is the distance of a Set of propositions?? – Set-Level: Index of first proposition layer where all goal propositions appear Admissible Gets better with mutexes, otherwise same as max – Max: Maximum distance proposition – Sum: Summation of proposition distances
June 7th, 2006 ICAPS'06 Tutorial T6 48 Example of Level Based Heuristics set-level(s I, G) = 3 max(s I, G) = max(2, 3, 3) = 3 sum(s I, G) = = 8
June 7th, 2006 ICAPS'06 Tutorial T6 49 How do Level-Based Heuristics Break? g A p1p1 p2p2 p3p3 p 99 p 100 p1p1 p2p2 p3p3 p 99 p 100 B1 q B2 B3 B99 B100 qq B1 B2 B3 B99 B100 A1A1 P1P1 P2P2 A0A0 P0P0 The goal g is reached at level 2, but requires 101 actions to support it.
June 7th, 2006 ICAPS'06 Tutorial T6 50 Relaxed Plan Heuristics When Level does not reflect distance well, we can find a relaxed plan. A relaxed plan is subgraph of the planning graph, where: – Every goal proposition is in the relaxed plan at the level where it first appears – Every proposition in the relaxed plan has a supporting action in the relaxed plan – Every action in the relaxed plan has its preconditions supported. Relaxed Plans are not admissible, but are generally effective. Finding the optimal relaxed plan is NP-hard, but finding a greedy one is easy.
June 7th, 2006 ICAPS'06 Tutorial T6 51 Example of Relaxed Plan Heuristic Count Actions RP(s I, G) = 8 Identify Goal Propositions Support Goal Propositions Individually
June 7th, 2006 ICAPS'06 Tutorial T6 52 Results Relaxed Plan Level-Based
June 7th, 2006 ICAPS'06 Tutorial T6 53 Results (cont’d) Relaxed Plan Level-Based
June 7th, 2006 ICAPS'06 Tutorial T6 54 Rover Cost Model
June 7th, 2006 ICAPS'06 Tutorial T6 57 Terminating Cost Propagation Stop when: – goals are reached (no-lookahead) – costs stop changing (∞-lookahead) – k levels after goals are reached (k-lookahead)
June 7th, 2006 ICAPS'06 Tutorial T6 58 Guiding Relaxed Plans with Costs Start Extract at first level goal proposition is cheapest
Planning with propositional logic Convert STRIPS encoding to SAT Here: test the satisfiability of a logical sentence: Sentence contains propositions for every action occurrence. – If the planning is unsolvable the sentence will be unsatisfiable. Benefit: take advantage of SAT solvers, e.g. DPLL algorithms, which can be pretty fast Limitation: expressiveness
SATPLAN algorithm function SATPLAN(problem, T max ) return solution or failure inputs: problem, a planning problem T max, an upper limit to the plan length for T= 0 to T max do cnf, mapping TRANSLATE-TO_SAT(problem, T) assignment SAT-SOLVER(cnf) if assignment is not null then return EXTRACT-SOLUTION(assignment, mapping) return failure Overall idea: try T=0,1,2, 3, …, until the first T that gives a solution
cnf, mapping TRANSLATE-TO_SAT(problem, T) Distinct propositions for assertions about each time step. – Superscripts denote the time step At(P1,SFO) 0 At(P2,JFK) 0 – specify which propositions are not true ¬At(P1,SFO) 0 ¬At(P2,JFK) 0 – Unknown propositions are left unspecified. The goal is associated with a particular time-step – But which one?
cnf, mapping TRANSLATE-TO_SAT(problem, T) How to determine the time step where the goal will be reached? – Start at T=0 Assert At(P1,JFK) 0 At(P2,SFO) 0 – Failure.. Try T=1 Assert At(P1,JFK) 1 At(P2,SFO) 1 – … – Repeat this until some minimal path length is reached. – Termination is ensured by T max
cnf, mapping TRANSLATE-TO_SAT(problem, T) How to encode actions into PL? – Propositional versions of successor-state axioms At(P1,JFK) 1 (At(P1,JFK) 0 ¬(Fly(P1,JFK,SFO) 0 ) (Fly(P1,SFO,JFK) 0 At(P1,SFO) 0 ) – Such an axiom is required for each plane, airport and time step Once all these axioms are in place, the satisfiability algorithm can start to find a plan.
assignment SAT-SOLVER(cnf) Multiple models can be found Are they all good?
assignment SAT-SOLVER(cnf) Multiple models can be found They are NOT satisfactory: (for T=1) Fly(P1,SFO,JFK) 0 Fly(P1,JFK,SFO) 0 Fly(P2,JFK,SFO) 0 The second action is infeasible Yet the plan IS a model of the sentence Avoiding illegal actions: pre-condition axioms Fly(P1,SFO,JFK) 0 At(P1,JFK) 0 Exactly one model now satisfies all the axioms where the goal is achieved at T=1.
assignment SAT-SOLVER(cnf) What if there is an LAX in the world?
assignment SAT-SOLVER(cnf) What if there is an LAX in the world? A plane can fly to two destinations at once They are NOT satisfactory: (for T=1) Fly(P1,SFO,JFK) 0 Fly(P2,JFK,SFO) 0 Fly(P2,JFK,LAX) 0 The second action is infeasible Avoid spurious solutions: action mutual exclusion axioms ¬( Fly(P2,JFK,SFO) 0 Fly(P2,JFK,LAX))
SATPLAN Convert STRIPS to planning graph Convert planning graph to SAT Hybrid encoding – Literals for both actions and facts Action-based encoding: – Literals for actions only – Preconditions are implied by action relations e.g. a t f t and f t b 1 t-1 b 2 t-1 b 3 t-1 convert to: a t b 1 t-1 b 2 t-1 b 3 t-1
Optimality of SATPLAN Optimal in terms of – The makespan (the length of a parallel plan) – An example solution plan Step 1: move(a,b,g), move(b,c,h) Step 2: move(a, g,table), move(c,s,d), move(e,f,m) Step 3: move(s,e,g) …. Step N: move(b,c,n), move(g,e,table) – SATPLAN finds the shortest parallel plan
Optimality of SATPLAN However, SATPLAN is not optimal for other preference metrics, such as – Total number of actions – Total action costs – Functional preferences f(a 1,a 2,…,a n ) A solution – Completely solve the SAT instance to optimize the given metric – Use bounding and pruning techniques to speedup search
Extend SATPLAN to Temporally Expressive Problems What is “Temporally Expressive” – Two actions need to be executed in parallel, in any valid plan. (required concurrency) adds condition f action2 deletes condition f Time action1 ff requires condition f
Facts about Temporally Expressive Planning Many existing temporal planners are not able to handle temporally expressive problems Most planning benchmarks (e.g. in IPC3-IPC5) are not temporally expressive [W. Cushing et al. :ICAPS-2007] Required concurrency is important in the real world – PDDL2.1 actually supports temporally expressive
Some Exsiting Temporally Expressive Planners State space search planners – Crikey2 and Crikey3, and variants Partial order planner – VHPOP Others – TLP-GP
SAT-Based Temporally Expressive Planning Discrete time Optimal in terms of overall timespan Transform the original temporal planning problem: – Model a durative action as a linked pair of simple actions – For each action, a new fact is used to indicate that it is executing [Long & Fox 2003] Formulate the transformed temporal planning problem into a SAT encoding Solve a sequence of SAT instances from time= 1 to time = N, with an increment of δ, until a solution is found [Ray & Ginsberg 2008]
SAT Encoding for Temporal Planning Variables
SAT Encoding for Temporal Planning Clauses
SAT Encoding for Temporal Planning Clauses
– Large room to improve the overall network throughput Under topology constraints, file availability constraints, upload/download speed constraints Modeling P2P Network Optimization
Represent the problem using PDDL2.1 Optimize the overall time to get all the peers’ data delivered – Optimize overall network throughput Main constraints – To download a file, a computer must be routed to a serving computer that has a copy of the file and is serving the file – Each computer can serve at most one file at any time
Properties Different topologies of the networks Objects: – Files, with different sizes (takes different time to transfer) – Computers/Peers Initial State: – Each computer may have its own collection of files Goal: – Each computer want to download a set of files
Required Concurrency in P2P network Required Concurrency – The serving computers – The downloading computers Start to serve A Serving file A Serving ends Peer Downloading A Time Peer Downloading A
Results – (Original Match-lift Domain) Experimental results on the Match-lift domain. The numbers in Columns L, M, R, U represent the numbers of floors, matches, rooms and fuses. Low concurrency as M=U.
Results (Match-lift with more concurrencies) Experimental results on Match-Lift Variant, which has more concurrencies (less match, long duration). Look at #12.
Results – P2P Network Optimization Domain Experimental results on the P2P domain. Column ‘P’ is the instance ID. Columns ‘C’ and ‘F’ are the numbers of peers and files, respectively, in the network. Columns ‘Time’ and ‘QAL’ are the running time and time span of solutions, respectively are much harder (topology, files)
Driverslogshift Domain Long durative actions Crikey2 sometimes generates flawed plans
Search based planner can be very efficiently when the problems are of lower degree of required concurrencies Results SAT-based planner is suitable for problems requiring a high degree of concurrency action1 a b action2 action1 a d action2 b c e f g e.g. original Match-lift domain e.g. P2P and Match-lift domain with more required concurrencies