Presentation is loading. Please wait.

Presentation is loading. Please wait.

CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical Techniques Relaxing Environmental Assumptions.

Similar presentations


Presentation on theme: "CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical Techniques Relaxing Environmental Assumptions."— Presentation transcript:

1 CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical Techniques Relaxing Environmental Assumptions

2 Overview  extending planning language & algorithms  1. allow actions that have durations & resource constraints  yields a new "scheduling problem" paradigm  incorporating action durations & timing, required resources  2. hierarchical planning techniques  control the complexity of large scale plans by hierarchical structuring of actions  3. uncertain environments  non-deterministic domains  4. multiagent environments 2 Planning & Acting in the Real World

3 Scheduling versus Planning  recall from classical planning (Ch 10)  PDDL representations only allowed us to decide the relative ordering among planning actions  up till now we've concentrated on what actions to do, given their PRECONDs & EFFECTs  in the real world, other properties must be considered  actions occur at particular moments in time, have a beginning and an end, occupy or require a certain amount of time  for a new category of Scheduling Problems we need to consider the absolute times when an event or action will occur & the durations of the events or actions  typically these are solved in 2 phases: planning then scheduling  a planning phase selects actions, respecting ordering constraints  this might be done by a human expert, and automated planners are suitable if they yield minimal ordering constraints  then a scheduling phase incorporates temporal information so that the result meets resource & deadline constraints 3

4 Time, Schedules & Resources  the Job-Shop Scheduling (JSS) paradigm includes  the requirement to complete a set of jobs  each job consists of a sequence of actions with ordering constraints  each action  has a given duration and may also require some resources  resource constraints indicate the type of resource, the number of it that are required, and whether the resource is consumed in the action or is reusable  the goal is to determine a schedule  one that minimizes the total time required to complete all jobs, (the makespan)  while respecting resource requirements & constraints 4 Planning & Acting in the Real World

5 Job-Shop Scheduling Problem (JSSP)  JSSP involves a list of jobs to do  where a job is a fixed sequence of actions  actions have quantitative time durations & ordering constraints  actions use resources (which may be shared among jobs)  to solve the JSSP: find a schedule that  determines a start time for each action  1. that obeys all hard constraints  e.g. no temporal overlap between mutex actions (those using the same one-action-at-a-time resource)  2. for our purposes, we'll operationalize cost as the total time to perform all actions and jobs  note that the cost function could be more complex (it could include the resources used, time delays incurred,...)  our example: automobile assembly scheduling  the jobs: assemble two cars  each job has 3 actions: add the engine, add the wheels, inspect the whole car  a resource constraint is that we do the engine & wheel actions at a special one-car-only work station 5

6 Ex: Car Construction Scheduling  the job shop scheduling problem of assembling 2 cars  includes required times & resource constraints  notation: A < B indicates action A must precede action B Jobs({AddEngine1 < AddWheels1 < Inspect1}, {AddEngine2 < AddWheels2 < Inspect2}) Resources (EngineHoists(1), WheelStations(1), Inspectors(2), LugNuts(500)) Action(AddEngine1, DURATION: 30, USE: EngineHoists(1)) Action(AddEngine2, DURATION: 60, USE: EngineHoists(1)) Action(AddWheels1, DURATION:30, CONSUME: LugNuts(20), USE: WheelStations(1)) Action(AddWheels2, DURATION:15, CONSUME: LugNuts(20), USE: WheelStations(1)) Action(Inspect i DURATION: 10, USE: Inspectors(1)) 6 Planning & Acting in the Real World

7 Car Construction Scheduling  note that the action schemas  list resources as numerical quantities, not named entities  so Inspectors(2), rather than Inspector(I 1 ) & Inspector(I 2 )  this process of aggregation is a general one  it groups objects that are indistinguishable with respect to the current purpose  this can help reduce complexity of the solution  for example, a candidate schedule that requires (concurrently) more than the number of aggregated resources can be rejected without having to exhaustively try assignments of individuals to actions 7 Planning & Acting in the Real World

8 Planning + Scheduling for JSSP  Planning + Scheduling for Job-Shop Problems  scheduling differs from standard planning problem  considers when an action starts and when it ends  so in addition to order (planning), duration is also considered  we begin with ignoring the resource constraints, solving the temporal domain issues to minimize the makespan  this requires finding the earliest start times for all actions consistent with the problem's ordering constraints  we create a partially-ordered plan, representing ordering constraints in a directed graph of actions  then we apply the critical path method to determine the start and end times for each action 8 Planning & Acting in the Real World

9 Graph of POP + Critical Path  the critical path is the path with longest total duration  it is "critical" in that it sets the duration for the whole plan and delaying the start of any action on it extends the whole plan  it is the sequence of actions, each of which has no slack  each must begin at a particular time, otherwise the whole plan is delayed  actions off the critical path have a window of time given by the earliest possible start time ES & the latest possible start time LS  the illustrated solution assumes no resource constraints  note that the 2 engines are being added simultaneously  the figure shows [ES, LS] for each action, & slack is LS - ES  the time required is indicated below the action name & bold links mark the critical path 9

10 JSSP: (1)Temporal Constraints  schedule for the problem  is given by ES & LS times for all actions  note the 15 minutes slack for each action in the top job, versus 0 (by definition) in the critical path job  formulas for ES & LS also outline a dynamic-programming algorithm for computing them  A, B are actions, A < B indicates A must come before B ES(Start) =0 ES(B) = max AA LS(B) - Duration(A)  complexity is O(Nb) where N is number of actions and b is the maximum branching factor into or out of an action  so without resource constraints, given a partial ordering of actions, finding the minimum duration schedule is (a pleasant surprise!) computationally easy 10

11 JSSP: (1)Temporal Constraints  timeline for the solution  grey rectangles give intervals for actions  empty portions show slack 11 Planning & Acting in the Real World

12 Solution from POP + Critical Path  1. the partially-ordered plan (above)  2. the schedule from the critical-path method (below)  notice that this solution still omits resource constraints  for example, the 2 engines are being added simultaneously 12

13 Scheduling with Resources  including resource constraints  critical path calculations involve conjunctions of linear inequalities over action start & end times  they become more complicated when resource constraints are included (for example, each AddEngine action requires the 1 EngineHoist, so they cannot overlap)  they introduce disjunctions of linear inequalities for possible orderings & as a result, complexity becomes NP-hard!!  here's a solution accounting for resource constraints  reusable resources are in the left column, actions align with resources  this shortest solution schedule requires 115 minutes 13

14 Scheduling with Resources  including resource constraints  notice  that the shortest solution is 30 minutes longer than the critical path without resource constraints  that multiple inspector resource units are not needed for this job, indicating the possibility for reallocation of this resource  that the "critical path" now is: AddEngine1, AddEngine2, AddWheels2, Inspect2.  the remaining actions have considerable slack time, they can begin much later without affecting the total plan time 14

15 Scheduling with Resources  for including resource constraints  a variety of solution techniques have been tested  one simple approach uses the minimum slack heuristic  at each step schedule next the unscheduled action that has its predecessors scheduled & has the least slack  update ES & LS for impacted actions & repeat  note the similarity to minimum-remaining values (MRV) heuristic of CSPs  applied to this example, it yields a 130 minute solution  15 minutes longer than the optimal solution  difficult scheduling problems may require a different approach  they may involve reconsidering actions & constraints, integrating the planning & scheduling phases by including durations & overlaps in constructing the POP  this approach is a focus of current research interest 15 Planning & Acting in the Real World

16 Time & Resource Constraints  summary  alternative approaches to planning with time & resource constraints  1. serial: plan, then schedule  use a partial or full-order planner  then schedule to determine actual start times  2. interleaved: mix planning and scheduling  for example, include resource constraints during partial planning  these can determine conflicts between actions  notes:  remember that so far we are still working in classical planning environments  so, fully observable, deterministic, static and discrete 16 Planning & Acting in the Real World

17 Hierarchical Planning  next  we add techniques to the handle plan complexity issue  HTN: hierarchical task network planning  this works in a top-down fashion  similar to the stepwise refinement approach to programming  plans that are built from a fixed set of small atomic actions will become unwieldy as the planning problem grows large  we need to plan at a higher level of abstraction  reduce complexity by hierarchical decomposition of plan steps  at each level of the hierarchy a planning task is reduced to a small number of activities at the next lower level  the low number of activities  means the computational cost of arranging these activities can be lowered 17 Planning & Acting in the Real World

18 Hierarchical Planning  an example: the Hawaiian vacation plan  recall: the AIMA authors live/work in San Francisco Bay area  go to SFO airport  take flight to Honolulu  do vacation stuff for 2 weeks  take flight back to SFO  go Home  each action in this plan actually embodies another planning task  for example: the go to SFO airport action might be expanded  drive to long term parking at SFO  park  take shuttle to passenger terminal  & each action can be decomposed until the level consists of actions that can be executed without deliberation  note: some component actions might not be refined until plan execution time (interleaving: a somewhat different topic) 18 Planning & Acting in the Real World

19 Hierarchical Planning  basic approach  at each level, each component is reduced to a small number of activities at the next lower level  this keeps the computational cost of arranging them low  otherwise, there are too many individual atomic actions for non-trivial problems (yielding high branching factor & depth)  the formalism is HTN planning  Hierarchical Task Network planning  notes  we retain the basic environmental assumptions as for classical planning  what we previously simply called actions are now "primitive actions"  we add HLAs: High Level Actions (like go to SFO airport)  each has 1 or more possible refinements  refinements are sequences of actions, either HLAs or primitive actions 19

20 Hierarchical Task Network  alternative refinements: notation  for the HLA: Go(Home, SFO) Refinement (Go(Home, SFO), STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)]) Refinement (Go(Home, SFO), STEPS: [Taxi(Home, SFO)])  the HLAs and their refinements  capture knowledge about how to do things  terminology: if the HLA refines to only primitive actions  it is called an implementation  the implementation of a high-level plan (sequence of HLAs)  concatenates the implementations for each HLA  the preconditions/effects representation of primitive action schemas allows a decision about whether an implementation of a high-level plan achieves the goal 20

21 Hierarchical Task Network  HLAs & refinements & plan goals  in the HTN approach, the goal is achieved if any implementation achieves it  this is the case since an agent may choose the implementation to execute (unlike non-deterministic environments where "nature" chooses)  in the simplest case there's a single implementation of an HLA  we get preconds/effects from the implementation, and then treat the HLA as a primitive action  where there are multiple implementations, either  1. search over implementations for 1 that solves the problem  OR  2. reason over HLAs directly  derive provably correct abstract plans independent of the specific implementations 21 Planning & Acting in the Real World

22 Search Over Implementations  1. the search approach  this involves generation of refinements by replacing an HLA in the current plan with a candidate refinement until the plan achieves the goal  the algorithm on the next slide shows a version using breadth-first tree search, considering plans in the order of the depth of nesting of refinements  note that other search versions (graph-search) and strategies (depth-first, iterative deepening) may be formulated by re- designing the algorithm  explores the space of sequences derived from knowledge in the HLA library re: how things should be done  the action sequences of refinements & their preconditions code knowledge about the planning domain  HTN planners can generate very large plans with little search 22 Planning & Acting in the Real World

23 Search Over Implementations  the search algorithm for refinements of HLAs function HIERARCHICAL-SEARCH(problem, hierarchy) returns a solution or failure frontier  a FIFO queue with [Act] as the only element loop do if EMPTY?(frontier) then return failure plan  POP(frontier) /* chooses the shallowest plan in frontier */ hla  the first HLA in plan, or null if none prefix, suffix  the action subsequences before and after hla in plan outcome  RESULT(problem.INITIAL-STATE, prefix) if hla is null then /* so plan is primitive & outcome is its result */ if outcome satisfies problem.GOAL then return plan /* insert all refinements of the current hla into the queue */ else for each sequence in REFINEMENTS(hla, outcome, hierarchy) do frontier  INSERT(APPEND(prefix, sequence, suffix), frontier) Planning & Acting in the Real World 23

24 HTN Examples  O-PLAN: an example of a real-world system  the O-PLAN system does both planning & scheduling, commercially for the Hitachi company  one specific sample problem concerns a product line of 350 items involving 35 machines and different operations  for this problem, the planner produces a 30-day schedule of 3x8-hour shifts, with 10s of millions of steps  a major benefit of the hierarchical structure with the HTN approach is the results are often easily understood by humans  abstracting away from excessive detail  (1) makes large scale planning/scheduling feasible  (2) enhances comprehensibility 24 Planning & Acting in the Real World

25 HTN Efficiency  computational comparisons for a hypothetical domain  assumption 1: a non-hierarchical progression planner with d primitive actions, b possibilities at each state: O(b d )  assumption 2: an HTN planner with r refinements of each non-primitive, each with k actions at each level  how many different refinement trees does this yield?  depth: number of levels below the root = log k d  then the number of internal refinement nodes = 1 + k + k 2 + … + k logkd-1 = (d - 1)/(k - 1)  each internal node has r possible refinements, so r (d - 1)/(k - 1) possible regular decomposition trees  the message: keeping r small & k large yields big savings (roughly k th root of non-hierarchical cost if b & r are comparable)  nice as a goal, but long action sequences that are useful over a range of problems are rare 25 Planning & Acting in the Real World

26 HTN Efficiency  HTN computational efficiency  building the plan library is critically important to achieving efficiency gains HTN planning  so, might the refinements be learned?  as one example, an agent could build plans conventionally then save them as a refinement of an HLA defined as the current task/problem  one goal is "generalizing" the methods that are built, eliminating problem-instance specific detail, keeping only key plan components 26 Planning & Acting in the Real World

27 Hierarchical Planning  we've just looked at the approach of searching over fully refined plans  that is, full implementations  the algorithm refines plans to primitive actions in order to check whether they achieve the problem goal  now we move on to searching for abstract solutions  the checking occurs at the level of HLAs  possibly with preconditions/effects descriptions for HLAs  the result is that search is in the much smaller HLA space, after which we refine the resulting plan 27 Planning & Acting in the Real World

28 Hierarchical Planning  searching for abstract solutions  this approach will require that HLA descriptions have the downward refinement property  every high level plan that apparently solves the problem (from the description of its steps) has at least 1 implementation that achieves the goal  since search is not at the level of sequences of primitive actions, a core issue is the describing of effects of actions (HLAs) with multiple implementations  assuming a problem description with only +ve preconds & goals, we might describe an HLA's +ve effects in terms of those achieved by every implementation, and its -ve effects in terms of those resulting from any implementation  this would satisfy the downward refinement property  however, requiring an effect to be true for every implementation is too restrictive, it assumes that an adversary chooses the implementation (assumes an underlying non-deterministic model) 28

29 Plan Search in HLA Space  plan search in HLA space  there are alternative models for which implementation is chosen, either  (1) demonic non-determinism where some adversary makes the choice  (2) angelic non-determinism, where the agent chooses  if we adopt angelic semantics for HLA descriptions  the resulting notation uses simple set operations/notation  the key concept is that of the reachable set for some HLA h & state s, notation: Reach(s, h)  this is the set of states reachable by any implementation of h (since under angelic semantics, the agent gets to choose)  for a sequence of HLAs [h 1, h 2 ] the reachable set is the union of all reachable sets from applying h 2 in each state in the reachable set of h 1 (for notation details see p 411)  a sequence of HLAs forming a high level plan is a solution if its reachable set intersects the set of goal states 29 Planning & Acting in the Real World

30 Plan Search in HLA Space  illustration of reachable sets, sequences of HLAs  dots are states, shaded areas = goal states  darker arrows: possible implementations of h 1  lighter arrows: possible implementations of h 2  (a) reachable set for HLA h 1  (b) reachable set for the sequence [h 1, h 2 ]  circled dots show the sequence achieving the goal Planning & Acting in the Real World 30

31 Planning in HLA Space  using this model  planning consists of searching in HLA space for a sequence with a reachable set that intersects the goal, then refining that abstract plan  note: we haven't considered yet the issue of representing reachable sets as the effects of HLAs  our basic planning model has states as conjunctions of fluents  if we treat the fluents of a planning problem as state variables, then under angelic semantics an HLA controls the values of these variables, depending on which implementation is actually selected  HLA may have 9 different effects on a given variable  if it starts true, in can always keep it true, always make it false, or have a choice & similarly for a variable that is initially false  any combination of the 3 choices for each case is possible, yielding 3 2 or 9 effects 31 Planning & Acting in the Real World

32 Planning in HLA Space  using this model  so there are 9 possible combinations of choices for the effects on variables  we introduce some additional notation to capture this idea  note some slight formatting differences between the details of the notation used here versus in the textbook  ~ indicates possibility, the dependence on the agent's choice of implementation  ~+A indicates the possibility of adding A  ~-A represents the possible deleting of A  ~±A stands for possibly adding or deleting A 32 Planning & Acting in the Real World

33 Planning in HLA Space  possible effects of HLAs  a simple example uses the HLA for going to the airport Go(Home, SFO) Refinement (Go(Home, SFO), STEPS: [Drive(Home, SFOLongTermParking), Shuttle(SFOLongTermParking, SFO)]) Refinement (Go(Home, SFO), STEPS: [Taxi(Home, SFO)])  this HLA has ~-Cash as a possible effect, since the agent may choose the refinement of going by taxi & have to pay  we can use this notation & angelic reachable state semantics to illustrate how an HLA sequence [h 1, h 2 ] reaches a goal  it's often the case that an HLA's effects can only be approximated (since it may have infinitely many implementations & produce arbitrarily "wiggly" reachable sets)  we use approximate descriptions of result states of HLAs that are  optimistic: REACH + (s, h) or pessimistic: REACH - (s, h)  one may overestimate, the other underestimate  here's the definition of the relationship  REACH - (s, h)  REACH(s, h)  REACH + (s, h) 33

34 Planning in HLA Space  possible effects of HLAs using approximate descriptions of result states  with approximate descriptions, we need to reconsider how to apply/interpret the goal test  (1) if the optimistic reachable set for a plan does not intersect the goal, then the plan is not a solution  (2) if the pessimistic reachable set for a plan intersects the goal, then the plan is a solution  (3) if the optimistic set intersects but the pessimistic set does not, the goal test is not decided & we need to refine the plan to resolve residual ambiguity 34

35 Planning in HLA Space  illustration  shading shows the set of goal states  reachable sets: R+ (optimistic) shown by dashed boundary, R- (pessimistic) by solid boundary  in (a) the plan shown by a dark arrow achieves the goal & the plan shown by the lighter arrow does not  in (b), the plan needs further refinement since the R+ (optimistic) set intersects the goal but the R- (pessimistic) does not 35

36 Planning in HLA Space  the algorithm  hierarchical planning with approximate angelic descriptions function ANGELIC-SEARCH(problem, hierarchy, initialPlan) returns solution or fail frontier  a FIFO queue with initialPlan as the only element loop do if EMPTY?(frontier) then return fail plan  POP(frontier) /* chooses shallowest node in frontier */ if REACH + (problem.INITIAL-STATE, plan) intersects problem.GOAL then/* opt'c*/ if plan is primitive then return plan /* REACH + is exact for primitive plans */ guaranteed  REACH - (problem.INITIAL-STATE, plan)  problem.GOAL/* pess'c*/ /* pessimistic set includes a goal state & we're not in infinite regress of refinements */ if guaranteed  {} and MAKING-PROGRESS(plan, initialPlan) then finalState  any element of guaranteed return DECOMPOSE(hierarchy, problem.INITIAL-STATE, plan, finalState) hla  some HLA in plan prefix, suffix  the action subsequences before & after hla in plan for each sequence in REFINEMENTS(hla, outcome, hierarchy) do frontier  INSERT(APPEND(prefix, sequence, suffix), frontier) 36

37 Planning in HLA Space  the decompose function  mutually recursive with ANGELIC-SEARCH  regress from goal to generate successful plan at next level of refinement function DECOMPOSE(hierarchy, s 0, plan, s f ) returns a solution solution  an empty plan while plan is not empty do action  REMOVE-LAST(plan) s i  a state in REACH - (s 0, plan) such that s f  REACH - (s i, action) problem  a problem with INITIAL-STATE = s i and GOAL = s f solution  APPEND(ANGELIC-SEARCH(problem, hierarchy, action), solution) s f  s i return solution 37

38 Planning in HLA Space  notes  ANGELIC-SEARCH has the same basic structure as the previous algorithm (BFS in space of refinements)  the algorithm detects plans that are or aren't solutions by checking intersections of optimistic & pessimistic reachable sets with the goal  when it finds a workable abstract plan, it decomposes the original problem into subproblems, one for each step of the plan  the initial state & goal for each subproblem are derived by regressing the guaranteed reachable goal state through the action schemas for each step of the plan  ANGELIC-SEARCH has a computational advantage over the previous hierarchical search algorithm, which in turn may have a large advantage over plain old exhaustive search Planning & Acting in the Real World 38

39 Least Cost & Angelic Search  the same approach can be adapted to find a least cost solution  this generalizes the reachable set concept so that a state, instead of being reachable or not, has costs for the most efficient way of getting to it ( for unreachable states)  then optimistic & pessimistic descriptions bound the costs  the holy grail of hierarchical planning  this revision may allow finding a provably optimal abstract plan without checking all implementations  extensions: the approach can also be applied to online search in the form of hierarchical lookahead algorithms (recall LRTA*)  the resulting algorithm resembles the human approach to problems like the vacation plan  initially consider alternatives at the abstract level, over long time scales  leave parts of the plan abstract until execution time, though other parts are expanded into detail (flights, lodging) to guarantee feasibility of the plan 39

40 Nondeterministic Domains  finally, we'll relax some of the environment assumptions of the classical planning model  in part, these parallel the extensions of our earlier (CISC352) discussions of search  we'll consider the issues in 3 sub-categories  (1) sensorless planning (conformant planning)  completely drop the observability property for the environment  (2) contingency planning  for partially observable & nondeterministic environments  (3) online planning & replanning  for unknown environments  however, we begin with some background Planning & Acting in the Real World 40

41 BKGD: Nondeterministic Domains  note some distinct differences from the search paradigms  the factored representation of states allows an alternative belief state representation  plus, we have the availability of the domain-independent heuristics developed for classical planning  as usual, we explore issues using a prototype problem  this time it's the task of painting a chair & table so that their colors match  in the initial state, the agent has 2 cans of paint, colors unknown, likewise the chair & table colors are unknown, & only the table is visible  plus there are actions to remove the lid of a can, & to paint from an open can (see the next slide) 41

42 The Furniture Painting Problem  the furniture painting problem Init(Object(Table)  Object(Chair)  Can(C 1 )  Can(C 2 )  InView(Table) Goal(Color(Chair, c)  Color(Table, c)) Action(RemoveLid(can), PRECOND: Can(can) EFFECT: Open(can)) Action(Paint(x, can), PRECOND: Object(x)  Can(can)  Color(Can, c)  Open(can) EFFECT: Color(x, c)) Planning & Acting in the Real World 42

43 BKGD: Nondeterministic Domains  the environment  since it may not be fully observable, we'll allow action schemas to have variables in preconditions & effects that aren't in the action's variable list  Paint(x, can) omits the variable c representing the color of the paint in can  the agent may not know what color is in a can  in some variants, the agent will have to use percepts it gets while executing the plan, so planning needs to model sensors  the mechanism: Percept Schemas Percept (Color(x, c), PRECOND: Object(x)  InView(x)) Percept (Color(can, c), PRECOND: Can(can)  InView(can)  Open(can))  when an object is in view, the agent will perceive its color  if an open can is in view, the agent will perceive the paint color Planning & Acting in the Real World 43

44 BKGD: Nondeterministic Domains  we still need an Action Schema for inspecting objects Action (LookAt(x), PRECOND: InView(y)  (x  y) EFFECT: InView(x)  ¬ InView(y))  in a fully observable environment, we include a percept axiom with no preconds for each fluent  of course, a sensorless agent has no percept axioms  note: it can still coerce the table & chair to the same color to solve the problem (though it won't know what color that is)  a contingent planning agent with sensors can do better  inspect the objects, & if they're the same color, done  otherwise check the paint cans & if one is the same color as an object, paint the other object with it  otherwise paint both objects any color  an online agent produces contingent plans with few branches  handling problems as they occur by replanning Planning & Acting in the Real World 44

45 BKGD: Nondeterministic Domains  a contingent planner assumes that the effects of an action are successful  a replanning agent checks results, generating new plans to fix any detected flaws  in the real world we find combinations of approaches Planning & Acting in the Real World 45

46 Sensorless Planning Belief States  unobservable environment = Sensorless Planning  these problems are belief state planning problems with physical transitions represented by action schemas  we assume a deterministic environment  we represent belief states as logical formulas rather than the explicit sets of atomic states we saw for sensorless search  for the prototype planning problem: furniture painting  1. we omit the InView fluents  2. some fluents hold in all belief states, so we can omit them for brevity: (Object(Table), Object(Chair), Can(C1), Can(C2))  3. the agent knows things have a color (x c Color(x, c)), but doesn't know the color of anything or the open vs closed state of cans  4. yields an initial belief state b 0 = Color(x, C(x)), where C(x) is a Skolem function to replace the existentially quantified variable  5. we drop the closed-world assumption of classical planning, so states may contain +ve & -ve fluents & if a fluent does not appear, its value is unknown 46

47 Sensorless Planning Belief States  belief states  specify how the world could be  they are represented as logical formulas  each is a set of possible worlds that satisfy the formula  in a belief state b, actions available to the agent are those with their preconds satisfied in b  given the initial belief state b 0 = Color(x, C(x)), a simple solution for the painting problem plan is: [RemoveLid(Can 1 ), Paint(Chair, Can 1 ), Paint(Table, Can 1 )]  we'll update belief states as actions are taken, using the rule  b' = RESULT(b, a) = {s': s' = RESULT P (s, a) and s  b}  where RESULT P defines the physical transition model Planning & Acting in the Real World 47

48 Sensorless Planning Belief States  updating belief states  we assume that the initial belief state is 1-CNF form, that is, a conjunction of literals  b' is derived based on what happens for the literals l in the physical states s that are in b when a is applied  if the truth value of a literal is known in b then in b' it is given by the current value, plus the add list of a & the delete list of a  if a literal's truth value is unknown, 1 of 3 cases applies  1. a adds l so it must be true in b'  2. a deletes l so it must be false in b'  3. a does not affect l so it remains unknown (thus is not in b') Planning & Acting in the Real World 48

49 Sensorless Planning Belief States  updating belief states: the example plan  recall the sensorless agent's solution plan for the furniture painting problem [RemoveLid(Can 1 ), Paint(Chair, Can 1 ), Paint(Table, Can 1 )]  apply RemoveLid(Can 1 ) to b 0 = Color(x, C(x)) (1) b 1 = Color(x, C(x))  Open(Can 1 )  apply Paint(Chair, Can 1 ) to b 1  precondition Color(Can 1, c) is satisfied by Color(x, C(x)) with the binding {x/Can 1, c/C(Can 1 )} (2) b 2 = Color(x, C(x))  Open(Can 1 )  Color(Chair, C(Can 1 ))  now apply the last action to get the next belief state, b 3 (3) b3 = Color(x, C(x))  Open(Can 1 )  Color(Chair, C(Can 1 ))  Color(Table, C(Can 1 ))  note that this satisfies the plan goal (Goal(Color(Chair, c)  Color(Table, c))with c bound to C(Can 1 ) 49

50 Sensorless Planning Belief States  the painting problem solution  this illustrates that the family of belief states given as conjunctions of literals is closed under updates defined by PDDL action schemas  so given n total fluents, any belief state is represented as a conjunction of size O(n) (despite the O(2 n ) states in the world)  however, this is only the case when action schemas have the same effects for all states in which their preconds are satisfied  if an action's effects depends on the state, dependencies among fluents are introduced & the 1-CNF property does not apply  illustrated by an example from the simple vacuum world on the next slides Planning & Acting in the Real World 50

51 Recall Vacuum World  the simple vacuum world state space Planning & Acting in the Real World 51

52 Sensorless Planning Belief States  if an action's effects depends on the state  dependencies among fluents are introduced & the 1-CNF property does not apply  the effect of the Suck action depends on where it is done (CleanL if agent is AtL, but CleanR if agent is AtR)  this requires conditional effects for action schemas:  when condition: effect, or for the vacuum world Action (Suck, Effect: when AtL: CleanL  when AtR: CleanR)  considering conditional effects & belief states  applying the conditional action to the initial belief state yields a result belief state (AtL  CleanL)  (AtR  CleanR)  so the belief state formula is no longer 1-CNF, and in the worst case may be exponential in size Planning & Acting in the Real World 52

53 Sensorless Planning Belief States  to a degree, the available options are  (1) use conditional effects for actions & deal with the loss of the belief state representational simplicity  (2) use a conventional action representation whose preconditions, if unsatisfied, are inapplicable & leave the resulting state undefined  for sensorless planning, conditional effects are preferable  they yield "wiggly" belief states (& maybe that's inevitable anyway for non-trivial problems)  an alternative is a conservative approximation of belief states (all literals whose truth values can be determined, with the others treated as unknown)  this yields planning that is sound but incomplete (if problem requires interactions among literals) Planning & Acting in the Real World 53

54 Sensorless Planning Belief States  another alternative  the agent (algorithm) could attempt to use actions sequences that keep the belief state simple (1-CNF) as in this vacuum world example  the target is a plan consisting of actions that will yield the simple belief state representation, for example: [Right, Suck, Left, Suck] b 0 = True b 1 = AtR b 2 = AtR  CleanR b 3 = AtL  CleanR b 4 = AtL  CleanR  CleanL  note that some alternative sequences (e.g. those beginning with the Suck action) would break the 1-CNF representation  more simple belief states are attractive, as even human behaviour shows - the evidence is our carrying out of frequent small actions to reduce uncertainty (keeping the belief state manageable) 54

55 Sensorless Planning Belief States  yet another alternative for representing belief states under the relaxed observability  we might represent belief states in terms of an initial belief state + a sequence of actions, yielding an O(n + m) bound on belief state size  a world of n literals, with a maximum of m actions in a sequence  if so, the issues relate to the difficulty of calculating when an action is applicable or a goal is satisfied  we might use an entailment test: b 0  A m ╞ G m, where  b 0 is the initial belief state  A m are the successor state axioms for the actions in the sequence, and G m states the goal is achieved after m actions  so we want to show b 0  A m  ¬G m is unsatisfiable  a good SAT solver may be able to determine this quite efficiently Planning & Acting in the Real World 55

56 Sensorless Planning Heuristics  as a last consideration  we return to the question of the use of heuristics to prune the search space  notice that for belief states, solving for a subset of the belief state must be easier than solving it entirely if b 1  b 2 then h * (b 1 )  h * (b 2 )  thus an admissible heuristic for a subset of states in the belief state is an admissible heuristic for the belief state  candidate subsets include singletons, the individual states  assuming we adopt 1 of the admissible heuristics we saw for classical planning, and that s 1,..., s N is a random selection of states in belief state b, an accurate admissible heuristic is H(b) = max{h(s 1 ),..., h(s N )}  still other alternatives involve converting to planning graph form, where the initial state layer is derived from b  just its literals if b is 1-CNF or potentially derived from a non-CNF representation 56

57 Contingent Planning  we relax some of the environmental assumptions of classical planning to deal with environments that are partially observable and/or non-deterministic  for such environments, a plan includes branching based on percepts (recall percept schemas from the introduction) Percept (Color(x, c), PRECOND: Object(x)  InView(x)) Percept (Color(can, c), PRECOND: Can(can)  InView(can)  Open(can))  at plan execution, we represent a belief state as logical formulas  the plan includes contingent/conditional branches  check branch conditions: does the current belief state entail the condition or its negation  the conditions include first order properties (existential quantification), so they may have multiple substitutions  an agent gets to choose one, applying it to the remainder of the plan 57

58 Contingent Planning  a contingent plan solution for the painting problem [LookAt(Table), LookAt(Chair), if Color(Table, c)  Color(Chair, c) then NoOp else [RemoveLid(Can 1 ), LookAt(Can 1 ), RemoveLid(Can 2 ), LookAt(Can 2 ) if Color(Table, c)  Color(can, c) then Paint(Chair, can) else if Color(Chair, c)  Color(Can, c) then Paint(Table, can) else [Paint(Chair, Can 1 ), Paint(Table, Can 1 )]]]  note: Color(Table, c)  Color(can, c)  this might be satisfied under both {can/Can 1 } and {can/Can 2 } if both cans are the same color as the table  the previous-to-new belief state calculation occurs in 2 stages  (1) after an action, a, as with the sensorless agent  b ^ = (b - DEL(a))  Add(a), where b ^ is the predicted belief state, represented as a conjunction of literals  (2) then in the percept stage, determine which percept axioms hold in the now partially updated belief state, and add their percepts + preconditions 58

59 Contingent Planning  (2) updating the belief state from the percept axioms  Percept(p, PRECOND: c), where c is conjunction of literals  suppose percept literals p 1,..., p k are received  for a given percept p, there's either a single percept axiom or there may be more than 1  if just 1, add it's percept literal & preconditions to the belief state  if > 1, then we have to deal with multiple candidate preconditions  add p & the disjunction of the preconditions that may hold in the predicted belief state b ^  if this is the case, we've given up the 1-CNF form for belief state representation and similar issues arise as for conditional effects for the sensorless planner  given a way to generate exact or approximate belief states  (1) the algorithm for contingent search may generate contingent plans  (2) actions with nondeterministic effects (disjunctive EFFECTs) can be handled with minor changes to belief state updating  (3) heuristics, including those that were suggested for sensorless planning, are available 59

60 Contingent Planning  the AND-OR-GRAPH-SEARCH algorithm  AND nodes indicate non-determinism, must all be handled, while OR nodes indicate choices of actions from states  the algorithm  is depth first, mutually recursive, & returns a conditional plan  notation: [x | l] is the list formed by prepending x to the list l function AND-OR-GRAPH-SEARCH(problem) returns a conditional plan, or failure return OR-SEARCH(problem.INITIAL-STATE, problem, []) function OR-SEARCH(state, problem, path) returns a conditional plan or failure if problem.GOAL-TEST(state) then return the empty plan if state is on path then return failure/* repeated state on this path */ for each action in problem.ACTIONS(state) do plan  AND-SEARCH(RESULTS(state, action), problem, [state | path] ) if plan  failure then return [action | plan] return failure function AND-SEARCH(states, problem, path) returns a conditional plan or failure for each s i in states do plan i  OR-SEARCH(s i, problem, path ) if plan = failure then return failure return [ if s 1 then plan 1 else if s 2 then plan 2 else … if s n-1 then plan n-1 else plan n ] 60

61 Online Replanning  replanning  this approach uses/captures knowledge about what the agent is trying to do  some form of execution monitoring triggers replanning  it interleaves executing & planning, dealing with some contingencies by including Replan branches in the plan  if the agent encounters a Replan during plan execution, it returns to planning mode  why Replan?  may be error or omission in the world model used to build the plan  e.g. no state variable to represent the quantity of paint in a can (so it could even be empty), or exogenous events (a can wasn't properly sealed & the paint dried up), or a goal may be changed  environment monitoring by the online agent  (1) action monitoring: check preconds before executing an action  (2) plan monitoring: check that the remaining plan will still work  (3) goal monitoring: before executing, ask: "Is a better set of goals available?" 61

62 Online Replanning  a replanning example  action monitoring indicates the agent's state is not as planned, so it should try to get back to a state in the original plan, minimizing total cost  when the agent finds it is in not in the expected state, E, but observes that it is instead in O, it Replans Planning & Acting in the Real World 62

63 Online Replanning  replanning in the furniture painting problem [LookAt(Table), LookAt(Chair), if Color(Table, c)  Color(Chair, c) then NoOp else [RemoveLid(Can 1 ), LookAt(Can 1 ), if Color(Table, c)  Color(Can 1, c) then Paint(Chair, Can 1 ) else REPLAN]]  the online planning agent, having painted the Chair, checks the preconds for the remaining empty plan: that the table & chair are the same colour  suppose the new paint didn't cover well & the old colour still shows  the agent needs to determine where in the whole plan to return to, & what repair action sequence to use to get there  given that the current state matches that before Paint(Chair, Can 1 ), an empty repair sequence & new plan of the same [Paint] sequence is OK  the agent resumes execution monitoring, retries the Paint action & loops like this until colours match  note that the loop is online: plan-execute-replan, not explicit in the plan 63

64 Online Replanning  replan  the original plan doesn't handle all contingencies, the REPLAN step could generate an entirely new plan  a plan monitoring agent may detect faults earlier, before the corresponding actions are executed: when the current state means that the remaining plan won't work  so it checks preconditions for success of the remaining plan  for each of its steps, except those contributed by some other step in the remaining plan  the goal is to detect future failure as early as possible, & replan  note: in (rare) cases it might even detect serendipitous success  action monitoring by checking preconditions is relatively easy to include but plan monitoring is more difficult  partial order & planning graph structures include information that may support the plan monitoring approach Planning & Acting in the Real World 64

65 Online Replanning  with replanning, plans will always succeed, right?  still there can be "dead ends", states from which no repair is possible  a flawed model can lead the plan into dead ends  an example of a flawed model: the general assumption of unlimited resources (for example, bottomless paint cans)  however, if we assume there are no dead ends, there will be a plan to reach a goal from any state  and if we further assume that the environment is truly non- deterministic (that there's always a non-zero chance of success) then a replanning agent will eventually achieve the goal Planning & Acting in the Real World 65

66 Online Replanning  when replanning fails  another problem is that actions may not really be non- deterministic - instead, they may depend on preconditions the agent does not know about  for example, that painting from an empty paint can has no effect & will never lead to the goal  there are alternative approaches to cope with such failures  (1) the agent might randomly select a candidate repair plan (open another can?)  (2) the agent also might learn a better model  modifying the world model to match percepts when predictions fail Planning & Acting in the Real World 66

67 Multiagent Planning  the next relaxation of environmental assumptions  there may be multiple agents whose actions need to be taken into account in formulating our plans  background: distinguish several slightly different paradigms  (1) multieffector planning  this is what we might call multitasking, really a single central agent but with multiple ways of interacting with the environment, simultaneously (or, like a multiarmed robot)  (2) multibody planning  here we consider multiple detached units moving separately, but sharing percepts to generate a common representation of the world state that is the basis of the plan  one version of the multibody scenario has central plan formulation but somewhat decoupled execution  for example, a fleet/squadron of reconnaissance robots that are sometimes out of communications range  multibody subplans for each individual body include communication actions 67

68 Multiagent Planning  variations on the theme  with a central planning agent, there's a shared goal  it's also possible for distinct agents, each generating plans, to have a shared goal  the latter paradigm suggests the new prototypical problem: planning for a tennis doubles team  so shared goal situations can be either multibody (1 central plan) or multiagent (each developing a plan, but with a requirement for coordination mechanisms)  a system could even be some hybrid of centralized & multiagent planning  as an example, the package delivery company develops centralized routing plans but each truck driver may respond to unforeseen weather, traffic issues with independent planning Planning & Acting in the Real World 68

69 Multiagent Planning  our first model involves multiple simultaneous actions  the terminology is multiactor settings  we merge aspects of the multieffector, multibody, & multiagent paradigms, then consider issues related to transition models, correctness of plans, efficiency/complexity of planning algorithms  correctness: a correct plan, if carried out by the actors will achieve the goal  note that in a true multiagent situation, they might not agree  synchronization: a simplifying assumption we apply that all actions require the same length of time, & multiple actions at a step in the plan are simultaneous  under a deterministic environment assumption, the transition model is given by the function: Result(s, a)  action choices for a single agent = b, & b may be quite large  in the multiactor model with n actors, now an action is joint using the notation, where a i is the action for the i th actor 69

70 Multiactor Scenario  complexity implications of the transition model  now with b n joint actions we have a b n branching factor for planning  since planning algorithm complexity was already an issue, a shared target for multiactor planning systems is to treat the actors as decoupled so that complexity is linear in n rather then exponential  the loose coupling of actors may allow an approximate to linear improvement  this is analogous to issues we've encountered before: additive heuristics for independent subproblems in planning, reducing of a CSP graph to a tree (or multiple trees) to apply efficient algorithms,...  in multiactor planning: for loosely coupled problems, we treat them as decoupled & then apply fixes as required to handle any interactions  so the action schemas of the transition model treat actors as independent 70

71 Multiactor Scenario  prototype problem: doubles tennis  the problem is formulated as returning a ball hit to the team, while retaining court coverage  there are 2 players on the team, each is either at the net or baseline, on the right side or left side of the court  actions are the moving of a player (actor) or the hitting of the ball by a player Planning & Acting in the Real World 71

72 Doubles Tennis Problem  here's the conventional (independence assumption) multiactor problem setup for doubles tennis Actors(A, B) Init(At(A, LeftBaseline)  At(B, RightNet)  Approaching(Ball, RightBaseline))  Partner(A, B)  Partner(B, A) Goal(Returned(Ball)  At(a, RightNet)  At(a, LeftNet) Action(Hit(actor, Ball), PRECOND: Approaching(Ball, loc)  At(actor, loc) EFFECT: Returned(Ball) Action(Go(actor, to), PRECOND: At(actor, loc)  to  loc EFFECT: At(actor, to)  ¬ At(actor, loc)) Planning & Acting in the Real World 72

73 Multiactor Tennis Doubles Scenario  for the multiactor tennis problem  here is a joint plan given the problem description Plan 1: A:[Go(A, RightBaseline), Hit(A, Ball)] B:[NoOp(B), NoOp(B)]  what are issues given the current problem representation?  a legal and apparently successful plan could still have both players hitting the ball at the same time (though that really won't work)  the preconditions don't include constraints to preclude interference of this type  a solution: revise the action schemas to include concurrent action lists that can explicitly state actions are or are not concurrent Planning & Acting in the Real World 73

74 Controlling Concurrent Actions  a revised Hit action requires it be by 1 actor  this is represented by including a concurrent action list Action(Hit(a, Ball), CONCURRENT: b  a  ¬Hit(b, Ball) PRECOND: Approaching(Ball, loc)  At(a, loc) EFFECT: Returned(Ball)  some actions might require concurrency for success  apparently tennis players require large coolers full of refreshing drinks & 2 actors are required to carry the cooler Action(Carry(a, cooler, here, there), CONCURRENT: b  a  Carry(b, cooler, here, there) PRECOND: At(a, here)  At(cooler, here)  Cooler(cooler) EFFECT: At(a, there)  At(cooler, there)  ¬At(a, here)  ¬At(cooler, here) Planning & Acting in the Real World 74

75 Multiactor Scenario  given appropriately revised action schemas  including concurrent action lists  it becomes relatively simple to adapt the classical planning algorithms for multiactor planning  it depends on there being loose coupling of subplans  so the plan search algorithm does not encounter concurrency constraints too frequently  further, the HTN approaches, techniques for partial observability, contingency & replanning techniques may also be adapted for the loosely coupled multiactor problems  next: full blown multiagent scenarios  each agent makes independent plans Planning & Acting in the Real World 75

76 Multiple Agents  cooperation & coordination  each agent formulates its own plan, but based on shared goals & a shared knowledge base  we continue with the doubles tennis example problem Plan 1: A:[Go(A, RightBaseline), Hit(A, Ball)] B:[NoOp(B), NoOp(B)] Plan 2: A:[Go(A, LeftNet), NoOp(A)] B:[Go(B, RightBaseline), Hit(B, Ball)]  either of these plans may work if both agents use it, but if A does 1 & B does 2 (or vice versa), both or neither returns the ball  so there has to be some mechanism that results in agents agreeing on a single plan Planning & Acting in the Real World 76

77 Multiple Agents  techniques for agreement on a single plan  (A) convention: adopt or agree upon some constraint on the selection of joint plans, for example in doubles tennis, "stay on your side of the court"  or a baseball center fielder takes fly balls hit "in the gap"  conventions are observable at more global levels among multiple agents, when, for example, drivers agree to drive on a particular side of the road  in higher order contexts, the conventions become "social laws"  (B) communication: between agents, as when 1 doubles player yells "mine" to a teammate  the signal indicates which is the preferred joint plan  see similar examples in other team sports as when a baseball fielder calls for the catch on a popup  note that the communication could be non-verbal  plan recognition applies when 1 agent begins execution & the initial actions unambiguously indicate which plan to follow 77

78 Multiple Agents  the AIMA authors discuss natural world conventions  these may be the outcome of evolutionary processes  in harvester ant colonies - there is no central control  yet they execute elaborate "plans" where each individual ant takes on 1 of multiple roles based on its current local conditions  convention or communication?  planning & "spontaneous" human social events (Aberdeen)?  another example from the natural world is the flocking behaviour of birds  this can be seen as a cooperative multiagent process  successful simulations of flocking behaviour algorithmically over a collection of agents ("boids") are possible if each observes its neighbours & maximizes a weighted sum of 3 elements  (1) cohesion: +ve for closer to average position of neighbours  (2) separation: -ve for too close to a neighbour  (3) alignment: +ve for closer to the average heading of neighbours 78 Planning & Acting in the Real World

79 Multiple Agents  convention & emergent behaviour  where complex global behavior can arise from the interaction of simple local rules  in the boids example, the result is a pseudorigid "flock" that has approximately constant density, does not disperse over time, & makes occasional swooping motions  each agent operates without having any joint plan to explicitly indicate actions of other agents  see some boids background & a demo at: boids onlineboids online  UMP! (ultimate multiagent problems)  these involve cooperation within a team & competition against another team, without central planning/control  robot soccer is an example, as are other similar dynamic team sports (hockey, basketball)  may be less true of say baseball, football where some central control is possible & high degree of convention + communication 79

80 Summary  moving away from the limits of classical planning  (1) actions consume (& possibly produce) resources which we treat as aggregates to control complexity  formulate partial plans, taking resource constraints into account, then refine them  (2) time is a resource that can be considered with dedicated scheduling algorithms or perhaps integrated with planning  (3) a HTN (Hierarchical Task Network) approach captures knowledge in HLAs (High Level Actions) that may have multiple implementations as sequences of lower level actions  angelic semantics for interpreting the effects of HLAs allows planning in the space of HLAs without refinement into primitive actions  HTN systems can create large, real-world plans  (4) classical planning's environment assumptions are too rigid/optimistic for many problem domains  full observability, deterministic actions, a single agent 80

81 Summary  relaxing the assumptions of classical planning  (5) contingent & sensorless planning  contingent planning uses percepts during execution to conditionally branch to appropriate subplans  sensorless/conformant planning may succeed in coercing the world to a goal state without any percepts  for contingent & sensorless paradigms, plans are built by search in the belief space, for which the techniques must address representational & computational issues  (6) online planning agents interleave execution & planning  they monitor for problems & repair plans to recover from unplanned states, allowing them to deal with nondeterministic actions, exogenous events, & poor models of the environment  (7) multiple agents might be cooperative or competitive  the keys to success are in mechanisms for coordination  (8) future chapters will cover  probabilistic non-determinism, learning from experience to acquire strategies 81


Download ppt "CISC453 Winter 2010 Planning & Acting in the Real World AIMA3 e Ch 11 Time & Resources Hierarchical Techniques Relaxing Environmental Assumptions."

Similar presentations


Ads by Google