CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati.

CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati http://rakaposhi.eas.asu.edu/cse574

CSE 574: Planning & Learning Subbarao Kambhampati Most everything Will be on Homepage Fill and return the survey form

CSE 574: Planning & Learning Subbarao Kambhampati Personnel G Instructor: Subbarao Kambhampati G No “official TA” Menkes van den Briel (menkes@asu.edu)menkes@asu.edu Will Cushing (wcushing@asu.edu)wcushing@asu.edu J. Benton (j.benton@asu.edu) –Will kindly provide unofficial Teaching Czar support Dr. Sungwook Yoon may also make cameo appearances

CSE 574: Planning & Learning Subbarao Kambhampati Grading Criteria G Caveats –Graduate level class. Participation required and essential G Evaluation (subject to change) –Participation (~20%) »Do readings before classes. Attend classes. Take part in discussions. »Volunteer to be a scribe for at least two classes –Projects/Homeworks (~35%) »May involve using existing planners, writing new domains –Semester project (~20%) »Either a term paper or a code-based project –Mid-term and Final (~25%)

CSE 574: Planning & Learning Subbarao Kambhampati Wiki-scribing.. G As an experiment, I plan to have designated students be scribes for each class G The scribe shall take careful notes during the class, and summarize the main points within 3 days of the class –Summary will be written on the class wiki Planning.wiki.asu.edu G Rest of the class/czars/rao will modify the notes as needed (thus the wiki)

CSE 574: Planning & Learning Subbarao Kambhampati Pre-requisites (CSE471) Search A* Search (admissbility, informedness etc) Local search Heuristics as distances in relaxed problems CSP Definition of CSP problems and SAT problems Standard techniques for CSP and SAT Planning State space search (progression, regression) Planning graph -- as a basis for search -- as a basis for heuristics Logic --Propositional logic --Syntax/semantics of First order logic Probabilistic Reasoning: Bayes networks -- as a compact way of representing joint distribution -- Some idea of standard approaches for reasoning with bayes nets Do fill-in and return the survey form…

CSE 574: Planning & Learning Subbarao Kambhampati On the Text Book vs. Readings G There is a recommended textbook. Its coverage overlaps with the expected coverage of this course to about 50% –You may have to get the book from Amazon… G Caveats: –In some cases, my presentation may be slightly different from that in the text book –In many cases, we will go out of the textbook and read papers. This will happen in two ways: »1. Every so often (a week?), I will assign a paper for “critical reading”. This paper will add to what has been discussed in the class. The intent is to get you to read research papers l You would be expected to provide a short written review of the paper. »2. In some cases, the best treatment of a topic may be an outside paper…

CSE 574: Planning & Learning Subbarao Kambhampati Teaching Methodology… G I would like to try and run it as a “true” graduate course –I will expect you to have read the required readings before coming to the class »I will see my role as “adding” to the readings rather than explaining everything for the first time »If I find too many blank faces indicating missed readings, I will consider required reading summaries before class –I will assume that you are interested not just in figuring out what has been done but where most action is currently

CSE 574: Planning & Learning Subbarao Kambhampati Expectations game.. G What I expect… –Serious time commitment for the course.. –Active participation »In reading »In attending »In wiki-scribing »In online discussions G What you can expect –Background on the current state of the art in automated planning research –Ability to read, understand and critique latest research papers in the area –Ability to formulate and attempt to solve research problems in the area

CSE 574: Planning & Learning Subbarao Kambhampati No reason for reduced expectations.. G 17 students registered currently –10 are PhD students –11 have taken CSE471 with Rao and got an A or more –6 have got A+! –In other words, these are people who have no life outside of university G The real question to can the course be made challenging/interesting enough for this uber-super- student-crowd….

CSE 574: Planning & Learning Subbarao Kambhampati Planning : The big picture G Synthesizing goal-directed behavior G Planning involves –Action selection; Handling causal dependencies –Action sequencing and handling resource allocation »typically called SCHEDULING –Depending on the problem, plans can be »action sequences »or “policies” (action trees, state-action mappings etc.)

CSE 574: Planning & Learning Subbarao Kambhampati Domain-Independent vs. Domain Specific vs. Domain Customized G Domain independent planners only expect as input the description of the actions (in terms of their preconditions and effects), and the description of the goals to be achieved G Domain dependent planners make use of additional knowledge beyond action and goal specification –Domain dependent planners may either be stand alone programs written specifically for that domain OR domain independent planners customized to a specific domain –In the case of domain-customized planners, the additional knowledge they exploit can come in many varieties (declarative control rules or procedural directives on which search choices to try and in what order) –The additional knowledge can either be input manually or in some cases, be learned automatically Unless noted otherwise, we will be talking about domain-independent planning

CSE 574: Planning & Learning Subbarao Kambhampati The Many Complexities of Planning Environment action perception Goals (Static vs. Dynamic) (Observable vs. Partially Observable) (perfect vs. Imperfect) (Deterministic vs. Stochastic) What action next? (Instantaneous vs. Durative) (Full vs. Partial satisfaction) The $$$$$$ Question (Discrete vs. Continuous)

CSE 574: Planning & Learning Subbarao Kambhampati Planning & (Classical Planning) Environment action perception Goals (Static) (Observable) (perfect) (deterministic) What action next? I = initial state G = goal state OiOi (prec)(effects) [ I ] OiOi OjOj OkOk OmOm [ G ]

CSE 574: Planning & Learning Subbarao Kambhampati Static Deterministic ObservableInstantaneousPropositional “Classical Planning” Dynamic Replanning/ Situated Plans Durative Temporal Reasoning Continuous Numeric Constraint reasoning (LP/ILP) Stochastic Contingent/Conformant Plans, Interleaved execution MDP Policies POMDP Policies Partially Observable Contingent/Conformant Plans, Interleaved execution Semi-MDP Policies Metric-Temporal Planning

CSE 574: Planning & Learning Subbarao Kambhampati State of the Field G Automated Planning, as a subfield of AI, is as old as AI G The area has become very “active” for the last several years –Papers appear in AAAI; IJCAI; AIJ; JAIR; as well as AIPS, ECP which have merged to become ICAPS –Tremendous strides in deterministic plan synthesis »Bi-annual Intl. Planning Competitions –Current interest is in exploiting the insights from deterministic planning techniques to other planning scenarios.

CSE 574: Planning & Learning Subbarao Kambhampati Topics to be “covered” G Plan Synthesis under a variety of assumptions –Classical, Metric Temporal, Non-deterministic, Stochastic, Partially Observable… G Plan Management –Reasoning with actions and plans (even if you didn’t synthesize them) –Execution management (Re-planning) G State estimation and Plan Recognition –Estimating the current state of an agent given a sequence of actions and observations –Recognize the high-level goals that the agent is attempting to satisfy G Connections to Workflows, Web services, UBICOMP etc

CSE 574: Planning & Learning Subbarao Kambhampati List of topics to be covered 1. Introduction, representation & search (1 week) 2. State Space and Plan Space Planning, Lifting (1 week) 3. Reachability heuristics (1- week) 4. SAT/CSP/IP based planning graph search; Planning as model-finding (1- week) 5. Refinement Planning as a unifying framework (1 week?) 1.RECITATION: Case studies of heuristic planners. Graphplan search 6. Partial satisfaction planning (1 class) 7. Knowledge-based planning with some emphasis on HTN planning (1 week) 8. Model-lite planning (1 class?) 9. Metric/Temporal Planning (1 week) 10. Scheduling (1 week) 11. Non-deterministic Planning: Conformant and Conditional planning (1 week) 12. Probabilistic planning: MDPs and POMDPS (2+ weeks) 13. Plan & Activity recognition (1 week) 14. Monitoring and Diagnosis (1 class) 15. Multi-agent planning (1 class) 16. Planning & Learning (1+ class)

CSE 574: Planning & Learning Subbarao Kambhampati Topics from the last offering (and how this offering will be different) 1. Introduction (Week 1; 8/23;8/25) Introduction (Week 1; 8/23;8/25) 2. State Space and Plan Space Planning (Week 2; 8/30; 9/1 State Space and Plan Space Planning (Week 2; 8/30; 9/1 3. Refinement Planning as a unifying framework (Week 3; 9/8) Refinement Planning as a unifying framework (Week 3; 9/8) 4. Lifting and Reachability Heuristics (Week 4; 9/13; 9/15) Lifting and Reachability Heuristics (Week 4; 9/13; 9/15) 5. Case studies of herustic planners. Graphplan search (Week 5; 9/20; 9/23) Case studies of herustic planners. Graphplan search (Week 5; 9/20; 9/23) 6. Cost-based planning; SAT/CSP based planning graph search; Planning as model-finding (Week 6; 9/27; 9/29) Cost-based planning; SAT/CSP based planning graph search; Planning as model-finding (Week 6; 9/27; 9/29) 7. Knowledge-based planning (Week 7) Knowledge-based planning (Week 7) --Would like to condense 1-7 into ~4 weeks --Keep 8-11 to about the same --Significantly expand 13-15 --Add other topics 8. Metric/Temporal Planning (Week 8)Metric/Temporal Planning (Week 8) Audio recording of 10/13 lecture 9. Metric/Temporal Planning: Planners; Heuristics (week 9)Metric/Temporal Planning: Planners; Heuristics (week 9) 10. Temporal Networks (week 10)Temporal Networks (week 10) 11. Temporal Networks contd; Scheduling (Week 11)Temporal Networks contd; Scheduling (Week 11) 12. Planning in Belief States... (Week 12)Planning in Belief States... (Week 12) 13. Planning in Belief States contd. (Week 13)Planning in Belief States contd. (Week 13) 14. Conditional Planning; Replanning; MDP start (Week 14)Conditional Planning; Replanning; MDP start (Week 14) 15. More MDPs (Week 15)More MDPs (Week 15)

CSE 574: Planning & Learning Subbarao Kambhampati Applications G Scheduling problems with action choices as well as resource handling requirements –Problems in supply chain management –HSTS (Hubble Space Telescope scheduler) –Workflow management G Autonomous agents –RAX/PS (The NASA Deep Space planning agent) G Software module integrators –VICAR (JPL image enhancing system); CELWARE (CELCorp) –Test case generation (Pittsburgh) G Interactive decision support –Monitoring subgoal interactions »Optimum AIV system G Plan-based interfaces –E.g. NLP to database interfaces –Plan recognition

CSE 574: Planning & Learning Subbarao Kambhampati Applications (contd) G Web services –Composing web services, and monitoring their execution has a lot of connections to planning –Many of the web standards have a lot of connections to plan representation languages »BPEL; BPEL-4WS allow workflow specifications »DAML-S allows process specifications G Grid services/Scientific workflow management G UBICOMP applications –State estimation; plan recognition to figure out what a user is upto (so she can be provided appropriate help) –Taking high-level goals and converting them to sequences of actions

CSE 574: Planning & Learning Subbarao Kambhampati Who hires planning folks? G Rao’s former students are at –Xerox Palo Alto Labs –USC Information Sciences Inst –Stanford Research Inst –CMU Robotics Inst –IBM India Research Labs G Other places include –NASA –Honeywell –Lockheed Martin –BBN –General Dynamics –MBARI –Google (who will then convert them into search hackers ;-)

CSE 574: Planning & Learning Subbarao Kambhampati Modeling (Deterministic) Planning Problems: Actions, States, Correctness

CSE 574: Planning & Learning Subbarao Kambhampati Transition Sytems Perspective G We can think of the agent-environment dynamics in terms of the transition systems –A transition system is a 2-tuple where »S is a set of states »A is a set of actions, with each action a being a subset of S X S –Transition systems can be seen as graphs with states corresponding to nodes, and actions corresponding to edges »If transitions are not deterministic, then the edges will be “hyper- edges”—i.e. will connect sets of states to sets of states –The agent may know that its initial state is some subset S’ of S »If the environment is not fully observable, then |S’|>1. –It may consider some subset Sg of S as desirable states –Finding a plan is equivalent to finding (shortest) paths in the graph corresponding to the transition system »Search graph is the same as transition graph for deterministic planning »For non-deterministic actions and/or partially observable environments, the search is in the space of sets of states (called belief states 2 S )

CSE 574: Planning & Learning Subbarao Kambhampati Transition System Models A transition system is a two tuple Where S is a set of “states” A is a set of “transitions” each transition a is a subset of SXS --If a is a (partial) function then deterministic transition --otherwise, it is a “non-deterministic” transition --It is a stochastic transition If there are probabilities associated with each state a takes s to --Finding plans becomes is equivalent to finding “paths” in the transition system Transition system models are called “Explicit state-space” models In general, we would like to represent the transition systems more compactly e.g. State variable representation of states. These latter are called “Factored” models Each action in this model can be Represented by incidence matrices (e.g. below) The set of all possible transitions Will then simply be the SUM of the Individual incidence matrices Transitions entailed by a sequence of actions will be given by the (matrix) multiplication of the incidence matrices

CSE 574: Planning & Learning Subbarao Kambhampati Manipulating Transition Systems Reachable states can be computed this way

CSE 574: Planning & Learning Subbarao Kambhampati MDPs as general cases of transition systems G An MDP (Markov Decision Process) is a general (deterministic or non-deterministic) transition system where the states have “Rewards” –In the special case, only a certain set of “goal states” will have high rewards, and everything else will have no rewards –In the general case, all states can have varying amount of rewards G Planning, in the context of MDPs, will be to find a “policy” (a mapping from states to actions) that has the maximal expected reward G We will talk about MDPs later in the semester

CSE 574: Planning & Learning Subbarao Kambhampati Problems with transition systems G Transition systems are a great conceptual tool to understand the differences between the various planning problems G …However direct manipulation of transition systems tends to be too cumbersome –The size of the explicit graph corresponding to a transition system is often very large (see Homework 1 problem 1) –The remedy is to provide “compact” representations for transition systems »Start by explicating the structure of the “states” l e.g. states specified in terms of state variables »Represent actions not as incidence matrices but rather functions specified directly in terms of the state variables l An action will work in any state where some state variables have certain values. When it works, it will change the values of certain (other) state variables

CSE 574: Planning & Learning Subbarao Kambhampati State-Variable Models GStates are modeled in terms of (binary) state-variables -- Complete initial state, partial goal state GActions are modeled as state transformation functions -- Syntax: ADL language (Pednault) -- Apply(A,S) = (S \ eff(A)) + eff(A) (If Precond(A) hold in S) Load(o 1 ) In(o 1 ) At(o 1,l 1 ), At(R,l 1 ) At(R,E) Fly() At(R,M), ¬At(R,E)  x In(x)  At(x,M) & ¬At(x, E) Unload(o 1 ) In(o 1 ) ¬In(o 1 ) Earth At(A,E), At(B,E),At(R,E) At(A,M),At(B,M) ¬In(A), ¬In(B) Effects Prec. Appolo 13

CSE 574: Planning & Learning Subbarao Kambhampati Blocks world State variables: Ontable(x) On(x,y) Clear(x) hand-empty holding(x) Stack(x,y) Prec: holding(x), clear(y) eff: on(x,y), ~cl(y), ~holding(x), hand-empty Unstack(x,y) Prec: on(x,y),hand-empty,cl(x) eff: holding(x),~clear(x),clear(y),~hand-empty Pickup(x) Prec: hand-empty,clear(x),ontable(x) eff: holding(x),~ontable(x),~hand-empty,~Clear(x) Putdown(x) Prec: holding(x) eff: Ontable(x), hand-empty,clear(x),~holding(x) Initial state: Complete specification of T/F values to state variables --By convention, variables with F values are omitted Goal state: A partial specification of the desired state variable/value combinations Init: Ontable(A),Ontable(B), Clear(A), Clear(B), hand-empty Goal: ~clear(B), hand-empty

CSE 574: Planning & Learning Subbarao Kambhampati Why is this more compact? (than explicit transition systems) G In explicit transition systems actions are represented as state-to-state transitions where in each action will be represented by an incidence matrix of size |S|x|S| G In state-variable model, actions are represented only in terms of state variables whose values they care about, and whose value they affect. G Consider a state space of 1024 states. It can be represented by log 2 1024=10 state variables. If an action needs variable v1 to be true and makes v7 to be false, it can be represented by just 2 bits (instead of a 1024x1024 matrix) –Of course, if the action has a complicated mapping from states to states, in the worst case the action rep will be just as large –The assumption being made here is that the actions will have effects on a small number of state variables.

CSE 574: Planning & Learning Subbarao Kambhampati Some notes on action representation G STRIPS Assumption: Actions must specify all the state variables whose values they change... G No disjunction allowed in effects –Conditional effects are NOT disjunctive »(antecedent refers to the previous state & consequent refers to the next state) G Quantification is over finite universes –essentially syntactic sugaring G All actions can be compiled down to a canonical representation where preconditions and effects are propositional –Exponential blow-up may occur (e.g removing conditional effects) »We will assume the canonical representation

CSE 574: Planning & Learning Subbarao Kambhampati Pros & Cons of Compiling to Canonical Action Representation (Added) G As mentioned, it is possible to compile down ADL actions into STRIPS actions –Quantification is written as conjunctions/disjunctions over finite universes –Actions with conditional effects are compiled into multiple (exponentially more) actions without conditional effects –Actions with disjunctive effects are compiled into multiple actions, each of which take one of the disjuncts as their preconditions –(Domain axioms can be compiled down into the individual effects of the actions; so all actions satisfy STRIPS assumption) G Compilation is not always a win-win. –By compiling down to canonical form, we can concentrate on highly efficient planning for canonical actions »However, often compilation leads to an exponential blowup and makes it harder to exploit the structure of the domain –By leaving actions in non-canonical form, we can often do more compact encoding of the domains as well as more efficient search »However, we will have to continually extend planning algorithms to handle these representations The basic tradeoff here is akin to the RISC vs. SISC tradeoff.. And we will re-visit it again when we consider compiling planning problems themselves down into other combinatorial substrates such as CSP, ILP, SAT etc..

CSE 574: Planning & Learning Subbarao Kambhampati Boolean vs. Multi-valued fluents G The state variables (“fluents”) in the “factored” representations can be either boolean or multi-valued –Most planners have conventionally used boolean fluents G Many domains are sometimes more compactly and naturally represented in terms of multi-valued variables. G Given a multi-valued state-variable representation, it is easy to compile it down to a boolean state-variable representation. –Each D-domain multi-valued fluent gets translated to D boolean variables of the form “fluent-has-the-value-v” –Complete conversion should also put in a domain axiom to the effect that only one of those D boolean variables can be true in any state »Unfortunately, since ordinary STRIPS representation doesn’t allow domain axioms, this piece of information is omitted during conversion (forcing planners to figure this out through costly search failures) G Conversion from boolean to multi-valued representation is trickier. –Need to find “cliques” of boolean variables where no more than one variable in the clique can be true at the same time; and convert that clique into a multi-valued state variable.

CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati.

Similar presentations

Presentation on theme: "CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati.

Similar presentations

Presentation on theme: "CSE 574: Planning & Learning Subbarao Kambhampati CSE 574 Planning & Learning (which is actually more of the former and less of the latter) Subbarao Kambhampati."— Presentation transcript:

Similar presentations

About project

Feedback