Presentation on theme: "Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Institute for the Study of Learning and Expertise Palo Alto,"— Presentation transcript:
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona USA Institute for the Study of Learning and Expertise Palo Alto, California USA Challenges in Learning Plan Knowledge Thanks to D. Choi, T. Konik, U. Kutur, N. Li, D. Nau, N. Nejati, and D. Shapiro for their many contributions. This talk reports research funded by grants from DARPA IPTO, which is not responsible for its contents.
Outline of the Talk 1. 1.Brief review of learning plan knowledge 2. 2.Learning from different sources 3. 3.Learning for new performance tasks 4. 4.Learning in different scenarios 5. 5.Learning with novel representations 6. 6.Some responses to these challenges 7. 7.Concluding remarks
The Problem: Learning Plan Knowledge Given: Basic knowledge about some action-oriented domain. (e.g., state/goal representation, operators) Given: A set of training problems (e.g., initial states, goals, and possibly more) Given: Some performance task that the system must carry out. Given: A performance mechanism that can use knowledge to carry out that task. Learn: Knowledge that will let the system improve its ability to perform new tasks from the same or similar domain.
Topics Not Covered This talk will range widely, but I will not cover issues related to: Learning with impoverished representations Interested in human-like, intelligent behavior Most work on reinforcement learning is irrelevant Acquiring basic knowledge about domain Interested in building on such knowledge Most work on learning action models is too basic Nonincremental learning from large data sets Interested in human-like incremental learning This rules out most data-mining approaches
Historical Topics There has been a long history of work on learning plan knowledge: Forming macro-operators Fikes et al. (1972), Iba (1988), Mooney (1989), Botea et al. (2005) Inducing forward-chaining control rules Anzai & Simon (1978) Mitchell et al. (1981), Langley (1982) Learning control rules analytically Laird et al. (1986), Mitchell et al. (1986), Minton (1988) Problem solving by analogy Veloso (1994), Jones & Langley (1995), VanLehn & Jones (1994) Inducing control rules for partial-order plans Kautukam & Kambhampati (1994), Estlin & Mooney (1997)
Historical Trends Work on learning plan knowledge has seen many shifts in fashion: Early hope for improving problem solvers/planners (1978 1985) Excitement/confusion introduced by EBL movement (1986 1992) Some doubts raised by the utility problem (1988 1993) Mass migration to reinforcement learning paradigm (1993 2003) Resurgence of interest in learning plan knowledge (2004 present) Throughout these changes, the problems and potential of learning plan knowledge have remained.
Traditional Sources of Information Most research on learning for planning has assumed the system uses search to generate: Successful paths that achieve the goals (positive instances) Failed paths that do not achieve the goals (negative instances) Alternative paths of different desirability (preferred instances) But humans learn from other sources of information and our AI systems should as well.
Challenge: Learn from Many Sources There has been relatively little research on plan learning from: Demonstrations of solved problems (Nejati et al., 2006) Explicit instruction from teacher (Blythe et al., 2007) Advice or hints from teacher (Mostow, 1983) Mental simulations or daydreaming (Mueller, 1985) Undesirable side effects during execution Humans learn from all of these sources, and our learning systems should support the same capabilities. Moreover, we should develop single systems that integrate plan knowledge learned from all of them (Oblinger, 2006).
Traditional Performance Tasks Most research on learning for planning has assumed the system aims to improve: The efficiency of plan generation (nodes expanded, time) The quality of generated plans (path length, utility) The coverage of plan knowledge (problems solved) But humans learn and use plan knowledge for other purposes that are just as valid.
Challenge: Learn for Plan Execution Many important domains require executing plan knowledge in some environment that includes: operators with likely but nonguaranteed effects external events not directly under the agents control other agents that are pursuing their own goals Urban driving is one setting that raises all three of these issues. Complex board games like chess, although deterministic, still require interleaving of planning and execution. We need more research on plan learning in contexts of this sort (e.g., Benson, 1995; Fern et al., 2004).
Challenge: Learn for Plan Understanding Another understudied problem is learning for plan understanding. Given: A partially observed sequence of states influenced by another agents actions. Given: Learned knowledge about how to achieve goals. Find: The other agents goals and the plans it is pursuing to achieve them. Plan understanding is important not only in complex games, but in military planning, politics, and other settings. This performance task suggests new learning problems, methods, and evaluation criteria.
Traditional Learning Scenarios Most research on learning for planning has assumed the system: Trains on problems from a given distribution / domain Tests on problems from the same distribution / domain Success depends on the extent to which the learner generalizes well to new problems from the same domain. But humans also use their learned plan knowledge in other, more flexible ways to improve performance.
Challenge: Cumulative Learning In complex domains, humans learn plan knowledge gradually: Starting with small, relatively easy problems Moving to complex problems after mastering simpler ones Later acquisitions build naturally on earlier experience, learning to cumulative learning. Our education system depends heavily on such vertical transfer of learned knowledge. We need more learning systems that demonstrate this form of cumulative improvement (e.g., Reddy & Tadepalli, 1997).
Challenge: Cross-Domain Transfer In other cases, humans exhibit a form of transfer that involves: Learning to solve problems in one domain Reusing this knowledge to solve problems in another domain that is superficially quite different Such cross-domain transfer is related to within-domain analogical reasoning, but it is far more challenging. In its extreme form, the two domains support similar solutions but have no shared symbols or predicates. We need more learning systems that demonstrate this radical form of knowledge reuse.
Traditional Learned Representations Most research on learning for planning has focused on learning: Control rules that reduce effective branching factor Macro-operators that reduce effective solution depth These grew naturally from representations used to create hand- crafted expert problem solvers. But now we have other representations of plan knowledge that suggest new learning tasks and methods. Nor does this refer to POMDPs, workflows, or other highly constrained formalisms.
Challenge: Learn HTNs the modularity and flexibility of search-control rules the large-scale structure of macro-operators Hierarchical task networks (HTNs) offer the most effective planning available, but they are expensive to build manually. HTNs provide an ideal target for learning because they have: Machine learning has automated the creation of expert classifiers. We should do the same for HTNs, which are effectively expert planning systems.
Challenge: Learn HTNs Given: Basic knowledge about some action-oriented domain Given: A set of training problems (initial states and goals) Given: Some performance task the system must carry out. Given: Some module that uses HTNs to perform this task Learn: An HTN that lets the system improve its performance on new tasks from the same or similar domain. We can define the task of learning hierarchical task networks as: We need more research on this important topic (e.g., Reddy & Tadepalli, 1997; Ilghami et al., 2005).
Some Responses acquire a constrained but important class of HTNs that one can use for both planning and reactive control from both successful problem solving and expert traces that extends naturally to support cross-domain transfer Our recent research attempts to respond to these challenges by developing methods that: Moreover, these ideas are embedded in an integrated architecture that supports many capabilities I CARUS (Langley, 2006).
Primitive Concept (assigned-mission ?patient ?mission) Nonprimitive Concept (patient-form-filled ?patient) Conceptual Knowledge in I CARUS Conceptual knowledge is cast as Horn clauses that specify relevant relations in the environment Memory is organized hierarchically Divided into primitive and non-primitive predicates
HTN Methods in I CARUS Similar to SHOP2 but methods indexed by goals they achieve Each method decomposes a goal into subgoals If a methods goal is active and its precondition is satisfied, then try to achieve its subgoals or apply its operators precondition concept HTN method HTN goal concept HTN method subgoal operator
Operators in I CARUS Effects Concept (arrival-time ?patient) Precondition Concept (patient ?p) and (travel-from ?p ?from) and (travel-to ?p ?to) Action (get-arrival-time ?patient ?from ?to) Operators describe low-level actions that agents can execute directly in the environment Preconditions: legal conditions for action execution Effects: expected changes when action is executed
Training Input: Expert Traces and Goals Expert demonstration traces Operators the expert uses and the resulting belief state State: Set of concept instances Goal is a concept instance in the final state I CARUS learns generalized skills that achieves similar goals Operator instance (get-arrival-time P2) Concept instance (assigned-flight P1 M1 ) State Goal concept (all-patients-arranged)
Learning Plan Knowledge from Demonstration Plan Knowledge If Impasse Problem ? Initial State goal LIGHT Demonstration Traces Background knowledge Reactive Executor Learned plan knowledge Concept definitions Operators States and actions HTNs Expert
Learning HTNs by Trace Analysis concepts actions
Operator Chaining Learning HTNs by Trace Analysis
Concept Chaining concepts actions Learning HTNs by Trace Analysis
concepts actions Transfer by Representation Mapping Predicate mappings Source domain Target domain
Challenge: Learn with Richer Goals HTNs are more expressive than classical plans (Errol et al., 1994). Our approach loses this advantage because it assumes the head of each method is a goal it achieves, but we can: This scheme should acquire the full class of HTNs while still retaining the tractability of goal-directed learning. Extend goal concepts to describe temporal behavior Revise the execution module to handle these structures Augment trace analysis to reason about temporal goals Learn new methods with temporal goals in their heads
Challenge: Extend Conceptual Vocabularies Given: A set of concepts used in goals, states, and methods Given: New methods acquired from sample solution traces Find:New concepts that produce improved performance as the result of future method learning. Our approach to learning HTNs relies on the concept hierarchy used to explain solution traces. The method would be less dependent if it extended this hierarchy: This would support a bootstrapped learner that invents predicates to describe states, goals, and methods.
Challenge: Extend Conceptual Vocabularies Define a new concept for the precondition of each method learned by chaining off a concept definition. Check traces for states in which this concept becomes true and learn methods to achieve it. During performance, treat each methods precondition as its first subgoal, which it can achieve if submethods are known. Our approach to utilizing predicate invention has three steps: This technique would make an HTN more complete by growing it downward, introducing nonterminal symbols as necessary. We have partially implemented this scheme and hope to report results at the next meeting.
Concluding Remarks: Research Style Clearly, there remain many open problems to address in learning plan knowledge. These involve new abilities, not improvements on existing ones, which suggests that we: These strategies will help us extend the reach of our learning systems, not just strengthen their grasp. Look at human behavior for ideas on how to proceed Develop integrated systems rather than component algorithms Demonstrate their behavior on challenging domains
Concluding Remarks: Evaluation We must evaluate our new plan learners, but this does not mean: More appropriate experiments would revolve around: Measuring their speed in generating plans Showing they run faster than existing systems Entering them in planning competitions Demonstrating entirely new functionalities Running lesion studies to show new features are required Using performance measures appropriate to the task These steps will produce conceptual advances and scientific understanding far more than will mindless bake-offs.
Concluding Remarks: Summary Learning plan knowledge is a key area with many open problems: These challenges will benefit from earlier work on plan learning, but they also require new ideas. Together, they should lead us toward learning systems that rival humans in their flexibility and power. Learning from traces, advice, and other sources Transferring knowledge within and across domains Learning and extending rich structures like HTNs
I CARUS Interleaves Execution and Problem Solving Executed plan Problem ? Skill Hierarchy Primitive Skills Reactive Execution impasse? Problem Solving yes no This organization reflects the psychological distinction between automatized and controlled behavior.