Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Similar presentations


Presentation on theme: "Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California"— Presentation transcript:

1 Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California A Cognitive Architecture for Physical Agents Thanks to Dongkyu Choi, Kirstin Cummings, Negin Nejati, Seth Rogers, Stephanie Sage, and Daniel Shapiro for contributions to this research.

2 Assumptions about Cognitive Architectures 1.A cognitive architecture specifies the infrastructure that holds constant over domains, as opposed to knowledge, which varies. 2.A cognitive architecture focuses on functional structures and processes, not on the knowledge or implementation levels. 3.A cognitive architecture commits to representations and organizations of knowledge and processes that operate on them. 4.A cognitive architecture comes with a programming language for encoding knowledge and constructing intelligent systems. 5.A cognitive architecture should demonstrate generality and flexibility rather than success on a single application domain.

3 Examples of Cognitive Architectures ACTE through ACT-R (Anderson, 1976; Anderson, 1993) ACTE through ACT-R (Anderson, 1976; Anderson, 1993) Soar (Laird, Rosenbloom, & Newell, 1984; Newell, 1990) Soar (Laird, Rosenbloom, & Newell, 1984; Newell, 1990) P RODIGY (Minton & Carbonell., 1986; Veloso et al., 1995) P RODIGY (Minton & Carbonell., 1986; Veloso et al., 1995) PRS (Georgeff & Lansky, 1987) PRS (Georgeff & Lansky, 1987) 3T (Gat, 1991; Bonasso et al., 1997) 3T (Gat, 1991; Bonasso et al., 1997) EPIC (Kieras & Meyer, 1997) EPIC (Kieras & Meyer, 1997) APEX (Freed et al., 1998) APEX (Freed et al., 1998) Some cognitive architectures produced over 30 years include: However, these systems cover only a small region of the space of possible architectures.

4 Goals of the I CARUS Project focuses on physical and embodied agents; focuses on physical and embodied agents; integrates perception and action with cognition; integrates perception and action with cognition; unifies reactive execution with problem solving; unifies reactive execution with problem solving; combines symbolic structures with numeric utilities; combines symbolic structures with numeric utilities; learns structures and utilities in a cumulative manner. learns structures and utilities in a cumulative manner. We are developing I CARUS, a new cognitive architecture that: In this talk, I report on our recent progress toward these goals.

5 Some Target Phenomena typically continue tasks to completion but can shift to other tasks if advantageous; typically continue tasks to completion but can shift to other tasks if advantageous; prefer to engage in routine behavior but can solve novel problems when required; prefer to engage in routine behavior but can solve novel problems when required; learn new concepts and skills in a cumulative fashion that builds on the results of previous learning. learn new concepts and skills in a cumulative fashion that builds on the results of previous learning. We intend for I CARUS to model high-level phenomena rather than detailed performance effects. For instance, we know that, in complex domains, humans: Such issues have received much less attention than they deserve.

6 Theoretical Commitments of I CARUS 1.Cognitive reality of physical objects 2.Cognitive separation of categories and skills 3.Primacy of categorization and skill execution 4.Hierarchical organization of long-term memory 5.Correspondence of long-term/short-term structures 6.Modulation of symbolic structures with utility functions Our designs for I CARUS have been guided by six principles: These ideas distinguish I CARUS from most other architectures.

7 Overview of the I CARUS Architecture * Long-TermConceptualMemory Long-Term Skill Memory Short-TermConceptualMemory Short-Term Categorization and Inference SkillExecution Perception Environment PerceptualBuffer * without learning Means-EndsAnalysis MotorBuffer SkillRetrieval

8 Some Concepts from the Blocks World (on (?block1 ?block2) :percepts((block ?block1 xpos ?x1 ypos ?y1) :percepts((block ?block1 xpos ?x1 ypos ?y1) (block ?block2 xpos ?x2 ypos ?y2 height ?h2)) (block ?block2 xpos ?x2 ypos ?y2 height ?h2)) :tests((equal ?x1 ?x2) :tests((equal ?x1 ?x2) (>= ?y1 ?y2) (>= ?y1 ?y2) (<= ?y1 (+ ?y2 ?h2))) ) (<= ?y1 (+ ?y2 ?h2))) ) (clear (?block) :percepts((block ?block)) :percepts((block ?block)) :negatives((on ?other ?block)) ) :negatives((on ?other ?block)) ) (unstackable (?block ?from) :percepts((block ?block) (block ?from)) :percepts((block ?block) (block ?from)) :positives((on ?block ?from) :positives((on ?block ?from) (clear ?block) (clear ?block) (hand-empty)) ) (hand-empty)) )

9 (pickup (?block ?from) :percepts((block ?block xpos ?x) :percepts((block ?block xpos ?x) (table ?from height ?h)) (table ?from height ?h)) :start((pickupable ?block ?from)) :start((pickupable ?block ?from)) :requires( ) :requires( ) :actions(( * move ?block ?x (+ ?h 10))) :actions(( * move ?block ?x (+ ?h 10))) :effects((holding ?block)) :effects((holding ?block)) :value1.0 ) :value1.0 ) (stack (?block ?to) :percepts((block ?block) :percepts((block ?block) (block ?to xpos ?x ypos ?y height ?h)) (block ?to xpos ?x ypos ?y height ?h)) :start((stackable ?block ?to)) :start((stackable ?block ?to)) :requires( ) :requires( ) :actions(( * move ?block ?x (+ ?y ?h))) :actions(( * move ?block ?x (+ ?y ?h))) :effects((on ?block ?to) :effects((on ?block ?to) (hand-empty)) (hand-empty)) :value1.0 ) :value1.0 ) Primitive Skills from the Blocks World

10 (puton (?block ?from ?to) :percepts((block ?block) (block ?from) (table ?to)) :percepts((block ?block) (block ?from) (table ?to)) :start((ontable ?block ?from) (clear ?block) :start((ontable ?block ?from) (clear ?block) (hand-empty) (clear ?to)) (hand-empty) (clear ?to)) :requires( ) :requires( ) :ordered((pickup ?block ?from) (stack ?block ?to)) :ordered((pickup ?block ?from) (stack ?block ?to)) :effects((on ?block ?to)) :effects((on ?block ?to)) :value1.0 ) :value1.0 ) (puton (?block ?from ?to) :percepts((block ?block) (block ?from) (block ?to)) :percepts((block ?block) (block ?from) (block ?to)) :start((on ?block ?from) (clear ?block) :start((on ?block ?from) (clear ?block) (hand-empty) (clear ?to)) (hand-empty) (clear ?to)) :requires( ) :requires( ) :ordered((unstack ?block ?from) (stack ?block ?to)) :ordered((unstack ?block ?from) (stack ?block ?to)) :effects((on ?block ?to)) :effects((on ?block ?to)) :value1.0 ) :value1.0 ) A Nonprimitive Skill from the Blocks World

11 Hierarchical Organization of Memory concepts can refer to percepts and to other concepts; concepts can refer to percepts and to other concepts; skills refer to percepts, to concepts, and to other skills. skills refer to percepts, to concepts, and to other skills. I CARUS long-term memories are organized into hierarchies: Conceptual memory is similar to a Rete network, but each node represents a meaningful category. Different expansions for skills and concepts also make them similar to Horn clause programs. These hierarchies are encoded by direct reference, rather than through working-memory elements, as in ACT and Soar.

12 (self g001 speed 32 wheel-angle -0.2 fuel-level 0.4) (corner g008 r 15.3 theta 0.25 street-dist 12.7) (corner g011 r 18.4 theta street-dist 12.7) (corner g017 r 7.9 theta 1.08 street-dist 5.2) (lane-line g019 dist 1.63 angle -0.07) (street g025 name campus address 1423) (package g029 street panama cross campus address 2134) perceptual buffer I CARUS Short-Term Memories (deliver-package g029) (avoid-collisions g001) short-term skill memory (ahead-right-corner g008) (ahead-left-corner g011) (behind-right-corner g017) (approaching g001 g023) (opposite-direction g001 g023) (parallel-to-line g001 g019) (on-cross-street g001 g029) short-term concept memory

13 Categorization and Inference On each cycle, perception deposits object descriptions into the perceptual buffer. I CARUS matches its concepts against the contents of this buffer. Categorization proceeds in an automatic, bottom-up manner, much as in a Rete matcher. This process can be viewed as a form of monotonic inference that adds concept instances to short-term memory. Long-TermConceptualMemory Short-TermConceptualMemory Categorization and Inference Perception PerceptualBuffer

14 Retrieving and Matching Skill Paths On each cycle, I CARUS finds all paths through its skill hierarchy which: begin with an instance in skill STM; begin with an instance in skill STM; have start and requires fields that match; have start and requires fields that match; have effects fields that do not match. have effects fields that do not match. Each instantiated path produced in this way terminates in an executable action. I CARUS adds these candidate actions to its motor buffer for possible execution. Short-TermConceptualMemory Short-Term Skill Memory PerceptualBuffer MotorBuffer SkillRetrieval Long-Term

15 Retrieving and Matching Skill Paths skills skill expansions Each path through the skill hierarchy starts at an intention and ends at a primitive skill instance.

16 Evaluating and Executing Skills For each selected path, I CARUS computes a utility by summing the values of each skill along that path. For each path, in order of decreasing utility: If required resources are available, execute actions; If required resources are available, execute actions; If executed, commit the resources for this cycle. If executed, commit the resources for this cycle. These actions alter the environment, which affects the perceptual buffer and thus conceptual memory. Short-Term Skill Memory SkillExecution Environment MotorBuffer

17 Modulating Utilities with Persistence Fully reactive behavior is not always desirable when one is completing a task that requires extended activity. In response, I CARUS retains the most recently executed path for each skill instance in short-term memory, For each matched path, it modulates the paths utility to be U = U (1 + p s i=1 k i / d j=1 k j ), U = U (1 + p s i=1 k i / d j=1 k j ), Where U is the original utility, d is the depth of the candidate path, s is the number of steps it shares with the previous path, p is a persistence factor, and 0 < k < 1 is a decay term. The greater the persistence factor, the greater the agents bias toward continuing to execute skills it has already initiated.

18 An Experimental Study of Persistence To evaluate the influence of persistence on agent behavior, we: created skills for in-city driving and package delivery; created skills for in-city driving and package delivery; gave the agent top-level intentions to deliver three packages; gave the agent top-level intentions to deliver three packages; generated five distinct sets of package delivery tasks; generated five distinct sets of package delivery tasks; varied the persistence level used in pursuing these tasks. varied the persistence level used in pursuing these tasks. We ran the system on each task and, for each level, averaged the number of cycles I CARUS needed to deliver all the packages. We predicted that an intermediate persistence level would give the best results.

19 Effects of Persistence on Delivery Time We also found delivery time scaled linearly with size of the city.

20 Means-Ends Problem Solving When a selected skill cannot be executed, I CARUS invokes a variant of means-ends analysis that: finds concepts which, if true, would let execution continue; finds concepts which, if true, would let execution continue; adds one of the unsatisfied concept instances to a goal stack; adds one of the unsatisfied concept instances to a goal stack; chains off a relevant skill or off the concepts definition; chains off a relevant skill or off the concepts definition; backtracks when no further chaining is feasible. backtracks when no further chaining is feasible. Each step takes one cycle and, unlike most AI planning systems, execution occurs whenever possible. Long-Term Skill Memory Short-Term Means-EndsAnalysis

21 An Abstract Means-Ends Trace

22 When it attempts to execute S2 to achieve concept O, but must first execute S1 to achieve the start conditions of S2, it creates a skill with S1 and S2 as subskills and O as the effect. it creates a skill with S1 and S2 as subskills and O as the effect. When it attempts to achieve concept O by achieving subconcepts {O1, …, On}, which it does by executing {S1, …, Sn}, it creates a skill with S1, …, Sn as subskills and O as the effect. it creates a skill with S1, …, Sn as subskills and O as the effect. I CARUS learns hierarchical skills from traces of problem solutions. In both cases, I CARUS also defines new concepts to serve as start conditions for the skills that ensure they have the desired effect. This method is akin to macro learning and production compilation, but it constructs a reactive skill hierarchy rather than flat rules. Learning Skills from Means-Ends Traces

23 A Learning Skills from Means-Ends Traces

24 A B Learning Skills from Means-Ends Traces

25 A B C Learning Skills from Means-Ends Traces

26 A B D C Learning Skills from Means-Ends Traces

27 A B D E C Learning Skills from Means-Ends Traces

28 An Experimental Study of Skill Learning To evaluate I CARUS ability to learn hierarchical skills, we: created primitive concepts and skills for the blocks world; created primitive concepts and skills for the blocks world; gave the agent problems in order of increasing complexity; gave the agent problems in order of increasing complexity; sampled randomly from 200 different training orders; sampled randomly from 200 different training orders; ran the architecture with learning turned on and off. ran the architecture with learning turned on and off. For each condition and experience level, we counted the number of cycles needed to solve a problems and averaged the results.

29 Effects of Skill Learning in Blocks World

30 Utility Functions in I CARUS describe expected utility in terms of perceived object attributes; describe expected utility in terms of perceived object attributes; are summed along each path through the skill hierarchy; are summed along each path through the skill hierarchy; are used to decide which paths to execute on each cycle. are used to decide which paths to execute on each cycle. Skill expansions in I CARUS have associated utility functions that: Taking entire paths into account produces context effects, in that an action has different utility depending on its calling skills. An earlier version of I CARUS acquired these value functions from delayed reward, using a hierarchical variant of Q learning.

31 Extended Utility Functions associates reward functions with individual concepts; associates reward functions with individual concepts; uses skill effects and durations to compute expected values; uses skill effects and durations to compute expected values; updates probability of success on completion or abandonment; updates probability of success on completion or abandonment; updates the expected duration of each skill upon completion. updates the expected duration of each skill upon completion. We are implementing an extended version of I CARUS that: This model-based approach to learning from delayed reward should be much faster than standard methods. However, it views reward as internal to the agent, rather than as coming from the environment.

32 Domains Studied to Date In-city driving In-city driving Pole balancing Pole balancing Tower of Hanoi Tower of Hanoi Multi-column subtraction Multi-column subtraction Peg solitaire Peg solitaire To demonstrate generality, we have developed initial I CARUS programs for a number of domains, including: Each of these connects to a simulated environment that contains objects the system can perceive and effect.

33 Intellectual Precursors earlier research on integrated cognitive architectures earlier research on integrated cognitive architectures especially influenced by ACT, Soar, and Prodigy especially influenced by ACT, Soar, and Prodigy earlier work on architectures for reactive control earlier work on architectures for reactive control especially universal plans and teleoreactive programs especially universal plans and teleoreactive programs research on learning macro-operators and search-control rules research on learning macro-operators and search-control rules decision theory and decision analysis decision theory and decision analysis previous versions of I CARUS (going back to 1988). previous versions of I CARUS (going back to 1988). I CARUS design has been influenced by many previous efforts: However, the framework combines and extends ideas from its various predecessors in novel ways.

34 Directions for Future Research forward chaining and mental simulation of skills; forward chaining and mental simulation of skills; learning durations and success rates to support lookahead; learning durations and success rates to support lookahead; allocation of scarce resources like perceptual attention; allocation of scarce resources like perceptual attention; probabilistic encoding and matching of Boolean concepts; probabilistic encoding and matching of Boolean concepts; flexible recognition of skills executed by other agents; flexible recognition of skills executed by other agents; extension of short-term memory to store episodic traces. extension of short-term memory to store episodic traces. Future work on I CARUS should introduce additional methods for: Taken together, these features should make I CARUS a more general and powerful cognitive architecture.

35 Concluding Remarks includes separate memories for concepts and skills; includes separate memories for concepts and skills; organizes both memories in a hierarchical fashion; organizes both memories in a hierarchical fashion; modulates reactive execution with persistence; modulates reactive execution with persistence; augments routine behavior with problem solving; and augments routine behavior with problem solving; and learns new skills and concepts in a cumulative manner. learns new skills and concepts in a cumulative manner. I CARUS is an integrated architecture for intelligent agents that: This constellation of concerns distinguishes I CARUS from other research on integrated architectures.

36 End of Presentation

37 In-City Driving: A Cognitive Task for Embodied Agents

38 Overview of the I CARUS Architecture* Long-TermConceptualMemory Long-Term Skill Memory Short-TermConceptualMemory Short-Term Nomination of Skills Categorization and Inference Means-Ends Problem Solving Skill Selection/ Execution Abandonment of Skills Perception of Environment Environment PerceptualBuffer * without learning

39 Examples of Long-Term Concepts (corner-ahead-left (?corner) :percepts((corner ?corner r ?r theta ?theta)) :percepts((corner ?corner r ?r theta ?theta)) :tests((< ?theta 0) :tests((< ?theta 0) (>= ?theta )) (>= ?theta )) :value(+ ( * 5.6 ?r) ( * 3.1 ?theta)) ) :value(+ ( * 5.6 ?r) ( * 3.1 ?theta)) ) (in-intersection (?self) :percepts((self ?self) :percepts((self ?self) (corner ?ncorner street-dist ?sd)) (corner ?ncorner street-dist ?sd)) :positives((near-block-corner ?ncorner) :positives((near-block-corner ?ncorner) (corner-straight-ahead ?scorner)) (corner-straight-ahead ?scorner)) :negatives((far-block-corner ?fcorner)) :negatives((far-block-corner ?fcorner)) :tests((< ?sd 0.0)) :tests((< ?sd 0.0)) :value-10.0 ) :value-10.0 )

40 Examples of Long-Term Skills (make-right-turn (?self ?corner) :objective((behind-right-corner ?corner)) :objective((behind-right-corner ?corner)) :start((in-rightmost-lane ?self) :start((in-rightmost-lane ?self) (ahead-right-corner ?corner) (ahead-right-corner ?corner) (at-turning-distance ?corner)) (at-turning-distance ?corner)) :requires((near-block-corner ?corner) :requires((near-block-corner ?corner) (at-turning-speed ?self)) (at-turning-speed ?self)) :ordered((begin-right-turn ?self ?corner) :ordered((begin-right-turn ?self ?corner) (end-right-turn ?self ?corner)) (end-right-turn ?self ?corner)) :value30.0 ) :value30.0 ) (slow-for-intersection (?self) :percepts((self ?self speed ?speed) :percepts((self ?self speed ?speed) (corner ?corner street-dist ?d)) (corner ?corner street-dist ?d)) :objective((slow-enough-intersection ?self)) :objective((slow-enough-intersection ?self)) :requires((near-block-corner ?corner) :requires((near-block-corner ?corner) :actions(( * slow-down)) :actions(( * slow-down)) :value(+ ( * -5.2 ?d) ( * 20.3 ?speed)) ) :value(+ ( * -5.2 ?d) ( * 20.3 ?speed)) )

41 Retrieving and Matching Skill Paths

42 Long-Term Skill Memory Short-Term Nomination of Skills Abandonment Short-TermConceptualMemory Skill Nomination and Abandonment I CARUS adds skill instances to short-term skill memory that: refer to concept instances or goals in short-term memory; refer to concept instances or goals in short-term memory; have expected utility > agents discounted past utility. have expected utility > agents discounted past utility. I CARUS removes a skill when its expected utility << past utility.

43 P = 0.74 D = 50 P = 0.90 D = 20 P = 0.90 D = 16 P = 0.99 D = 6 P = 0.95 D = 8 P = 0.99 D = 6 P = 0.90 D = 10 P = 0.90 D = 8 P = 0.95 D = 12 P = 0.90 D = 14 S10 S9S7S8 S1 S6 S5 S4S3S2 S1S2S3S4S5S6 S7S9S8 S10 time Learning from Skill Subgoaling

44 Intelligent Assistance for Office Planning assist users in planning and scheduling trips; assist users in planning and scheduling trips; assist users in planning and scheduling meetings; assist users in planning and scheduling meetings; accept advice about how to accomplish these tasks; accept advice about how to accomplish these tasks; learn new skills from such interactions; learn new skills from such interactions; infer the preferences of individual users. infer the preferences of individual users. For a recent DARPA project, we are developing systems that: We hope to extend the I CARUS framework to support these new performance and learning tasks.

45 Some Necessary Extensions Representation Representation time needed to execute skills time needed to execute skills resources required by each skill resources required by each skill plans (specfic instances of expanded skills) plans (specfic instances of expanded skills) episodes (past instances of executed skills) episodes (past instances of executed skills) Performance Performance plan generation plan generation plan revision plan revision advice taking advice taking Learning Learning new skills from advice new skills from advice preferences from interactions preferences from interactions Directions in which we need to extend I CARUS include: These additions will make I CARUS a more complete architecture for intelligent agents.


Download ppt "Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California"

Similar presentations


Ads by Google