Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Value-Driven Architecture for Intelligent Behavior

Similar presentations


Presentation on theme: "A Value-Driven Architecture for Intelligent Behavior"— Presentation transcript:

1 A Value-Driven Architecture for Intelligent Behavior
Dan Shapiro Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California This research was supported in part by Grant NCC from NASA Ames Research Center. Thanks to Meg Aycinena, Michael Siliski, Stephanie Sage, and David Nicholas.

2 Assumptions about Cognitive Architectures
We should move beyond isolated phenomena and capabilities to develop complete intelligent agents. AI and cognitive psychology are close allies with distinct but related goals. A cognitive architecture specifies the infrastructure that holds constant over domains, as opposed to knowledge, which varies. We should model behavior at the level of functional structures and processes, not the knowledge or implementation levels. A cognitive architecture should commit to representations and organizations of knowledge and processes that operate on them. An architecture should come with a programming language for encoding knowledge and constructing intelligent systems. An architecture should demonstrate generality and flexibility rather than success on a single application domain.

3 Examples of Cognitive Architectures
Some f the cognitive architectures produced over 30 years include: ACTE through ACT-R (Anderson, 1976; Anderson, 1993) Soar (Laird, Rosenbloom, & Newell, 1984; Newell, 1990) Prodigy (Minton & Carbonell., 1986; Veloso et al., 1995) PRS (Georgeff & Lansky, 1987) 3T (Gat, 1991; Bonasso et al., 1997) EPIC (Kieras & Meyer, 1997) APEX (Freed et al., 1998) However, these systems cover only a small region of the space of possible architectures.

4 Goals of the ICARUS Project
We are developing the ICARUS architecture to support effective construction of intelligent autonomous agents that: integrate perception and action with cognition combine symbolic structures with affective values unify reactive behavior with deliberative problem solving learn from experience but benefit from domain knowledge In this talk, we report on our recent progress toward these goals.

5 Design Principles for ICARUS
Our designs for ICARUS have been guided by five principles: 1. Affective values pervade intelligent behavior; 2. Categorization has primacy over execution; 3. Execution has primacy over problem solving; 4. Tasks and intentions have internal origins; and 5. The agent’s reward is determined internally. These ideas distinguish ICARUS from most agent architectures.

6 A Cognitive Task for Physical Agents
(car self) (#back self 225) (#front self 235) (#speed self 60) (car brown-1) (#back brown-1 208) (#front brown-1 218) (#speed brown-1 65) (car green-2) (#back green-2 251) (#front green-2 261) (#speed green-2 72) (car orange-3) (#back orange-3 239) (#front orange-3 249) (#speed orange-3 56) (in-lane self B) (in-lane brown-1 A) (in-lane green-2 A) (in-lane orange-3 C)

7 Overview of the ICARUS Architecture*
Perceptual Buffer Long-Term Conceptual Memory Categorize / Compute Reward Short-Term Conceptual Memory Perceive Environment Select / Execute Skill Instance Environment Nominate Skill Instance Long-Term Skill Memory Short-Term Skill Memory Abandon Skill Instance Repair Skill Instance * without learning

8 Some Motivational Terminology
ICARUS relies on three quantitative measures related to motivation: Reward  the affective value produced on the current cycle. Past reward  the discounted sum of previous agent reward. Expected reward  the predicted discounted future reward. These let an ICARUS agent make decisions that take into account its past, present, and future affective responses.

9 Long-Term Conceptual Memory
ICARUS includes a long-term conceptual memory that contains: Boolean concepts that are either True or False; numeric concepts that have quantitative measures. These concepts may be either: primitive (corresponding to the results of sensory actions); defined as a conjunction of other concepts and predicates. Each Boolean concept includes an associated reward function. * Icarus’ concept memory is distinct from, and more basic than, skill memory, and provides the ultimate source of motivation.

10 Examples of Long-Term Concepts
(ahead-of (?car1 ?car2) (coming-from-behind (?car1 ?car2) :defn (car ?car1) (car ?car2) :defn (car ?car1) (car ?car2) (#back ?car1 ?back1) (in-lane ?car1 ?lane1) (#front ?car2 ?front2) (in-lane ?car2 ?lane2) (> ?back1 ?front2) (adjacent ?lane1 ?lane2) :reward (#dist-ahead ?car1 ?car2 ?d) (faster-than ?car1 ?car2) (#speed ?car2 ?s) (ahead-of ?car2 ?car1) ) :weights ( ) ) :reward (#dist-behind ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights (4.8 2.7) ) (clear-for (?lane ?car) :defn (lane ?lane ?left-line ?right-lane) (not (overlaps-and-adjacent ?car ?other)) (not (coming-from-behind ?car ?other)) (not (coming-from-behind ?other ?car)) :constant 10.0 )

11 A Sample Conceptual Hierarchy
lane adjacent faster-than #speed clear-for coming-from-behind ahead-of #yback car overlaps-and-adjacent overlaps #yfront in-lane

12 Long-Term Skill Memory
ICARUS includes a long-term skill memory in which skills contain: an :objective field that encodes the skill’s desired situation; a :start field that must hold for the skill to be initiated; a :requires field that must hold throughout the skill’s execution; an :ordered or :unordered field referring to subskills or actions; a :values field with numeric concepts to predict expected value; a :weights field indicating the weight on each numeric concept. These fields refer to terms stored in conceptual long-term memory. * Icarus’ skill memory encodes knowledge about how and why to act in the world, not about how to solve problems.

13 Examples of Long-Term Skills
(pass (?car1 ?car2 ?lane) (change-lanes (?car ?from ?to) :start (ahead-of ?car2 ?car1) :start (in-lane ?car ?from) (in-same-lane ?car1 ?car2) :objective (in-lane ?car ?to) :objective (ahead-of ?car1 ?car2) :requires (lane ?from ?shared ?right) (in-same-lane ?car1 ?car2) (lane ?to ?left ?shared) :requires (in-lane ?car2 ?lane) (clear-for ?to ?car) (adjacent ?lane ?to) :ordered (*shift-left) :ordered (speed&change ?car1 ?car2 ?lane ?to) :constant 0.0 ) (overtake ?car1 ?car2 ?lane) (change-lanes ?car1 ?to ?lane)) :values (#distance-ahead ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights ( ) )

14 ICARUS’ Short-Term Memories
Besides long-term memories, ICARUS stores dynamic structures in: a perceptual buffer with primitive Boolean and numeric concepts (car car-06), (in-lane car-06 lane-a), (#speed car-06 37) a short-term conceptual memory with matched concept instances (ahead-of car-06 self), (faster-than car-06 self), (clear-for lane-a self) a short-term skill memory with instances of skills that the agent intends to execute (speed-up-faster-than self car-06), (change-lanes lane-a lane-b) These encode temporary beliefs, intended actions, and their values. * Icarus’ short-term memories store specific, value-laden instances of long-term concepts and skills.

15 Categorization and Reward in ICARUS
Perceptual Buffer Long-Term Conceptual Memory Categorize / Compute Reward Short-Term Conceptual Memory Perceive Environment Environment Categorization occurs in an automatic, bottom-up manner. A reward is calculated for every matched Boolean concept. This reward is a linear function of associated numeric concepts. Total reward is the sum of rewards for all matched concepts. * Categorization and reward calculation are inextricably linked.

16 Skill Nomination and Abandonment
ICARUS adds skill instances to short-term skill memory that: refer to concept instances in short-term conceptual memory; have expected reward > agent’s discounted past reward . ICARUS removes a skill when its expected reward << past reward. Long-Term Skill Memory Short-Term Nominate Skill Instance Abandon Conceptual Memory * Nomination and abandonment create highly autonomous behavior that is motivated by the agent’s internal reward.

17 Skill Selection and Execution
Perceptual Buffer On each cycle, ICARUS executes the skill with highest expected reward. Selection invokes deep evaluation to find the action with the highest expected reward. Execution causes action, including sensing, which alters memory. Short-Term Conceptual Memory Perceive Environment Select / Execute Skill Instance Environment Short-Term Skill Memory * ICARUS makes value-based choices among skills, and among the alternative subskills and actions in each skill.

18 ICARUS’ Interpreter for Skill Execution
Objectives Subskill Start Requires (speed&change (?car1 ?car2 ?from ?to) :start (ahead-of ?car1 ?car2) (same-lane ?car1 ?car2) :objective (faster-than ?car1 ?car2) (different-lane ?car1 ?car2) :requires (in-lane ?car2 ?from) (adjacent ?from ?to) :unordered (*accelerate) (change-lanes ?car1 ?from ?to) ) Given Start: If not (Objectives) and Requires, then - choose among unordered Subskills - consider ordered Subskills * ICARUS skills have hierarchical structure, and the interpreter uses a reactive control loop to identify the most valuable action.

19 Cognitive Repair of Skills
ICARUS seeks to repair skills whose requirements do not hold by: finding concepts that, if true, would let execution continue; selecting the concept that is most important to repair; and nominating a skill with objectives that include the concept. Repair takes one cycle and adds at most one skill instance to memory. * This backward chaining is similar to means-ends analysis, but it supports execution rather than planning. Long-Term Skill Memory Short-Term Skill Memory Repair Skill Instance

20 Revising Expected Reward Functions
ICARUS uses a hierarchical variant of Q learning to revise estimated reward functions based on internally computed rewards: pass pass R(t) change-lanes speed&change *shift-left *accelerate Update Q(S) =  •  with R(t), Q(s´) * This method learns 100 times faster than nonhierarchical ones.

21 Intellectual Precursors
Our work on ICARUS has been influenced by many previous efforts: earlier research on integrated cognitive architectures especially influenced by ACT, Soar, and Prodigy earlier work on architectures for reactive control especially universal plans and teleoreactive programs research on learning value functions from delayed reward especially hierarchical approaches to Q learning decision theory and decision analysis previous versions of ICARUS (going back to 1988). However, ICARUS combines and extends ideas from its various predecessors in novel ways.

22 Directions for Future Research
Future work on ICARUS should introduce additional methods for: forward chaining and mental simulation of skills; allocation of scarce resources and selective attention; probabilistic encoding and matching of Boolean concepts; flexible recognition of skills executed by other agents; caching of repairs to extend the skill hierarchy; revision of internal reward functions for concepts; and extension of short-term memory to store episodic traces. Taken together, these features should make ICARUS a more general and powerful architecture for constructing intelligent agents.

23 Concluding Remarks ICARUS is a novel integrated architecture for intelligent agents that: includes separate memories for concepts and skills; organizes concepts and skills in a hierarchical manner; associates affective values with all cognitive structures; calculates these affective values internally; combines reactive execution with cognitive repair; and uses expected values to nominate tasks and abandon them. This constellation of concerns distinguishes ICARUS from other research on integrated architectures.

24

25 Estimation of Expected Reward
Estimates are linear functions of numeric features in the :values field of a skill. These features are inherited down the calling tree from skills to subskills. pass speed&change *accelerate #speed #dist-ahead #speed, #dist-ahead, #room #room, #own-speed * The expected reward for an ICARUS skill depends on the agent’s intent and responds to its current beliefs.

26 A Cognitive Task for Physical Agents

27 Examples of Long-Term Concepts
(ahead-of (?car1 ?car2) (overlaps-and-adjacent (?car1 ?car2) :defn (car ?car1)(car ?car2) :defn (car ?car1) (car ?car2) (#yback ?car1 ?back1) (in-lane ?car1 ?lane1) (#yfront ?car2 ?front2) (in-lane ?car2 ?lane2) (> ?back1 ?front2) (adjacent ?lane1 ?lane2) :reward (#distance-ahead ?car1 ?car2 ?d) (overlaps ?car1 ?car2) ) (#speed ?car2 ?s) :weights ( ) ) (overlaps (?car1 ?car2) (coming-from-behind (?car1 ?car2) (#yfront ?car1 ?front1) (in-lane ?car1 ?lane1) (#yback ?car1 ?back1) (in-lane ?car2 ?lane2) (#yfront ?car2 ?front2) (adjacent ?lane1 ?lane2) (> ?front1 ?front2) (faster-than ?car1 ?car2) (> ?front2 ?back1) ) (ahead-of ?car2 ?car1) ) (faster-than (?car1 ?car2) (clear-for (?lane ?car) :defn (#speed ?car1 ?s1) :defn (lane ?lane ?left-line ?right-lane) (#speed ?car2 ?s2) (not (overlaps-and-adjacent ?car ?other)) (> ?s1 (+ ?s2 5)) ) (not (coming-from-behind ?car ?other)) (not (coming-from-behind ?other ?car)) )

28 Examples of Long-Term Skills
(pass (?car1 ?car2 ?lane) (speed&change (?car1 ?car2 ?from ?to) :start (ahead-of ?car2 ?car1) :start (ahead-of ?car1 ?car2) (in-same-lane ?car1 ?car2) (same-lane ?car1 ?car2) :objective (ahead-of ?car1 ?car2) :objective (faster-than ?car1 ?car2) (in-same-lane ?car1 ?car2) (different-lane ?car1 ?car2) :requires (in-lane ?car2 ?lane) :requires (in-lane ?car2 ?from) (adjacent ?lane ?to) (adjacent ?from ?to) :ordered (speed&change ?car1 ?car2 ?lane ?to) :unordered (speed-up-faster ?car1 ?car2) (overtake ?car1 ?car2 ?lane) (change-lanes ?car1 ?from ?to) (change-lanes ?car1 ?to ?lane)) :values (#distance-ahead ?car1 ?car2 ?d) (#speed ?car2 ?s) :weights ( ) ) (change-lanes (?car ?from ?to) (speed-up-faster (?car1 ?car2) :start (in-lane ?car ?from) :start (faster-than ?car2 ?car1) :objective (in-lane ?car ?to) :objective (faster-than ?car1 ?car2) :requires (lane ?from ?shared-line ?right-line) :requires ( ) (lane ?to ?left-line ?shared-line) :ordered (*accelerate)) (clear-for ?to ?car) :ordered (*shift-left))

29 A Long-Term Skill Hierarchy
speed&change speed-up-faster *accelerate pass overtake *wait *shift-right *shift-left change-lanes

30 Categorization in ICARUS
On each cycle, ICARUS’ conceptual recognition module: Applies preattentive sensors to refresh contents of the perceptual buffer Ships added, removed, and changed primitive concepts to hierarchy Propagates literals to add, remove, and update instances of defined concepts Adds, removes, and updates elements in short-term conceptual memory This bottom-up process is akin to matching in Rete networks and to inference in truth maintenance systems.

31 Calculation of Internal Reward
For each instance I of a matched concept C in long-term memory: Access each numeric concept N mentioned in C’s :reward field Find the corresponding concept instance J in short-term memory Multiply Q by weight W associated with N in the :weights field Extract the quantity Q associated with the concept instance J Add the products for each numeric concept to compute reward for I ICARUS sums the contributions from each matched Boolean concept to obtain the overall reward for the current cycle.

32 Nomination and Selection of Skills
On every cycle, ICARUS nominates a new task that it will pursue by: Accessing all skills that refer to concepts in short-term memory Generating instantiated candidates from variables bound in concepts Estimating the reward that each skill instance would produce if executed Adding best candidate to skill STM only if value more than past discounted reward ICARUS selects the skill instance with the highest value for execution.

33 Calculation of Expected Reward
For an instance I of skill S, ICARUS computes the expected affective value of executing I by either: Using the :values and :weights fields associated with that skill in the same way it computes reward for a concept instance; or accessing the :values and :weights fields associated with each executable subskill and returning the highest result. ICARUS uses the first process when selecting among skills and the second when determining how to execute a selected skill. Thus, skill selection relies on skills’ summary statistics, whereas execution has access to their detailed structures.

34 Repair of Selected Skills
When ICARUS cannot execute a selected skill, the repair process: Finds unmatched concepts that, if matched, would let execution continue Computes expected reward for each and selects concept C with the highest value Accesses every skill that includes concept C in its :objective field Computes expected reward for instances of these relevant skills Selects the skill instance with highest value and adds it to skill STM This backward-chaining process is similar to means-ends analysis, but it supports execution rather than planning.

35 Abandoning Nominated Skills
Besides nominating skills, ICARUS can decide to abandon them by: Updating the discounted average RA of the past overall rewards Computing the expected reward VI for each skill instance I in short-term memory Retaining the skill instance I only if VI is greater than 0.2  RA Thus, the agent gives up on pursuing a task when its expected value drops too low compared to recent experience.

36 Learning Hierarchical Control Policies
reward streams learned value functions (pass (?x) :start (behind ?x)(same-lane ?x) :objective (ahead ?x)(same-lane ?x) :requires (lane ?x ?l) :components ((speed-up-faster-than ?x) (change-lanes ?l ?k) (overtake ?x) (change-lanes ?k ?l))) (speed-up-faster-than (?x) :start (slower-than ?x) :objective (faster-than ?x) :requires ( ) :components ((accelerate))) (change-lanes (?l ?k) :start (lane self ?l) :objective (lane self ?k) :requires (left-of ?k ?l) :components ((shift-left))) (overtake (?x) :start (behind ?x)(different-lane ?x) :objective (ahead ?x) :requires (different-lane ?x)(faster-than ?x) Induction (pass (?x) :start (behind ?x)(same-lane ?x) :objective (ahead ?x)(same-lane ?x) :requires (lane ?x ?l) :components ((speed-up-faster-than ?x) (change-lanes ?l ?k) (overtake ?x) (change-lanes ?k ?l))) (speed-up-faster-than (?x) :start (slower-than ?x) :objective (faster-than ?x) :requires ( ) :components ((accelerate))) (change-lanes (?l ?k) :start (lane self ?l) :objective (lane self ?k) :requires (left-of ?k ?l) :components ((shift-left))) (overtake (?x) :start (behind ?x)(different-lane ?x) :objective (ahead ?x) :requires (different-lane ?x)(faster-than ?x) hierarchical skills


Download ppt "A Value-Driven Architecture for Intelligent Behavior"

Similar presentations


Ads by Google