Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Teleoreactive Logic Programs by Observation

Similar presentations


Presentation on theme: "Learning Teleoreactive Logic Programs by Observation"— Presentation transcript:

1 Learning Teleoreactive Logic Programs by Observation
Brian Salomaki, Dongkyu Choi, Negin Nejati, and Pat Langley Computational Learning Laboratory Stanford University, USA

2 Outline Motivation Overview Icarus: A reactive agent architecture
Learning via Problem Solving Learning by Observation Preliminary Results Related Work Future Work

3 Motivation An intelligent agent existing in a real world should be able to encounter many scenarios and pursue different goals. There are two main approaches to overcome this need: The knowledge can be built in manually for all possible situations. The agent can expand its knowledge to the new scenarios itself by: Problem solving (lots of search) Might be too expensive Too slow Even impossible Learning by watching an expert

4 Learning by Observation (overview)
Skill Hierarchy Problem Reactive Execution Initial State ? goal impasse? Effects of Primitive skills yes New Skills Expert’s Primitive Skill Sequence Learning by Observation

5 Our Agent Architecture: ICARUS
Perceptual Buffer Perception Long-Term Conceptual Memory Inference Short-Term Conceptual Memory Environment Long-Term Skill Memory Skill Retrieval Short-Term Skill Memory Action Motor Buffer

6 Teleoreactive Logic Programs
ICARUS encodes long-term knowledge of three general types: Conceptual clauses Relational inference rules that refer to percepts (primitive) or other concepts (nonprimitive) Primitive skill clauses Stated as durative STRIPS operators Nonprimitive skill clauses Relational rules which specify: a head that indicates a goal the method achieves a set of (possibly defined) preconditions one or more ordered subskills for achieving the goal. Teleoreactive logic programs can be executed reactively but in a goal-directed manner (Nilsson, 1994).

7 Knowledge Representation I - Concept Hierarchy
(clear (?block) :percepts ((block ?block)) :negatives ((on ?other ?block))) (unstackable (?block ?from) :percepts ((block ?block) (block ?from)) :positives ((on ?block ?from) (clear ?block) (hand-empty))) (on (?blk1 ?blk2) :percepts ((block ?blk1 x ?x1 y ?y1) (block ?blk2 x ?x2 y ?y2 h ?h)) :tests ((equal ?x1 ?x2) (>= ?y1 ?y2) (<= ?y1 (+ ?y2 ?h))))

8 Knowledge Representation II – Skill Hierarchy
An example of a higher level skill in the hierarchy: (hand-empty () :percepts ((block ?c) (table ?t1)) :start ((putdownable ?c ?t1)) :ordered ((putdown ?c ?t1))) An example of the first level skill (operators): (putdown (?block ?t0) :percepts ((block ?block) (table ?t0 xpos ?xpos ypos ?ypos height ?height)) :start ((putdownable ?block ?t0)) :effects ((ontable ?block ?t0) (hand-empty)) :actions ((*horizontal-move ?block (+ ?xpos 1 (random100))) (*vertical-move ?block (+ ?ypos ?height)) (*ungrasp ?block)))

9 Inference and Execution
Concepts are matched bottom up, starting from percepts. Skill paths are matched top down, starting from intentions. Primitive Skills Skill Instances’ Hierarchy Concept Instances’ Hierarchy

10 Learning Using Problem Solving
Skill Hierarchy Problem Reactive Execution Initial State ? goal impasse? Effects of Primitive skills yes Executed plan Problem Solving Extracting new skills Problem solving involves means-ends analysis, except chaining occurs over both skills and concepts, and skills are executed whenever applicable.

11 Learning by Observation (LBO)
: . State Sequence Concept Hierarchy State Projection : . S0= Initial State New Skills Learning by Observation Expert’s Primitive Skill Sequence Effects of Primitive skills Goal

12 Observational Inputs to Learning Module
Learning by Observation Procedure Observational Inputs to Learning Module Primitive skills’ Definitions Skill Chaining Sn= S0= S1= S2= yes : . : . : . : . :effects no Concept instance Concept Chaining Primitive skill/operator Goal concept

13 Skill Chaining If made , the precondition of satisfied then learn:
else learn: Set: goal Remove: from the expert’s trace Call LBO recursively (g() start: precondition on subskills: ( )) (g() start: subskills: )

14 Concept Chaining : … . : . : : . . … Expert’s trace Concept …
State Sequence Concept Definitions Call LBO module subgoals: : . Parsing the Sequence Call LBO module : . : . : . ( () start: ( ,.., ) subskills: ( : )) Learning the new skill Call LBO module Already satisfied

15 An Example from Blocks World
Goal : (clear A) Expert’s trace: (unstack C B) (putdown C T) (unstack B A) Initial State: C B A T Known concepts: (on) (on-table) (clear) (holding) (hand-empty) (unstackable) (stackable) (pickupable) (putdownable) Known skills: (stack) (unstack) (pickup) (putdown)

16 Blocks World Example Cont’d
clear (?B) :start ((unstackable ?C ?B)) :ordered ((unstack ?C ?B)) On B A Clear B unstackable B A clear A unstackable C B unstack B A Clear B unstack C B hand-empty putdownable C Hand-empty putdown C clear (?A) :start ((on ?B ?A)) :ordered ((unstackable ?B ?A) (unstack ?B ?A)) hand_empty :start ((putdownable ?C ?T1)) :ordered ((putdown ?C ?T1)) unstackable (?B ?A) :start ((on ?B ?A)) :ordered ((clear ?B) (hand-empty))

17 Preliminary Results Promising results in domains such as:
Blocks World Depots Similar results to learning by problem solving when search is feasible After learning by observation, being able to solve problems that were not feasible to solve by problem solving

18 Related Work Explanation-Based Learning Behavioral Cloning
Programming by Demonstration Mixed Initiative Learning

19 Future Work More experiments and evaluation (in progress)
Irrelevant actions Interleaving goal achieving Using higher level skills Missing skills Multiple goals Unknown goal


Download ppt "Learning Teleoreactive Logic Programs by Observation"

Similar presentations


Ads by Google