Presentation on theme: "Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson March 2004."— Presentation transcript:
Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson March 2004
Characterizing the Learner Deliberate Implicit Method KR Declarative Procedural Simpler Agents Weak, slower learning Complex Agents Strong, faster learning Complex Environments Actions: Duration & Conditional Sensing: Limited, noisy, delayed Task : Timely response Domain: Change over time large state space Simple Environments Symbolic Learners Reinforcement Learning IMPROV
Why Limit Knowledge Access? Procedural – Only access by executing Declarative – Can answer when will execute/what it will do. Declarative Problems Availability –If (x^5 + 3x^3 – 5x^2 +2) > 7 then Action –Chains of rules A->B->C->Action Efficiency –O(size of knowledge base) or worse –Agent slows down as learns more IMPROV Representation –Sets of production rules for operator preconditions and actions –Assume learner can only execute rules –But allow ability to add declarative knowledge when its efficient to do so.
Focusing on Part of the Problem Task Performance 0% 100% Knowledge Representation Initial Rule Base Learn this Domain Knowledge
The Problem Cast learning problem as –Error detection (incomplete/incorrect K) –Error correction (fixing or adding K) But with just limited, procedural access Aim is to support learning in complex, scalable agents/environments.
Error Detection Problem S1 Speed-30 S2 Speed-10 S3 Speed-0 S4 Speed-30 Existing (Possibly Incorrect) Knowledge PLAN How to monitor the plan during execution without direct knowledge access?
Error Detection Solution Direct monitoring – not possible Instead detect lack of progress to the goal –No rules matching or conflicting rules S1 Speed-30 S2 Speed-10 S3 Speed-0 S4 Engine stalls No proposal Not predicting behavior of the world (useful in stochastic environments) But no implicit notion of quality of solution Can add domain specific error conditions – but not required.
Finding the Incorrect Operator(s) Speed-30Speed-10Speed-0Speed-30 Speed-10Speed-0Speed-30Change-Gear Change-Gear is over-specific Speed-0 is over-general By waiting can do better credit assignment
Learning to Correct the Operator Collected a set of training instances –[State, Operator -> Result] –Can identify differences between states Speed = 40 Light = green Self = car Other = car Speed = 40 Light = green Self = car Other = ambulance Used as a default bias in training inductive learner Learn preconditions as classification problem (predict operator from state)
K-Incremental Learning Collect a set of k instances Then train inductive learner Reinforcement Learners Till Correction (IMPROV) Till Unique Cause (EXPO) Non-Incremental Learners 1k1k2 n K-Incremental Learner –k does not grow over time => incremental behavior –Better decisions about what to discard when generalizing –When doing active learning bad early learning can really hurt Instance set size
Extending to Operator Actions Speed 30Speed 0Speed 20 Speed 30 Decompose into operator hierarchy Speed 0Speed 20 BrakeRelease Slow -5Slow -10 Slow 0 Terminates with operators that modify a single symbol
Correcting Actions Slow -5Slow -10 Expected effects of braking Slow -2Slow -4Slow -6 Observed effects of braking on ice => Failure Use the correction method to change the pre-conditions of these sub-operators
IMPROV Summary DeliberateImplicit Method KR Declarative Non-Incremental Procedural Incremental Symbolic Learners Reinforcement Learning IMPROV IMPROV support for: Powerful agents -- Multiple goals -- Faster, deliberate learning Complex environments -- Noise -- Complex actions -- Dynamic environments k-Incremental Learning -- Improved credit assignment -- Which operator -- Which feature General weak deliberate learner with only procedural access assumed -- General purpose error detection -- General correction method applied to preconditions and actions -- Nice re-use of precondition learner to learn actions -- Easy to add domain specific knowledge to make method stronger
Redux: Diagram-based Example-driven Knowledge Acquisition Douglas Pearson March 2004
1. User specifies desired behavior
2. User selects features – define rules Later well use ML to guess this initial feature set
3. Compare desired with rules Desired Actual Move-through(door1) Turn-to-face(threat1)Shoot(threat1) Move-through(door1) Turn-to-face(neutral1) Shoot(neutral1)
4. Identify and correct problems Detect differences between desired behavior and rules –Detect overgeneral preconditions –Detect conflicts within the scenario –Detect conflicts between scenarios –Detect choice points where theres no guidance –etc. etc. All of these errors are detected automatically when rule is created
5. Fast rule creation by expert ExpertEngineer Library of validated behavior examples A -> B C -> D E, J -> F G, A, C -> H E, G -> I J, K -> L Executable Code Analysis & generation tools Detect inconsistency Generalize Generate rules Simulate execution Simulation Environment Define behavior with diagram-based examples