Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진.

Similar presentations


Presentation on theme: "Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진."— Presentation transcript:

1 Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진

2 - C O N T E N T - INTRODUCTION REATED WORK KNOMIC EXPERIMENTS AND RESULTS FUTURE WORK CONCLUSION

3 Key words. Machine learning. Knowledge acquisition. Rule learning. User modeling Expert effort vs. Research effort for a variety of knowledge acquisition approaches (Figure 1) 1. INTRODUCTION (1/2) Figure 1

4 Hypothesis. “ Learning procedural knowledge from observations of an expert is more efficient than the standard knowledge-acquisition approach and is a more tractable research problem than unsupervised learning approach ” Primary Goal. This research is to explore is to explore and evaluate observation as a knowledge source for learning procedural knowledge 1. INTRODUCTION (2/2)

5 2. RELATED WORK (1/3) Behavioral Cloning. Learning by observation - to learning the knowledge necessary to fly a simulated airplane along a specific flight plan in the Silicon Graphics fight simulator (Bain and Sammut). Situation/action examples taken from observation of an expert. Used to build decision trees that decide which action to take based on current sensor input. Advantage - effectiveness in a complex, non-deterministic, dynamic environment. Defect - does not learn knowledge that allows the agent to dynamically select goals and procedures

6 2. RELATED WORK (2/3) The OBSERVER system. STRIPS-style operator - pre-condition - dynamically select based on the current sensor input - assume : operator action always performed in a single time step and without error Advantage. more expressive knowledge format Defect. problem in more complex task. contain no noise or dynamic change

7 2. RELATED WORK (3/3) The TRAIL system. learning algorithm and a planning algorithm to create teleo-operators. inductive logic programming to learn TOPs based on positive and negative examples from the traces OBSERVER, TRAIL ’ s TOPs :. difficultly in complex domains with uncertain action

8 3. K N O M I C (1/5) KNOMIC. learning-by-observation system based on a general framework for learning procedural knowledge from observation of an expert Knowledge Representation. selecting appropriate goals. performing actions to achieve and maintain those goals. knomic operator - non-deterministic action : implemented in the environment determined by the environment may or may not have the expected outcome - recover when the actions and environment don ’ t behave as expected : have multiple procedures for achieving its goals. classification of each operator as homeostatic, one-time or repeatable

9 3. K N O M I C (2/5) Learning – by-Observation Framework. major advantage : modularity Expert Environmental Interface Environment Observation Generation Execution Architecture Operator Classification Knowledge Formatting Condition Learning Output Commands Parameters& SensorStep1 Annotations Observation TracesStep2 Step3 Operator Conditions Step4 Learned Task Knowledge Step5 Formatted Knowledge -Figure 2: The learning-by-observation framework

10 3. K N O M I C (3/5) Knomic is one instantiation of the learning-by-observation framework Expert Environmental Interface ModSAF Observation Generation Soar Architecture Operator Classification Knowledge Formatting Specific to General Induction Output Commands Parameters& Sensor Annotations Observation Traces Operator Conditions Learned Task Knowledge Soar Productions

11 3. K N O M I C (4/5) Observation Trace Generation. sensor inputs the expert receives. actions the expert performs each cycle. annotation : whenever the goal he/she is seeking to achieve changes the task is being performed during a review of his/her behavior to avoid interrupt Specific-to-General Condition Learning. Find-S specific-to-general induction algorithm x 1 =, + x 2 =, + h 0 = h 1 = h 2 = Hypotheses H Specific General Instances X

12 3. K N O M I C (5/5) Operator Classification. examining if and when the expert reselects an operator as its goal conditions change from achieved to unachieved. homeostatic operator - to maintain its goal conditions as true once they are achieved - become untrue, immediately re-activated to re-achieved. one-time operator - only achieve its goal conditions once and then never be re-activated. repeatable operator - not be immediately re-activated but can be reselected if trigged by another operator Soar production generation. hierarchical, symbolic, propositional representation allowing some numerical relations. three classes of Soar production - operator pre-condition production, goal-achieved productions, action application production

13 4. EXPERIMENTS AND RESULTS (1/3) 1th experiment. accuracy of the Knomic system when provided with error-free observation 2th experiment. knomic ’ s accuracy with more realistic observation traces generated from observations of a human expert domain : air combat. 31operators in a pour level hierarchy. operator conditions : conjunctive. domain are triggered by external events sensed by the expert Evaluation Criteria. using the learned knowledge by the agent. compared to rules for the same task created by human programmer. fully correct, functionally correct, incorrect correct

14 4. EXPERIMENTS AND RESULTS (2/3) Experiment 1. four observation traces. 30minutes Result. 1900decision cycles. 25000sensor input. 40 goal annotation. 140 output command.101 fully correct, 29 functionally correct,10 incorrect. 10 incorrect production six of 10 : extraneous condition 3 of 10 : due to missing sensors in the environmental interface final incorrect : requires a negated test

15 4. EXPERIMENTS AND RESULTS (3/3) Experiment 2. two observation traces by a human expert. initialization, takeoff, racetrack. 45 production - 29 correct, 13 functionally correct, 3 incorrect. This experiment shows that KnoMic can successfully learn from observations of human experts but additional research needs to explore a more robust algorithm.

16 5. FUTURE WORK better learning algorithm. more powerful biases. aid in removing the additional extraneous conditions learning knowledge representation. include structured sensors, operator parameters prove annotation. automatically segmenting the observation traces - one possible based on these preliminary operators, the rest of the traces could be automatically annotated - another possible detect shifts in behavior through statistical analysis of the behavior traces

17 5. CONCLUSION focuses on the applications of a relatively simple learning algorithm to real world problem “ As demonstrated here, even a simple learning algorithm can be effective when embedded in a carefully designed framework. Hopefully, future research will find that more sophisticated learning algorithms are even more effective. ”


Download ppt "Learning Procedural Knowledge through Observation -Michael van Lent, John E. Laird – 인터넷 기술 전공 022ITI02 성유진."

Similar presentations


Ads by Google