Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning Candidate Kathryn Merrick School of Information Technologies University.

Similar presentations


Presentation on theme: "Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning Candidate Kathryn Merrick School of Information Technologies University."— Presentation transcript:

1 Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning Candidate Kathryn Merrick School of Information Technologies University of Sydney PhD Thesis Defence July, 2007 Supervisor Prof. Mary Lou Maher Key Centre for Design Computing and Cognition, University of Sydney Objectives | Contributions | Results | Conclusions

2 Introduction Objectives | Contributions | Results | Conclusions  Learning environments may be complex, with many states and possible actions  The tasks to be learned may change over time  It may be difficult to predict tasks in advance  Doing ‘everything’ may be infeasible  How can artificial agents focus attention to develop behaviours in complex, dynamic environments?  This thesis considers this question in conjunction with reinforcement learning

3 1. Develop models of motivation that focus attention based on experiences 2.Model complex, dynamic environments using a representation that enables adaptive behaviour 3. Develop learning agents with three aspects of attention focus:  Behavioural cycles  Adaptive behaviour  Multi-task learning 4. Develop metrics for comparing adaptability and multi-task learning behaviour of MRL agents. 5. Evaluate performance and scalability of MRL agents using different models of motivation and different RL approaches. Objectives | Contributions | Results | Conclusions S1S1 S2S2 S4S4 A1A1 A2A2 A4A4 A3A3 S3S3

4 Modelling Motivation as Experience-Based Reward Objectives | Contributions | Results | Conclusions R m(t) = max(I (t), C (t) )  Compute observations and events O S(t), E S(t)  Task selection using a self-organising map  Compute experience- based reward using:  Policy error  Deci and Ryan’s model of optimal challenges  Arbitrate by taking maximum of interest and competence motivation R m(t) = I (t)  Compute observations and events O S(t), E S(t)  Task selection using a self-organising map  Compute experience- based reward using:  Stanley’s model of habituation  Wundt Curve  No arbitration required

5 Representing Complex, Dynamic Environments S   | ε  |  1 | 2 | 3 |... ... P = {P 1, P 2, P 3, …, P i, …} Objectives | Contributions | Results | Conclusions A   | ε ... S (1) = ( ) A (1) = {A(pick-up, pick), A(pick-up, forge), A(pick-up, smithy)} S (2) = ( ) A (2) = {A(pick-up, axe), A(pick-up, lathe)}

6 Metrics and Evaluation  A classification of different types of MRL and the role played by motivation in these approaches.  Metrics for comparing learned behavioural cycles in terms of adaptability and multi-task learning.  Evaluation of the performance and scalability of MRL agents using different:  Models of motivation  RL approaches  Types of environment  New approaches to the design of non-player characters for games, which can adapt in open- ended virtual worlds. Objectives | Contributions | Results | Conclusions

7 Experiment 1 Behavioural VarietyBehavioural Complexity Objectives | Contributions | Results | Conclusions  Task oriented learning emerges using a task-independent motivation signal to direct learning.  Greatest behavioural variety in simple environments is achieved by MFRL agents  Greatest behavioural complexity is achieved by MFRL and MHRL agents, which can interleave solutions to multiple tasks

8 Experiment 2Experiment 3Experiment 4 MFRL MMORL MHRL Objectives | Contributions | Results | Conclusions  MFRL agents are most adaptable and most scalable as the number of tasks in the environment increases  MMORL are most scalable as the complexity of tasks increases  Agents motivated by interest and competence achieve greater adaptability, and show increased behavioural variety and complexity

9  MRL agents can learn task-oriented behavioural cycles using a task- independent motivation signal  The greatest behavioural variety and complexity in simple environments is achieved by MFRL agents  The greatest adaptability is displayed by MRL agents motivated by interest and competence  The most scalable approach when recall is required uses MMORL Objectives | Contributions | Results | Conclusions Conclusions

10 Limitations and Future Work  Scalability of MRL in other types of environments  Additional approaches to motivation:  Biological models  Cognitive models  Social models  Combined models  Motivation in other machine learning settings:  Motivated supervised learning  Motivated unsupervised learning  Additional metrics for MRL:  Usefulness  Intelligence  Rationality Objectives | Contributions | Results | Conclusions (Linden, 2007)

11

12 Tasks Maintenance tasks: observations Achievement tasks: Events Agent Sensed state Observation World state sensors E (t) = S (t) –S (t’) = (Δ(s 1(t), s 1(t’) ), Δ(s 2(t), s 2(t’) ), … Δ(s L(t), s L(t’) ), …)

13 Behavioural Cycles S 1 = ( ) S 2 = ( ) A 1 = use (Food Machine) A 3 = use (Food) S 3 = ( ) A 2 = move to(Food) A 4 = move to(Food Machine) S 3 = ( ) … S1S1 S1S1 S2S2 S1S1 S2S2 S3S3 S1S1 S2S2 SnSn A1A1 A1A1 A2A2 A1A1 A2A2 A3A3 A1A1 A2A2 AnAn A n-1

14 Agent Models A (t) S (t), R mt) S (t) W (t) sensors O (t), E (t) RL effectors M O (t)-1, E (t-1) π (t), S (t), A (t) π (t)-1), S (t)-1,A (t)-1) E U O U π (t) U A (t) U A T (t) A (t) S (t), R mt) S (t) W (t) B (t) effectors M π (t), S (t), B (t) π (t)-1), S (t)-1,B (t)-1) E U O U π (t) U A (t) U A U B T (t) B (t-1).π, S (t-1), S (t), B (t-1).A B (t-1).Ω(S (t-1) ) MORL Reflex B (t)-1 sensors S (t), R mt) O (t)-1), E (t-1) O (t)), E (t) MFRLMMORL

15 Sensitivity Change in interest with (a) ρ+ = ρ- = 5,F+min = 0.5 and F-min = 1.5 and (b) ρ+ = ρ- = 30,F+min= 0.5 and F-min = 1.5 Change in interest with (a) ρ+ = ρ- =10, F+min = 0.1 and F-min = 1.9 and (b) ρ+ = ρ- = 10, F+min = 0.9 and F-min = 1.1

16 Metrics A task is complete when its defining observation or event is achieved A task is learned when the standard deviation of the number of actions in h behavioural cycle completing the task is less than some error threshold Behavioural variety measures the number of tasks learned Behavioural complexity measures the number of actions in a behavioural cycle


Download ppt "Modelling Motivation for Experience-Based Attention Focus in Reinforcement Learning Candidate Kathryn Merrick School of Information Technologies University."

Similar presentations


Ads by Google