Download presentation
Presentation is loading. Please wait.
Published byBennett Sharp Modified over 8 years ago
1
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik University of Wisconsin - Madison University of Minnesota - Duluth*
2
Reinforcement Learning Environment Agent action reward state May be delaye d
3
Q-Learning Update Q-function incrementally Update Q-function incrementally Follow current Q-function to choose actions Follow current Q-function to choose actions Converges to accurate Q-function Converges to accurate Q-function Q-function state action value policy(state) = argmax action
4
Limitations Agents begin without any information Agents begin without any information Random exploration required in early stages of learning Random exploration required in early stages of learning Long training times can result Long training times can result
5
Naturally-Inspired Extensions Advice Taking Advice Taking Transfer Learning Transfer Learning RL Agent Human Teacher Knowledge Target-task Agent Knowledge Source-task Agent
6
Potential Benefits performance training with knowledge without knowledge higher start higher slope higher asymptote
7
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
8
The RoboCup Domain KeepAway BreakAway MoveDownfield +1 per time step +1 per meter +1 upon goal
9
The RoboCup Domain distBetween(a0, Player) distBetween(a0, GoalPart) distBetween(Attacker, goalCenter) distBetween(Attacker, ClosestDefender) distBetween(Attacker, goalie) angleDefinedBy(topRight, goalCenter, a0) angleDefinedBy(GoalPart, a0, goalie) angleDefinedBy(Attacker, a0, ClosestDefender) angleDefinedBy(Attacker, a0, goalie) timeLeft state actions move(ahead) shoot(GoalPart) pass(Teammate) move(away) move(right) move(left)
10
Q-Learning Q-function state action value policy(state) = argmax action StateActionQ 110.5 12-0.5 130 2…2… 1…1… 0.3 … Function approximation
11
Approximating the Q-function Feature vector Weight vector ● Linear support-vector regression: Q-value = Set weights to minimize Set weights to minimize: ModelSize + C × DataMisfit distBetween(a0, a1) distBetween(a0, a2) distBetween(a0, goalie) … 0.2 -0.1 0.9 … T ●
12
RL in 3-on-2 BreakAway
13
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
14
Extension #1: Advice Taking IF an opponent is near AND a teammate is open THEN pass is the best action
15
Advice in RL Advice sets constraints on Q-values under specified conditions Advice sets constraints on Q-values under specified conditions IF an opponent is near me AND a teammate is open THEN pass has a high Q-value Apply as soft constraints in optimization Apply as soft constraints in optimization ModelSize + C × DataMisfit + μ × AdviceMisfit
16
Advice Performance
17
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
18
Extension #2: Transfer 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield
19
Relational Transfer First-order logic describes relationships between objects First-order logic describes relationships between objects distBetween(a0, Teammate) > 10 distBetween(Teammate, goalCenter) < 15 We want to transfer relational knowledge We want to transfer relational knowledge Human-level reasoning Human-level reasoning General representation General representation
20
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
21
Skill Transfer Learn advice about good actions from the source task Learn advice about good actions from the source task good_action(pass(Teammate)):- distBetween(a0, Teammate) > 10, distBetween(Teammate, goalCenter) <15. Example 1: distBetween(a0, a1) = 15 distBetween(a0, a2) = 5 distBetween(a0, goalie) = 20... action = pass(a1) outcome = caught(a1) Select positive and negative examples of good actions and apply inductive logic programming to learn rules Select positive and negative examples of good actions and apply inductive logic programming to learn rules
22
User Advice in Skill Transfer There may be new skills in the target that cannot be learned from the source There may be new skills in the target that cannot be learned from the source E.g., shooting in BreakAway E.g., shooting in BreakAway We allow users to add their own advice about these new skills We allow users to add their own advice about these new skills User advice simply adds to transfer advice User advice simply adds to transfer advice
23
Skill Transfer to 3-on-2 BreakAway
24
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
25
Macro Transfer Find an action sequence that separates good games from bad games Find an action sequence that separates good games from bad games Learn first-order rules to control transitions along the sequence Learn first-order rules to control transitions along the sequence move(ahead) pass(Teammate) shoot(GoalPart) Learn a strategy from the source task Learn a strategy from the source task
26
Transfer via Demonstration Games played in target task 0 100 … Execute macro strategy Perform standard RL Agent learns an initial Q-function Agent adapts to the target task
27
Macro Transfer to 3-on-2 BreakAway
28
Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer
29
MLN Transfer Learn a Markov Logic Network to represent the source-task policy relationally Learn a Markov Logic Network to represent the source-task policy relationally Apply the policy via demonstration in the target task Apply the policy via demonstration in the target task MLN Q-function state action value
30
Markov Logic Networks A Markov network models a joint distribution A Markov network models a joint distribution A Markov Logic Network combines probability with logic A Markov Logic Network combines probability with logic Template: a set of first-order formulas with weights Template: a set of first-order formulas with weights Each grounded predicate in a formula becomes a node Each grounded predicate in a formula becomes a node Predicates in grounded formula are connected by arcs Predicates in grounded formula are connected by arcs Probability of a world: (1/Z) exp( Σ W i N i ) Probability of a world: (1/Z) exp( Σ W i N i ) X Y Z A B
31
MLN Q-function IF distance(me, Teammate) < 15 AND angle(me, goalie, Teammate) > 45 THEN Q є (0.8, 1.0) IF distance(me, GoalPart) < 10 AND angle(me, goalie, GoalPart) > 45 THEN Q є (0.8, 1.0) Formula 1 W 1 = 0.75 N 1 = 1 teammate Formula 2 W 2 = 1.33 N 2 = 3 goal parts Probability that Q є (0.8, 1.0): __exp(W 1 N 1 + W 2 N 2 )__ 1 + exp(W 1 N 1 + W 2 N 2 )
32
Using an MLN Q-function Q є (0.8, 1.0) P 1 = 0.75 Q є (0.5, 0.8) P 2 = 0.15 Q є (0, 0.5) P 2 = 0.10 Q = P 1 ● E [Q | bin1] + P 2 ● E [Q | bin2] + P 3 ● E [Q | bin3] Q-value of most similar training example in bin
33
MLN Transfer to 3-on-2 BreakAway
34
Conclusions Advice and transfer can provide RL agents with knowledge that improves early performance Advice and transfer can provide RL agents with knowledge that improves early performance Relational knowledge is desirable because it is general and involves human-level reasoning Relational knowledge is desirable because it is general and involves human-level reasoning More detailed knowledge produces larger initial benefits, but is less widely transferrable More detailed knowledge produces larger initial benefits, but is less widely transferrable
35
Acknowledgements DARPA grant HR0011-04-1-0007 DARPA grant HR0011-04-1-0007 DARPA grant HR0011-07-C-0060 DARPA grant HR0011-07-C-0060 DARPA grant FA8650-06-C-7606 DARPA grant FA8650-06-C-7606 NRL grant N00173-06-1-G002 NRL grant N00173-06-1-G002 Thank You
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.