Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.

Similar presentations


Presentation on theme: "Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik."— Presentation transcript:

1 Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik University of Wisconsin - Madison University of Minnesota - Duluth*

2 Reinforcement Learning Environment Agent action reward state May be delaye d

3 Q-Learning Update Q-function incrementally Update Q-function incrementally Follow current Q-function to choose actions Follow current Q-function to choose actions Converges to accurate Q-function Converges to accurate Q-function Q-function state action value policy(state) = argmax action

4 Limitations Agents begin without any information Agents begin without any information Random exploration required in early stages of learning Random exploration required in early stages of learning Long training times can result Long training times can result

5 Naturally-Inspired Extensions Advice Taking Advice Taking Transfer Learning Transfer Learning RL Agent Human Teacher Knowledge Target-task Agent Knowledge Source-task Agent

6 Potential Benefits performance training with knowledge without knowledge higher start higher slope higher asymptote

7 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

8 The RoboCup Domain KeepAway BreakAway MoveDownfield +1 per time step +1 per meter +1 upon goal

9 The RoboCup Domain distBetween(a0, Player) distBetween(a0, GoalPart) distBetween(Attacker, goalCenter) distBetween(Attacker, ClosestDefender) distBetween(Attacker, goalie) angleDefinedBy(topRight, goalCenter, a0) angleDefinedBy(GoalPart, a0, goalie) angleDefinedBy(Attacker, a0, ClosestDefender) angleDefinedBy(Attacker, a0, goalie) timeLeft state actions move(ahead) shoot(GoalPart) pass(Teammate) move(away) move(right) move(left)

10 Q-Learning Q-function state action value policy(state) = argmax action StateActionQ 110.5 12-0.5 130 2…2… 1…1… 0.3 … Function approximation

11 Approximating the Q-function Feature vector Weight vector ● Linear support-vector regression: Q-value = Set weights to minimize Set weights to minimize: ModelSize + C × DataMisfit distBetween(a0, a1) distBetween(a0, a2) distBetween(a0, goalie) … 0.2 -0.1 0.9 … T ●

12 RL in 3-on-2 BreakAway

13 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

14 Extension #1: Advice Taking IF an opponent is near AND a teammate is open THEN pass is the best action

15 Advice in RL Advice sets constraints on Q-values under specified conditions Advice sets constraints on Q-values under specified conditions IF an opponent is near me AND a teammate is open THEN pass has a high Q-value Apply as soft constraints in optimization Apply as soft constraints in optimization ModelSize + C × DataMisfit + μ × AdviceMisfit

16 Advice Performance

17 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

18 Extension #2: Transfer 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield

19 Relational Transfer First-order logic describes relationships between objects First-order logic describes relationships between objects distBetween(a0, Teammate) > 10 distBetween(Teammate, goalCenter) < 15 We want to transfer relational knowledge We want to transfer relational knowledge Human-level reasoning Human-level reasoning General representation General representation

20 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

21 Skill Transfer Learn advice about good actions from the source task Learn advice about good actions from the source task good_action(pass(Teammate)):- distBetween(a0, Teammate) > 10, distBetween(Teammate, goalCenter) <15. Example 1: distBetween(a0, a1) = 15 distBetween(a0, a2) = 5 distBetween(a0, goalie) = 20... action = pass(a1) outcome = caught(a1) Select positive and negative examples of good actions and apply inductive logic programming to learn rules Select positive and negative examples of good actions and apply inductive logic programming to learn rules

22 User Advice in Skill Transfer There may be new skills in the target that cannot be learned from the source There may be new skills in the target that cannot be learned from the source E.g., shooting in BreakAway E.g., shooting in BreakAway We allow users to add their own advice about these new skills We allow users to add their own advice about these new skills User advice simply adds to transfer advice User advice simply adds to transfer advice

23 Skill Transfer to 3-on-2 BreakAway

24 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

25 Macro Transfer Find an action sequence that separates good games from bad games Find an action sequence that separates good games from bad games Learn first-order rules to control transitions along the sequence Learn first-order rules to control transitions along the sequence move(ahead) pass(Teammate) shoot(GoalPart) Learn a strategy from the source task Learn a strategy from the source task

26 Transfer via Demonstration Games played in target task 0 100 … Execute macro strategy Perform standard RL Agent learns an initial Q-function Agent adapts to the target task

27 Macro Transfer to 3-on-2 BreakAway

28 Outline RL in a complex domain RL in a complex domain Extension #1: Advice Taking Extension #1: Advice Taking Extension #2: Transfer Learning Extension #2: Transfer Learning Skill Transfer Skill Transfer Macro Transfer Macro Transfer MLN Transfer MLN Transfer

29 MLN Transfer Learn a Markov Logic Network to represent the source-task policy relationally Learn a Markov Logic Network to represent the source-task policy relationally Apply the policy via demonstration in the target task Apply the policy via demonstration in the target task MLN Q-function state action value

30 Markov Logic Networks A Markov network models a joint distribution A Markov network models a joint distribution A Markov Logic Network combines probability with logic A Markov Logic Network combines probability with logic Template: a set of first-order formulas with weights Template: a set of first-order formulas with weights Each grounded predicate in a formula becomes a node Each grounded predicate in a formula becomes a node Predicates in grounded formula are connected by arcs Predicates in grounded formula are connected by arcs Probability of a world: (1/Z) exp( Σ W i N i ) Probability of a world: (1/Z) exp( Σ W i N i ) X Y Z A B

31 MLN Q-function IF distance(me, Teammate) < 15 AND angle(me, goalie, Teammate) > 45 THEN Q є (0.8, 1.0) IF distance(me, GoalPart) < 10 AND angle(me, goalie, GoalPart) > 45 THEN Q є (0.8, 1.0) Formula 1 W 1 = 0.75 N 1 = 1 teammate Formula 2 W 2 = 1.33 N 2 = 3 goal parts Probability that Q є (0.8, 1.0): __exp(W 1 N 1 + W 2 N 2 )__ 1 + exp(W 1 N 1 + W 2 N 2 )

32 Using an MLN Q-function Q є (0.8, 1.0) P 1 = 0.75 Q є (0.5, 0.8) P 2 = 0.15 Q є (0, 0.5) P 2 = 0.10 Q = P 1 ● E [Q | bin1] + P 2 ● E [Q | bin2] + P 3 ● E [Q | bin3] Q-value of most similar training example in bin

33 MLN Transfer to 3-on-2 BreakAway

34 Conclusions Advice and transfer can provide RL agents with knowledge that improves early performance Advice and transfer can provide RL agents with knowledge that improves early performance Relational knowledge is desirable because it is general and involves human-level reasoning Relational knowledge is desirable because it is general and involves human-level reasoning More detailed knowledge produces larger initial benefits, but is less widely transferrable More detailed knowledge produces larger initial benefits, but is less widely transferrable

35 Acknowledgements DARPA grant HR0011-04-1-0007 DARPA grant HR0011-04-1-0007 DARPA grant HR0011-07-C-0060 DARPA grant HR0011-07-C-0060 DARPA grant FA8650-06-C-7606 DARPA grant FA8650-06-C-7606 NRL grant N00173-06-1-G002 NRL grant N00173-06-1-G002 Thank You


Download ppt "Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik."

Similar presentations


Ads by Google