Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

Similar presentations


Presentation on theme: "Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA."— Presentation transcript:

1 Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA

2 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

3 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

4 GivenLearn Task T Task S

5 Environment s1s1 Agent Q(s 1, a) = 0 policy π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3 ExplorationExploitation Maximize reward

6 performance training higher start higher slope higher asymptote

7 3-on-2 BreakAway 2-on-1 BreakAway Hand-coded defenders Single learning agent

8 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

9 Madden & Howley 2004 Learn a set of rules Use during exploration steps Croonenborghs et al. 2007 Learn a relational decision tree Use as an additional action Our prior work, 2007 Learn a relational macro Use as a demonstration

10 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

11 IF distance(GoalPart) > 10 AND angle(ball, Teammate, Opponent) > 30 THEN pass(t 1 ) pass(t 2 ) pass(Teammate) goalLeft goalRight

12 Formulas (F) evidence 1 (X) AND query(X) evidence 2 (X) AND query(X) Weights (W) w 0 = 1.1 w 1 = 0.9 n i (world) = # true groundings of i th formula in world query(x 1 ) e1 e2 … query(x 2 ) e1 e2 … Richardson and Domingos, Machine Learning 2006

13 Algorithm 1: Transfer source-task Q-function as an MLN Algorithm 2: Transfer source-task policy as an MLN Task T Task S MLN Q-function Task T Task S MLN Policy

14 Use MLN Use regular target-task training

15 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

16 Source Target Aleph, Alchemy Demonstration MLN for action 1 StateQ-value MLN Q-function MLN for action 2 StateQ-value …

17 0 ≤ Q a < 0.20.2 ≤ Q a < 0.40.4 ≤ Q a < 0.6 ……… … Bin Number Probability Bin Number Probability Bin Number Probability

18 IF … THEN 0 < Q < 0.2 IF … THEN 0 < Q < 0.2 Bins: Hierarchical clustering Formulas for each bin: Aleph (Srinivasan) w 0 = 1.1 w 1 = 0.9 … Weights: Alchemy (U. Washington) IF … THEN 0 < Q < 0.2

19 Rule 1Precision=1.0 Rule 2Precision=0.99 Rule3Precision=0.96… Does rule increase F-score of ruleset? yes Add to ruleset Aleph rules F = 2 x Precision x Recall Precision + Recall

20 Examples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAway IFdistance(me, GoalPart) ≥ 42 distance(me, Teammate) ≥ 39 THEN pass(Teammate) falls into [0, 0.11] IFangle(topRight, goalCenter, me) ≤ 42 angle(topRight, goalCenter, me) ≥ 55 angle(goalLeft, me, goalie) ≥ 20 angle(goalCenter, me, goalie) ≤ 30 THEN pass(Teammate) falls into [0.11, 0.27] IFdistance(Teammate, goalCenter) ≤ 9 angle(topRight, goalCenter, me) ≤ 85 THEN pass(Teammate) falls into [0.27, 0.43]

21 Transfer from 2-on-1 BreakAway to 3-on-2 BreakAway

22 Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

23 Source Target Aleph, Alchemy Demonstration MLN (F,W) State Action Probability MLN Policy

24 move(ahead)pass(Teammate)shoot(goalLeft) ……… … Policy: choose highest-probability action

25 IF … THEN pass(Teammate) IF … THEN pass(Teammate) Formulas for each action: Aleph (Srinivasan) w 0 = 1.1 w 1 = 0.9 … Weights: Alchemy (U. Washington) IF … THEN pass(Teammate)

26 Examples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAway IFangle(topRight, goalCenter, me) ≤ 70 timeLeft ≥ 98 distance(me, Teammate) ≥ 3 THEN pass(Teammate) IFdistance(me, GoalPart) ≥ 36 distance(me, Teammate) ≥ 12 timeLeft ≥ 91 angle(topRight, goalCenter, me) ≤ 80 THEN pass(Teammate) IFdistance(me, GoalPart) ≥ 27 angle(topRight, goalCenter, me) ≤ 75 distance(me, Teammate) ≥ 9 angle(Teammate, me, goalie) ≥ 25 THEN pass(Teammate)

27 MLN policy transfer from 2-on-1 BreakAway to 3-on-2 BreakAway

28 ILP rulesets can represent a policy by themselves Does the MLN provide extra benefit? Yes, MLN policies perform as well or better MLN policies can include action-sequence knowledge Does this improve transfer? No, the Markov assumption appears to hold in RoboCup

29 MLN transfer can improve reinforcement learning Higher initial performance Policies transfer better than Q-functions Simpler and more general Policies can transfer better than macros, but not always More detailed knowledge, risk of overspecialization MLNs transfer better than rulesets Statistical-relational over pure relational Action-sequence information is redundant Markov assumption holds in our domain

30 Refinement of transferred knowledge Revising weights Relearning rules Too-specific clause Better clause Too-general clause Better clause (Mihalkova et al. 2007)

31 Relational reinforcement learning Q-learning with MLN Q-function Policy search with MLN policies or macro Bin Number Probability MLN Q-functions lose too much information:

32 Co-author: Jude Shavlik Grants DARPA HR0011-04-1-0007 DARPA FA8650-06-C-7606


Download ppt "Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA."

Similar presentations


Ads by Google