Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Similar presentations


Presentation on theme: "Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,"— Presentation transcript:

1 Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

2 Our Goal Transfer knowledge… … between reinforcement learning tasks … employing SVM function approximators … using advice

3 Transfer Learn first task Learn related task knowledge acquired Exploit previously learned models Improve learning of new tasks performance experience with transfer without transfer

4 Reinforcement Learning state…action…reward…new state Q-function: value of taking action from state Policy: take action with max Q action (state) +2 0

5 Advice for Transfer Based on what worked in Task A, I suggest… Task A Solution Task B Learner I’ll try it, but if it doesn’t work I’ll do something else. Advice improves RL performance Advice can be refined or even discarded

6 Transfer Process Task A Q-functions Task A experience Advice from user (optional) Transfer Advice Task B Q-functions Task B experience Advice from user (optional) Mapping from user Task A  Task B Task A experience Task B experience

7 RoboCup Soccer Tasks KeepAwayBreakAway Keep ball from opponents [Stone & Sutton, ICML 2001] Score a goal [Maclin et al., AAAI 2005]

8 RL in RoboCup Tasks Pass, HoldPass, Move, Shoot Each time step: +1At end: +2, 0, or -1 KeepAwayBreakAway Features Actions Rewards (time left)

9 Transfer Process Task A Q-functions Task A experience Transfer Advice Task B Q-functions Task B experience Mapping from user Task A  Task B

10 Approximating Q-Functions Learn linear coefficients y = w 1 f 1 + … + w n f n + b Non-linearity from Boolean tile features tile i,lower,upper = 1 if lower ≤ f i < upper Given examples State features S i = Estimated values y  Q action (S i )

11 Support Vector Regression state S Q-estimate y minimize ||w|| 1 + |b| + C ||k|| 1 such that y - k  Sw + b  y + k Linear Program

12 Transfer Process Task A Q-functions Task A experience Transfer Advice Task B Q-functions Task B experience Mapping from user Task A  Task B

13 Advice Example Need only follow advice approximately Add soft constraints to linear program if distance_to_goal  10 and shot_angle  30 then prefer shoot over all other actions

14 Incorporating Advice Maclin et al., AAAI 2005 if v 11 f 1 + … + v 1n f n  d 1 … and v m1 f 1 + … + v mn f n  d n then Q shoot > Q other for all other Advice and Q-functions have same language Linear expressions of features

15 Transfer Process Task A Q-functions Task A experience Transfer Advice Task B Q-functions Task B experience Mapping from user Task A  Task B

16 Expressing Policy with Advice Q hold_ball (s) Q pass_near (s) Q pass_far (s) if Q hold_ball (s) > Q pass_near (s) and Q hold_ball (s) > Q pass_far (s) then prefer hold_ball over all other actions Old Q-functions Advice expressing policy

17 Mapping Actions hold_ball  move pass_near  pass_near pass_far   Q hold_ball (s) Q pass_near (s) Q pass_far (s) if Q hold_ball (s) > Q pass_near (s) and Q hold_ball (s) > Q pass_far (s) then prefer move over all other actions Old Q-functions Mapped policy Mapping from user

18 Mapping Features Q hold_ball (s) = w 1 (dist_keeper1)+ w 2 (dist_taker2)+ … Q´ hold_ball (s) = w 1 (dist_attacker1)+ w 2 (MAX_DIST)+ … Mapping from user Q-function mapping

19 Transfer Example Q x = w x1 f 1 + w x2 f 2 + b x Q y = w y1 f 1 + b y Q z = w z2 f 2 + b z Old model Q´ x = w x1 f´ 1 + w x2 f´ 2 + b x Q´ y = w y1 f´ 1 + b y Q´ z = w z2 f´ 2 + b z Mapped model if Q´ x > Q´ y and Q´ x > Q´ z then prefer x´ Advice if w x1 f´ 1 + w x2 f´ 2 + b x > w y1 f´ 1 + b y and w x1 f´ 1 + w x2 f´ 2 + b x > w z2 f´ 2 + b z then prefer x´ to all other actions Advice (expanded)

20 Transfer Experiment Between RoboCup subtasks From 3-on-2 KeepAway To 2-on-1 BreakAway Two simultaneous mappings Transfer passing skills Map passing skills to shooting

21 Experiment Mappings Play a moving KeepAway game Pass  Pass, Hold  Move Pretend teammate is standing in the goal Pass  Shoot imaginary teammate

22 Experimental Methodology Averaged over 10 BreakAway runs Transfer: advice from one KeepAway model Control: runs without advice

23 Results

24 Analysis Transfer advice helps BreakAway learners 7% more likely to score a goal after learning Improvement is delayed Advantage begins after 2500 games Some advice rules apply rarely Preconditions for shoot advice not often met

25 Related Work: Transfer Remember action subsequences [Singh, ML 1992] Restrict action choices [Sherstov & Stone, AAAI 2005] Transfer Q-values directly in KeepAway [Taylor & Stone, AAMAS 2005]

26 Related Work: Advice “Take action A now” [Clouse & Utgoff, ICML 1992] “In situations S, action A has value X ” [Maclin & Shavlik, ML 1996] “In situations S, prefer action A over B ” [Maclin et al., AAAI 2005]

27 Future Work Increase speed of linear-program solving Decrease sensitivity to imperfect advice Extract advice from kernel-based models Help user map actions and features

28 Conclusions Transfer exploits previously learned models to improve learning of new tasks Advice is an appealing way to transfer Linear regression approach incorporates advice straightforwardly Transferring a policy accommodates different reward structures

29 Acknowledgements DARPA grant HR0011-04-1-0007 United States Naval Research Laboratory grant N00173-04-1-G026 Michael Ferris Olvi Mangasarian Ted Wild


Download ppt "Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,"

Similar presentations


Ads by Google