Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

Transfer Learning Agent learns Task A Agent encounters related Task B Agent discovers how tasks are related So far the user provides this info to the agent Agent uses knowledge from Task A to learn Task B faster Task A is the source. Task B is the target.

Transfer Learning The goal for the target task: performance training with transfer without transfer

Reinforcement Learning Overview Take an action Observe world state Receive a reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

Transfer in Reinforcement Learning What knowledge will we transfer from the source? What knowledge will we transfer from the source? Q-functions (Taylor & Stone 2005) Q-functions (Taylor & Stone 2005) Policies (Torrey et al. 2005) Policies (Torrey et al. 2005) Skills (this work) Skills (this work) How will we extract that knowledge from the source? How will we extract that knowledge from the source? From Q-functions (Torrey et al. 2005) From Q-functions (Torrey et al. 2005) From observed behavior (this work) From observed behavior (this work) How will we apply that knowledge in the target? How will we apply that knowledge in the target? Model reuse (Taylor & Stone 2005) Model reuse (Taylor & Stone 2005) Advice taking (Torrey et al. 2005, this work) Advice taking (Torrey et al. 2005, this work)

Advice Taking Advice: instructions for the learner Advice: instructions for the learner IF: condition THEN: prefer action In these states Q action1 > Q action2 Complexity of Q-function Apply advice as soft constraints (KBKR, 2005) Apply advice as soft constraints (KBKR, 2005) For each action, find the Q-function that minimizes: Error on Training Data Disagreement with Advice + +

Experimental Domain: RoboCup Keep the ball Stone & Sutton 2001 KeepAway (KA/MKA) Score a goal Maclin et al. 2005 BreakAway (BA) MoveDownfield (MD) Cross the line Torrey et al. 2006 Different objectives, but a transferable skill: passing to teammates

A Challenge for Skill Transfer Shared skills are not exactly the same Shared skills are not exactly the same Skills have general and specific aspects Skills have general and specific aspects Aspects of the pass skill in RoboCup Aspects of the pass skill in RoboCup General: teammate must be open General: teammate must be open Game-specific: where teammate should be located Game-specific: where teammate should be located Player-specific: whether teammate is nearest or furthest Player-specific: whether teammate is nearest or furthest I’m open and near the goal. Pass to me! I’m open and far from you. Pass to me!

Addressing the Challenge We focus on learning general skill aspects We focus on learning general skill aspects These should transfer better These should transfer better We learn skills that apply to multiple players We learn skills that apply to multiple players This generalizes over player-specific aspects This generalizes over player-specific aspects We allow humans to provide information We allow humans to provide information They can point out game-specific aspects They can point out game-specific aspects

Human-Provided Information User provides a mapping to show task similarities User provides a mapping to show task similarities May also provide user advice about task differences May also provide user advice about task differences Pass Ø Pass towards goal Move towards goal Shoot at goal

Our Transfer Algorithm Observe source task games to learn skills Create advice for the target task Learn target task with KBKR Translate learned skills into transfer advice If there is user advice, add it in

Learning Skills By Observation Source-task games are sequences: (state, action) Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions Learning skills is like learning to classify states by their correct actions We use Inductive Logic Programming to learn classifiers We use Inductive Logic Programming to learn classifiers State 1: distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5... action = pass(teammate2) outcome = caught(teammate2)

Advantages of ILP Can produce first-order rules for skills Can produce first-order rules for skills Capture only the essential aspects of the skill Capture only the essential aspects of the skill We expect these aspects to transfer better We expect these aspects to transfer better Can incorporate background knowledge Can incorporate background knowledge pass(Teammate) pass(teammate1) pass(teammateN) vs....

Preparing Datasets for ILP action = pass(Teammate) ? outcome = caught(Teammate) ? Q(pass) is high? Q(pass) is highest? Positive example for pass(Teammate) yes Q(other) is high? Q(pass) is lower? Negative example for pass(Teammate) no yes Reject example no

Example of a Skill Learned pass(Teammate) :- distBetween(me, Teammate) > 14, distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) > 30, passAngle(Teammate) < 150, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. distBetween(me, Opponent) < 7.

KBKR requires propositional advice KBKR requires propositional advice We instantiate each rule head We instantiate each rule head Variables in rule bodies create disjunctions Variables in rule bodies create disjunctions We use tile features to translate them We use tile features to translate them Variables can appear multiple times Variables can appear multiple times We create new features to translate them We create new features to translate them Technical Challenges

Two Experimental Scenarios Pass Ø Pass towards goal Move towards goal Shoot at goal 4-on-3 MKA 3-on-2 BA 3-on-2 MD Pass MoveAhead Ø Pass MoveAhead Shoot at goal

Skill Transfer Results Without transfer From MKA From MD

Breakdown of MKA Results

What if User Advice is Bad?

Related Work Q-function transfer in RoboCup Q-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005) Taylor & Stone (AAMAS 2005, AAAI 2005) Transfer via policy reuse Transfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Madden & Howley (AI Review 2004) Transfer via relational RL Transfer via relational RL Driessens et al. (ICML workshop 2006) Driessens et al. (ICML workshop 2006)

Summary of Contributions Transfer of shared skills in high-level logic Transfer of shared skills in high-level logic Despite differences in shared skills Despite differences in shared skills Demonstration of the value of user guidance Demonstration of the value of user guidance Easy to give and beneficial Easy to give and beneficial Effective transfer in the RoboCup domain Effective transfer in the RoboCup domain Challenging and dissimilar tasks Challenging and dissimilar tasks

Future Work Learn more general skills by combining multiple source tasks Learn more general skills by combining multiple source tasks Compare several transfer methods on RoboCup scenarios of varying difficulty Compare several transfer methods on RoboCup scenarios of varying difficulty Reach similar levels of transfer with less user input Reach similar levels of transfer with less user input

Acknowledgements DARPA Grant HR0011-04-1-0007 DARPA Grant HR0011-04-1-0007 US Naval Research Laboratory Grant N00173-06-1-G002 US Naval Research Laboratory Grant N00173-06-1-G002 Thank You

User Advice IF: distBetween(me,goal) < 10 AND angle(goal, me, goalie) > 40 angle(goal, me, goalie) > 40 THEN: prefer shoot IF: distBetween(me,goal) > 10 THEN: prefer move_ahead IF: [transferred conditions] AND distBetween(Teammate,goal) < distBetween(me,goal) THEN: prefer pass(Teammate) This is the part that came from transfer

Feature Tiling Original feature Tiling #1 Tiling #2 Tiling #8 … Tiling #9 Tiling #10 Tiling #11 … min value max value (16 tiles) (8 tiles)

Propositionalizing Rules pass(Teammate) :- distBetween(me, Teammate) > 14, distBetween(me, Teammate) > 14, … Step 1: rule head Step 1: rule head pass(teammate1) :- distBetween(me, teammate1) > 14, distBetween(me, teammate1) > 14, … pass(teammateN) :- distBetween(me, teammateN) > 14, distBetween(me, teammateN) > 14, … …

Propositionalizing Rules distBetween(me, Opponent) < 7 distBetween(me,opponent1) [0,7] + … + distBetween(me,opponentN ) [0,7] ≥ 1 Step 2: single-variable disjunctions Step 2: single-variable disjunctions distBetween(me,opponent1) < 7 OR … OR distBetween(me,opponentN) < 7

distBetween(me, Player) > 14, distBetween(me, Player) > 14, distBetween(Player, goal) < 10 distBetween(Player, goal) < 10 newFeature(player1) + … + newFeature(playerN) ≥ 1 newFeature(Player) :- Dist1 is distBetween(me, Player), Dist1 is distBetween(me, Player), Dist2 is distBetween(Player, goal), Dist2 is distBetween(Player, goal), Dist1 > 14, Dist2 14, Dist2 < 10. Add to target task feature space: Step 3: linked-variable disjunctions Step 3: linked-variable disjunctions Propositionalizing Rules

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Similar presentations

Presentation on theme: "Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Similar presentations

Presentation on theme: "Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin."— Presentation transcript:

Similar presentations

About project

Feedback