Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Slides:

Advertisements

Similar presentations

Explanation-Based Learning (borrowed from mooney et al)

Advertisements

Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning Hierarchical Reactive Skills from Reasoning and Experience.

Learning Procedural Planning Knowledge in Complex Environments Douglas Pearson March 2004.

Galit Haim, Ya'akov Gal, Sarit Kraus and Michele J. Gelfand A Cultural Sensitive Agent for Human-Computer Negotiation 1.

Perceptron Learning Rule

1 Update on Learning By Observation Learning from Positive Examples Only Tolga Konik University of Michigan.

RL for Large State Spaces: Value Function Approximation

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Multi-Agent Strategic Modeling in a Robotic Soccer Domain Andraz Bezek, Matjaz Gams Department of Intelligent Systems, Jozef Stefan Institute {andraz.bezek,

Team Games Target level 3>4 My target is to…. Improve my basic knowledge of positions and their jobs within the game.

Artificial Intelligence in Game Design Introduction to Learning.

Training complex judgment The effects of critical thinking instruction and contextual interference Anne Helsdingen 1, Tamara van Gog 1, Jeroen van Merriёnboer.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

1 Learning from Behavior Performances vs Abstract Behavior Descriptions Tolga Konik University of Michigan.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.

WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Integrating Knowledge Capture and Supervised Learning through a Human-Computer Interface Trevor Walker, Gautam.

RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.

David Preston.  I would like to prepare a mock-up of Zippo Lighters, and create key features and training information. For now, the project would include.

Behaviorist Psychology R+R- P+P- B. F. Skinner’s operant conditioning.

Lisa Torrey University of Wisconsin – Madison CS 540.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Learning from Human Teachers: Issues and Challenges for ILP in Bootstrap Learning Sriraam Natarajan 1, Gautam Kunapuli 1, Richard Maclin 3, David Page.

Reinforcement Learning

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

Introduction Many decision making problems in real life

OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,

Lisa Torrey University of Wisconsin – Madison Doctoral Defense May 2009.

Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.

WISCONSIN UNIVERSITY OF WISCONSIN - MADISON Broadening the Applicability of Relational Learning Trevor Walker Ph.D. Defense 1.

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.

Leveraging Human Knowledge for Machine Learning Curriculum Design Matthew E. Taylor teamcore.usc.edu/taylorm.

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.

Performance Objectives and Content Analysis Chapter 8 (c) 2007 McGraw-Hill Higher Education. All rights reserved.

Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.

Reinforcement learning (Chapter 21)

Copyright Paula Matuszek Kinds of Machine Learning.

1 Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University.

The ideals reality of science The pursuit of verifiable answers highly cited papers for your c.v. The validation of our results by reproduction convincing.

Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.

Matthew E. Taylor 1 Autonomous Inter-Task Transfer in Reinforcement Learning Domains Matthew E. Taylor Learning Agents Research Group Department of Computer.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Getting Started With Scratch Brendan Routledge

Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.

Transfer Learning and Intelligence: an Argument and Approach Matthew E. Taylor Joint work with: Gregory Kuhlmann and Peter Stone Learning Agents Research.

AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Learning Robot Motion Control from Demonstration and Human Advice Brenna.

Motor Learning and Skill acquisition

Model Discovery through Metalearning

FFA Goalkeeper Self-Assessment Session Planner

Trevor Walker, Gautam Kunapuli, Noah Larsen, David Page, Jude Shavlik

Reinforcement learning (Chapter 21)

Transferring Instances for Model-Based Reinforcement Learning

Prof. Marie desJardins September 20, 2010

Teaching with Instructional Software

Announcements Homework 3 due today (grace period through Friday)

Richard Maclin University of Minnesota - Duluth

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth.

Continuous Curriculum Learning for RL

Presentation transcript:

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

Transfer Learning Agent learns Task A Agent encounters related Task B Agent discovers how tasks are related So far the user provides this info to the agent Agent uses knowledge from Task A to learn Task B faster Task A is the source. Task B is the target.

Transfer Learning The goal for the target task: performance training with transfer without transfer

Reinforcement Learning Overview Take an action Observe world state Receive a reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

Transfer in Reinforcement Learning What knowledge will we transfer from the source? What knowledge will we transfer from the source? Q-functions (Taylor & Stone 2005) Q-functions (Taylor & Stone 2005) Policies (Torrey et al. 2005) Policies (Torrey et al. 2005) Skills (this work) Skills (this work) How will we extract that knowledge from the source? How will we extract that knowledge from the source? From Q-functions (Torrey et al. 2005) From Q-functions (Torrey et al. 2005) From observed behavior (this work) From observed behavior (this work) How will we apply that knowledge in the target? How will we apply that knowledge in the target? Model reuse (Taylor & Stone 2005) Model reuse (Taylor & Stone 2005) Advice taking (Torrey et al. 2005, this work) Advice taking (Torrey et al. 2005, this work)

Advice Taking Advice: instructions for the learner Advice: instructions for the learner IF: condition THEN: prefer action In these states Q action1 > Q action2 Complexity of Q-function Apply advice as soft constraints (KBKR, 2005) Apply advice as soft constraints (KBKR, 2005) For each action, find the Q-function that minimizes: Error on Training Data Disagreement with Advice + +

Experimental Domain: RoboCup Keep the ball Stone & Sutton 2001 KeepAway (KA/MKA) Score a goal Maclin et al BreakAway (BA) MoveDownfield (MD) Cross the line Torrey et al Different objectives, but a transferable skill: passing to teammates

A Challenge for Skill Transfer Shared skills are not exactly the same Shared skills are not exactly the same Skills have general and specific aspects Skills have general and specific aspects Aspects of the pass skill in RoboCup Aspects of the pass skill in RoboCup General: teammate must be open General: teammate must be open Game-specific: where teammate should be located Game-specific: where teammate should be located Player-specific: whether teammate is nearest or furthest Player-specific: whether teammate is nearest or furthest I’m open and near the goal. Pass to me! I’m open and far from you. Pass to me!

Addressing the Challenge We focus on learning general skill aspects We focus on learning general skill aspects These should transfer better These should transfer better We learn skills that apply to multiple players We learn skills that apply to multiple players This generalizes over player-specific aspects This generalizes over player-specific aspects We allow humans to provide information We allow humans to provide information They can point out game-specific aspects They can point out game-specific aspects

Human-Provided Information User provides a mapping to show task similarities User provides a mapping to show task similarities May also provide user advice about task differences May also provide user advice about task differences Pass Ø Pass towards goal Move towards goal Shoot at goal

Our Transfer Algorithm Observe source task games to learn skills Create advice for the target task Learn target task with KBKR Translate learned skills into transfer advice If there is user advice, add it in

Learning Skills By Observation Source-task games are sequences: (state, action) Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions Learning skills is like learning to classify states by their correct actions We use Inductive Logic Programming to learn classifiers We use Inductive Logic Programming to learn classifiers State 1: distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5... action = pass(teammate2) outcome = caught(teammate2)

Advantages of ILP Can produce first-order rules for skills Can produce first-order rules for skills Capture only the essential aspects of the skill Capture only the essential aspects of the skill We expect these aspects to transfer better We expect these aspects to transfer better Can incorporate background knowledge Can incorporate background knowledge pass(Teammate) pass(teammate1) pass(teammateN) vs....

Preparing Datasets for ILP action = pass(Teammate) ? outcome = caught(Teammate) ? Q(pass) is high? Q(pass) is highest? Positive example for pass(Teammate) yes Q(other) is high? Q(pass) is lower? Negative example for pass(Teammate) no yes Reject example no

Example of a Skill Learned pass(Teammate) :- distBetween(me, Teammate) > 14, distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) > 30, passAngle(Teammate) < 150, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. distBetween(me, Opponent) < 7.

KBKR requires propositional advice KBKR requires propositional advice We instantiate each rule head We instantiate each rule head Variables in rule bodies create disjunctions Variables in rule bodies create disjunctions We use tile features to translate them We use tile features to translate them Variables can appear multiple times Variables can appear multiple times We create new features to translate them We create new features to translate them Technical Challenges

Two Experimental Scenarios Pass Ø Pass towards goal Move towards goal Shoot at goal 4-on-3 MKA 3-on-2 BA 3-on-2 MD Pass MoveAhead Ø Pass MoveAhead Shoot at goal

Skill Transfer Results Without transfer From MKA From MD

Breakdown of MKA Results

What if User Advice is Bad?

Related Work Q-function transfer in RoboCup Q-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005) Taylor & Stone (AAMAS 2005, AAAI 2005) Transfer via policy reuse Transfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Madden & Howley (AI Review 2004) Transfer via relational RL Transfer via relational RL Driessens et al. (ICML workshop 2006) Driessens et al. (ICML workshop 2006)

Summary of Contributions Transfer of shared skills in high-level logic Transfer of shared skills in high-level logic Despite differences in shared skills Despite differences in shared skills Demonstration of the value of user guidance Demonstration of the value of user guidance Easy to give and beneficial Easy to give and beneficial Effective transfer in the RoboCup domain Effective transfer in the RoboCup domain Challenging and dissimilar tasks Challenging and dissimilar tasks

Future Work Learn more general skills by combining multiple source tasks Learn more general skills by combining multiple source tasks Compare several transfer methods on RoboCup scenarios of varying difficulty Compare several transfer methods on RoboCup scenarios of varying difficulty Reach similar levels of transfer with less user input Reach similar levels of transfer with less user input

Acknowledgements DARPA Grant HR DARPA Grant HR US Naval Research Laboratory Grant N G002 US Naval Research Laboratory Grant N G002 Thank You

User Advice IF: distBetween(me,goal) < 10 AND angle(goal, me, goalie) > 40 angle(goal, me, goalie) > 40 THEN: prefer shoot IF: distBetween(me,goal) > 10 THEN: prefer move_ahead IF: [transferred conditions] AND distBetween(Teammate,goal) < distBetween(me,goal) THEN: prefer pass(Teammate) This is the part that came from transfer

Feature Tiling Original feature Tiling #1 Tiling #2 Tiling #8 … Tiling #9 Tiling #10 Tiling #11 … min value max value (16 tiles) (8 tiles)

Propositionalizing Rules pass(Teammate) :- distBetween(me, Teammate) > 14, distBetween(me, Teammate) > 14, … Step 1: rule head Step 1: rule head pass(teammate1) :- distBetween(me, teammate1) > 14, distBetween(me, teammate1) > 14, … pass(teammateN) :- distBetween(me, teammateN) > 14, distBetween(me, teammateN) > 14, … …

Propositionalizing Rules distBetween(me, Opponent) < 7 distBetween(me,opponent1) [0,7] + … + distBetween(me,opponentN ) [0,7] ≥ 1 Step 2: single-variable disjunctions Step 2: single-variable disjunctions distBetween(me,opponent1) < 7 OR … OR distBetween(me,opponentN) < 7

distBetween(me, Player) > 14, distBetween(me, Player) > 14, distBetween(Player, goal) < 10 distBetween(Player, goal) < 10 newFeature(player1) + … + newFeature(playerN) ≥ 1 newFeature(Player) :- Dist1 is distBetween(me, Player), Dist1 is distBetween(me, Player), Dist2 is distBetween(Player, goal), Dist2 is distBetween(Player, goal), Dist1 > 14, Dist2 14, Dist2 < 10. Add to target task feature space: Step 3: linked-variable disjunctions Step 3: linked-variable disjunctions Propositionalizing Rules