Lisa Torrey University of Wisconsin – Madison CS 540.

Lisa Torrey University of Wisconsin – Madison CS 540

Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative

GivenLearn Task T Task S

performance training higher start higher slope higher asymptote

All Hypotheses Allowed Hypotheses Search

All Hypotheses Allowed Hypotheses Search Thrun and Mitchell 1995: Transfer slopes for gradient descent

Bayesian LearningBayesian Transfer Prior distribution + Data = Posterior Distribution Bayesian methods Raina et al.2006: Transfer a Gaussian prior

LineCurve Surface Circle Pipe Hierarchical methods Stracuzzi 2006: Learn Boolean concepts that can depend on each other

Dealing with Missing Data or Labels Shi et al. 2008: Transfer via active learning Task S Task T

Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms

0000 0000 0000 target-task training 2548 9172 5914 Initial Q-table transfer no transfer Source task Starting-point methods Taylor et al. 2005: Value-function transfer

Hierarchical methods RunKick Pass Shoot Soccer Mehta et al. 2008: Transfer a learned hierarchy

Alteration methods Walsh et al. 2006: Transfer aggregate states Task S Original states Original actions Original rewards New states New actions New rewards

New RL Algorithms Torrey et al. 2006: Transfer advice about skills Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

Imitation methods training source target policy used Torrey et al. 2007: Demonstrate a strategy

Starting-point methods Imitation methods Hierarchical methods Hierarchical methods New RL algorithms Skill Transfer Macro Transfer

3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield 2-on-1 BreakAway

IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) …

Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Environment Agent Batch 1 Environment Agent Batch 2 … Compute Q-functions

Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Environment Agent Batch 1 Compute Q-functions Environment Agent Batch 2 … Advice + µ × AdviceMisfit

Source Target IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) ILP Advice Taking [Human advice] Mapping

Skill transfer to 3-on-2 BreakAway from several tasks

pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(ahead) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(left) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalRight)

source target training policy used An imitation method

Source Target ILP Demonstration

Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)

Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF [ … ] THEN enter(State) IF [ … ] THEN loop(State, Teammate)) pass(Teammate)

Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Lisa Torrey University of Wisconsin – Madison CS 540.

Similar presentations

Presentation on theme: "Lisa Torrey University of Wisconsin – Madison CS 540."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lisa Torrey University of Wisconsin – Madison CS 540.

Similar presentations

Presentation on theme: "Lisa Torrey University of Wisconsin – Madison CS 540."— Presentation transcript:

Similar presentations

About project

Feedback