Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lisa Torrey University of Wisconsin – Madison CS 540.

Similar presentations


Presentation on theme: "Lisa Torrey University of Wisconsin – Madison CS 540."— Presentation transcript:

1 Lisa Torrey University of Wisconsin – Madison CS 540

2 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative

3 GivenLearn Task T Task S

4 performance training higher start higher slope higher asymptote

5 All Hypotheses Allowed Hypotheses Search

6 All Hypotheses Allowed Hypotheses Search Thrun and Mitchell 1995: Transfer slopes for gradient descent

7 Bayesian LearningBayesian Transfer Prior distribution + Data = Posterior Distribution Bayesian methods Raina et al.2006: Transfer a Gaussian prior

8 LineCurve Surface Circle Pipe Hierarchical methods Stracuzzi 2006: Learn Boolean concepts that can depend on each other

9 Dealing with Missing Data or Labels Shi et al. 2008: Transfer via active learning Task S Task T

10 Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

11 Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms

12 0000 0000 0000 target-task training 2548 9172 5914 Initial Q-table transfer no transfer Source task Starting-point methods Taylor et al. 2005: Value-function transfer

13 Hierarchical methods RunKick Pass Shoot Soccer Mehta et al. 2008: Transfer a learned hierarchy

14 Alteration methods Walsh et al. 2006: Transfer aggregate states Task S Original states Original actions Original rewards New states New actions New rewards

15 New RL Algorithms Torrey et al. 2006: Transfer advice about skills Environment s1s1 Agent Q(s 1, a) = 0 π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3

16 Imitation methods training source target policy used Torrey et al. 2007: Demonstrate a strategy

17 Starting-point methods Imitation methods Hierarchical methods Hierarchical methods New RL algorithms Skill Transfer Macro Transfer

18 3-on-2 BreakAway 3-on-2 KeepAway 3-on-2 MoveDownfield 2-on-1 BreakAway

19 IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) …

20 Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Environment Agent Batch 1 Environment Agent Batch 2 … Compute Q-functions

21 Find Q-functions that minimize:ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Environment Agent Batch 1 Compute Q-functions Environment Agent Batch 2 … Advice + µ × AdviceMisfit

22 Source Target IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) ILP Advice Taking [Human advice] Mapping

23 Skill transfer to 3-on-2 BreakAway from several tasks

24 pass(Teammate) move(Direction) shoot(goalRight) shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(ahead) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalLeft) IF [... ] THEN pass(Teammate) IF [... ] THEN move(left) IF [... ] THEN shoot(goalRight) IF [... ] THEN shoot(goalRight)

25 source target training policy used An imitation method

26 Source Target ILP Demonstration

27 Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)

28 Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF [ … ] THEN enter(State) IF [ … ] THEN loop(State, Teammate)) pass(Teammate)

29 Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

30 Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping


Download ppt "Lisa Torrey University of Wisconsin – Madison CS 540."

Similar presentations


Ads by Google