Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.

Similar presentations


Presentation on theme: "Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA."— Presentation transcript:

1 Machine Learning via Advice Taking Jude Shavlik

2 Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA

3 Quote (2002) from DARPA Sometimes an assistant will merely watch you and draw conclusions. Sometimes you have to tell a new person, 'Please don't do it this way' or 'From now on when I say X, you do Y.' It's a combination of learning by example and by being guided.

4 Widening the “Communication Pipeline” between Humans and Machine Learners Teacher Pupil Machine Learner

5 Our Approach to Building Better Machine Learners Human partner expresses advice “naturally” and w/o knowledge of ML agent’s internals Agent incorporates advice directly into the function it is learning Additional feedback (rewards, I/O pairs, inferred labels, more advice) used to refine learner continually

6 “Standard” Machine Learning vs. Theory Refinement Positive Examples (“should see doctor”) temp = 102.1, age = 21, sex = F, … temp = 101.7, age = 37, sex = M, … Negative Examples (“take two aspirins”) temp = 99.1, age = 43, sex = M, … temp = 99.6, age = 24, sex = F, … Approximate Domain Knowledge if temp = high and age = young … then neg example Related work by labs of Mooney, Pazzani, Cohen, Giles, etc

7 Rich Maclin’s PhD (1995) IF a Bee is (Near and West) & an Ice is (Near and North) Then Begin Move East Move North END

8 Sample Results Without advice With advice

9 Our Motto Give advice rather than commands to your computer

10 Outline Prior Knowledge and Support Vector Machines  Intro to SVM’s  Linear Separation  Non-Linear Separation  Function Fitting (“Regression”)  Advice-Taking Reinforcement Learning  Transfer Learning via Advice Taking

11 Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support Vectors ? Margin

12 Linear Algebra for SVM’s Given p points in n dimensional space Represent by p-by-n matrix A of reals More succinctly where e is vector of ones Separate by two bounding planes Each A i in class +1 or -1

13 “Slack” Variables Dealing with Data that is not Linearly Separable A+ A- y Support Vectors

14 Support Vector Machines Quadratic Programming Formulation Solve this quadratic program min s.t. Maximize margin by minimizing Minimize sum of slack vars with wgt

15 Support Vector Machines Linear Programming Formulation Use 1-norm instead of 2-norm (typically runs faster; better feature selection; might generalize better, NIPS ‘03) min s.t.

16 Knowledge-Based SVM’s Generalizing “Example” from POINT to REGION A+ A-

17 Incorporating “Knowledge Sets” Into the SVM Linear Program This implication equivalent to set of constraints (proof in NIPS ’02 paper) Suppose that knowledge set belongs to class A+ Hence must lie in half space We therefore have the implication

18 Resulting LP for KBSVM’s We get this linear program (LP) Ranges over # regions

19 KBSVM with Slack Variables Was 0

20 SVMs and Non-Linear Separating Surfaces f1f1 f2f2 + + _ _ h(f 1, f 2 ) g(f 1, f 2 ) + + _ _ Non-linearly map to new space Linearly separate in new space (using kernels) Result is non-linear separator in original space Fung et al. (2003) presents knowledge- based non-linear SVMs

21 Support Vector Regression (aka Kernel Regression) Linearly approximating a function, given array A of inputs and vector y of (numeric) outputs f(x) ≈ x’w + b Find weights such that Aw + be ≈ y In dual space, w = A’ , so get (A A’)  + be ≈ y Kernel’izing (to get non-linear approx) K(A,A’)  + be ≈ y y x

22 What to Optimize? Linear program to optimize 1 st term (  ) is “regularizer” that minimizes model complexity 2 nd term is approximation error, weighted by parameter C Classical “least squares” fit if quadratic version and first term ignored

23 Predicting Y for New X y = K(x’, A’)  + b Use Kernel to compute “distance” to each training point (ie, row in A) Weight by  i (hopefully many  i are zero), Sum Add b (a scalar)

24 Knowledge-Based SVR Mangasarian, Shavlik, & Wild, JMLR ‘04 Add soft constraints to linear program (so need only follow advice approximately) minimize ||w|| 1 + C ||s|| 1 + penalty for violating advice such that y - s  Aw + b  y + s “slacked” match to advice Advice: In this region, y should exceed 4 S y 4

25 Testbeds: Subtasks of RoboCup Keep ball from opponents [Stone & Sutton, ICML 2001] Mobile KeepAway Score goal [Maclin et al., AAAI 2005] BreakAway

26 Reinforcement Learning Overview Take an action Receive a state Receive a reward Policy: choose the action with the highest Q-value in the current state Use the rewards to estimate the Q- values of actions in states Described by a set of features

27 Incorporating Advice in KBKR Advice format Bx ≤ d  f(x) ≥ hx +  If distanceToGoal ≤ 10 and shotAngle ≥ 30 Then Q(shoot) ≥ 0.9

28 Giving Advice About Relative Values of Multiple Functions Maclin et al, AAAI ’05 When the input satisfies preconditions(input) Then f 1 (input) > f 2 (input)

29 Sample Advice-Taking Results if distanceToGoal  10 and shotAngle  30 then prefer shoot over all other actions advice std RL 2 vs 1 BreakAway, rewards +1, -1 Q(shoot) > Q(pass) Q(shoot) > Q(move)

30 Transfer Learning Agent discovers how tasks are related We use a user mapping to tell the agent this Agent learns Task A Agent encounters related Task B Agent uses knowledge from Task A to learn Task B faster Task A is the source Task B is the target

31 Transfer Learning: The Goal for the Target Task performance training with transfer without transfer better start faster rise better asymptote

32 Our Transfer Algorithm Observe source task games to learn skills Use ILP to create advice for the target task Learn target task with KBKR Translate learned skills into transfer advice If there is user advice, add it in

33 Learning Skills By Observation Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions ILP = Inductive Logic Programming State 1 distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5... action = pass(teammate2) outcome = caught(teammate2)

34 ILP: Searching for First-Order Rules P :- true P :- QP :- R P :- R, QP :- R, S P :- R, S, V, W, X P :- S We also use a random-sampling approach

35 Advantages of ILP Can produce first-order rules for skills Capture only the essential aspects of the skill We expect these aspects to transfer better Can incorporate background knowledge pass(Teammate) pass(teammate1) pass(teammateN) vs.......

36 Example of a Skill Learned by ILP from KeepAway pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. Also gave “human” advice about shooting, since that is new skill in BreakAway

37 TL Level 7: KA to BA Raw Curves

38 TL Level 7: KA to BA Averaged Curves

39 TL Level 7: Statistics TL Metrics Average Reward TypeNameKA to BAMD to BA ScoreP ValueScoreP Value IJump start0.050.03120.080.0086 Jump start smoothed0.080.00020.060.0014 IITransfer ratio1.820.00341.860.0004 Transfer ratio (truncated)1.820.00321.860.0004 Average relative reduction (narrow)0.580.00420.540.0004 Average relative reduction (wide)0.700.00180.710.0008 Ratio (of area under the curves)1.370.00561.410.0012 Transfer difference503.570.0046561.270.0008 Transfer difference (scaled)1017.000.00401091.20.0016 IIIAsymptotic advantage0.090.00860.110.0040 Asymptotic advantage smoothed0.080.01160.100.0030 Boldface indicates a significant difference was found

40 Conclusion Can use much more than I/O pairs in ML Give advice to computers; they automatically refine it based on feedback from user or environment Advice an appealing mechanism for transferring learned knowledge computer-to-computer

41 Some Papers (on-line, use Google :-) Creating Advice-Taking Reinforcement LearnersCreating Advice-Taking Reinforcement Learners, Maclin & Shavlik, Machine Learning 1996 Knowledge-Based Support Vector Machine ClassifiersKnowledge-Based Support Vector Machine Classifiers, Fung, Mangasarian, & Shavlik, NIPS 2002 Knowledge-Based Nonlinear Kernel ClassifiersKnowledge-Based Nonlinear Kernel Classifiers, Fung, Mangasarian, & Shavlik, COLT 2003 Knowledge-Based Kernel ApproximationKnowledge-Based Kernel Approximation, Mangasarian, Shavlik, & Wild, JAIR 2004 Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel RegressionGiving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression, Maclin, Shavlik, Torrey, Walker, & Wild, AAAI 2005 Skill Acquisition via Transfer Learning and Advice TakingSkill Acquisition via Transfer Learning and Advice Taking, Torrey, Shavlik, Walker, & Maclin, ECML 2006

42 Backups

43 Breakdown of Results

44 What if User Advice is Bad?

45 Related Work on Transfer Q-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005) Transfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Torrey et al. (ECML 2005) Transfer via relational RL Driessens et al. (ICML workshop 2006)


Download ppt "Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA."

Similar presentations


Ads by Google