Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA.

Slides:



Advertisements
Similar presentations
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.
Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u
Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.
Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson
CSC411- Machine Learning and Data Mining Unsupervised Learning Tutorial 9– March 16 th, 2007 University of Toronto (Mississauga Campus)
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
Reinforcement Learning
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Machine Learning: Symbol-Based
ONLINE Q-LEARNER USING MOVING PROTOTYPES by Miguel Ángel Soto Santibáñez.
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.
Soar-RL: Reinforcement Learning and Soar Shelley Nason.
Machine Learning via Advice Taking Jude Shavlik. Thanks To... Rich Maclin Lisa Torrey Trevor Walker Prof. Olvi Mangasarian Glenn Fung Ted Wild DARPA.
Reinforcement Learning (1)
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
© Jesse Davis 2006 View Learning Extended: Learning New Tables Jesse Davis 1, Elizabeth Burnside 1, David Page 1, Vítor Santos Costa 2 1 University of.
Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679
Lisa Torrey University of Wisconsin – Madison CS 540.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Boosting Markov Logic Networks
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
Machine Learning Chapter 13. Reinforcement Learning
1 Transfer Learning by Mapping and Revising Relational Knowledge Raymond J. Mooney University of Texas at Austin with acknowledgements to Lily Mihalkova,
Reinforcement Learning
Introduction to ILP ILP = Inductive Logic Programming = machine learning  logic programming = learning with logic Introduced by Muggleton in 1992.
Introduction Many decision making problems in real life
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves A Case Study in Biomedical Information Extraction Mark Goadrich, Louis Oliphant and.
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison,
Lisa Torrey University of Wisconsin – Madison Doctoral Defense May 2009.
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.
Transfer Learning Via Advice Taking Jude Shavlik University of Wisconsin-Madison.
Markov Logic And other SRL Approaches
Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,
Machine Learning.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Learning Ensembles of First-Order Clauses for Recall-Precision Curves Preliminary Thesis Proposal Mark Goadrich Department of Computer Sciences University.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS
Relational Macros for Transfer in Reinforcement Learning Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University.
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
Solving POMDPs through Macro Decomposition
Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.
POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.
1 Introduction to Reinforcement Learning Freek Stulp.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Reinforcement learning (Chapter 21)
Reinforcement Learning
The ideals reality of science The pursuit of verifiable answers highly cited papers for your c.v. The validation of our results by reproduction convincing.
Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.
Thirty-Two Years of Knowledge-Based Machine Learning Jude Shavlik University of Wisconsin Not on cs540 final.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Frank DiMaio and Jude Shavlik Computer Sciences Department
Trevor Walker, Gautam Kunapuli, Noah Larsen, David Page, Jude Shavlik
Markov Logic Networks for NLP CSCI-GA.2591
Louis Oliphant and Jude Shavlik
Mark Goadrich Computer Sciences Department
Richard Maclin University of Minnesota - Duluth
Introduction to Reinforcement Learning and Q-Learning
Refining Rules Incorporated into Knowledge-Based Support Vector Learners via Successive Linear Programming Richard Maclin University of Minnesota - Duluth.
Presentation transcript:

Lisa Torrey and Jude Shavlik University of Wisconsin Madison WI, USA

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

GivenLearn Task T Task S

Environment s1s1 Agent Q(s 1, a) = 0 policy π(s 1 ) = a 1 a1a1 s2s2 r2r2 δ(s 1, a 1 ) = s 2 r(s 1, a 1 ) = r 2 Q(s 1, a 1 )  Q(s 1, a 1 ) + Δ π(s 2 ) = a 2 a2a2 δ(s 2, a 2 ) = s 3 r(s 2, a 2 ) = r 3 s3s3 r3r3 ExplorationExploitation Maximize reward

performance training higher start higher slope higher asymptote

3-on-2 BreakAway 2-on-1 BreakAway Hand-coded defenders Single learning agent

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

Madden & Howley 2004 Learn a set of rules Use during exploration steps Croonenborghs et al Learn a relational decision tree Use as an additional action Our prior work, 2007 Learn a relational macro Use as a demonstration

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

IF distance(GoalPart) > 10 AND angle(ball, Teammate, Opponent) > 30 THEN pass(t 1 ) pass(t 2 ) pass(Teammate) goalLeft goalRight

Formulas (F) evidence 1 (X) AND query(X) evidence 2 (X) AND query(X) Weights (W) w 0 = 1.1 w 1 = 0.9 n i (world) = # true groundings of i th formula in world query(x 1 ) e1 e2 … query(x 2 ) e1 e2 … Richardson and Domingos, Machine Learning 2006

Algorithm 1: Transfer source-task Q-function as an MLN Algorithm 2: Transfer source-task policy as an MLN Task T Task S MLN Q-function Task T Task S MLN Policy

Use MLN Use regular target-task training

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

Source Target Aleph, Alchemy Demonstration MLN for action 1 StateQ-value MLN Q-function MLN for action 2 StateQ-value …

0 ≤ Q a < ≤ Q a < ≤ Q a < 0.6 ……… … Bin Number Probability Bin Number Probability Bin Number Probability

IF … THEN 0 < Q < 0.2 IF … THEN 0 < Q < 0.2 Bins: Hierarchical clustering Formulas for each bin: Aleph (Srinivasan) w 0 = 1.1 w 1 = 0.9 … Weights: Alchemy (U. Washington) IF … THEN 0 < Q < 0.2

Rule 1Precision=1.0 Rule 2Precision=0.99 Rule3Precision=0.96… Does rule increase F-score of ruleset? yes Add to ruleset Aleph rules F = 2 x Precision x Recall Precision + Recall

Examples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAway IFdistance(me, GoalPart) ≥ 42 distance(me, Teammate) ≥ 39 THEN pass(Teammate) falls into [0, 0.11] IFangle(topRight, goalCenter, me) ≤ 42 angle(topRight, goalCenter, me) ≥ 55 angle(goalLeft, me, goalie) ≥ 20 angle(goalCenter, me, goalie) ≤ 30 THEN pass(Teammate) falls into [0.11, 0.27] IFdistance(Teammate, goalCenter) ≤ 9 angle(topRight, goalCenter, me) ≤ 85 THEN pass(Teammate) falls into [0.27, 0.43]

Transfer from 2-on-1 BreakAway to 3-on-2 BreakAway

Background Approaches for transfer in reinforcement learning Relational transfer with Markov Logic Networks Two new algorithms for MLN transfer

Source Target Aleph, Alchemy Demonstration MLN (F,W) State Action Probability MLN Policy

move(ahead)pass(Teammate)shoot(goalLeft) ……… … Policy: choose highest-probability action

IF … THEN pass(Teammate) IF … THEN pass(Teammate) Formulas for each action: Aleph (Srinivasan) w 0 = 1.1 w 1 = 0.9 … Weights: Alchemy (U. Washington) IF … THEN pass(Teammate)

Examples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAway IFangle(topRight, goalCenter, me) ≤ 70 timeLeft ≥ 98 distance(me, Teammate) ≥ 3 THEN pass(Teammate) IFdistance(me, GoalPart) ≥ 36 distance(me, Teammate) ≥ 12 timeLeft ≥ 91 angle(topRight, goalCenter, me) ≤ 80 THEN pass(Teammate) IFdistance(me, GoalPart) ≥ 27 angle(topRight, goalCenter, me) ≤ 75 distance(me, Teammate) ≥ 9 angle(Teammate, me, goalie) ≥ 25 THEN pass(Teammate)

MLN policy transfer from 2-on-1 BreakAway to 3-on-2 BreakAway

ILP rulesets can represent a policy by themselves Does the MLN provide extra benefit? Yes, MLN policies perform as well or better MLN policies can include action-sequence knowledge Does this improve transfer? No, the Markov assumption appears to hold in RoboCup

MLN transfer can improve reinforcement learning Higher initial performance Policies transfer better than Q-functions Simpler and more general Policies can transfer better than macros, but not always More detailed knowledge, risk of overspecialization MLNs transfer better than rulesets Statistical-relational over pure relational Action-sequence information is redundant Markov assumption holds in our domain

Refinement of transferred knowledge Revising weights Relearning rules Too-specific clause Better clause Too-general clause Better clause (Mihalkova et al. 2007)

Relational reinforcement learning Q-learning with MLN Q-function Policy search with MLN policies or macro Bin Number Probability MLN Q-functions lose too much information:

Co-author: Jude Shavlik Grants DARPA HR DARPA FA C-7606