Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Slides:

Advertisements

Similar presentations

A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.

Advertisements

Bayesian Belief Propagation

Markov Decision Process

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

RL for Large State Spaces: Value Function Approximation

Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.

An Introduction to Markov Decision Processes Sarah Hickmott

Markov Decision Processes

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Reinforcement Learning (1)

Collaborative Reinforcement Learning Presented by Dr. Ying Lu.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

MAKING COMPLEX DEClSlONS

Search and Planning for Inference and Learning in Computer Vision

BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.

Reinforcement Learning

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Introduction Many decision making problems in real life

1 Prasad Tadepalli Intelligent assistive systems Infer the goals of the human users and offer timely help; applications to assistance, tutoring; Learning.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Privacy-Preserving Bayes-Adaptive MDPs CS548 Term Project Kanghoon Lee, AIPR Lab., KAIST CS548 Advanced Information Security Spring 2010.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

INTRODUCTION TO Machine Learning

Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

MDPs (cont) & Reinforcement Learning

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Reinforcement learning (Chapter 21)

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Figure 5: Change in Blackjack Posterior Distributions over Time.

Online Multiscale Dynamic Topic Models

Nonparametric Bayesian Learning of Switching Dynamical Processes

Reinforcement learning (Chapter 21)

Reinforcement learning (Chapter 21)

Markov Decision Processes

RL methods in practice Alekh Agarwal.

Reinforcement Learning with Partially Known World Dynamics

Hierarchical POMDP Solutions

Dr. Unnikrishnan P.C. Professor, EEE

Reinforcement Learning

CS639: Data Management for Data Science

Department of Computer Science Ben-Gurion University

Reinforcement Nisheeth 18th January 2019.

Continuous Curriculum Learning for RL

Presentation transcript:

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University

Markov Decision Processes MDP M : R : Policy Seek optimal policy: Environment Agent

Multi Task Reinforcement Learning (MTRL) Given: A sequence of Markov Decision Processes drawn from an unknown distribution D. Goal: Leverage past experience to improve performance on new MDPs drawn from D. Environment M1Environment M2Environment Mn

MTRL Problem Tasks have hierarchical relationships. Set of classes (unknown to the agent). Natural means of transfer (class discovery).

Hierarchical Bayesian Modeling Foundation:  Dirichlet Process Models  Unknown number of classes.  Discover hierarchical structure. Explicit formulation of Uncertainty  Adapt machinery to the RL setting.  Well justified transfer for RL problems.

Basic Hierarchical Transfer Process Process Inference Select Actions (Bayesian RL) Compute Posterior Select Best Hierarchy

Model-Based Multi-Task RL  Prior model for domain models.  Action selection: Thompson sampling Planning Policy-Based Multi-Task RL  Prior for policy parameters.  Action selection: Bayesian Policy Search algorithm. Hierarchical Bayesian Transfer for RL

Model-Based MTRL Explicitly Model the Generative Process D Hierarchy represents classes of MDPs. Class Prior Estimate D

Action Selection: Exploit estimate of D Exploit the refined prior (class information).  Sample the MDPs using Thompson Sampling.  Plan with the sampled model (Value Iteration). Compute Posterior Plan

Domain 1 State is a bit vector: True reward function: Set of 20 test maps. State

Domain 1 No Transfer 16 previous tasks

Policy-Based MTRL Policy prior. Infer policy components. Hierarchy represents reusable policy components. Class Prior Estimate H

Consider Wargus RTS Multiple Unit types. Units fulfill tactical roles. Roles are useful in multiple maps.  Simple->hard instances Hierarchical policy prior.  Facilitate reuse of roles.

Role Based Policies Set of Roles.  Vectors of policy parameters.  Who to attack. Set of role assignments. A strategy for assigning agents to roles.  Assignment depends on state features. Executing role-based policy  1. Make the assignment  2. Each agent selects action

Transfer of Role-Based Policies Bayesian Policy Search  Learns Individual Role parameters. Role assignment function. Assignments of agents to roles. Sample role-based policies  Construct an artificial distribution [Hoffman et. al. NIPS 2007, Muller Bayes Stats.1999]  Search using stochastic simulation  Model free. Bayesian Policy Search

Experiments Tactical battles in Wargus Transfer given expert examples. Learning without expert examples.

Transfer from expert play.

Transfer from self play Use BPS on Training Map 1. Transfer to new map.

Conclusion Hierarchical Bayesian Modeling for RL Transfer  Model-Based MTRL Learn classes of domain models. Transfer: Improved priors for model-based Bayesian RL.  Policy-Based MTRL Learn re-usable policies. Transfer: Recombine learned policy components in new tasks. Solved tactical games in Wargus

Thank You

Outline Multi-Task Reinforcement Learning (RL).  Markov Decision Processes.  Multi-task RL setting Policy-Based Multi-task RL  Discover classes of policy components.  Bayesian Policy Search Algorithm. Conclusion

Policy-Based MTRL Observed property:  Bags of trajectories. Transfer:  Classes of policy components Means of exploiting transferred information:  Recombine existing components in new tasks. Consequence:  Components reused to learn hard tasks.

Outline Markov Decision Processes Bayesian Model Based Reinforcement Learning Multi Task Reinforcement Learning (MTRL) Modeling the MTRL Problem MTRL Transfer Algorithm  Estimating parameters of the generative process.  Action Selection. Results Conclusion

Bayesian Model Based RL Given prior: Plan using updated model. 1. Most work uses uninformed priors. 2. Selection of prior not supported by data. 3. Priors do not facilitate transfer. Environment