Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.
Hierarchical Reinforcement Learning Amir massoud Farahmand
Reinforcement learning
Dialogue Policy Optimisation
Todd W. Neller Gettysburg College
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Markov Decision Processes & Reinforcement Learning Megan Smith Lehigh University, Fall 2006.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Markov Decision Processes
Reinforcement Learning Tutorial
CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Algorithms 4/1/2008 Srini Narayanan – ICSI and UC Berkeley.
An Introduction to Reinforcement Learning (Part 1) Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
The People Have Spoken.... Administrivia Final Project proposal due today Undergrad credit: please see me in office hours Dissertation defense announcements.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Search and Planning for Inference and Learning in Computer Vision
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
Privacy-Preserving Bayes-Adaptive MDPs CS548 Term Project Kanghoon Lee, AIPR Lab., KAIST CS548 Advanced Information Security Spring 2010.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
INTRODUCTION TO Machine Learning
1 Introduction to Reinforcement Learning Freek Stulp.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
MDPs (cont) & Reinforcement Learning
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Revision.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Figure 5: Change in Blackjack Posterior Distributions over Time.
Chapter 6: Temporal Difference Learning
Reinforcement Learning (1)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
Reinforcement Learning
An Overview of Reinforcement Learning
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement learning
Dr. Unnikrishnan P.C. Professor, EEE
یادگیری تقویتی Reinforcement Learning
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Reinforcement Learning (2)
Presentation transcript:

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University

Outline Introduction to Reinforcement Learning (RL) Markov Decision Processes (MDPs) Traditional RL Solution Methods Gaussian Processes (GPs) Gaussian Process Temporal Difference (GPTD) Experiment Conclusion

Reinforcement Learning (RL) An agent interacts with the environment and learns how to map situations to actions in order to maximize the reward. Involves sequences of decision Almost all Artificial Intelligence (AI) problems can be formulated as RL problems

Reinforcement Learning (RL) Evaluative feedback (reward or reinforcement)  Indicates how good the action taken is, but not whether it is correct or wrong. Balance between exploration and exploitation  Exploitation— make most of the current information that has already got  Exploration — explore the unknown states that may cause higher return in the long run. Online learning

Markov Decision Processes (MDPs) RL problems can be formulated as Markov Decision Processes (MDPs) An MDP is a tuple  State space: S  Action space: A  Reward function  State transition function represents the probability of making a transition from state s to state s’ taking action a By a model, we mean the reward function and state-transition function.

Maze world problem

Traditional RL Solution Methods Dynamic Programming (DP) Monte Carlo (MC) Methods Temporal Difference (TD) Methods  All the methods are based on estimate of value function under certain policy  The value of a state is the total amount of reward an agent can expect to accumulate starting from that state

Maze world problem The value for the states that are near the goal should be greater than those far away from the goal.

Temporal Difference (TD) Methods Learn directly from experience Bootstrap: update estimate based on other learned estimate Do Not need a model Updating rule:  δ t : the temporal difference  α t : time dependent learning rate  γ: discounting rate

TD Method (With Optimistic Policy Iteration)

Policy Learned by TD method (After 100 trails)

Gaussian Processes (GPs) A Bayesian approach: provides full posterior over values, not just point estimates Forces us to make our assumptions explicit Non-parametric – priors are placed and inference is performed directly in function space (kernels) Domain knowledge intuitively coded in priors

Gaussian Processes (GPs) “An indexed set of jointly Gaussian random variables" The index set X may be just about any set. For a Gaussian Process F, Kernel function k(x,x’) is symmetric positive definite (Mercer kernel).

Conditioning Theorem

GP regression

GPTD Methods Generative model for the reward of the trajectory s 1,, s 2,…,s t, In compact form

GPTD Methods Conditioning theorem where

Can we use the uncertainty information to help balance exploitation and exploration? New value function (improved GPTD): Parameter c balances the importance between exploitation and exploration. Information theory – higher uncertainty means more information. Visiting states with higher uncertainty gives higher information gain – another kind of value for a state

Experiment Gaussian kernel is used:  ||s-s’|| represents the Euclidean distance between two states s and s’ : adjacent state will have similar value function in maze problem. Use Optimistic Policy Iteration (OPI) to determine policy  Take actions that lead to the highest expected returned based on current value estimate.

GPTD Method

Policy Learned by GPTD

GPTD for Multi-goal Maze

Policy Learned by GPTD

Improved GPTD

Policy Learned by Improved GPTD

Conclusion Gaussian process gives how certainty the estimate is along with the estimate. GPTD will give much better results than traditional RL methods The main contribution of the project is the proposal of one way to utilize the uncertainty to balance exploration and exploitation in RL; and the experiment shows its effectiveness

Reference Bishop, C. M Pattern Recognition and Machine Learning. Secaucus, NJ, USA: Springer-Verlag New York, Inc. Engel, Y.; Mannor, S.; and Meir, R Bayesian meets bellman: The gaussian process approach to temporal difference learning. International Conference on Machine Learning. Engel, Y.; Mannor, S.; and Meir, R Reinforcement learning with gaussian process. International Conference on Machine Learning. Engel, Y Algorithms and Representations for Reinforcement Learning. Ph.D. Dissertation, The Hebrew University of Jerusalem, Israel. Puterman, M. L Markov Decision Process: Discrete Stochastic Dynamic Programming. Wiley-Interscience, New York, NY. Russell, S. J., and Norvig, P Artificial Intelligence: A Modern Approach (2nd Edition). Prentice Hall. Sutton, R. S., and Barto, A. G Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.

Questions? Bayesian Reinforcement Learning with Gaussian Processes