Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.

Slides:



Advertisements
Similar presentations
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Advertisements

Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Markov Decision Processes & Reinforcement Learning Megan Smith Lehigh University, Fall 2006.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Reinforcement Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Reinforcement Learning Tutorial
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
An Introduction to Reinforcement Learning (Part 1) Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Chapter 5: Monte Carlo Methods
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Policies and exploration and eligibility, oh my!.
Reinforcement Learning. Overview  Introduction  Q-learning  Exploration vs. Exploitation  Evaluating RL algorithms  On-Policy Learning: SARSA.
Machine Learning Lecture 11: Reinforcement Learning
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.
Chapter 6: Temporal Difference Learning
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction.
Chapter 6: Temporal Difference Learning
Chapter 6: Temporal Difference Learning
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Reinforcement Learning
Policies and exploration and eligibility, oh my!.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2006.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Introduction to Reinforcement Learning
1 Reinforcement Learning Sungwook Yoon * Based in part on slides by Alan Fern and Daniel Weld.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.
Reinforcement Learning
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
CPSC 422, Lecture 8Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 8 Sep, 25, 2015.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Retraction: I’m actually 35 years old. Q-Learning.
CPSC 422, Lecture 10Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10 Sep, 30, 2015.
Reinforcement Learning Elementary Solution Methods
Sutton & Barto, Chapter 4 Dynamic Programming. Programming Assignments? Course Discussions?
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Sutton & Barto, Chapter 4 Dynamic Programming. Policy Improvement Theorem Let π & π’ be any pair of deterministic policies s.t. for all s in S, Then,
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning.
Chapter 6: Temporal Difference Learning
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement learning
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 4: Dynamic Programming
Presentation transcript:

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton and Barto

Computational Modeling Lab Backup diagrams in DP State-value function for policy  V(s) V(s 2 ’ ) V(s 2 ) Q(s,a) Q(s 2,a 2 ) Q(s 2,a 1 ) s1s1 s2s2 Action-values function for policy  Q(s 1,a 2 ) Q(s 1,a 1 ) V(s 3 ’ ) V(s 3 ) V(s 1 ’ ) V(s 1 )

Computational Modeling Lab Dynamic Programming, model based T T T TTTTTTTTTT

Computational Modeling Lab Recall Value Iteration in DP Q(s,a)

Computational Modeling Lab RL, model free TTTTTTTTTT

Computational Modeling Lab Q-Learning, a value iteration approach Q-learning is off-policy

Computational Modeling Lab example d R=4 c b a R=5 R=2 R=1 R=10 R= Epoch 1: 1,2,4 Epoch 2: 1,6 Epoch 3: 1,3 Epoch 4: 1,2,5 Epoch 6: 2,5

Computational Modeling Lab Some convergence issues Q-learning in guaranteed to converge in a Markovian setting Tsitsiklis J.N. Asynchronous Stochastic Approximation and Q- learning. Machine Learning, Vol. 16: , 1994.

Computational Modeling Lab Proof by Tsitsiklis, cont. On the convergence of Q-learning

Computational Modeling Lab Proof by Tsitsiklis On the convergence of Q-learning “Learning factor” Contraction mapping Noise term q vector, but with possibly outdated components Q(s,a)

Computational Modeling Lab Proof by Tsitsiklis, cont. Stochastic approximation, as a vector t qiqi qjqj FiFi F i + noise

Computational Modeling Lab Proof by Tsitsiklis, cont. Relating Q-learning to stochastic approximation i th component Noise term Contraction mapping Bellman operator Can vary in time

Computational Modeling Lab Sarsa: On-Policy TD Control When is Sarsa = Q-learning?

Computational Modeling Lab Q-Learning versus SARSA Q-learning is off-policy Q-learning is on-policy Sarsa

Computational Modeling Lab Cliff Walking example Actions: up, down, left, right Reward: cliff -100, goal 0, default -1. Action selection  -greedy, with  = 0.1 Sarsa takes exploration into account

Computational Modeling Lab Q-learning for CAC S 1 = (2,4) S 3 = (3,3) Q(s 1,A1) Q(s 1,R1) Q(s 3,A2) Q(s 3,R2) Class -1 Class -2 [ [ S 2 =(3,4) Acceptance Criterion: Maximize Network Revenue

Computational Modeling Lab Continuous Time Q-learning for CAC Call Arrival t 0 = 0 System state: x Call Departure t1t1 Call Arrival System state: y  Call Departure tntn Call Departure t2t2 [Bratke]