Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Reinforcement learning
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Model-Free vs. Model- Based RL: Q, SARSA, & E 3. Administrivia Reminder: Office hours tomorrow truncated 9:00-10:15 AM Can schedule other times if necessary.
Reinforcement Learning Tutorial
Reinforcement Learning1 COMP538 Reinforcement Learning Recent Development Group 7: Chan Ka Ki Fung On Tik Andy
Reinforcement Learning (for real this time). Administrivia No noose is good noose (Crazy weather?)
Chapter 5: Monte Carlo Methods
Policies and exploration and eligibility, oh my!.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.
An Introduction to COMPUTATIONAL REINFORCEMENT LEARING Andrew G. Barto Department of Computer Science University of Massachusetts – Amherst Lecture 3 Autonomous.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 9: Planning and Learning pUse of environment models pIntegration of planning.
Policies and exploration and eligibility, oh my!.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
Qjk Qjk.
Efficiency in ML / AI 1.Data efficiency (rate of learning) 2.Computational efficiency (memory, computation, communication) 3.Researcher efficiency (autonomy,
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Introduction to Reinforcement Learning
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.
Reinforcement Learning
INTRODUCTION TO Machine Learning
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
2/theres-an-algorithm-for-that-algorithmia- helps-you-find-it/ 2/theres-an-algorithm-for-that-algorithmia-
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 14: Planning and Learning Dr. Itamar Arel College of Engineering Department of Electrical.
Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 5: Monte Carlo Methods pMonte Carlo methods learn from complete sample.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Planning, Acting, and Learning Chapter Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.
Chapter 6: Temporal Difference Learning
Chapter 5: Monte Carlo Methods
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
Reinforcement learning
Chapter 3: The Reinforcement Learning Problem
CMSC 471 Fall 2009 RL using Dynamic Programming
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Chapter 6: Temporal Difference Learning
Chapter 9: Planning and Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 7: Eligibility Traces
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
October 20, 2010 Dr. Itamar Arel College of Engineering
Reinforcement Learning (2)
Reinforcement Learning (2)
Presentation transcript:

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton and Barto

Computational Modeling Lab Planning and Learning Model: anything the agent can use to predict how the environment will respond to its actions Distribution model: description of all possibilities and their probabilities e.g., Sample model: produces sample experiences e.g., a simulation model Both types of models can be used to produce simulated experience Often sample models are much easier to come by

Computational Modeling Lab Planning Planning: any computational process that uses a model to create or improve a policy Planning in AI:  state-space planning (e.g. Heuristic search methods ) We take the following (unusual) view:  all state-space planning methods involve computing value functions, either explicitly or implicitly  they all apply backups to simulated experience

Computational Modeling Lab Planning Two uses of real experience:  model learning: to improve the model  direct RL: to directly improve the value function and policy Improving value function and/or policy via a model is sometimes called indirect RL or model-based RL or planning.

Computational Modeling Lab Direct vs. Indirect RL Indirect methods  make fuller use of experience: get better policy with fewer environment interactions Direct methods  simpler  not affected by bad models But they are very closely related and can be usefully combined: planning, acting, model learning, and direct RL can occur simultaneously and in parallel

Computational Modeling Lab The Dyna-Q Algorithm model learning planning direct RL

Computational Modeling Lab Dyna-Q on a Simple Maze rewards = 0 until goal, when =1 Action selection  -greedy, with  = 0.1  = 0.1, initial action values = 0,  = 0.95

Computational Modeling Lab Dyna-Q Snapshots: Midway in 2nd Episode

Computational Modeling Lab When the Model is Wrong: Blocking Maze The changed envirnoment is harder

Computational Modeling Lab When the Model is Wrong: Shortcut Maze The changed environment is easier

Computational Modeling Lab What is Dyna-Q + ? Uses an “exploration bonus”:  Keeps track of time since each state-action pair was tried for real  An extra reward is added for transitions caused by state-action pairs related to how long ago they were tried: the longer unvisited, the more reward for visiting  The agent actually “plans” how to visit long unvisited states r = modeled reward n = # time steps since last trial  = weight factor