R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 1: Introduction Psychology Artificial Intelligence Control Theory and Operations.

Slides:

Advertisements

Similar presentations

1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.

Advertisements

Reinforcement Learning: A Tutorial Rich Sutton AT&T Labs -- Research

10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.

CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

An Introduction to Reinforcement Learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Chapter 5: Monte Carlo Methods

Chapter 1: Introduction

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Chapter 6: Temporal Difference Learning

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction.

Chapter 6: Temporal Difference Learning

1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.

Reinforcement Learning (1)

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement Learning

Reinforcement Learning’s Computational Theory of Mind Rich Sutton Andy Barto Satinder SinghDoina Precup with thanks to:

Reinforcement Learning

Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

Reinforcement Learning

CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.

Neural Networks Chapter 7

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Stochastic tree search and stochastic games

Introduction of Reinforcement Learning

A Comparison of Learning Algorithms on the ALE

Done Done Course Overview What is AI? What are the Major Challenges?

Reinforcement Learning

Chapter 6: Temporal Difference Learning

Backgammon project Oren Salzman Guy Levit Instructors:

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

An Overview of Reinforcement Learning

CS 4700: Foundations of Artificial Intelligence

Reinforcement Learning

CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17

Reinforcement Learning

Reinforcement learning

Dr. Unnikrishnan P.C. Professor, EEE

RL for Large State Spaces: Value Function Approximation

Chapter 2: Evaluative Feedback

Chapter 11: Case Studies Objectives of this chapter:

یادگیری تقویتی Reinforcement Learning

Chapter 8: Generalization and Function Approximation

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Chapter 1: Introduction

Chapter 9: Planning and Learning

Chapter 7: Eligibility Traces

Chapter 2: Evaluative Feedback

Chapter 11: Case Studies Objectives of this chapter:

Presentation transcript:

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 1: Introduction Psychology Artificial Intelligence Control Theory and Operations Research Artificial Neural Networks Reinforcement Learning (RL) Neuroscience

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 2 What is Reinforcement Learning? pLearning from interaction pGoal-oriented learning pLearning about, from, and while interacting with an external environment pLearning what to do—how to map situations to actions— so as to maximize a numerical reward signal

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 3 Supervised Learning System InputsOutputs Training Info = desired (target) outputs Error = (target output – actual output)

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 4 Reinforcement Learning RL System InputsOutputs (“actions”) Training Info = evaluations (“rewards” / “penalties”) Objective: get as much reward as possible

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 5 Key Features of RL pLearner is not told which actions to take pTrial-and-Error search pPossibility of delayed reward Sacrifice short-term gains for greater long-term gains pThe need to explore and exploit pConsiders the whole problem of a goal-directed agent interacting with an uncertain environment

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 6 Complete Agent pTemporally situated pContinual learning and planning pObject is to affect the environment pEnvironment is stochastic and uncertain Environment action state reward Agent

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 7 Elements of RL pPolicy: what to do pReward: what is good pValue: what is good because it predicts reward pModel: what follows what Policy Reward Value Model of environment

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 8 An Extended Example: Tic-Tac-Toe X X XOO X X O X O X O X O X X O X O X O X O X O X O X } x’s move } o’s move } x’s move } o’s move... x x x x o x o x o x x x x o o Assume an imperfect opponent: —he/she sometimes makes mistakes

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 9 An RL Approach to Tic-Tac-Toe 1. Make a table with one entry per state: 2. Now play lots of games. To pick our moves, look ahead one step: State V(s) – estimated probability of winning.5 ?... 1 win 0 loss... 0 draw x x x x o o o o o x x oo oo x x x x o current state various possible next states * Just pick the next state with the highest estimated prob. of winning — the largest V(s); a greedy move. But 10% of the time pick a move at random; an exploratory move.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 10 RL Learning Rule for Tic-Tac-Toe “Exploratory” move

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 11 How can we improve this T.T.T. player? pTake advantage of symmetries representation/generalization How might this backfire? pDo we need “random” moves? Why? Do we always need a full 10%? pCan we learn from “random” moves? pCan we learn offline? Pre-training from self play? Using learned models of opponent? p...

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 12 e.g. Generalization Table Generalizing Function Approximator State V sss...ssss...s N Train here

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 13

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 14 How is Tic-Tac-Toe Too Easy? pFinite, small number of states pOne-step look-ahead is always possible pState completely observable p...

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 15 Some Notable RL Applications pTD-Gammon: Tesauro –world’s best backgammon program pElevator Control: Crites & Barto –high performance down-peak elevator controller pInventory Management: Van Roy, Bertsekas, Lee&Tsitsiklis –10–15% improvement over industry standard methods pDynamic Channel Assignment: Singh & Bertsekas, Nie & Haykin –high performance assignment of radio channels to mobile telephone calls

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 16 TD-Gammon Start with a random network Play very many games against self Learn a value function from this simulated experience This produces arguably the best player in the world Action selection by 2–3 ply search Value TD error Tesauro, 1992–1995

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 17 Elevator Dispatching 10 floors, 4 elevator cars STATES: button states; positions, directions, and motion states of cars; passengers in cars & in halls ACTIONS: stop at, or go by, next floor REWARDS: roughly, –1 per time step for each person waiting Conservatively about 10 states 22 Crites and Barto, 1996

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 18 Performance Comparison

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 19 Some RL History Trial-and-Error learning Temporal-difference learning Optimal control, value functions Thorndike (  ) 1911 Minsky Klopf Barto et al. Secondary reinforcement (  ) Samuel Witten Sutton Hamilton (Physics) 1800s Shannon Bellman/Howard (OR) Werbos Watkins Holland

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 20 MENACE (Michie 1961) “Matchbox Educable Noughts and Crosses Engine”

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 21 The Book pPart I: The Problem Introduction Evaluative Feedback The Reinforcement Learning Problem pPart II: Elementary Solution Methods Dynamic Programming Monte Carlo Methods Temporal Difference Learning pPart III: A Unified View Eligibility Traces Generalization and Function Approximation Planning and Learning Dimensions of Reinforcement Learning Case Studies

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 22 The Course pOne chapter per week (with some exceptions) pRead the chapter for the first class devoted to that chapter pWritten homeworks: basically all the non-programming assignments in each chapter. Due second class on that chapter. pProgramming exercises (not projects!): each student will do approximately 3 of these, including one of own devising (in consultation with instructor and/or TA). pClosed-book, in-class midterm; closed-book 2-hr final pGrading: 40% written homeworks; 30% programming homeworks; 15% final; 10% midterm; 5% class participation pSee the web for more details

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 23 Next Class pIntroduction continued and some case studies pRead Chapter 1 pHand in exercises 1.1 — 1.5