Chapter 1: Introduction

Slides:



Advertisements
Similar presentations
1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.
Advertisements

Reinforcement Learning
Reinforcement Learning: A Tutorial Rich Sutton AT&T Labs -- Research
RL for Large State Spaces: Value Function Approximation
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
CS 4700: Foundations of Artificial Intelligence Bart Selman Reinforcement Learning R&N – Chapter 21 Note: in the next two parts of RL, some of the figure/section.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Reinforcement Learning
An Introduction to Reinforcement Learning
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 2: Evaluative Feedback pEvaluating actions vs. instructing by giving correct.
Reinforcement Learning
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Chapter 6: Temporal Difference Learning
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction.
Chapter 6: Temporal Difference Learning
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning (1)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning
Reinforcement Learning’s Computational Theory of Mind Rich Sutton Andy Barto Satinder SinghDoina Precup with thanks to:
Machine Learning Chapter 13. Reinforcement Learning
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Introduction Many decision making problems in real life
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Introduction to Reinforcement Learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Reinforcement Learning
Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
Neural Networks Chapter 7
INTRODUCTION TO Machine Learning
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 1: Introduction Psychology Artificial Intelligence Control Theory and Operations.
Reinforcement Learning
Chapter 6: Temporal Difference Learning
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
CS 4700: Foundations of Artificial Intelligence
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
یادگیری تقویتی Reinforcement Learning
Chapter 6: Temporal Difference Learning
Chapter 1: Introduction
Chapter 2: Evaluative Feedback
Presentation transcript:

Chapter 1: Introduction Artificial Intelligence Control Theory and Operations Research Psychology Reinforcement Learning (RL) Neuroscience Artificial Neural Networks R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

What is Reinforcement Learning? Learning from interaction Goal-oriented learning Learning about, from, and while interacting with an external environment Learning what to do—how to map situations to actions—so as to maximize a numerical reward signal R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Supervised Learning Error = (target output – actual output) Training Info = desired (target) outputs Supervised Learning System Inputs Outputs Error = (target output – actual output) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Reinforcement Learning Training Info = evaluations (“rewards” / “penalties”) RL System Inputs Outputs (“actions”) Objective: get as much reward as possible R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Key Features of RL Learner is not told which actions to take Trial-and-Error search Possibility of delayed reward Sacrifice short-term gains for greater long-term gains The need to explore and exploit Considers the whole problem of a goal-directed agent interacting with an uncertain environment R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Complete Agent Temporally situated Continual learning and planning Object is to affect the environment Environment is stochastic and uncertain Environment action state reward Agent R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Elements of RL Policy: what to do Reward: what is good Value Model of environment Policy: what to do Reward: what is good Value: what is good because it predicts reward Model: what follows what R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

An Extended Example: Tic-Tac-Toe ... } x’s move x x x } o’s move ... ... ... x o o x o x x } x’s move ... ... ... ... ... } o’s move Assume an imperfect opponent: —he/she sometimes makes mistakes } x’s move x o x x o R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

An RL Approach to Tic-Tac-Toe 1. Make a table with one entry per state: State V(s) – estimated probability of winning .5 ? x .5 ? 2. Now play lots of games. To pick our moves, look ahead one step: . . . . . . x x x 1 win o o . . . . . . current state various possible next states * x o 0 loss x o o . . . . . . o x o 0 draw o x x x o o Just pick the next state with the highest estimated prob. of winning — the largest V(s); a greedy move. But 10% of the time pick a move at random; an exploratory move. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

RL Learning Rule for Tic-Tac-Toe “Exploratory” move R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

How can we improve this T.T.T. player? Take advantage of symmetries representation/generalization How might this backfire? Do we need “random” moves? Why? Do we always need a full 10%? Can we learn from “random” moves? Can we learn offline? Pre-training from self play? Using learned models of opponent? . . . R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

e.g. Generalization Table Generalizing Function Approximator State V 1 2 3 Train here N R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

How is Tic-Tac-Toe Too Easy? Finite, small number of states One-step look-ahead is always possible State completely observable . . . R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Some Notable RL Applications TD-Gammon: Tesauro world’s best backgammon program Elevator Control: Crites & Barto high performance down-peak elevator controller Inventory Management: Van Roy, Bertsekas, Lee&Tsitsiklis 10–15% improvement over industry standard methods Dynamic Channel Assignment: Singh & Bertsekas, Nie & Haykin high performance assignment of radio channels to mobile telephone calls R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

TD-Gammon Tesauro, 1992–1995 Action selection by 2–3 ply search Value TD error Start with a random network Play very many games against self Learn a value function from this simulated experience This produces arguably the best player in the world R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Elevator Dispatching Crites and Barto, 1996 10 floors, 4 elevator cars STATES: button states; positions, directions, and motion states of cars; passengers in cars & in halls ACTIONS: stop at, or go by, next floor REWARDS: roughly, –1 per time step for each person waiting 22 Conservatively about 10 states R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Performance Comparison R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Some RL History Trial-and-Error learning Temporal-difference learning Optimal control, value functions Thorndike () 1911 Hamilton (Physics) 1800s Secondary reinforcement () Shannon Minsky Samuel Bellman/Howard (OR) Holland Klopf Witten Werbos Barto et al. Sutton Watkins R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

MENACE (Michie 1961) “Matchbox Educable Noughts and Crosses Engine” x R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

The Book Part I: The Problem Introduction Evaluative Feedback The Reinforcement Learning Problem Part II: Elementary Solution Methods Dynamic Programming Monte Carlo Methods Temporal Difference Learning Part III: A Unified View Eligibility Traces Generalization and Function Approximation Planning and Learning Dimensions of Reinforcement Learning Case Studies R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction