Machine Learning Chapter 13. Reinforcement Learning

Slides:

Advertisements

Similar presentations

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Advertisements

Reinforcement Learning

10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Reinforcement Learning

Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.

Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.

Reinforcement Learning

Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.

INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.

CMPUT 551 Analyzing abstraction and approximation within MDP/POMDP environment Magdalena Jankowska (M.Sc. - Algorithms) Ilya Levner (M.Sc - AI/ML)

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Reinforcement Learning (1)

International Graduate School of Dynamic Intelligent Systems Machine Learning RG Knowledge Based Systems Hans Kleine Büning 15 July 2015.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement Learning

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Introduction Many decision making problems in real life

Reinforcement Learning

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.

1 CS 4701 – Project Proposal Jane Park (jp624) Ran Zhao (rz54)

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

Reinforcement Learning

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday 29 October 2002 William.

Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?

Neural Networks Chapter 7

INTRODUCTION TO Machine Learning

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Reinforcement Learning

CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.

1 Introduction to Machine Learning Chapter 1. cont.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Markov Decision Process (MDP)

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Introduction to Reinforcement Learning Hiren Adesara Prof: Dr. Gittens.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Reinforcement Learning

Reinforcement learning

Reinforcement Learning

Markov Decision Processes

Planning to Maximize Reward: Markov Decision Processes

Markov Decision Processes

Reinforcement learning

Chapter 3: The Reinforcement Learning Problem

Dr. Unnikrishnan P.C. Professor, EEE

Learning to Maximize Reward: Reinforcement Learning

یادگیری تقویتی Reinforcement Learning

Chapter 3: The Reinforcement Learning Problem

Reinforcement Learning

Chapter 3: The Reinforcement Learning Problem

Reinforcement learning

Presentation transcript:

Machine Learning Chapter 13. Reinforcement Learning Tom M. Mitchell

Control Learning Consider learning to choose actions, e.g., Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon Note several problem characteristics: Delayed reward Opportunity for active exploration Possibility that state only partially observable Possible need to learn multiple tasks with same sensors/effectors

One Example: TD-Gammon Learn to play Backgammon Immediate reward +100 if win -100 if lose 0 for all other states Trained by playing 1.5 million games against itself Now approximately equal to best human player

Reinforcement Learning Problem

Markov Decision Processes Assume finite set of states S set of actions A at each discrete time agent observes state st  S and chooses action at  A then receives immediate reward rt and state changes to st+1 Markov assumption : st+1 = (st, at ) and rt = r(st, at ) i.e., rt and st+1 depend only on current state and action functions  and r may be nondeterministic functions  and r not necessarily known to agent

Agent's Learning Task

Value Function

What to Learn

Q Function

Training Rule to Learn Q

Q Learning for Deterministic Worlds

Nondeterministic Case

Nondeterministic Case(Cont’)

Temporal Difference Learning

Temporal Difference Learning(Cont’)

Subtleties and Ongoing Research