Staffan Järn.  Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good.

Slides:



Advertisements
Similar presentations
By David Sanders CS 430, Artificial Intelligence September 25, 2008.
Advertisements

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/21 at 5:00pm.  Optional.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Eick: Q-Learning for the PD-World COSC 6342 Project 1 Spring 2014 Q-Learning for a Pickup Dropoff World P P PD D D.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Reinforcement Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Reinforcement learning
Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.
Distributed Q Learning Lars Blackmore and Steve Block.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.
Chapter 6: Temporal Difference Learning
Reinforcement Learning Presented by: Kyle Feuz.
Ai in game programming it university of copenhagen Reinforcement Learning [Intro] Marco Loog.
Chapter 6: Temporal Difference Learning
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.
Reinforcement Learning
Reinforcement Learning (1)
Policies and exploration and eligibility, oh my!.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.
/t/ / I d/ /d/ Try Again Go on /t/ / I d/ /d/
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Introduction Many decision making problems in real life
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
INTRODUCTION TO Machine Learning
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Schedule for presentations. 6.1: Chris? – The agent is driving home from work from a new work location, but enters the freeway from the same point. Thus,
CPSC 422, Lecture 8Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 8 Sep, 25, 2015.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Retraction: I’m actually 35 years old. Q-Learning.
Reinforcement Learning
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
CPSC 422, Lecture 10Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10 Sep, 30, 2015.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/19 at 5:00pm.  Optional.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 10
Done Done Course Overview What is AI? What are the Major Challenges?
Chapter 6: Temporal Difference Learning
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement Learning
Reinforcement learning
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
Chapter 6: Temporal Difference Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Announcements Homework 2 Project 2 Mini-contest 1 (optional)
Reinforcement learning
Presentation transcript:

Staffan Järn

 Intelligent learning algortithm  Doesn’t require the presence of a teacher  The algorithm is given a reward (a reinforcement) for good actions  The algortithm tries to figure out what is the best action to take in a given state, without knowing the final optimal solution.  The actions are based on rewards and penalties.

 Robot control  Elevator scheduling (search for patterns)  Telecommunications (finding networks)  Games (Chess, Backgammon)  Financial trading

 Gridworld (4 x 12)  The walker (agent) is supposed to find the shortest or safest way to the finish, without falling into the cliff (blue area)  Falling into to cliff gives 100 penalty points, and the walker has to start over again

Q-learning algorithm  Matrix, called the Q-matrix  48 x 4 matrix (12x4 gridworld) x 4 (four directions)  The Q-matrix contains a ”price” for taking a certain action  Initialized randomly in the beginning  The walker has two options: Take the optimal action, according to smallest Q-value Explore the gridworld by taking a random step (cannot walk into the wall)  Q-value is updated according to the equation every time the walker takes an action

 The new value in the Q-matrix for the previous state and taking the previously taken action will be updated based on: what it was before multiplied by (1-α), plus a factor (alfa) multiplied by the sum of the cost to take a step (usually 1, cliff 100) and another factor (gamma) multiplied by the best action the walker can take (optimal action) New valuePrevious step Best action Sum of the cost Alfa = learning factorGamma = reward factor

SARSA-algorithm  Another way of updating the Q-matrix  Not based on the next optimal move, but on the next actual move  Means that it will take into account the risk of falling into the cliff, and will eventually arrive at a safer path   Longer, but safer path

Fig 1) Q-learning, the 100-th walk Fig 2) Q-learning, optimal solution Fig 3) SARSA, the 100-th walkFig 4) SARSA, optimal solution

Random steps over the cliff

 Reinforcement Learning (pdf), Jonas Waller [2005]  Cliffwalker program, Jonas Waller [2005]  Reinforcement Learning, An Introduction. Sutton and Barto