NIPS 2007 Workshop Welcome! Hierarchical organization of behavior Thank you for coming Apologies to the skiers… Why we will be strict about timing Why.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

Hierarchical Reinforcement Learning Amir massoud Farahmand

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.

Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement Learning

Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density Amy McGovern Andrew Barto.

Model-Free vs. Model- Based RL: Q, SARSA, & E 3. Administrivia Reminder: Office hours tomorrow truncated 9:00-10:15 AM Can schedule other times if necessary.

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Reinforcement Learning Introduction Presented by Alp Sardağ.

Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.

Reinforcement Learning: Learning to get what you want... Sutton & Barto, Reinforcement Learning: An Introduction, MIT Press 1998.

CS 188: Artificial Intelligence Fall 2009 Lecture 10: MDPs 9/29/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either Stuart Russell.

Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.

Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.

Collaborative Reinforcement Learning Presented by Dr. Ying Lu.

Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement Learning

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Reinforcement Learning

Solving Large Markov Decision Processes Yilan Gu Dept. of Computer Science University of Toronto April 12, 2004.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hierarchical Reinforcement Learning Using Graphical Models Victoria Manfredi and.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

SOME PERSPECTIVES ON PROBLEM-BASED LEARNING Stephen Ressler, P.E., Ph.D.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Reinforcement Learning

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Neural Networks Chapter 7

Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

CPSC 422, Lecture 8Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 8 Sep, 25, 2015.

Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.

QUIZ!!  T/F: Optimal policies can be defined from an optimal Value function. TRUE  T/F: “Pick the MEU action first, then follow optimal policy” is optimal.

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Search Control.. Planning is really really hard –Theoretically, practically But people seem ok at it What to do…. –Abstraction –Find “easy” classes of.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Planning, Acting, and Learning Chapter Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Reinforcement Learning (1)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

Model-based RL (+ action sequences): maybe it can explain everything

Announcements Homework 3 due today (grace period through Friday)

CS 188: Artificial Intelligence Fall 2008

Reinforcement Learning

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

CS 188: Artificial Intelligence Fall 2008

Artificial Intelligence Chapter 10 Planning, Acting, and Learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

NIPS 2007 Workshop Welcome! Hierarchical organization of behavior Thank you for coming Apologies to the skiers… Why we will be strict about timing Why we want the workshop to be interactive

Rewards/punishments may be delayed Outcomes may depend on sequence of actions  Credit assignment problem RL: Decision making Goal: maximize reward (minimize punishment)

RL in a nutshell: formalization states - actions - transitions - rewards - policy - long term values Components of an RL task Policy: p(S,a) State values: V(S) State-action values: Q(S,a) Policy: p(S,a) State values: V(S) State-action values: Q(S,a) S1S1 S3S3 S2S R L

RL in a nutshell: forward search S1S1 S3S3 S2S2 L R L R L R = 4 = 0 = 2 Model based RL learn model through experience (cognitive map) choosing actions is hard goal directed behavior; cortical Model = T(ransitions) and R(ewards) S1S1 S3S3 S2S R L

Trick #1: Long-term values are recursive Q(S,a) = r(S,a) + V(S next ) Trick #1: Long-term values are recursive Q(S,a) = r(S,a) + V(S next ) RL in a nutshell: cached values Model-free RL temporal difference learning Q(S,a) = r(S,a) + max Q(S’,a’) TD learning: start with initial (wrong) Q(S,a) PE = r(S,a) + max Q(S’,a’) - Q(S,a) Q(S,a) new = Q(S,a) old +  PE S1S1 S3S3 S2S R L

RL in a nutshell: cached values Model-free RL choosing actions is easy (but need lots of practice to learn) habitual behavior; basal ganglia temporal difference learning S1S1 S3S3 S2S24022 R L Trick #2: Can learn values without a model Q(S 1,L) 4 Q(S 1,R) 2 Q(S 2,L) 4 Q(S 2,R) 0 Q(S 3,L) 2 Q(S 3,R) 2

RL in real world tasks… model based vs. model free learning and control Q(S 1,L) 4 Q(S 1,R) 2 Q(S 2,L) 4 Q(S 2,R) 0 Q(S 3,L) 2 Q(S 3,R) 2 S1S1 S3S3 S2S2 L R L R L R = 4 = 0 = 2 S1S1 S3S3 S2S24022 R L Scaling problem!

Real-world behavior is hierarchical Hierarchical RL: What is it? 1. set water temp 2. get wet 3. shampoo 4. soap 5. turn off water 6. dry off add hot success add cold wait 5sec too cold too hot change just right simplified control, disambiguation, encapsulation 1. pour coffee 2. add sugar 3. add milk 4. stir

HRL: (in)formal framework Termination condition = (sub)goal state Option policy learning: via pseudo reward (model based or model free) Hierarchical RL: What is it? options - skills - macros - temporally abstract actions (Sutton, McGovern, Dietterich, Barto, Precup, Singh, Parr…) Option: set water temperature S1S1 S2S2 S8S8 … S1S S2S2 0.8 S3S S 1 (0.1) S 2 (0.1) S 3 (0.9) … initiation set policy termination conditions

S: startG: goal Options: going to doors Actions: + 2 door options HRL: a toy example Hierarchical RL: What is it?

Advantages of HRL 1. Faster learning (mitigates scaling problem) Hierarchical RL: What is it? RL: no longer ‘tabula rasa’ 2. Transfer of knowledge from previous tasks (generalization, shaping)

Disadvantages (or: the cost) of HRL Hierarchical RL: What is it? 1.Need ‘right’ options - how to learn them? 2.Suboptimal behavior (“negative transfer”; habits) 3.More complex learning/control structure no free lunches…