Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

RL for Large State Spaces: Value Function Approximation

brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

Reinforcement Learning

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Reinforcement Learning & Apprenticeship Learning Chenyi Chen.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement Learning Tutorial

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Reinforcement Learning

1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Reinforcement Learning

Introduction Many decision making problems in real life

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Reinforcement Learning

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

MDPs (cont) & Reinforcement Learning

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement learning (Chapter 21)

Reinforcement Learning

CMSC 471 Fall 2009 MDPs and the RL Problem Prof. Marie desJardins Class #23 – Tuesday, 11/17 Thanks to Rich Sutton and Andy Barto for the use of their.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Reinforcement Learning RS Sutton and AG Barto Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 3: The Reinforcement Learning Problem pdescribe the RL problem we will.

Reinforcement Learning

Building Imitation and Self-Evolving AI in Python

Reinforcement learning (Chapter 21)

Reinforcement Learning

An Overview of Reinforcement Learning

Autonomous Cyber-Physical Systems: Reinforcement Learning for Planning

"Playing Atari with deep reinforcement learning."

CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17

Reinforcement learning

Chapter 3: The Reinforcement Learning Problem

Dr. Unnikrishnan P.C. Professor, EEE

Chapter 3: The Reinforcement Learning Problem

Chapter 8: Generalization and Function Approximation

Chapter 3: The Reinforcement Learning Problem

CS 188: Artificial Intelligence Spring 2006

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015

“You might find this one-page proof complicated, but if you read this book, then the proof becomes easy.” — Eric B. Laber “Why do you want to climb Mount Everest?” “Because it’s there.” — George Mallory 2 OCT 2015Reinforcement Learning with Laser Cats!2

Why do we care about LaserCat at all?? Means to an end. Life is a game. 2 OCT 2015Reinforcement Learning with Laser Cats!3

2 OCT 2015Reinforcement Learning with Laser Cats!4 Laser Cats! Demo Reinforcement Learning Dynamic Programming Q-learning ….

2 OCT 2015Reinforcement Learning with Laser Cats!5

2 OCT 2015Reinforcement Learning with Laser Cats!6

2 OCT 2015Reinforcement Learning with Laser Cats!7

Laser Cats! Demo Reinforcement Learning Dynamic Programming Q-learning …. 2 OCT 2015Reinforcement Learning with Laser Cats!8

I. The Problem What is Reinforcement Learning? Learning through interaction with an environment Goal-oriented learning: maximize a numerical reward signal Learn from evaluation not instruction Exploitation vs. exploration tradeoff Trial-and-Error Delayed Reward Example: The usual or surprise me! 2 OCT 2015Reinforcement Learning with Laser Cats!9 Reinforcement Learning

Elements of RL Agent – learner + decision maker Environment – what the agent interacts with Policy – agent’s behavior at a given state Reward – numeric representation of desirability Immediate and given directly by environment Value – total amount of reward expected Long-term and must be estimated and re-estimated Model – mimics the behavior of the environment for use in future planning 2 OCT 2015Reinforcement Learning with Laser Cats!10 Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!11 Agent Agent-Environment Interface Interaction occurs with discrete time steps Agent performs action at time, Agent observes state at time, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!12 Agent A reward is given, and the next state, Agent Interaction occurs with discrete time steps Agent performs action at time, Agent observes state at time, Agent-Environment Interface Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!13 Environment Agent stateactionreward Agent-Environment Interface Interaction occurs with discrete time steps Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!14 Environment Agent stateactionreward Agent-Environment Interface Interaction occurs with discrete time steps Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!15 Agent-Environment Interface Interaction occurs with discrete time steps Environment Agent stateactionreward Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

Policies Agent implements mapping from state-action pair to a probability of selecting that action given that state Policies are changed due to experience How to find the optimal policy to maximize total reward? 2 OCT 2015Reinforcement Learning with Laser Cats!16 Reinforcement Learning

Value Functions Estimate how good it is for the agent to be in a given state. Return The value (Expected Return) of a state under a policy: The value of taking an action in a state under a policy: 2 OCT 2015Reinforcement Learning with Laser Cats!17 Reinforcement Learning

Optimal Value Functions There is always at least one policy that is better than or equal to all others. These are denoted. They share the same optimal state-value and action-value functions: Expected return for taking action in state and following some thereafter. 2 OCT 2015Reinforcement Learning with Laser Cats!18 Reinforcement Learning

Bellman Optimality Equations Under an optimal policy, the value of a state is equal to the expected return for the best action 2 OCT 2015Reinforcement Learning with Laser Cats!19 Reinforcement Learning

Solving Optimality Equations Requires knowledge of environment, space and time to do computations, Markov Property Settle for approximations Many RL methods are understood as approximations to solving the Bellman Optimality Equation 2 OCT 2015Reinforcement Learning with Laser Cats!20

Temporal Difference Learning Recall: Bellman Equation: Temporal Difference Prediction: 2 OCT 2015Reinforcement Learning with Laser Cats!21

Temporal Difference Learning Recall: action-value function: SARSA: Q-Learning: 2 OCT 2015Reinforcement Learning with Laser Cats!22

Value Prediction with Function Approximation Handles continuous state space Type of functions: Linear. Neural Networks. Decision trees... Parameter update: Gradient descent. Least squares policy iteration... 2 OCT 2015Reinforcement Learning with Laser Cats!23

Starting Point: Linear Function Approximation with Hand-craft Features Construct a set of features for any state s: Estimate the state value with a L.C. of features: Use gradient descent to update parameters: 2 OCT 2015Reinforcement Learning with Laser Cats!24

2 OCT 2015Reinforcement Learning with Laser Cats!25 Feature Construction Raw state: (x, y) of upper left corner of each object Features computed from raw state:

Skynet 1.0 (Demo) 2 OCT 2015Reinforcement Learning with Laser Cats!26

Skynet 1.0 Problem 1: Only learns safe policy Problem 2: Handles 2 nd – 4 th mouse poorly Problem 3: Estimates don’t converge 2 OCT 2015Reinforcement Learning with Laser Cats!27

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: 2 OCT 2015Reinforcement Learning with Laser Cats!28

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: 2 OCT 2015Reinforcement Learning with Laser Cats!29

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: - If reducing value A leads to increasing positive reward, set the feature to (constant - value A) - If reducing value B leads to reducing negative reward, set the feature to value B. 2 OCT 2015Reinforcement Learning with Laser Cats!30

Skynet 1.1 Fix 2: Ignore negative reward from letting mouse escape. 2 OCT 2015Reinforcement Learning with Laser Cats!31

Skynet 1.1 Fix 3: Set step size to a harmonic sequence. - Initial alpha = After 10 minutes, shrink alpha each step: alpha = 1/10001, 1/10002,... 2 OCT 2015Reinforcement Learning with Laser Cats!32

Skynet 1.1 (Demo) Did they work? Can it handle INSANE DIFFICULTY?? 2 OCT 2015Reinforcement Learning with Laser Cats!33

Skynet 1.1 Limitations 1. Requires a perfect model for the environment 2. Requires perfect knowledge of observed state 3. Requires expert level hand-craft features Solution 1: Q-Learning Solution 2: Convolutional Neural Network Solution 3: B-Spline Approximation, Tile Coding, Deep Learning, etc. 2 OCT 2015Reinforcement Learning with Laser Cats!34

Real World Application? 2 OCT 2015Reinforcement Learning with Laser Cats!35

Thank you! 2 OCT 2015Reinforcement Learning with Laser Cats!36 References Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction