Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015

“You might find this one-page proof complicated, but if you read this book, then the proof becomes easy.” — Eric B. Laber “Why do you want to climb Mount Everest?” “Because it’s there.” — George Mallory 2 OCT 2015Reinforcement Learning with Laser Cats!2

Why do we care about LaserCat at all?? Means to an end. Life is a game. 2 OCT 2015Reinforcement Learning with Laser Cats!3

2 OCT 2015Reinforcement Learning with Laser Cats!4 Laser Cats! Demo Reinforcement Learning Dynamic Programming Q-learning ….

2 OCT 2015Reinforcement Learning with Laser Cats!5

Laser Cats! Demo Reinforcement Learning Dynamic Programming Q-learning …. 2 OCT 2015Reinforcement Learning with Laser Cats!8

I. The Problem What is Reinforcement Learning? Learning through interaction with an environment Goal-oriented learning: maximize a numerical reward signal Learn from evaluation not instruction Exploitation vs. exploration tradeoff Trial-and-Error Delayed Reward Example: The usual or surprise me! 2 OCT 2015Reinforcement Learning with Laser Cats!9 Reinforcement Learning

Elements of RL Agent – learner + decision maker Environment – what the agent interacts with Policy – agent’s behavior at a given state Reward – numeric representation of desirability Immediate and given directly by environment Value – total amount of reward expected Long-term and must be estimated and re-estimated Model – mimics the behavior of the environment for use in future planning 2 OCT 2015Reinforcement Learning with Laser Cats!10 Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!11 Agent Agent-Environment Interface Interaction occurs with discrete time steps Agent performs action at time, Agent observes state at time, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!12 Agent A reward is given, and the next state, Agent Interaction occurs with discrete time steps Agent performs action at time, Agent observes state at time, Agent-Environment Interface Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!13 Environment Agent stateactionreward Agent-Environment Interface Interaction occurs with discrete time steps Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!14 Environment Agent stateactionreward Agent-Environment Interface Interaction occurs with discrete time steps Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

2 OCT 2015Reinforcement Learning with Laser Cats!15 Agent-Environment Interface Interaction occurs with discrete time steps Environment Agent stateactionreward Agent performs an action at time, Agent observes state at time, A reward is given, and the next state, Reinforcement Learning

Policies Agent implements mapping from state-action pair to a probability of selecting that action given that state Policies are changed due to experience How to find the optimal policy to maximize total reward? 2 OCT 2015Reinforcement Learning with Laser Cats!16 Reinforcement Learning

Value Functions Estimate how good it is for the agent to be in a given state. Return The value (Expected Return) of a state under a policy: The value of taking an action in a state under a policy: 2 OCT 2015Reinforcement Learning with Laser Cats!17 Reinforcement Learning

Optimal Value Functions There is always at least one policy that is better than or equal to all others. These are denoted. They share the same optimal state-value and action-value functions: Expected return for taking action in state and following some thereafter. 2 OCT 2015Reinforcement Learning with Laser Cats!18 Reinforcement Learning

Bellman Optimality Equations Under an optimal policy, the value of a state is equal to the expected return for the best action 2 OCT 2015Reinforcement Learning with Laser Cats!19 Reinforcement Learning

Solving Optimality Equations Requires knowledge of environment, space and time to do computations, Markov Property Settle for approximations Many RL methods are understood as approximations to solving the Bellman Optimality Equation 2 OCT 2015Reinforcement Learning with Laser Cats!20

Temporal Difference Learning Recall: Bellman Equation: Temporal Difference Prediction: 2 OCT 2015Reinforcement Learning with Laser Cats!21

Temporal Difference Learning Recall: action-value function: SARSA: Q-Learning: 2 OCT 2015Reinforcement Learning with Laser Cats!22

Value Prediction with Function Approximation Handles continuous state space Type of functions: Linear. Neural Networks. Decision trees... Parameter update: Gradient descent. Least squares policy iteration... 2 OCT 2015Reinforcement Learning with Laser Cats!23

Starting Point: Linear Function Approximation with Hand-craft Features Construct a set of features for any state s: Estimate the state value with a L.C. of features: Use gradient descent to update parameters: 2 OCT 2015Reinforcement Learning with Laser Cats!24

2 OCT 2015Reinforcement Learning with Laser Cats!25 Feature Construction Raw state: (x, y) of upper left corner of each object Features computed from raw state:

Skynet 1.0 (Demo) 2 OCT 2015Reinforcement Learning with Laser Cats!26

Skynet 1.0 Problem 1: Only learns safe policy Problem 2: Handles 2 nd – 4 th mouse poorly Problem 3: Estimates don’t converge 2 OCT 2015Reinforcement Learning with Laser Cats!27

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: 2 OCT 2015Reinforcement Learning with Laser Cats!28

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: 2 OCT 2015Reinforcement Learning with Laser Cats!29

Skynet 1.1 Fix 1: Reconsider features from the lens of gradient descent: - If reducing value A leads to increasing positive reward, set the feature to (constant - value A) - If reducing value B leads to reducing negative reward, set the feature to value B. 2 OCT 2015Reinforcement Learning with Laser Cats!30

Skynet 1.1 Fix 2: Ignore negative reward from letting mouse escape. 2 OCT 2015Reinforcement Learning with Laser Cats!31

Skynet 1.1 Fix 3: Set step size to a harmonic sequence. - Initial alpha = 0.0001. - After 10 minutes, shrink alpha each step: alpha = 1/10001, 1/10002,... 2 OCT 2015Reinforcement Learning with Laser Cats!32

Skynet 1.1 (Demo) Did they work? Can it handle INSANE DIFFICULTY?? 2 OCT 2015Reinforcement Learning with Laser Cats!33

Skynet 1.1 Limitations 1. Requires a perfect model for the environment 2. Requires perfect knowledge of observed state 3. Requires expert level hand-craft features Solution 1: Q-Learning Solution 2: Convolutional Neural Network Solution 3: B-Spline Approximation, Tile Coding, Deep Learning, etc. 2 OCT 2015Reinforcement Learning with Laser Cats!34

Real World Application? 2 OCT 2015Reinforcement Learning with Laser Cats!35

Thank you! 2 OCT 2015Reinforcement Learning with Laser Cats!36 References Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Similar presentations

Presentation on theme: "Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Similar presentations

Presentation on theme: "Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015."— Presentation transcript:

Similar presentations

About project

Feedback