1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Short reading for Thursday Job talk at 1:30pm in ETRL 101 Kuka robotics –
Lecture 18: Temporal-Difference Learning
Beyond Linear Separability
Slides from: Doug Gray, David Poole
RL for Large State Spaces: Value Function Approximation
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
The loss function, the normal equation,
Artificial Neural Networks
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Radial Basis Functions
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Artificial Neural Networks
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Efficiency in ML / AI 1.Data efficiency (rate of learning) 2.Computational efficiency (memory, computation, communication) 3.Researcher efficiency (autonomy,
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning Dr. Itamar Arel.
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Non-Bayes classifiers. Linear discriminants, neural networks.
Akram Bitar and Larry Manevitz Department of Computer Science
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Chapter 6 Neural Network.
Decision Making Under Uncertainty Lec #9: Approximate Value Function UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
Capabilities of Threshold Neurons
Chapter 8: Generalization and Function Approximation
October 6, 2011 Dr. Itamar Arel College of Engineering
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
October 20, 2010 Dr. Itamar Arel College of Engineering
ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 17: TRTRL, Implementation Considerations, Apprenticeship Learning November 3, 2010.
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2012 October 23, 2012

ECE 517: Reinforcement Learning in AI 2 Outline Introduction Value Prediction with function approximation Gradient Descent framework On-Line Gradient-Descent TD( ) On-Line Gradient-Descent TD( ) Linear methods Linear methods Control with Function Approximation

ECE 517: Reinforcement Learning in AI 3 Introduction We have so far assumed a tabular view of value or state- value functions Inherently limits our problem-space to small state/action sets Space requirements – storage of values Space requirements – storage of values Computation complexity – sweeping/updating the values Computation complexity – sweeping/updating the values Communication constraints – getting the data where it needs to go Communication constraints – getting the data where it needs to go Reality is very different – high-dimensional state representations are common We will next look at generalizations – an attempt by the agent to learn about a large state set while visiting/ experiencing only a small subset of it People do it – how can machines achieve the same goal? People do it – how can machines achieve the same goal?

ECE 517: Reinforcement Learning in AI 4 General Approach Luckily, many approximation techniques have been developed e.g. multivariate function approximation schemes e.g. multivariate function approximation schemes We will utilize such techniques in a RL context

ECE 517: Reinforcement Learning in AI 5 Value Prediction with FA As usual, let’s start with prediction of V  Instead of using a table for V t, the latter will be represented in a parameterized functional form We’ll assume that V t is a sufficiently smooth differentiable function of, for all s. For example, a neural network can be trained to predict V where are the connection weights We will require that is much smaller than the state set When a single state is backed up, the change generalizes to affect the values of many other states transpose

ECE 517: Reinforcement Learning in AI 6 Adapt Supervised Learning Algorithms Supervised Learning System Inputs Outputs Training Info = desired (target) outputs Error = (target output – actual output) Training example = {input, target output}

ECE 517: Reinforcement Learning in AI 7 Performance Measures Let us assume that training examples all take the form A common performance metric is the mean-squared error (MSE) over a distribution P : Q: Why use P ? Is MSE the best metric? Let us assume that P is always the distribution of states at which backups are done On-policy distribution: the distribution created while following the policy being evaluated Stronger results are available for this distribution. Stronger results are available for this distribution.

ECE 517: Reinforcement Learning in AI 8 Gradient Descent We iteratively move down the gradient:

ECE 517: Reinforcement Learning in AI 9 Gradient Descent in RL Let’s now consider the case where the target output, v t, for sample t is not the true value (unavailable) In such cases we perform an approximate update, such that where v t is an unbiased estimate of the target output. Example of v t are: Monte Carlo methods: v t = R t Monte Carlo methods: v t = R t TD( ): R t TD( ): R t The general gradient-descent is guaranteed to converge to a local minimum

ECE 517: Reinforcement Learning in AI 10 On-Line Gradient-Descent TD( )

ECE 517: Reinforcement Learning in AI 11 Residual Gradient Descent The following statement is not completely accurate: since it suggests that which is not true, e.g. so, we should be writing (residual GD): Comment: the whole scheme is no longer supervised learning based!

ECE 517: Reinforcement Learning in AI 12 Linear Methods One of the most important special cases of GD FA V t becomes a linear function of the parameters vector For every state, there is a (real valued) column vector of features The features can be constructed from the states in many ways The linear approximate state-value function is given by

ECE 517: Reinforcement Learning in AI 13 Nice Properties of Linear FA Methods The gradient is very simple: For MSE, the error surface is simple: quadratic surface with a single (global) minimum Linear gradient descent TD( ) converges: Step size decreases appropriately Step size decreases appropriately On-line sampling (states sampled from the on-policy distribution) On-line sampling (states sampled from the on-policy distribution) Converges to parameter vector with property: Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997)

ECE 517: Reinforcement Learning in AI 14 Limitations of Pure Linear Methods Many applications require a mixture (e.g. product) of the different feature components Linear form prohibits direct representation of the interactions between features Linear form prohibits direct representation of the interactions between features Intuition: feature i is good only in the absence of feature j Intuition: feature i is good only in the absence of feature j Example: Pole Balancing task High angular velocity can be good or bad … High angular velocity can be good or bad … If the angle is high  imminent danger of falling (bad state) If the angle is high  imminent danger of falling (bad state) If the angle is low  the pole is righting itself (good state) If the angle is low  the pole is righting itself (good state) In such cases we need to introduce features that express a mixture of other features

ECE 517: Reinforcement Learning in AI 15 0 Coarse Coding – Feature Composition/Extraction

ECE 517: Reinforcement Learning in AI 16 Shaping Generalization in Coarse Coding If we train at one point (state), X, the parameters of all circles intersecting X will be affected If we train at one point (state), X, the parameters of all circles intersecting X will be affected Consequence: the value function of all points within the union of the circles will be affected Consequence: the value function of all points within the union of the circles will be affected Greater affects for points that have more circles “in common” with X Greater affects for points that have more circles “in common” with X

ECE 517: Reinforcement Learning in AI 17 Learning and Coarse Coding All three cases have the same number of features (50), learning rate is 0.2/m (m – the number of features present in each example)

ECE 517: Reinforcement Learning in AI 18 0 Tile Coding Binary feature for each tile Number of features present at any one time is constant Binary features means weighted sum easy to compute Easy to compute indices of the features present

ECE 517: Reinforcement Learning in AI 19 0 Tile Coding Cont. Irregular tilings Hashing

ECE 517: Reinforcement Learning in AI 20 Control with Function Approximation Learning state-action values Training examples of the form: The general gradient-descent rule: Gradient-descent Sarsa( ) (backward view):

ECE 517: Reinforcement Learning in AI 21 GPI with Linear Gradient Descent Sarsa( )

ECE 517: Reinforcement Learning in AI 22 GPI Linear Gradient Descent Watkins’ Q( )

ECE 517: Reinforcement Learning in AI 23 Mountain-Car Task Example Challenge: driving an underpowered car up a steep mountain road Gravity is stronger than its engine Gravity is stronger than its engine Solution approach: build enough inertia from other slope to carry it up the opposite slope Example of a task where things can get worse in a sense (farther from the goal) before they get better Hard to solve using classic control schemes Hard to solve using classic control schemes Reward is -1 for all steps until the episode terminates Actions full throttle forward (+1), full throttle reverse (- 1) and zero throttle (0) Two 9x9 overlapping tiles were used to represent the continuous state space

ECE 517: Reinforcement Learning in AI 24 Mountain-Car Task

ECE 517: Reinforcement Learning in AI 25 Mountain-Car Results (five 9 by 9 tilings were used)

ECE 517: Reinforcement Learning in AI 26 Summary Generalization is an important RL attribute Adapting supervised-learning function approximation methods Each backup is treated as a learning example Each backup is treated as a learning example Gradient-descent methods Linear gradient-descent methods Radial basis functions Radial basis functions Tile coding Tile coding Nonlinear gradient-descent methods? NN Backpropagation? NN Backpropagation? Subtleties involving function approximation, bootstrapping and the on-policy/off-policy distinction