Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.

Slides:



Advertisements
Similar presentations
Introductory Control Theory I400/B659: Intelligent robotics Kris Hauser.
Advertisements

1 Benoit Boulet, Ph.D., Eng. Industrial Automation Lab McGill Centre for Intelligent Machines Department of Electrical and Computer Engineering McGill.
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
NEURAL NETWORKS Backpropagation Algorithm
Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Introduction to Neural Networks Computing
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
L OUISIANA T ECH U NIVERSITY Department of Electrical Engineering and Institute for Micromanufacturing INTRODUCTION PROBLEM FORMULATION STATE FEEDBACK.
Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Performance oriented anti-windup for a class of neural network controlled systems G. Herrmann M. C. Turner and I. Postlethwaite Control and Instrumentation.
1. Algorithms for Inverse Reinforcement Learning 2
Robust control Saba Rezvanian Fall-Winter 88.
Neural Network Grasping Controller for Continuum Robots David Braganza, Darren M. Dawson, Ian D. Walker, and Nitendra Nath David Braganza, Darren M. Dawson,
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
IQC analysis of linear constrained MPC W.P. Heath*, G. Li*, A.G. Wills†, B. Lennox* *University of Manchester †University of Newcastle, Australia.
May 14, ISVLSI 09 Algorithms for Estimating Number of Glitches and Dynamic Power in CMOS Circuits with Delay Variations Jins Davis Alexander Vishwani.
Neural Network Based Control Dan Simon Cleveland State University 1.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
7. Experiments 6. Theoretical Guarantees Let the local policy improvement algorithm be policy gradient. Notes: These assumptions are insufficient to give.
Model Predictive Controller Emad Ali Chemical Engineering Department King Saud University.
Certifying the robustness of model predictive controllers W. P. Heath and B. Lennox Control Systems Centre The University of Manchester.
Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CH 1 Introduction Prof. Ming-Shaung Ju Dept. of Mechanical Engineering NCKU.
F.B. Yeh & H.N. Huang, Dept. of Mathematics, Tunghai Univ Nov.8 Fang-Bo Yeh and Huang-Nan Huang Department of Mathematic Tunghai University The 2.
CONTROL of NONLINEAR SYSTEMS with LIMITED INFORMATION Daniel Liberzon Coordinated Science Laboratory and Dept. of Electrical & Computer Eng., Univ. of.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CONTROL of NONLINEAR SYSTEMS under COMMUNICATION CONSTRAINTS Daniel Liberzon Coordinated Science Laboratory and Dept. of Electrical & Computer Eng., Univ.
Neural Networks Lecture 8: Two simple learning algorithms
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
MODEL REFERENCE ADAPTIVE CONTROL
Introduction to estimation theory Seoul Nat’l Univ.
A Shaft Sensorless Control for PMSM Using Direct Neural Network Adaptive Observer Authors: Guo Qingding Luo Ruifu Wang Limei IEEE IECON 22 nd International.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
Robust Reinforcement Learning Control Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical and Computer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 Adaptive Control Neural Networks 13(2000): Neural net based MRAC for a class of nonlinear plants M.S. Ahmed.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Ming-Feng Yeh1 CHAPTER 16 AdaptiveResonanceTheory.
September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.
Low Level Control. Control System Components The main components of a control system are The plant, or the process that is being controlled The controller,
TEMPLATE DESIGN © Observer Based Control of Decentralized Networked Control Systems Ahmed M. Elmahdi, Ahmad F. Taha School.
Adaptive Hopfield Network Gürsel Serpen Dr. Gürsel Serpen Associate Professor Electrical Engineering and Computer Science Department University of Toledo.
Robust Optimization and Applications Laurent El Ghaoui IMA Tutorial, March 11, 2003.
Monte-Carlo based Expertise A powerful Tool for System Evaluation & Optimization  Introduction  Features  System Performance.
7/6/99 MITE1 Fully Parallel Learning Neural Network Chip for Real-time Control Students: (Dr. Jin Liu), Borte Terlemez Advisor: Dr. Martin Brooke.
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
Chapter 6 Neural Network.
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
Virtual Gravity Control for Swing-Up pendulum K.Furuta *, S.Suzuki ** and K.Azuma * * Department of Computers and Systems Engineering, TDU, Saitama Japan.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Reinforcement Learning
Real Neurons Cell structures Cell body Dendrites Axon
Maria Okuniewski Nuclear Engineering Dept.
. Development of Analysis Tools for Certification of
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
CS5321 Numerical Optimization
Learning Control for Dynamically Stable Legged Robots
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Hafez Sarkawi (D1) Control System Theory Lab
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
Physics-guided machine learning for milling stability:
NONLINEAR AND ADAPTIVE SIGNAL ESTIMATION
Presentation transcript:

Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical and Computer Engineering Douglas Hittle, Department of Mechanical Engineering Colorado State University, Fort Collins, CO Reinforcement Learning Agent in Parallel with Controller Reinforcement learning algorithm guides adjustment of actor's weights. IQC places bounding box in weight space of actor network, beyond which stability has not been verified. Incorporating Time-Varying IQC in Reinforcement Learning weight space (high-dimensional) initial guaranteed- stable region Step 1 initial weight vector Step 0 trajectory of weights while learning Step 2 must find new stable region Step 3 next guaranteed-stable region Step 4 Now learning can continue until edge of new bounding box is encountered. Step 5 … weight space (high-dimensional) UNSTABLE REGION ! final weight vector weight trajectory with robust contstraints weight trajectory without robust contstraints Trajectory of Weights and Bounds on Regions of StabilityB C D E A initial weight vector Motivation Robust control theory Guarantees stability Results in less aggressive controllers Reinforcement learning Optimizes the performance of a controller No guarantee of stability while learning Experimental HVAC System Reinforcement Learning Subtract right side from left to get algorithm for updating Q Replace expectation with sample (Monte Carlo approach) Temporal-difference error action state policy function value function discount factor reinforcement (|error|) Robust Control based on IQCs Uncertainties (D) Contoller/Plant (M) v w An Integral Quadratic Constraint (IQC) describes the relationship between signals as Stability of the closed loop system is guaranteed if for all w and for e > 0. Given specific IQCs for a particular system, this inequality problem becomes a linear, matrix inequality (LMI) problem. Reference Output Good response NominalPerturbed Terrible response Robust Reinforcement Learning Perturbed case, no learning Perturbed case, with learning Through learning, controller has been fine-tuned to actual dynamics of real plant without losing guarantee of stability ! Sum Squared Error Nominal Controller Robust Controller Robust RL Controller Conclusions IQC bounds on parameters of tanh and sigmoid networks exist for which the combination of a reinforcement learning agent and feedback control system satisfy the requirements of robust stability theorems. (static and dynamic stability) Robust reinforcement learning algorithm improves control performance while avoiding instability on several simulated problems. Reinforcement learning is now more acceptable in practical applications as an adaptive controller that modifies its behavior over time, due to the guarantees of stability. Initial, conservative robust controller becomes more aggressive through adaptation to actual physical system. See Integral Quadratic Constraints   M 1122 M   M   ( )   ( ) Neural Net and Robust Control with IQCs Bounds on neural net weight adjustment in green Neural net as reinforcement learning actor in blue Robust controller and plant in red First Example Second Example Without robust constraints, becomes unstable before learning final stable solution. Third Example Distillation Column Fourth Example 1 st Order 2 nd Order Without robust constraints, becomes unstable before learning final stable solution.