1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Slides:



Advertisements
Similar presentations
Boyce/DiPrima 9th ed, Ch 2.4: Differences Between Linear and Nonlinear Equations Elementary Differential Equations and Boundary Value Problems, 9th edition,
Advertisements

Boyce/DiPrima 9th ed, Ch 2.8: The Existence and Uniqueness Theorem Elementary Differential Equations and Boundary Value Problems, 9th edition, by William.
Optimization of thermal processes
Optimization 吳育德.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Some useful Contraction Mappings  Results for a particular choice of norms.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
1cs542g-term Notes. 2 Solving Nonlinear Systems  Most thoroughly explored in the context of optimization  For systems arising in implicit time.
1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
Tutorial 12 Unconstrained optimization Conjugate gradients.
Universidad de La Habana Lectures 5 & 6 : Difference Equations Kurt Helmes 22 nd September - 2nd October, 2008.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Advanced Topics in Optimization
Why Function Optimization ?
Ordinary Differential Equations Final Review Shurong Sun University of Jinan Semester 1,
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.
Ordinary Differential Equations (ODEs)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 10. Ordinary differential equations. Initial value problems.
1 Chapter 6 Numerical Methods for Ordinary Differential Equations.

Computational Optimization
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Lecture 35 Numerical Analysis. Chapter 7 Ordinary Differential Equations.
ORDINARY DIFFERENTIAL EQUATION (ODE) LAPLACE TRANSFORM.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
Basic Numerical methods and algorithms
ENCI 303 Lecture PS-19 Optimization 2
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
ME451 Kinematics and Dynamics of Machine Systems Numerical Solution of DAE IVP Newmark Method November 1, 2013 Radu Serban University of Wisconsin-Madison.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.
Application of Differential Applied Optimization Problems.
Scientific Computing Partial Differential Equations Poisson Equation.
Nonlinear programming Unconstrained optimization techniques.
Integration of 3-body encounter. Figure taken from
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Roots of Equations ~ Open Methods Chapter 6 Credit:
Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh Dr. M. N. H. MOLLAH.
1 Chien-Hong Cho ( 卓建宏 ) (National Chung Cheng University)( 中正大學 ) 2010/11/11 in NCKU On the finite difference approximation for hyperbolic blow-up problems.
“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.
Chapter 24 Sturm-Liouville problem Speaker: Lung-Sheng Chien Reference: [1] Veerle Ledoux, Study of Special Algorithms for solving Sturm-Liouville and.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Large Timestep Issues Lecture 12 Alessandra Nardi Thanks to Prof. Sangiovanni, Prof. Newton, Prof. White, Deepak Ramaswamy, Michal Rewienski, and Karen.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
AUTOMATIC CONTROL THEORY II Slovak University of Technology Faculty of Material Science and Technology in Trnava.
Survey of unconstrained optimization gradient based algorithms
Hand-written character recognition
Optimization of Nonlinear Singularly Perturbed Systems with Hypersphere Control Restriction A.I. Kalinin and J.O. Grudo Belarusian State University, Minsk,
Ordinary Differential Equations
This chapter is concerned with the problem in the form Chapter 6 focuses on how to find the numerical solutions of the given initial-value problems. Main.
Optimal Control.
Keywords (ordinary/partial) differencial equation ( 常 / 偏 ) 微分方程 difference equation 差分方程 initial-value problem 初值问题 convex 凸的 concave 凹的 perturbed problem.
EEE 431 Computational Methods in Electrodynamics
Computational Optimization
Autonomous Cyber-Physical Systems: Dynamical Systems
CS5321 Numerical Optimization
Sec 21: Analysis of the Euler Method
Stability Analysis of Linear Systems
MATH 175: Numerical Analysis II
Outline Preface Fundamentals of Optimization
Conjugate Direction Methods
CS5321 Numerical Optimization
Presentation transcript:

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM

2 Outline Problem background and introduction Analysis for dynamical systems with time delay  Introduction of dynamical systems  Delayed dynamical systems approach  Uniqueness property of dynamical systems Numerical testing  Comparison between L-BFGS and steepest descent method  A new code Radar 5 Main stages of this research APPENDIX

3 1. Problem background and introduction Optimization problems are classified into four parts, our research is focusing on unconstrained optimization problems. (UP)

4 Steepest descent method For (UP), is a descent direction at or is a descent direction for.

5 Method of Steepest Descent Find that solves Then Unfortunately, the steepest descent method converges only linearly, and sometimes very slowly linearly.

6 Newton’s method Newton’s direction— Newton’s method Given, compute Although Newton’s method converges very fast, the Hessian matrix is difficult to compute.

7 Quasi-Newton method—BFGS Instead of using the Hessian matrix, the quasi-Newton methods approximate it. In quasi-Newton methods, the inverse of the Hessian matrix is approximated in each iteration by a positive definite (p.d.) matrix, say. being symmetric and p.d. implies the descent property.

8 BFGS The most important quasi-Newton formula— BFGS. (2) where THEOREM 1 If is a p.d. matrix, and, then in (2) is also positive definite. (Hint: we can write, and let and )

9 Limited-Memory Quasi-Newton Methods —L-BFGS Limited-memory quasi-Newton methods are useful for solving large problems whose Hessian matrices cannot be computed at a reasonable cost or are not sparse. Various limited-memory methods have been proposed; we focus mainly on an algorithm known as L-BFGS. (3)

10 The L-BFGS approximation satisfies the following formula: for (6) for (7)

11 2. Analysis for dynamical systems with time delay The unconstrained problem (UP) is reproduced. (8) It is very important that the optimization problem is posted in the continuous form, i.e. x can be changed continuously. The conventional methods are addressed in the discrete form.

12 Dynamical system approach The essence of this approach is to convert (UP) into a dynamical system or an ordinary differential equation (o.d.e.) so that the solution of this problem corresponds to a stable equilibrium point of this dynamical system. Consider the following simple dynamical system or ode Neural network approach The mathematical representation of neural network is an ordinary differential equation which is asymptotically stable at any isolated solution point.

13 Some Dynamical system versions Based on the steepest descent direction Based on the Newton’s direction Other dynamical systems

14 Delayed dynamical systems approach Dynamical system approach can solve very large problems. How to find a “good” ? steepest descent direction slow convergence Newton’s direction difficult to compute fast convergence and easy to calculate

15 The delayed dynamical systems approach solves the delayed o.d.e. (13) For, we use (13A) Where To compute at.

16 Beyond this point we save only m previous values of x. The definition of H is now, for m k, For, (13B) where

Uniqueness property of dynamical systems 17 Lipschitz continuity

3. Numerical testing Test problems 1. Extended Rosenbrock function 2. Penalty function Ⅰ 3. Variable dimensioned function 4. Linear function-rank 1 Result of modified Rosenbrock problem tvaluestep L-BFGS20497 Steepest descent

Comparison of function value m = 2 m = 4 m = 6

Comparison of norm of gradient m = 2 m = 4 m = 6

A new code — Radar 5 The code RADAR5 is for stiff problems, including differential-algebraic and neutral delay equations with constant or state-dependent (eventually vanishing) delays.

Breaking points  Discontinuities occur in different orders of the derivative of the solutions  To detect the breaking point----- find the value of t to make the function zero  is a previous breaking point and a suitable continuous approximation to the solution. 22 Implicit Runge-Kutta method Delay differential equation Radar 5

Theorem 3.1 Consider the DDE where is -continuous in, the initial function is -continuous and the delay is -continuous in. Moreover, assume that the mesh includes all the discontinuity points of order lying in. If the underlying CRK method has discrete order and uniform order, then the DDE method has discrete global order and uniform global order ; that is and, where. 23

24 4. Main stages of this research Prove that the function H in (13) is positive definite. (APPENDIX) Prove that H is Lipschitz continuous. Show that the solution to (13) is asymptotically stable. Show that (13) has a better rate of convergence than the dynamical system based on the steepest descent direction. Perform numerical testing. Apply this new optimization method to practical problems.

25 APPENDIX To show that H in (13) is positive definite Property 1. If is positive definite, the matrix defined by (13) is positive definite (provided for all ). I proved this result by induction. Since the continuous analog of the L-BFGS formula has two cases, the proof needs to cater for each of them.

26 for When, is p.d. (Theorem 1) Assume that is p.d. when If

27 for In this case there is no exists. By the assumption is p.d., it is obvious that is also p.d..

28