1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Slides:

Advertisements

Similar presentations

Boyce/DiPrima 9th ed, Ch 2.4: Differences Between Linear and Nonlinear Equations Elementary Differential Equations and Boundary Value Problems, 9th edition,

Advertisements

Optimization of thermal processes

Optimization 吳育德.

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Some useful Contraction Mappings  Results for a particular choice of norms.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.

1cs542g-term Notes. 2 Solving Nonlinear Systems  Most thoroughly explored in the context of optimization  For systems arising in implicit time.

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Function Optimization Newton’s Method. Conjugate Gradients

Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.

Tutorial 12 Unconstrained optimization Conjugate gradients.

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Unconstrained Optimization Problem

Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.

Advanced Topics in Optimization

1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

Why Function Optimization ?

Ordinary Differential Equations Final Review Shurong Sun University of Jinan Semester 1,

Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.

Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

Computational Optimization

UNCONSTRAINED MULTIVARIABLE

Collaborative Filtering Matrix Factorization Approach

Autumn 2008 EEE8013 Revision lecture 1 Ordinary Differential Equations.

Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.

ENCI 303 Lecture PS-19 Optimization 2

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.

ME451 Kinematics and Dynamics of Machine Systems Numerical Solution of DAE IVP Newmark Method November 1, 2013 Radu Serban University of Wisconsin-Madison.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.

Application of Differential Applied Optimization Problems.

4 4.4 © 2012 Pearson Education, Inc. Vector Spaces COORDINATE SYSTEMS.

Scientific Computing Partial Differential Equations Poisson Equation.

Nonlinear programming Unconstrained optimization techniques.

Using Partitioning in the Numerical Treatment of ODE Systems with Applications to Atmospheric Modelling Zahari Zlatev National Environmental Research Institute.

Integration of 3-body encounter. Figure taken from

“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

Chapter 24 Sturm-Liouville problem Speaker: Lung-Sheng Chien Reference: [1] Veerle Ledoux, Study of Special Algorithms for solving Sturm-Liouville and.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.

Survey of unconstrained optimization gradient based algorithms

1 Chapter 7 Numerical Methods for the Solution of Systems of Equations.

Hand-written character recognition

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 3 Iterative Method for Nonlinear problems EE692 Parallel and Distribution.

Optimization of Nonlinear Singularly Perturbed Systems with Hypersphere Control Restriction A.I. Kalinin and J.O. Grudo Belarusian State University, Minsk,

This chapter is concerned with the problem in the form Chapter 6 focuses on how to find the numerical solutions of the given initial-value problems. Main.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.

Optimal Control.

Computational Optimization

CS5321 Numerical Optimization

Sec 21: Analysis of the Euler Method

~ Least Squares example

~ Least Squares example

Performance Optimization

Outline Preface Fundamentals of Optimization

Conjugate Direction Methods

CS5321 Numerical Optimization

Presentation transcript:

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM

2 Outline Problem background and introduction Analysis for dynamical systems with time delay  Introduction of dynamical systems  Delayed dynamical systems approach  Uniqueness property of dynamical systems Numerical testing Main stages of this research APPENDIX

3 1. Problem background and introduction Optimization problems are classified into four parts, our research is focusing on unconstrained optimization problems. (UP)

4 Descent direction A common theme behind all these methods is to find a direction so that there exists an such that

5 Steepest descent method For (UP), is a descent direction at or is a descent direction for.

6 Method of Steepest Descent Find that solves Then Unfortunately, the steepest descent method converges only linearly, and sometimes very slowly linearly.

7 Newton’s method Newton’s direction— Newton’s method Given, compute Although Newton’s method converges very fast, the Hessian matrix is difficult to compute.

8 Quasi-Newton method—BFGS Instead of using the Hessian matrix, the quasi-Newton methods approximate it. In quasi-Newton methods, the inverse of the Hessian matrix is approximated in each iteration by a positive definite (p.d.) matrix, say. being symmetric and p.d. implies the descent property.

9 BFGS The most important quasi-Newton formula— BFGS. (2) where THEOREM 1 If is a p.d. matrix, and, then in (2) is also positive definite. (Hint: we can write, and let and )

10 Limited-Memory Quasi-Newton Methods —L-BFGS Limited-memory quasi-Newton methods are useful for solving large problems whose Hessian matrices cannot be computed at a reasonable cost or are not sparse. Various limited-memory methods have been proposed; we focus mainly on an algorithm known as L-BFGS. (3)

11 The L-BFGS approximation satisfies the following formula: for (6) for (7)

12 2. Analysis for dynamical systems with time delay The unconstrained problem (UP) is reproduced. (8) It is very important that the optimization problem is posted in the continuous form, i.e. x can be changed continuously. The conventional methods are addressed in the discrete form.

13 Dynamical system approach The essence of this approach is to convert (UP) into a dynamical system or an ordinary differential equation (o.d.e.) so that the solution of this problem corresponds to a stable equilibrium point of this dynamical system. Neural network approach The mathematical representation of neural network is an ordinary differential equation which is asymptotically stable at any isolated solution point.

14 Consider the following simple dynamical system or ode (9) DEFINITION 1. (Equilibrium point). A point is called an equilibrium point of (9) if. DEFINITION 3. (Convergence). Let be the solution of (9). An isolated equilibrium point is convergent if there exists a such that if, as.

15 Some Dynamical system versions Based on the steepest descent direction Based on the Newton’s direction Other dynamical systems

16 Dynamical system approach can solve very large problems. How to find a “good” ? The dynamical system approach normally consists of the following three steps:  to establish an ode system  to study the convergence of the solution of the ode as ; and  to solve the ode system numerically. Even though the solutions of ode systems are continuous, the actual computation has to be done discretely.

17 Delayed dynamical systems approach steepest descent direction slow convergence Newton’s direction difficult to compute fast convergence and easy to calculate

18 The delayed dynamical systems approach solves the delayed o.d.e. (13) For, we use (13A) Where To compute at.

19 Beyond this point we save only m previous values of x. The definition of H is now, for m k, For, (13B) where

Uniqueness property of dynamical systems 20 Lipschitz continuity

Lemma 2.6 Let be continuously differentiable in the open convex set, and let be Lipschitz continuous at in the neighborhood using a vector norm and the induced matrix operator norm and the Lipschitz constant. Then, for any

3. Numerical testing Test problems ● Extended Rosenbrock function ● Penalty function Ⅰ ● Variable dimensioned function ● Linear function-rank 1

Result of modified Rosenbrock problem tvaluestep L-BFGS20497 Steepest descent

Comparison of function value m = 2 m = 4 m = 6

Comparison of norm of gradient m = 2 m = 4 m = 6

A new code — Radar 5 The code RADAR5 is for stiff problems, including differential-algebraic and neutral delay equations with constant or state-dependent (eventually vanishing) delays.

27 4. Main stages of this research Prove that the function H in (13) is positive definite. (APPENDIX) Prove that H is Lipschitz continuous. Show that the solution to (13) is asymptotically stable. Show that (13) has a better rate of convergence than the dynamical system based on the steepest descent direction. Perform numerical testing. Apply this new optimization method to practical problems.

28 APPENDIX To show that H in (13) is positive definite Property 1. If is positive definite, the matrix defined by (13) is positive definite (provided for all ). I proved this result by induction. Since the continuous analog of the L-BFGS formula has two cases, the proof needs to cater for each of them.

29 for When, is p.d. (Theorem 1) Assume that is p.d. when If

30 for In this case there is no exists. By the assumption is p.d., it is obvious that is also p.d..

31