Survey of unconstrained optimization gradient based algorithms

Slides:



Advertisements
Similar presentations
Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA
Advertisements

Survey of gradient based constrained optimization algorithms Select algorithms based on their popularity. Additional details and additional algorithms.
Optimization with Constraints
Optimization.
Unconstrained optimization Gradient based algorithms –Steepest descent –Conjugate gradients –Newton and quasi-Newton Population based algorithms –Nelder.
Optimization of thermal processes
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Chapter 3 UNCONSTRAINED OPTIMIZATION
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Tutorial 12 Unconstrained optimization Conjugate gradients.
Methods For Nonlinear Least-Square Problems
Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Unconstrained Optimization Problem
Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.
Engineering Optimization
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Function Optimization. Newton’s Method Conjugate Gradients Method
Advanced Topics in Optimization
Why Function Optimization ?
An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.
Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
ENCI 303 Lecture PS-19 Optimization 2
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Application of Differential Applied Optimization Problems.
Nonlinear programming Unconstrained optimization techniques.
Fin500J: Mathematical Foundations in Finance
Survey of gradient based constrained optimization algorithms Select algorithms based on their popularity. Additional details and additional algorithms.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
Local search algorithms Most local search algorithms are based on derivatives to guide the search. For differentiable function it has been shown that.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Dan Simon Cleveland State University Jang, Sun, and Mizutani Neuro-Fuzzy and Soft Computing Chapter 6 Derivative-Based Optimization 1.
Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.
Introduction to Optimization Methods
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Gradient Methods In Optimization
Variations on Backpropagation.
Hand-written character recognition
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Computational Biology BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
Widrow-Hoff Learning (LMS Algorithm).
Chapter 14.
CS5321 Numerical Optimization
Variations on Backpropagation.
Optimization Part II G.Anuradha.
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Introduction to Scientific Computing II
~ Least Squares example
~ Least Squares example
EEE 244-8: Optimization.
Introduction to Scientific Computing II
Variations on Backpropagation.
Performance Optimization
Section 3: Second Order Methods
Steepest Descent Optimization
CS5321 Numerical Optimization
Presentation transcript:

Survey of unconstrained optimization gradient based algorithms Unconstrained minimization Steepest descent vs. conjugate gradients Newton and quasi-Newton methods Matlab fminunc This lecture provides a brief survey of gradient based local unconstrained optimization algorithms. We first discuss algorithms that are based only on the gradient comparing the intuitive steepest descent to the much better conjugate gradients. Then we discuss algorithms that calculate or approximate the Hessian matrix, in particular Newtons’s method and quasi-Newton’s method. We finally discuss their use in Matlab’s fminunc.

Unconstrained local minimization The necessity for one dimensional searches The most intuitive choice of sk is the direction of steepest descent This choice, however is very poor Methods are based on the dictum that all functions of interest are locally quadratic Gradient based optimization algorithms assume that gradient calculations are expensive. Therefore, once the gradient is calculated and a search direction selected, it pays to go in that direction for a while, rather than stop soon and update the direction based on another gradient calculation. The move is described by Which indicates that the (k+1)-th position is found by moving a distance alpha in the direction sk. The distance alpha is usually based on one-dimensional minimization of the function in that direction. The most intuitive direction is in the direction of steepest descent (negative of the gradient direction), the way water moves on a terrain. However, water can continually change direction, while we want to stick to a single direction as long as possible. This makes the steepest descent direction a poor choice. Instead, methods for choosing a better direction are based on the idea that close enough to a minimum the function behaves like a quadratic.

Conjugate gradients Conjugate gradient methods select a direction which takes into account not only the gradient direction but also the search direction in the previous iteration. By doing that, we can develop algorithms that are guaranteed to converge to the minimum of an n-dimensional quadratic function in no more than n iterations. The equation at the top of the slide gives the recipe for the Fletcher Reeves conjugate gradient method, and the figure compares it to the steepest descent method for a quadratic function The two methods start the same in the negative gradient direction. However, the steepest descent methods zig-zags around, while the conjugate directions methods homes on the minimum in the second move. Note that each move ends being tangent to the function contour, because we minimize the function in the search direction. Most implementations approximate the function as a quadratic along the search direction, so that if the function is a quadratic the search will end at the minimum in that direction. However, if the function is not quadratic we are likely to stop at a distance from where the direction is tangent to the contour.

Newton and quasi-Newton methods Quasi-Newton methods use successive evaluations of gradients to obtain approximation to Hessian or its inverse Matlab’s fminunc uses a variant of Newton if gradient routine is provided, otherwise BFGS quasi-Newton. The variant of Newton is called trust region approach and is based on using a quadratic approximation of the function inside a box. If we have the Hessian matrix we can use a first order Taylor expansion for the gradient where Qk is the Hessian matrix at the k-th iteration. Then we select delta-x so that the gradient vanishes, to satisfy the stationarity condition. Instead of going there, we just use it to select a direction, because at the present point we may be far from the minimum, and the equation for the gradient may be have large errors for large delta-x. This is Newton’s method. When it is too expensive to calculate the Hessian matrix we fall back on quasi-Newton methods. These are methods that build an approximation to the Hessian matrix and update it after each gradient calculation. For quadratic functions they are guaranteed to reach the minimum in no more than n iterations and to have an exact Hessian after n iterations. One of the most popular quasi-Newton methods is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Matlab’s fminunc will use a variant of Newton’s method if given a routine to calculate the gradient, and BFGS if it has to calculate the gradient by finite differences. The variant of Newton’s method used by fminunc is called a trust-region method. It constructs a quadratic approximation to the function and minimizes it in a box around the current point, with the size of the box adjusted depending on the success of the previous iteration.

Problems Unconstrained algorithms Explain the differences and commonalities of steepest descent, conjugate gradients, Newton’s method, and quasi-Newton methods for unconstrained minimization. Solution on Notes page. Use fminunc to minimize the Rosenbrock Banana function and compare the trajectories of fminsearch and fminunc starting from (-1.2,1), with and without the routine for calculating the gradient. Plot the three trajectories. Solution Explain the differences and commonalities of steepest descent, conjugate gradients, Newton’s method,and quasi-Newton methods for unconstrained minimization. Solution: All the algorithms share the philosophy of calculating a direction and then doing a 1D minimization along this direction. All the algorithms except for steepest descent are guaranteed to converge in finite number of steps to the solution for a quadratic function. Conjugate gradients and quasi-newton in n steps, and Newton in one step. The difference between conjugate gradients and quasi-Newton is that the latter produce an approximation of the Hessian or its inverse, which accelerates convergence for non-quadratic functions.