Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.

Slides:

Advertisements

Similar presentations

Curved Trajectories towards Local Minimum of a Function Al Jimenez Mathematics Department California Polytechnic State University San Luis Obispo, CA

Advertisements

Numerical Solution of Nonlinear Equations

Engineering Optimization

Optimization Methods TexPoint fonts used in EMF.

Optimization of thermal processes

Optimization 吳育德.

Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.

Empirical Maximum Likelihood and Stochastic Process Lecture VIII.

Linear Discriminant Functions

1cs542g-term Notes  Assignment 1 due tonight ( me by tomorrow morning)

Geometry Optimization Pertemuan VI. Geometry Optimization Backgrounds Real molecules vibrate thermally about their equilibrium structures. Finding minimum.

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Numerical Optimization

Function Optimization Newton’s Method. Conjugate Gradients

Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.

1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .

Tutorial 12 Unconstrained optimization Conjugate gradients.

Methods For Nonlinear Least-Square Problems

Design Optimization School of Engineering University of Bradford 1 Numerical optimization techniques Unconstrained multi-parameter optimization techniques.

1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.

Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn

12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.

Function Optimization. Newton’s Method Conjugate Gradients Method

Advanced Topics in Optimization

Linear Discriminant Functions Chapter 5 (Duda et al.)

Why Function Optimization ?

Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.

An Introduction to Optimization Theory. Outline Introduction Unconstrained optimization problem Constrained optimization problem.

Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.

UNCONSTRAINED MULTIVARIABLE

Optimization Methods.

ENCI 303 Lecture PS-19 Optimization 2

84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.

Qualifier Exam in HPC February 10 th, Quasi-Newton methods Alexandru Cioaca.

Application of Differential Applied Optimization Problems.

Nonlinear programming Unconstrained optimization techniques.

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.

“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Circuits Theory Examples Newton-Raphson Method. Formula for one-dimensional case: Series of successive solutions: If the iteration process is converged,

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture 13. Geometry Optimization References Computational chemistry: Introduction to the theory and applications of molecular and quantum mechanics, E.

Chapter 10 Minimization or Maximization of Functions.

1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.

Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm

Variations on Backpropagation.

Survey of unconstrained optimization gradient based algorithms

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Root Finding Methods Fish 559; Lecture 15 a.

Non-linear Minimization

CS5321 Numerical Optimization

CS5321 Numerical Optimization

Outline Single neuron case: Nonlinear error correcting learning

Chapter 10. Numerical Solutions of Nonlinear Systems of Equations

~ Least Squares example

~ Least Squares example

Performance Optimization

Outline Preface Fundamentals of Optimization

Outline Preface Fundamentals of Optimization

Section 3: Second Order Methods

Presentation transcript:

Quasi-Newton Methods of Optimization Lecture 2

General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained minimization). Let x k be the current estimate of x*. –U1.[Test for convergence] If the conditions for convergence are satisfied, the algorithm terminates with x k as the solution. –U2.[Compute a search direction] Compute a non-zero n-vector p k, the direction of the search.

General Algorithm –U3.[Compute a step length] Compute a scalar a k, the step length, for which f(x k + a k p k )<f(x k ). –U4.[Update the estimate of the minimum] Set x k+1 = x k + a k p k, k=k+1, and go back to step U1. Given the steps to the prototype algorithm, I want to develop a sample problem that we can compare the various algorithms against.

General Algorithm –Using Newton-Raphson, the optimal point for this problem is found in 10 iterations using 1.23 seconds on the DEC Alpha.

Derivation of the Quasi- Newton Algorithm n An Overview of Newton and Quasi-Newton Algorithms The Newton-Raphson methodology can be used in U2 in the prototype algorithm. Specifically, the search direction can be determined by:

Derivation of the Quasi- Newton Algorithm Quasi-Newton algorithms involve an approximation to the Hessian matrix. For example, we could replace the Hessian matrix with the negative of the identity matrix for the maximization problem. In this case the search direction would be:

Derivation of the Quasi- Newton Algorithm This replacement is referred to as the steepest descent method. In our sample problem, this methodology requires 990 iterations and seconds on the DEC Alpha. –The steepest descent method requires more overall iterations. In this example, the steepest descent method requires 99 times as many iterations as the Newton-Raphson method.

Derivation of the Quasi- Newton Algorithm –Typically, the time spent on each iteration is reduced. Again, in the current comparison each the steepest descent method requires.123 seconds per iteration while Newton- Raphson requires.030 seconds per iteration.

Derivation of the Quasi- Newton Algorithm Obviously substituting the identity matrix uses no real information from the Hessian matrix. An alternative to this drastic reduction would be to systematically derive a matrix H k which uses curvature information akin to the Hessian matrix. The projection could then be derived as:

Derivation of the Quasi- Newton Algorithm n Conjugate Gradient Methods One class of Quasi-Newton methods are the conjugate gradient methods which “build” up information on the Hessian matrix. –From our standard starting point, we take a Taylor series expansion around the point x k + s k

Derivation of the Quasi- Newton Algorithm

n One way to generate B k+1 is to start with the current B k and add new information on the current solution

Derivation of the Quasi- Newton Algorithm

The Rank-One update then involves choosing v to be y k + B k s k. Among other things, this update will yield a symmetric Hessian matrix:

Derivation of the Quasi- Newton Algorithm Other than the Rank-One update, no simple vector will result in a symmetric Hessian. An alternative is to reconfigure the Hessian by letting the numeric be the 1/2 the sum of a numeric approximation plus itself transposed. This procedure yields the general update:

DFP and BFGS Two prominent conjugate gradient methods are the Davidon-Fletcher-Powell (DFP) update and the Broyden-Fletcher- Goldfarb-Shanno (BFGS) update. –In the DFP update v is set equal to y k yielding

DFP and BFGS –The BFGS update is then

DFP and BFGS n A Numerical Example Using the previously specified problem and starting with an identity matrix as the original Hessian matrix, each algorithm was used to maximize the utility function.

DFP and BFGS In discussing the difference in step, I will focus on two attributes. –The first attribute is the relative length of the step (the 2-norm). –The second attribute is the direction of the step. Dividing each vector by its 2-norm yields yields a normalized direction of the search

DFP and BFGS

Relative Performance –The Rank One Approximation Iteration 1

Relative Performance Iteration 2

Relative Performance –PSB Iteration 1

Relative Performance Iteration 2

Relative Performance –DFP Iteration 1

Relative Performance Iteration 2

Relative Performance –BFGS Iteration 1

Relative Performance Iteration 2