Function Optimization. Newton’s Method Conjugate Gradients Method

Slides:



Advertisements
Similar presentations
Chapter 4 Euclidean Vector Spaces
Advertisements

Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Optimization.
ESSENTIAL CALCULUS CH11 Partial derivatives
Optimization : The min and max of a function
Optimization of thermal processes
Optimization 吳育德.
Direction Set (Powell’s) Methods in Multidimensions
PARTIAL DERIVATIVES 14. PARTIAL DERIVATIVES 14.6 Directional Derivatives and the Gradient Vector In this section, we will learn how to find: The rate.
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Function Optimization Newton’s Method. Conjugate Gradients
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
Lecture 2: Numerical Differentiation. Derivative as a gradient
Tutorial 12 Unconstrained optimization Conjugate gradients.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Optimization Methods One-Dimensional Unconstrained Optimization
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Tutorial 5-6 Function Optimization. Line Search. Taylor Series for Rn
Optimization Methods One-Dimensional Unconstrained Optimization
Math for CSLecture 11 Mathematical Methods for Computer Science Lecture 1.
Math for CSTutorial 5-61 Tutorial 5 Function Optimization. Line Search. Taylor Series for R n Steepest Descent.
Advanced Topics in Optimization
Linear Discriminant Functions Chapter 5 (Duda et al.)
Why Function Optimization ?
Math for CSLecture 51 Function Optimization. Math for CSLecture 52 There are three main reasons why most problems in robotics, vision, and arguably every.
Optimization Methods One-Dimensional Unconstrained Optimization
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.
Quantum One: Lecture 8. Continuously Indexed Basis Sets.

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
ENCI 303 Lecture PS-19 Optimization 2
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
Nonlinear programming Unconstrained optimization techniques.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
1 Optimization Multi-Dimensional Unconstrained Optimization Part II: Gradient Methods.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.
1 MODELING MATTER AT NANOSCALES 4. Introduction to quantum treatments The variational method.
559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Chapter 10 Minimization or Maximization of Functions.
1 Chapter 6 General Strategy for Gradient methods (1) Calculate a search direction (2) Select a step length in that direction to reduce f(x) Steepest Descent.
Gradient Methods In Optimization
Survey of unconstrained optimization gradient based algorithms
Signal & Weight Vector Spaces
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Function Optimization
Non-linear Minimization
Chapter 14.
CS5321 Numerical Optimization
Chap 3. The simplex method
Conjugate Gradient Method
Stability Analysis of Linear Systems
Performance Surfaces.
Performance Optimization
Outline Preface Fundamentals of Optimization
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Conjugate Direction Methods
Presentation transcript:

Function Optimization. Newton’s Method Conjugate Gradients Method Tutorial 5-6 Function Optimization. Newton’s Method Conjugate Gradients Method Math for CS Tutorial 5-6

Newton’s Method In Lecture 5 we have seen that the steepest descent method can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function f(x) near x* can be approximated by a paraboloid: ,where and (1) Math for CS Tutorial 5-6

Newton’s Method 2 Here gk is the gradient and Qk is the Hessian of the function f, evaluated at xk . They appear in the 2nd and 3rd terms of the Taylor expansion of f(xk). Minimum of the function should require: The solution of this equation gives the step direction and the step size towards the minimum of (2), which is, presumably, close to the minimum of f(x). The minimization algorithm in which xk+1=y(xk)=xk+∆, with ∆ defined by (2) is called a Newton’s method. (2) Math for CS Tutorial 5-6

Newton’s Method: Example Consider the same elliptic function: f(x,y)=(x1-1)2+4(x2-2)2 and find the first step for Newton’s Method. 1 2 -f’(0) Math for CS Tutorial 5-6

Conjugate Gradient Suppose that we want to minimize the quadratic function where Q is a symmetric, positive definite matrix, and x has n components. As we saw in explanation of steepest descent, the minimum x* is the solution to the linear system The explicit solution of this system requires about O(n3) operations and O(n2) memory, what is very expensive. Math for CS Tutorial 5-6

Conjugate Gradients 2 We now consider an alternative solution method that does not need Q, but only the gradient of f(xk) evaluated at n different points x1 , . . ., xn. Gradient Conjugate Gradient Math for CS Tutorial 5-6

Conjugate Gradients 3 Consider the case n = 3, in which the variable x in f(x) is a three-dimensional vector . Then the quadratic function f(x) is constant over ellipsoids, called isosurfaces, centered at the minimum x* . How can we start from a point xo on one of these ellipsoids and reach x* by a finite sequence of one-dimensional searches? In the steepest descent, for the poorly conditioned Hessians orthogonal directions lead to many small steps, that is, to slow convergence. Math for CS Tutorial 5-6

Conjugate Gradients: Spherical Case When the ellipsoids are spheres, on the other hand, the convergence is much faster: first step takes from xo to x1 , and the line between xo and x1 is tangent to an isosurface at x1 . The next step is in the direction of the gradient, takes us to x* right away. Suppose however that we cannot afford to compute this special direction p1 orthogonal to po, but that we can only compute some direction p1 orthogonal to po (there is an n-1 -dimensional space of such directions!) and reach the minimum of f(x) in this direction. In that case n steps will take us to x* of the sphere, since coordinate of the minimum in each on the n directions is independent of others. Math for CS Tutorial 5-6

Conjugate Gradients: Elliptical Case Any set of orthogonal directions, with a line search in each direction, will lead to the minimum for spherical isosurfaces. Given an arbitrary set of ellipsoidal isosurfaces, there is a one-to-one mapping with a spherical system: if Q = UEUT is the SVD of the symmetric, positive definite matrix Q, then we can write ,where (4) (5) Math for CS Tutorial 5-6

Elliptical Case 2 Consequently, there must be a condition for the original problem (in terms of Q) that is equivalent to orthogonality for the spherical problem. If two directions qi and qj are orthogonal in the spherical context, that is, if what does this translate into in terms of the directions pi and pj for the ellipsoidal problem? We have (6) Math for CS Tutorial 5-6

Elliptical Case 3 Consequently, What is This condition is called Q-conjugacy, or Q-orthogonality : if equation (7) holds, then pi and pj are said to be Q-conjugate or Q-orthogonal to each other. Or simply say "conjugate". (7) Math for CS Tutorial 5-6

Elliptical Case 4 In summary, if we can find n directions po, . . .,pn_1 that are mutually conjugate, i.e. comply with (7), and if we do line minimization along each direction pk, we reach the minimum in at most n steps. Of course, we cannot use the transformation (5) in the algorithm, because E and especially UT are too large. So we need to find a method for generating n conjugate directions without using either Q or its SVD . Math for CS Tutorial 5-6

Hestenes Stiefel Procedure Where Math for CS Tutorial 5-6

Hestenes Stiefel Procedure 2 It is simple to see that pk and pk+1 are conjugate. In fact, The proof that pi and pk+1 for i = 0, . . . , k are also conjugate can be done by induction, based on the observation that the vectors pk are found by a generalization of Gram-Schmidt to produce conjugate rather than orthogonal vectors. Math for CS Tutorial 5-6

Removing the Hessian In the described algorithm the expression for yk contains the Hessian Q, which is too large. We now show that yk can be rewritten in terms of the gradient values gk and gk+1 only. To this end, we notice That Or Proof: So that Math for CS Tutorial 5-6

Removing the Hessian 2 We can therefore write and Q has disappeared . This expression for yk can be further simplified by noticing that because the line along pk is tangent to an isosurface at xk+l , while the gradient gk+l is orthogonal to the isosurface at xk +l. Math for CS Tutorial 5-6

Polak-Ribiere formula Similarly, Then, the denominator of yk becomes In conclusion, we obtain the Polak-Ribiere formula Math for CS Tutorial 5-6