 # Why Function Optimization ?

## Presentation on theme: "Why Function Optimization ?"— Presentation transcript:

Tutorial 11 Unconstrained optimization Steepest descent Newton’s method

Why Function Optimization ?
There are three main reasons why most problems in robotics, vision, and arguably every other science or endeavor take on the form of optimization problems: The desired goal may not be achievable, and so we try to get as close as possible to it. There may be more ways to achieve the goal, and so we can choose one by assigning a quality to all the solutions and selecting the best one. We may not know how to solve the system of equations f(x) = 0, so instead we minimize the norm ||f(x)||, which is a scalar function of the unknown vector x. Tutorial 11 M4CS 2005

Characteristics of Optimization Algorithms
x* = arg min f(x) Stability Under what conditions the minimum will be reached? Convergence speed N – the order of the algorithm (usually N=1,2, rarely 3) 3. Complexity How much time (CPU operations) takes each iteration. Tutorial 11 M4CS 2005

Line search Line search could run as follows: Let
be the scalar function of α representing the possible values of f(x) in the direction of pk. Let (a,b,c) be the three points of α, such that a single point of (constrained) minimum x* is between a and c: a < x* < c . Then the following algorithm allows to approach x* arbitrarily close: If f(a) ≥ f(c), u = (a+b)/2; If f(u) < f(b) (a,b,c) = (a,u,b) Else (a,b,c) = (u,b,c) a b c u If f(a) < f(c), u = (b+c)/2; If f(u) < f(b) (a,b,c) = (b,u,c) Else (a,b,c) = (a,b,u) Tutorial 11 M4CS 2005

Taylor Series The Taylor series for a scalar function f(x) is given by
,where The Taylor series can be derived by successive differentiation of polynomial representation of f(x): For the function of n variables, the expression is Tutorial 11 M4CS 2005

2D Taylor Series: Example
Consider an elliptic function: f(x,y)=(x-1)2+(2y-2)2 and find the first three terms of Taylor expansion. Tutorial 11 M4CS 2005

Steepest Descent: example
Consider the same elliptic function: f(x,y)=(x-1)2+(2y-2)2 and find the first step of Steepest Descent, from (0,0). 1 2 -f’(0) Now, we the line search can be applied. Instead, we do: Is it a minimum? Next step? Tutorial 11 M4CS 2005

Newton’s Method The steepest descent treats only the gradient term of Taylor expansion to finds the minimization direction and therefore has linear convergence rate. Newton’s method treats also the second derivatives to find both the direction and the step, and is applicable, where the function f(x) near minimum x* can be approximated by a paraboloid: in other words if the Hessian H is PD. Minimum of the function should require: Tutorial 11 M4CS 2005

Newton’s Method: Example
Consider the same elliptic function: f(x)=(x1-1)2+4(x2-2)2 and find the first step for Newton’s Method. 1 2 -f’(0) In this simple case, the description of the function with the first 3 Taylor terms is exact, and the first iteration converge to the minimum. Tutorial 11 M4CS 2005

Convergence rate 1/2 Before analyzing the convergence rate of steepest descent and Newton’s methods, we write again the Taylor series for the function: To simplify the proof we will consider the upper bound of this expansion and will take the norm of each term. Near the minimum point first derivative vanishes: The gradient of the function near the minimum behaves as following: Tutorial 11 M4CS 2005

Convergence rate 2/2 Now, consider the step k of the Newton’s method
The step is chosen to zero out the first two terms, and therefore, Thus, the derivative converges to zero in the second order. Since point of zero derivative corresponds to the minimum of the function, the Newtons method is of the second order. Tutorial 11 M4CS 2005

Complexity 1/2 For example, for a quadratic function
The steepest descent takes many iterations to converge in general case Q≠I, while the Newton’s method will require only one step. However, this single iteration in Newton's method is more expensive, because it requires both the gradient gk and the Hessian Hk to be evaluated, for a total of derivatives . In addition, the Hessian must be inverted, or, at least, a system must be solved. The explicit solution of this system requires about O(n3) operations and O(n2) memory, what is very expensive. Tutorial 11 M4CS 2005

Complexity 2/2 In contrast, steepest descent requires the gradient gk for selecting the step direction pk, and a line search in the direction pk to find the step size. These faster steps can be advantageous over faster convergence of Newton’s method for large dimensionality of x, which can exceed many thousands. In the next tutorial we will discuss the method of conjugate gradients, which is motivated by the desire to accelerate convergence with respect to the steepest descent method, but without paying the computation and storage cost of Newton's method. Tutorial 11 M4CS 2005