Download presentation

Presentation is loading. Please wait.

Published byDomenic Fredricks Modified about 1 year ago

1
Optimization

2
Issues What is optimization? What real life situations give rise to optimization problems? When is it easy to optimize? What are we trying to optimize? What can cause problems when we try to optimize? What methods can we use to optimize?

3
One-Dimensional Minimization Golden section search Brent’s method

4
One-Dimensional Minimization Golden section search: successively narrowing the brackets of upper and lower bounds Terminating condition: |x3–x1|< Start with x1,x2,x3 where f2 is smaller than f1 and f3 Iteration: Choose x4 somewhere in the larger interval Two cases for f4: f4a: [x1,x2,x4] f4b: [x2,x4,x3] Initial bracketing…

5
Upper bound a, lower bound b, initial estimate x f(a) > f(x) < f(b) This condition guarantees that a minimum is contained somewhere within the interval. On each iteration a new point x' is selected using one of the available algorithms. If the new point is a better estimate of the minimum, i.e. where f(x') < f(x), then the current estimate of the minimum x is updated. The new point also allows the size of the bounded interval to be reduced, by choosing the most compact set of points which satisfies the constraint f(a) > f(x) < f(b). The interval is reduced until it encloses the true minimum to a desired tolerance. This provides a best estimate of the location of the minimum and a rigorous error estimate. From GSL

6
Golden Section Search Guaranteed linear convergence: [x1,x3]/[x1,x4] = 1.618 [GSL] Choosing the golden section as the bisection ratio can be shown to provide the fastest convergence for this type of algorithm.

7
Golden Section (reference)reference

8
Fibonacci Search (ref)ref F i : 0, 1, 1, 2, 3, 5, 8, 13, … Related…

9
Parabolic Interpolation (Brent)

10
Brent Details (From GSL) The minimum of the parabola is taken as a guess for the minimum. If it lies within the bounds of the current interval then the interpolating point is accepted, and used to generate a smaller interval. If the interpolating point is not accepted then the algorithm falls back to an ordinary golden section step. The full details of Brent's method include some additional checks to improve convergence.

11
Brent(details) The abscissa x that is the minimum of a parabola through three points (a,f(a)), (b,f(b)), (c,f(c))

12
Multi-Dimensional Minimization Gradient Descent Conjugate Gradient

13
f: R n R. If f(x) is of class C 2, objective function Gradient of f Hessian of f Gradient and Hessian

14
Optimality Positive semi-definite Hessian Taylor’s expansion For one dimensional f(x)

15
Multi-Dimensional Optimization Higher dimensional root finding is no easier (more difficult) than minimization

16
Quasi-Newton Method The various quasi-Newton methods (DFP, BFGS, Broyden) differ in their choice of the solution to update B.quasi-Newton methods Taylor’s series of f(x) around x k : B: an approximation to the Hessian matrix The gradient of this approximation: Setting this gradient to zero provides the Newton step:

17
Gradient Descent Are the directions always orthogonal? Yes!

18
Example Minimize minimum

19
…

20
Gradient is perpendicular to level curves and surfaces (proof)proof

21
Weakness of Gradient Descent Narrow valley

22
where Any function f(x) can be locally approximated by a quadratic function Conjugate gradient method is a method that works well on this kind of problems

23
Conjugate Gradient An iterative method for solving linear systems Ax=b, where A is symmetric and positive definite Guaranteed to converge in n steps, where n is the system size Symmetric A is positive definite if it has (any of these): 1.All n eigenvalues are positive 2.All n upper left determinants are positive 3.All n pivots are positive 4.x T Ax is positive except at x = 0 Symmetric A is positive definite if it has (any of these): 1.All n eigenvalues are positive 2.All n upper left determinants are positive 3.All n pivots are positive 4.x T Ax is positive except at x = 0

24
Details (from wikipedia) Two nonzero vectors u & v are conjugate w.r.t. A: {p k } are n mutually conjugate directions. {p k } form a basis of R n. x *, the solution to Ax=b, can be expressed in this basis Therefore, Find p k ’s Solve k ’s Find p k ’s Solve k ’s

25
The Iterative Method Equivalent problem: find the minimal of the quadratic function, Taking the first basis vector p 1 to be the gradient of f at x = x 0 ; the other vectors in the basis will be conjugate to the gradient r k : the residual at k th step, Note that r k is the negative gradient of f at x = x k

26
The Algorithm

27
Example Stationary point at [-1/26, -5/26]

28
Solving Linear Equations The optimality condition seems to suggest that CG can be used to solve linear equations CG is only applicable for symmetric positive definite A. For arbitrary linear systems, solve the normal equation since A T A is symmetric and positive- semidefinite for any A But, k(A T A) = k(A)^2! Slower convergence, worse accuracy BiCG (biconjugate gradient) is the approach to use for general A

29
Multidimensional Minimizer [GSL] Conjugate gradient Fletcher-Reeves, Polak-Ribiere Quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) Utilizes 2 nd order approximation Steepest descent Inefficient (for demonstration purpose) Simplex algorithm (Nelder and Mead) Without derivative

30
GSL Example Objective function: paraboloid Starting from (5,7)

31
Conjugate gradient Converge in 12 iterations Steepest descent Converge in 158 iterations

32
[Solutions in Numerical Recipe] Sec.2.7 linbcg (biconjugate gradient): general A Reference A implicitly through atimes Sec.10.6 frprmn (minimization) Model test problem: spacetime, …

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google