Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca.

Qualifier Exam in HPC February 10 th, 2010

Quasi-Newton methods Alexandru Cioaca

Quasi-Newton methods (nonlinear systems)  Nonlinear systems: F(x) = 0,F : R n  R n F(x) = [ f i (x 1,…,x n ) ] T  Such systems appear in the simulation of processes (physical, chemical, etc.)  Iterative algorithm to solve nonlinear systems  Newton’s method != Nonlinear least-squares

Quasi-Newton methods (nonlinear systems) Standard assumptions 1. F – continuously differentiable in an open convex set D 2. F – Lipschitz continuous on D 3. There is x * in D s.t. F(x * )=0, F’(x * ) nonsingular Newton’s method: Starting from x 0 (initial iterate) x k+1 = x k – F’(x k ) -1 * F(x k ),{x k }  x * Until termination criterion is satisfied

Quasi-Newton methods (nonlinear systems)  Linear model around x k : M n (x) = F(x n ) + F’(x n )(x-x n ) M n (x) = 0  x n+1 = x n - F’(x n ) -1 *F(x n )  Iterates are computed as: F’(x n ) * s n = F(x n ) x n+1 = x n - s n

Quasi-Newton methods (nonlinear systems) Evaluate F’(x n )  Symbolically  Numerically with finite differences  Automatic differentiation Solve the linear system F’(x n ) * s n = F(x n )  Direct solve: LU, Cholesky  Iterative methods: GMRES, CG

Quasi-Newton methods (nonlinear systems) Computation:  F(xk)n scalar functions  F’(xk)n 2 scalar functions  LUO(2n 3 /3)  CholeskyO(n 3 /3)  Krylov methods(depends on condition number)

Quasi-Newton methods (nonlinear systems)  LU and Cholesky are useful when we want to reuse the factorization (quasi-implicit)  Difficult to parallelize and balance the workload  Cholesky is faster and more stable but needs SPD (!)  For n large, factorization is very impractical (n~10 6 )  Krylov methods contain elements easily parallelizable (updates, inner products, matrix-vector products)  CG is faster and more stable but needs SPD

Quasi-Newton methods (nonlinear systems) Advantages:  Under standard assumptions, Newton’s method converges locally and quadratically  There exists a domain of attraction S which contains the solution  Once the iterates enter S, they stay in S and eventually converge to x*  The algorithm is memoryless (self-corrective)

Quasi-Newton methods (nonlinear systems) Disadvantages:  Convergence depends on the choice of x 0  F’(x) has to be evaluated for each x k  Computation can be expensive: F(x k ), F’(x k ), s k

Quasi-Newton methods (nonlinear systems)  Implicit schemes for ODEs y’ = f(t,y) Forward Euler: y n+1 = y n + hf(t n,y n )(explicit) Backward Euler: y n+1 = y n + hf(t n+1, y n+1 ) (implicit)  Implicit schemes need the solution of a nonlinear system (also CN, RK, LMF)

Quasi-Newton methods (nonlinear systems)  How to circumvent evaluating F’(x k ) ?  Broyden’s method B k+1 = B k + (y k – B k *s k )*s k T / x k+1 = x k – B k -1 * F(x k )  Inverse update (Sherman-Morrison formula) H k+1 =H k +(s k -H k *y k )*s k T *H k / x k+1 = x k – H k * F(x k ) ( s k+1 = x k+1 – x k,y k+1 = F(x k+1 ) – F(x k ) )

Quasi-Newton methods (nonlinear systems) Advantages:  No need to compute F’(x k )  For inverse update – no linear system to solve Disadvantages:  Superlinear convergence  No longer memoryless

Quasi-Newton methods (unconstrained optimization)  Problem: Find the global minimizer of a cost function f : R n  R, x * = arg min f  f differentiable means the problem can be attacked by looking for zeros of the gradient

Quasi-Newton methods (unconstrained optimization)  Descent methods x k+1 =x k – λ k *P k *  f(x k ) P k = I n -steepest descent P k =  2 f(x k ) -1 -Newton’s method P k = B k -1 -Quasi-Newton  Angle between P k,  f(x k ) less than 90  B k has to mimic the behavior of the Hessian

Quasi-Newton methods (unconstrained optimization) Global convergence  Line search Step length: backtracking, interpolation Sufficient decrease: Wolfe conditions  Trust regions

Quasi-Newton methods (unconstrained optimization) For Quasi-Newton, B k has to resemble  2 f(x k )  Single-Rank:  Symmetry:  Positive def.:  Inverse update:

Quasi-Newton methods (unconstrained optimization) Computation  Matrix updates, inner products  DFP, PSB3 matrix-vector products  BFGS 2 matrix-matrix products Storage  Limited memory versions (L-BFGS)  Store {sk, yk} for the last m iterations and recompute H

Further improvements Preconditioning the linear system  For faster convergence one may solve K*B k *p k = K*F(x k )  If B is spd (and sparse) we can use sparse approximate inverses to generate the preconditioner  This preconditioner can be refined on a subspace of B k using an algebraic multigrid technique  We need to solve the eigenvalue problem

Further improvements Model reduction  Sometimes the dimension of the system is very large  Smaller model that captures the essence of the original  An approximation of the model variability can be retrieved from an ensemble of forward simulations  The covariance matrix gives the subspace  We need to solve the eigenvalue problem

QR/QL algorithms for symmetric matrices  Solves the eigenvalue problem  Iterative algorithm  Uses QR/QL factorization at each step (A=Q*R, Q unitary, R upper triangular) for k = 1,2,.. A k =Q k *R k A k+1 =R k *Q k end  Diagonal of A k converges to eigenvalues of A

QR/QL algorithms for symmetric matrices  The matrix A is reduced to upper Hessenberg form before starting the iterations  Householder reflections (U=I-v*v’)  Reduction is made column-wise  If A is symmetric, it is reduced to tridiagonal form

QR/QL algorithms for symmetric matrices  Convergence to a triangular form can be slow  Origin shifts are used to accelerate it for k = 1,2,.. A k -z k *I=Q k *R k A k+1 =R k *Q k +z k *I end  Wilkinson shift  QR makes heavy use of matrix-matrix products

Alternatives to quasi-Newton Inexact Newton methods  Inner iteration – determine a search direction by solving the linear system with a certain tolerance  Only Hessian-vector products are necessary  Outer iteration – line search on the search direction Nonlinear CG  Residual replaced by gradient of cost function  Line search  Different flavors

Alternatives to quasi-Newton Direct search  Does not involve derivatives of the cost function  Uses a structure called simplex to search for decrease in f  Stops when further progress cannot be achieved  Can get stuck in a local minima

More alternatives Monte Carlo  Computational method relying on random sampling  Can be used for optimization (MDO), inverse problems by using random walks  In the case where we have multiple correlated variables, the correlation matrix is spd so we can use Cholesky to factorize it

Conclusions  Newton’s method is a very powerful method with many applications and uses (solving nonlinear systems, finding minima of cost functions). Newton’s method can be used together with many other numerical algorithms (factorizations, linear solvers)  The optimization and parallelization of matrix-vector, matrix-matrix products, decompositions and other numerical methods can have a significant impact in overall performance

Thank you for your time!

Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca.

Similar presentations

Presentation on theme: "Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca.

Similar presentations

Presentation on theme: "Qualifier Exam in HPC February 10 th, 2010. Quasi-Newton methods Alexandru Cioaca."— Presentation transcript:

Similar presentations

About project

Feedback