Engineering Optimization Concepts and Applications Fred van Keulen Matthijs Langelaar CLA H21.1 A.vanKeulen@tudelft.nl
Contents Unconstrained Optimization: Methods for Multiple Variables 1st order methods: CG 2nd order methods Quasi-Newton Methods Constrained Optimization: Optimality Criteria
Multivariate Unconstrained Optimization Algorithms Zeroth order (direct search): Random methods: random jumping / walk / S.A. Cyclic coordinate search Powell’s Conjugate Directions method Nelder and Mead Simplex method Biologically inspired algorithms (GA, swarms, …) Conclusions: Nelder-Mead usually best Inefficient for N>10
Fletcher-Reeves conjugate gradient method Based on building set of conjugate directions, combined with line searches Conjugate directions: Quadratic function: Conjugate directions: guaranteed convergence in N steps for quadratic problems (recall Powell: N cycles of N line searches)
CG practical Start with abritrary x1 Set first search direction: Line search to find next point: Next search direction: Repeat 3 Restart every (n+1) steps, using step 2
CG properties After N steps / bad convergence: restart procedure Theoretically converges in N steps or less for quadratic functions In practice: Non-quadratic functions Finite line search accuracy Round-off errors Slower convergence; > N steps In fact, Newton and quasi-Newton methods perform better. After N steps / bad convergence: restart procedure etc.
Application to mechanics (FE) Structural mechanics: Quadratic function! Equilibrium: CG: Not,e the line search is not performed, the line search length is computed based on K. Line search: Simple operations on element level. Attractive for large N!
Multivariate Unconstrained Optimization Algorithms (2) First order methods (descent methods): Steepest descent method (with line search) Fletcher-Reeves Conjugate Gradient method Quasi-Newton methods Conclusions (for now): Scaling important for Steepest descent (zig-zag) For quadratic problem, CG converges in N steps y1 y2
Unconstrained optimization algorithms Single-variable methods Multiple variable methods 0th order 1st order 2nd order (Skipping quasi-Newton methods for now, those are discussed next lecture)
Newton’s method Concept: Local approximation: 2nd order Taylor series: Construct local quadratic approximation Minimize approximation Repeat Local approximation: 2nd order Taylor series: Note, multidimensional version of the method discussed for single variable. First order sufficiency condition applied to approximation:
Newton’s method (2) Step: Update: Note: Evaluated at x Finds minimum of quadratic functions in 1 step! Step includes solving (dense) linear system of equations If H not positive definite, divergence occurs Potentially dense system of equations. If sparse, fast techniques are available. Also obtaining H itself can be rather costly.
Similarity to linear FE Linear mechanics problem: (quadratic in u) Newton: (Evaluated at u = 0) Nonlinear mechanics similar, with multiple steps.
Newton’s method (3) To avoid divergence: line search. Search direction: Newton’s method has quadratic convergence (best!) close to optimum: Error Iteration Update: Quadratic convergence: error reduces quadratically.
Levenberg-Marquardt method Problems in Newton method: Bad convergence / divergence when far from optimum What to do when H singular / not positive definite? Remedy: use modified H: with b such that H positive definite Levenberg-Marquardt method: start with large bk, and decrease gradually: Blend between Newton and Steepest Descent b large: Steepest descent b small: Newton
Trust region methods Different approach to make Newton’s method more robust: local approximation locally valid: x1 x2 Defines trust region Trust region adjusted based on approximation quality: Actual reduction Predicted reduction R = 1 is perfect approximation. R < 0.25 is very bad, so the trust region is reduced.
Trust region methods (2) Performance: robust, can also deal with neg. def. H Not sensitive to variable scaling Similar concept in computational mechanics: arc-length methods Trust region subproblem is quadratic constrained optimization problem: Rather expensive Approximate solution methods popular: dogleg methods, Steihaug method, … About dogleg methods: see here: http://www.numerical.rl.ac.uk/nimg/oupartc/
Newton summary Second order method: Newton’s method Conclusions: Most efficient: solves quadratic problem in 1 step Not robust! Robustness improvements: Levenberg-Marquardt (blend Steepest decent / Newton) Trust region approaches
Quasi-Newton methods Drawbacks remain for Newton method: Need to evaluate H (often impractical) Storage of H, solving system of equations Alternatives: quasi-Newton methods: Based on approximating H (or H-1) Use only first-order derivative information Quasi-Newton algorithms start with a simple pos.def. guess for H, for example I. Newton step: Quasi-Newton step: Update (or )
Quasi-Newton fundamentals First order Taylor approximation of gradient: Operating on H-1 avoids solving a linear system each step. Define: The Hessian approximation is corrected in each step in order to satisfy the Quasi-Newton condition. This leads to increasingly better approximations of H. General drawback: if H is sparse, this sparsity could be lost during the process. Update equations: Rank one update Rank two update
Update functions: rank 1 Notation: Broyden’s update: Drawback: updates not guaranteed to remain positive definite (possible divergence)
Update functions: rank 2 Davidson-Fletcher-Powell (DFP): Broyden, Fletcher, Goldfarb, Shannon (BFGS): DFP: ~1960. BFGS: 1970 (independently by each B/F/G/S) BFGS is in fact DFP applied to the Hessian. DFP sometimes gives singular B-matrices. Conjugate gradient aspect means that the set of search directions contains conjugate vectors. This means that for quadratic problems, the solution is found in at most N steps. Line searches are necessary to ensure a descent direction is used. To save computations, both algorithms are often used with inexact line searches, which have more relaxed stopping criteria compared to “exact” line searches [Haftka p.142] Both DFP and BFGS are conjugate gradient methods Rank 2 quasi-Newton methods: best general-purpose unconstrained optimization (hill climbing) methods
BFGS example Minimization of quadratic function: Starting point: Line search:
BFGS example (2) BFGS: Line search: Solution found.
BFGS example (3) Check: BFGS:
Comparison on Banana function Newton Newton: 18 Levenberg-Marquardt Levenberg-Marquardt: 29 DFP DFP: 260 BFGS BFGS: 53 Nelder-Mead simplex Nelder-Mead simplex: 210 Steepest descent Steepest descent: >300 (no convergence) 210
Summary multi-variable algorithms for unconstrained optimization Theoretically nothing beats Newton, but: Expensive computations and storage (Hessian) Not robust Quasi-Newton (BFGS) best if gradients available Zeroth-order methods robust and simple, but inefficient for n > 10 CG inferior to quasi-Newton methods, but calculations simple: more efficient for n > 1000 Exact line search important! Variable scaling can have large impact
Contents Unconstrained Optimization: Methods for Multiple Variables 1st order methods: CG 2nd order methods Quasi-Newton Methods Constrained Optimization: Optimality Criteria
Summary optimality conditions Conditions for local minimum of unconstrained problem: First Order Necessity Condition: Second Order Sufficiency Condition: H positive definite For convex f in convex feasible domain: condition for global minimum: Sufficiency Condition:
Boundary optima Today’s topic: optimality conditions for constrained problems f g2 f Interior optimum x2 g1 G1 could even be h1, an equality constraint. In this case the picture wouldn’t change. Also quickly repeat constraint activity. Boundary optima x1
Feasible perturbations / directions Consider feasible space X Feasible perturbation: Feasible direction s: line in direction s remains in X for some finite length: g2 g1 x1 x2 X y x
Boundary optimum Necessary condition for boundary optimum: f cannot decrease further in any feasible direction g2 g1 x1 x2 (no feasible direction exists for which f decreases) s Approach for numerical algorithms: move along feasible directions until condition holds
Equality constrained problem First, only equality constraints considered: Simplest case! Active inequality constraints can be treated as equality constraints Active inequality constraints can be identified by e.g. Monotonicity Analysis
Equality constrained problem (2) Each (functionally independent) equality constraint reduces the dimension of the problem: Problem dimension Solutions can only exist in the feasible subspace X of dimension n – m (hypersurface) Functionally independent means that it cannot be derived from (a combination of) other constraints. Examples: n = 3, m = 2: X = line (1-D) n = 3, m = 1: X = surface (2-D)
Description of constraint surface Why? Local characterization of constraint surface leads to optimality conditions Basis for numerical algorithms Assumptions: Constraints differentiable, functionally independent All points are regular points: Constraint gradients hi all linearly independent
Some definitions … Normal hyperplane: spanned by all gradients, normal to constraint surface Tangent hyperplane: orthogonal to the normal plane:
Normal / tangent hyperplane (2) Example: Tangent hyperplane (line) Normal hyperplane
Optimality conditions Simplest approach: Eliminate variables using equality constraints Result: unconstrained problem of dimension n – m Apply unconstrained optimality conditions But often not possible / practical: Elimination fails when variables cannot be explicitly solved from equality constraints No closed form of objective function (e.g. simulations)
Local approximation Approach: build local approximation for constrained case, using (very small) feasible perturbations: m + 1 equations, n + 1 unknowns (f, xi), where n > m n - m = p degrees of freedom
Local approximation (2) Divide design variables in two subsets: p decision/control variables d m state/solution variables s dependent independent
Dependent/independent variables Example:
Local approximation (3) Eliminating solution variable perturbation (dependent): Note, regularity of the current point has been used, because in that case dh/ds can be inverted.
Reduced gradient Variation of f expressed in decision variables: Reduced / constrained gradient Even when the dependent variables cannot be eliminated from the function f itself, they can easily be eliminated in the expression for the gradient / perturbation of f. f locally expressed in decision variables only:
Optimality condition z unconstrained function of decision variables d unconstrained optimality condition can be used: Optimality condition for equality-constrained problem: Reduced gradient zero.
Example x2 x1 m L Take Equilibrium:
Lagrange approach Alternative way to formulate optimality conditions: formulate Lagrangian: Lagrange multipliers Consider stationary point of L: Note, adding zero is always allowed. Lagrangian has the same stability points as f, when multipliers chosen properly.
Example x2 x1 m L Equilibrium:
Comparison Elimination: n – m variables, (Often not possible) Elimination: n – m variables, Reduced gradient: m decision variables, n - m solution variables (total n) Lagrange approach: n design variables, m multipliers (total m + n) In fact, the final equations that are solved are the same for the reduced gradient and Lagrange approach. The Lagrange approach however is often used in theoretical derivations, but not necessarily for implementations.
Application in mechanics Lagrangian method widely used to enforce kinematic constraints, e.g. multibody dynamics: Variational formulation of equations of motion (planar case): 3 equations of motion per body Generalized displacement Total kinetic energy Generalized forces
Multibody dynamics (2) Bodies connected by m joints 2m constraint equations (1 DOF per joint): Constrained equations of motion (based on Lagrangian): Same units as force. Interpretation: force required to satisfy constraint. System simulation: solve for qi (3n) and lj (2m) simultaneously
Geometrical interpretation For single equality constraint: simple geometrical interpretation of Lagrange optimality condition: f h x1 x2 Gradients parallel tangents parallel h tangent to isolines Meaning: h f For multiple equality constraints, this doesn’t work anymore, because the multipliers define a subspace. Since they can have any sign, there is no interpretation, other than the fact that the gradient must lie in this subspace.