Richard Tapia (Research joint with John Dennis) Rice University

Slides:



Advertisements
Similar presentations
Differential Equations Brannan Copyright © 2010 by John Wiley & Sons, Inc. All rights reserved. Chapter 08: Series Solutions of Second Order Linear Equations.
Advertisements

Engineering Optimization
Separating Hyperplanes
Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
Lecture 13 - Eigen-analysis CVEN 302 July 1, 2002.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Numerical Optimization
Maximum likelihood (ML) and likelihood ratio (LR) test
Function Optimization Newton’s Method. Conjugate Gradients
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Ch 7.9: Nonhomogeneous Linear Systems
Maximum likelihood (ML)
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
Chapter 6 Eigenvalues.
Martin Burger Institut für Numerische und Angewandte Mathematik European Institute for Molecular Imaging CeNoS Total Variation and related Methods Numerical.
Maximum likelihood (ML) and likelihood ratio (LR) test
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Advanced Topics in Optimization
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Constrained Optimization Rong Jin. Outline  Equality constraints  Inequality constraints  Linear Programming  Quadratic Programming.
Maximum likelihood (ML)
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Normalised Least Mean-Square Adaptive Filtering
Computational Optimization
ORDINARY DIFFERENTIAL EQUATION (ODE) LAPLACE TRANSFORM.
Linear Algebra/Eigenvalues and eigenvectors. One mathematical tool, which has applications not only for Linear Algebra but for differential equations,
Computing the Fundamental matrix Peter Praženica FMFI UK May 5, 2008.
Algorithms for a large sparse nonlinear eigenvalue problem Yusaku Yamamoto Dept. of Computational Science & Engineering Nagoya University.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
84 b Unidimensional Search Methods Most algorithms for unconstrained and constrained optimisation use an efficient unidimensional optimisation technique.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Simplex method (algebraic interpretation)
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
D Nagesh Kumar, IIScOptimization Methods: M2L4 1 Optimization using Calculus Optimization of Functions of Multiple Variables subject to Equality Constraints.
“On Sizing and Shifting The BFGS Update Within The Sized-Broyden Family of Secant Updates” Richard Tapia (Joint work with H. Yabe and H.J. Martinez) Rice.
Numerical Methods.
Lecture 26 Molecular orbital theory II
Chapter 4 Sensitivity Analysis, Duality and Interior Point Methods.
Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.
State Observer (Estimator)
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
Linear Programming Chapter 9. Interior Point Methods  Three major variants  Affine scaling algorithm - easy concept, good performance  Potential.
2D Complex-Valued Eikonal Equation Peijia Liu Department of Mathematics The University of Texas at Austin.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
MA5233 Lecture 6 Krylov Subspaces and Conjugate Gradients Wayne M. Lawton Department of Mathematics National University of Singapore 2 Science Drive 2.
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.
MA2213 Lecture 9 Nonlinear Systems. Midterm Test Results.
MA2213 Lecture 12 REVIEW. 1.1 on page Compute Compute quadratic Taylor polynomials for 12. Compute where g is the function in problems 11 and 12.
§7-4 Lyapunov Direct Method
Degree and Eigenvector Centrality
Chap 3. The simplex method
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
PRELIMINARY MATHEMATICS
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Inequalities Some problems in algebra lead to inequalities instead of equations. An inequality looks just like an equation, except that in the place of.
Simplex method (algebraic interpretation)
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Chapter 2. Simplex method
Multivariable optimization with no constraints
Presentation transcript:

Inverse, Shifted Inverse, and Rayleigh Quotient Iteration as Newton’s Method Richard Tapia (Research joint with John Dennis) Rice University Department of Computational and Applied Mathematics Center for Excellence and Equity in Education Berkeley, California September 22, 2005

Outline Preliminaries on Newton's Method The Babylonians calculation of the square root of 2 Development of algorithms for the eigenvalue problem Development of optimization algorithms and equivalences to eigenvalue algorithms An alternative development of Rayleigh Quotient Iteration

Newton’s Method To approximate x* the solution of the square nonlinear system Newton's method computes by solving the square linear system

Newton’s Method (cont.) Equivalently: Hopefully:

Interpretation of Newton’s Method Geometric:

Interpretation of Newton’s Method (cont.) Algebraic: Use the first-order Taylor approximation to replace the nonlinear problem with the linear problem and set

Cookbook Theory Consider the fixed point problem and the obvious iterative method Def: Local convergence

Cookbook Theory (cont.) Def: Convergence rate Linear: Quadratic: Cubic:

Theory in a Nutshell Local convergence if Quadratic convergence if Cubic convergence if and

Newton's Method and Therefore: Newton’s method is locally and quadratically convergent, i.e., it is effective and fast.

The Babylonian's Computation of Consider: Write: Iterate: Failure:

The Babylonian's Computation of (cont.) Prune and tune: add x to both sides Simplify: Iterate: The resulting algorithm is effective and fast.

Today’s Message When an algorithm has been pruned and tuned so that it is fast and effective, then it is Newton’s Method, perhaps in a very disguised form.

The Babylonian's Method is Newton’s Method

The Symmetric Eigenvalue Problem Given symmetric find such that Rayleigh quotient: Remark:

Chronology 1870’s – Lord Rayleigh: 1944 – H. Wielandt: Suggests fractional iteration ( is the first unit coordinate vector) (shifted and normalized inverse iteration) Note: σ =0 gives inverse power.

Chronology (cont.) 1945 – J. Todd (lecture notes), “approximately solve”:

(any unit coordinate vector) Chronology (cont.) 1949 – W. Kohn: (in a letter to the editor) suggests (any unit coordinate vector) Kohn argues (without a rigorous proof) quadratic rate for

(this is unnormalized RQI) Chronology (cont.) 1951 – S. Crandall: (this is unnormalized RQI) Tacitly assuming convergence Crandall established cubic rate for . Remark: will not converge. However, can converge to .

Chronology (cont.) 1957 – A.M. Ostrowski: Rigorously established quadratic rate for Observed that Wielandt’s fractional iteration combined with a) above leads to (unnormalized RQI) Ostrowski rigorously established cubic rate for Ostrowski aware of Kohn, but unaware of Crandall, Forsythe points out (while paper in press) (10/57) Crandall’s earlier work (RQI algorithm and cubic rate). Ostrowski acknowledges and remains cool. (viewed as implementation of Todd)

Chronology (cont.) 1958 – A.M. Ostrowski (Ostrowski fights back): Points out cubic in does not imply cubic in . Points out that a normalization* can be applied to to guarantee convergence of to an eigenvector. * Wielandt had a normalization

Chronology (cont.) This “tuning and pruning” process has taken us in the time period 1870’s to 1958 to Rayleigh Quotient Iteration (RQI): Solve: Normalize: Attributes: Requires solution of linear system Fast convergence (cubic) Observation: At solution RQI is singular.

Normalized Newton’s Method: (NNM) For the nonlinear equation problem when it is known that Solve: Normalize: Attributes: Requires solution of a linear system Fast convergence (quadratic)

Some Points to Ponder How much did Helmut Wielandt know or not know? Wielandt gave the eigenvalue approximation update formula Theorem. Wielandt iteration is equivalent to Newton’s method on

Some Points to Ponder (cont.) Wielandt stressed nonsymmetric A. Wielandt gave other eigenvalue formulas, but he never suggested the Rayleigh quotient; did he know it? If he had he would have given normalized RQI in 1944. “RQI performs better than Wielandt iteration for both symmetric and nonsymmetric matrices with real eigenvalues, except for the example matrix given in Wielandt’s paper.” Rice Graduate Student

Some Points to Ponder (cont.) Peters and Wilkinson (1979) observe that obtained from inverse iteration and obtained from one step of Newton on are the same (Newton to first order)

Some Points to Ponder (cont.) They propose Newton on In this case the normalization is automatically satisfied and the quadratic convergence is assured, unlike the two-norm case (wait). They did not realize that they had reinvented Wielandt iteration.

“BELIEF IN FOLKLORE” There exists F such that RQI  NNM on F. 28

Motivation (John Dennis Question) Is RQI  Normalized Newton on (*) Theorem: Given any x, not an eigenvector of A, the Newton iterate on (*) is ZERO. (Normalized Newton is not defined) (Newton’s Method goes directly to the unique singular point of F.)

Nonlinear Programming and Our Equivalences 30

Equality Constrained Nonlinear Program Lagrangian Function:

Lagrange Multiplier Rule (First Order Necessary Conditions) (*)

Constructing The Eigen-Nonlinear Program Consider This problem is not well posed for Newton’s Method. It has several continua of solutions. We can obtain isolated solutions for simple eigenvalues by adding the constraint

The Eigen-Nonlinear Program Necessary conditions: Observations: Unit Eigenvectors  Stationary Points Eigenvalues  Lagrange Multipliers

The Mathematicians Crutch Reduce the problem at hand to a solved problem, or a sequence of such solved problems. Example: Differentiation and the partial derivative Example: Newton’s Method – reduces the solution of a square nonlinear system of equations, to solving a sequence of square linear systems of equations. The present case: SUMT– Sequential unconstrained minimization techniques. We reduce the solution of the constrained optimization problem to solving a sequence of unconstrained problems.

Penalty Function Method (Courant 1944) Observation: Stationary points of our equality constrained optimization problem do not correspond to stationary points of P(x;) Proof:

Penalty Function Method (cont.) Observation: We need h(x)= at a solution but  ≠ 0 and h(x)=0; hence we must have We therefore perform the minimization sequentially for a sequence such that

Equivalence Result Normalized Inverse Iteration Solve Normalize Theorem: Normalized inverse iteration is equivalent to normalized Newton's method on the penalty function for the eigen-nonlinear program

Equivalence Result (cont.) But wait, we don’t get quadratic convergence. Why? Answer: For eigenpair with and we have Therefore, the unit eigenvector is not a stationary point of for any , as we pointed out before.

Multiplier Method (Hestenes 1968) Augmented Lagrangian: Observation: As before stationary points of our equality constrained optimization problem do not correspond to stationary points (in x) of

Multiplier Method (cont.) Given obtain as the solution of Remark: Hestenes suggested the multiplier method because it does not need as did the penalty function method. In the penalty function method causes arbitrarily bad conditioning.

Equivalence Result Normalized Shifted Inverse Iteration with shift  Solve Normalize Theorem:Normalized shifted inverse iteration with shift  is equivalent to normalized Newton's method on the augmented Lagrangian for the eigen-nonlinear program with multiplier estimate , i.e., on

Equivalence Result (cont’d) But wait, we don’t have quadratic convergence. Why? Answer: Same as for the penalty function case.

In the Interest of Efficiency We Ask Question: Can we move away from a sequence of unconstrained optimization problems to just one unconstrained optimization problem, i.e., a so-called exact penalty function.

The Fletcher Exact Penalty Function (1970) where and λ(x) is the well-known Lagrange Multiplier Approximation Formula observe that

Theorem: For  sufficiently large our constrained minimizers are minimizers of , i.e.  is an exact penalty function. Criticism: How large is sufficiently large Derivatives are out of phase, i.e. involves

Correcting the Deficiencies of the Fletcher Exact Penalty Function The Multiplier Substitution Equation (Tapia 1978) Instead of considering the unconstrained optimization problem consider the nonlinear equation problem

Observation: where Remark: System is a square nonlinear system of equations; hence, the use of Newton's method is appropriate.

Theorem Stationary points of the equality constrained optimization problem correspond to zeros of the multiplier substitution equation. Proof (very pretty) Substitute

Theorem (cont.) multiply by to obtain therefore and so and

Equivalence Result Normalized Rayleigh Quotient Iteration Solve Normalize Theorem: Normalized Rayleigh Quotient Iteration is equivalent to normalized Newton's method on the multiplier substitution equation, for any  ≠ 0, i.e., on

Equivalence Result (cont.) where Remark: We finally have our equivalence.

But wait, we don’t have quadratic convergence (indeed we have cubic) But wait, we don’t have quadratic convergence (indeed we have cubic). Why?

on the Cubic Convergence of RQI Some Observations on the Cubic Convergence of RQI 54

Theorem If is simple, then is nonsingular for all x in a neighborhood of each unit eigenvector corresponding to and for all  ≠ 0.   Moreover, the Normalized Newton Method is locally cubically convergent with convergence constant less than or equal to one.

Implication Although asymptotically RQI becomes a singular algorithm, this singularity is removable.*   *Jim Wilkinson in the 1960’s and 1970’s repeatedly stated that the singularity doesn’t hurt – the angle is good, the magnitude is not important.

Cubic Convergence Newton Operator: Normalized Newton Operator:   Normalized Newton Operator:   Lemma: Let be a unit eigenvector corresponding to a simple eigenvalue, then a. (quadratic) b.

Cubic Convergence (cont.) Surprise: If then Conclusion: (Must normalize in 2-norm) For good x-behavior.

Two Fascinating Reflections The beauty of Euclidean geometry If then

Two Fascinating Reflections (cont.) The Babylonian shift (a universal tool?) In “deriving” Newton’s Method for , when uniformly failed they corrected by shifting by x, i.e. they considered

Two Fascinating Reflections (cont.) In deriving Newton’s method as Rayleigh quotient iteration for symmetric eigenvalue problem, when uniformly failed, we considered so that i.e., we corrected by shifting by x.

An Alternative Approach to Deriving RQI Consider the eigen-nonlinear program

Newton’s Method where convergence of is quadratic.

Normalized Newton’s Method (closing the norm constraint) Tapia- Whitley 1988: The convergence rate of is The convergence rate of is Closing the constraint improved the convergence rate from 2 to

Normalized Newton’s Method (cont.) Idea: Let’s choose to “close” the equation Remark: We can’t do this, because for fixed this is an overdetermined system in  . O.K., let’s “close” in the least-squares sense, i.e., let be the solution of

Normalized Newton’s Method (cont.) this gives Our closed in x and least-squares closed in  Newton’s method becomes

Normalized Newton’s Method (cont.) Theorem: This algorithm is RQI. Therefore   convergence of is cubic convergence of is cubic.

Summary Newton’s Method gives quadratic convergence. Newton’s Method followed by (normalization) a closing of the norm constraint gives a convergence rate of Newton’s Method followed by a (normalization) closing of the norm constraint, followed by a least-squares closing of the gradient of the Lagrangian equal to zero equation, gives a convergence rate of 3 and also gives us Rayleigh Quotient Iteration.