# CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods (II) Chung-Kuan Cheng.

## Presentation on theme: "CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods (II) Chung-Kuan Cheng."— Presentation transcript:

CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods (II) Chung-Kuan Cheng

2 Outline  Introduction  Direct Methods  Iterative Methods Formulations Projection Methods Krylov Space Methods Preconditioned Iterations Multigrid Methods Domain Decomposition Methods

3 Introduction Direct Method LU Decomposition Iterative Methods Jacobi Gauss-Seidel Conjugate Gradient GMRES Multigrid Domain Decomposition Preconditioning General and Robust but can be complicated if N>= 1M Excellent choice for SPD matrices Remain an art for arbitrary matrices

4 Formulation  Error in A norm Matrix A is SPD (Symmetric and Positive Definite  Minimal Residue Matrix A can be arbitrary

5 Formulation: Error in A norm Min E(x)=1/2 x T Ax – b T x, A is SPD Suppose that we knew Ax * =b E(x)=1/2 (x-x * ) T A(x-x * ). For Krylov space approach, search space: x=x 0 +Vy, where x 0 is an initial solution, matrix V nxm is given and full ranked vector y m contains the m variables.

6 Solution and Error Min E(x)=1/2x T Ax - b T x, Search space: x=x 0 +Vy, r=b-Ax Derivation 1.V T AV is nonsingular (A is SPD & V is full ranked 2.The variable: y=(V T AV) -1 V T r 0 (derivation) 3.Thus, solution: x=x 0 +V(V T AV) -1 V T r 0 Property of the solution 1.Residue: V T r=0 (proof) 2.Error: E(x)=E(x 0 )-1/2 r 0 T V(V T AV) -1 V T r 0

7 Solution and Error Min E(x)=1/2x T Ax - b T x, Search space: x=x 0 +Vy, r=b-Ax The variable: y=(V T AV) -1 V T r 0 Derivation of y: Use the condition that dE/dy=0 We have V T AVy+V T Ax 0 -V T b=0 Thus y=(V T AV) -1 V T (b-Ax 0 )

8 Solution and Error Min E(x)=1/2x T Ax - b T x, Search space: x=x 0 +Vy, r=b-Ax Property of Solution 1.Residue: V T r=0 (proof) Proof: V T r=V T (b-Ax)=V T (b-Ax 0 -AV(V T AV) -1 V T r 0 ) =V T (r 0 -AV(V T AV) -1 V T r 0 )=0 The residue of the solution is orthogonal to the previous bases.

9 Transformation of the bases For any V’=VW, where V nxm is given and W mxm is nonsigular, solution x remains the same for search space: x=x 0 +V’y Proof: 1.V’ T AV’ is nonsingular (A is SPD & V’ is full ranked) 2.x=x 0 +V’(V’ T AV’) -1 V’ T r 0 =x 0 +V(V T AV) -1 V T r 0 3.V’ T r=W T V T r=0 4.E(x)=E(x 0 )-1/2 r 0 T V(V T AV) -1 V T r 0

10 Steepest Descent: Error in A norm Min E(x)=1/2 x T Ax - b T x, Gradient: -r = Ax-b Set x=x 0 +yr 0 1.r 0 T Ar 0 is nonsingular (A is SPD) 2.y=(r 0 T Ar 0 ) -1 r 0 T r 0 3.x=x 0 +r 0 (r 0 T Ar 0 ) -1 r 0 T r 0 4.r 0 T r=0 5.E(x)=E(x 0 )-1/2 (r 0 T r 0 ) 2 /(r 0 T Ar 0 )

11 Lanczos: Error in A norm Min E(x)=1/2 x T Ax-b T x, Set x=x 0 +Vy 1.v 1 =r 0 2.v i is in K{r 0, A, i} 3.V=[v 1,v 2, …,v m ] is orthogonal 4.AV=VH m +v m+1 e m T 5.V T AV=H m Note since A is SPD, H m is T m Tridiagonal

12 Lanczos: Derive from Arnoldi Process Arnoldi Process: AV=VH+h m+1 v m+1 e m T Input A, v 1 =r 0 /|r 0 | Output V =[v 1,v 2, …, v m ], H and h m+1, v m+1 k=0, h 10 =1 While h k+1,k !=0, k<=m v k+1 =r k /h k+1,k k=k+1, r k =Av k For i=1, k h ik =v i T r k r k =r k -h ik v i End h k+1,k= |r k | 2 End

13 Lanczos: Create V=[v 1,v 2, …,v m ] which is orthogonal Input A, r 0, Output V and T=V T AV Initial: k=0, β 0 =|r 0 | 2, v 0 =0, v 1 =r 0 /β 0 while k≤ K or β k != 0 v k+1 =r k /β k k=k+1 a k =v k T Av k r k =Av k -a k v k - β k-1 v k-1 β k =|r k | 2 End

14 Lanczos: Create V=[v 1,v 2, …,v m ] which is orthogonal Input A, r 0, Output V and T=V T AV T ii =a i, T ij =b i (j=i+1), =b j (j=i-1), =0 (else) Proof: By induction that v j T Av j = 0, if i { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/11/3171392/slides/slide_14.jpg", "name": "14 Lanczos: Create V=[v 1,v 2, …,v m ] which is orthogonal Input A, r 0, Output V and T=V T AV T ii =a i, T ij =b i (j=i+1), =b j (j=i-1), =0 (else) Proof: By induction that v j T Av j = 0, if i

15 Conjugate Gradient: Min E(x)=1/2 x T Ax-b T x, Set x=x 0 +Vy 1.v 1 =r 0 2.v i is in K{r 0, A, i} 3.Set V=[v 1,v 2, …,v m ] orthogonal in A norm, i.e. V T AV= [diag(v i T Av i )]=D 4.x=x 0 +VD -1 V T r 0, 5.x= x 0 +∑ i=1,m d i v i v i T r 0, where d i =(v i T Av i ) -1

16 Conjugate Gradient Method  Steepest Descent Repeat search direction  Why take exact one step for each direction? Search direction of Steepest descent method

17 Orthogonal  A-orthogonal  Instead of orthogonal search direction, we make search direction A –orthogonal (conjugate)

Conjugate Search Direction  How to construct A-orthogonal search directions, given a set of n linear independent vectors.  Since the residue vector in steepest descent method is orthogonal, a good candidate to start with 18

19 Conjugate Gradient: Min E(x)=1/2 x T Ax-b T x, Set x=x 0 +Vy Lanczos Method to derive V and T x=x 0 +VT -1 V T r 0 Decompose T to LDU=T (U=L T ) Thus, we have x=x 0 +V(LDL T ) -1 V T r 0 =x 0 +VL -T D -1 L -1 V T r 0

Conjugate Gradient Algorithm Given x0, iterate until residue is smaller than error tolerance 20

21 Conjugate gradient: Convergence  In exact arithmetic, CG converges in n steps (completely unrealistic!!)  Accuracy after k steps of CG is related to: consider polynomials of degree k that is equal to 1 at step 0. how small can such a polynomial be at all the eigenvalues of A?  Eigenvalues close together are good.  Condition number: κ(A) = ||A|| 2 ||A -1 || 2 = λ max (A) / λ min (A)  Residual is reduced by a constant factor by O(κ 1/2 (A)) iterations of CG.

22 Preconditioners  Suppose you had a matrix B such that: 1.condition number κ (B -1 A) is small 2.By = z is easy to solve  Then you could solve (B -1 A)x = B -1 b instead of Ax = b  B = A is great for (1), not for (2)  B = I is great for (2), not for (1)  Domain-specific approximations sometimes work  B = diagonal of A sometimes works  Better: blend in some direct-methods ideas...

23 Preconditioned conjugate gradient iteration x 0 = 0, r 0 = b, d 0 = B -1 r 0, y 0 = B -1 r 0 for k = 1, 2, 3,... α k = (y T k-1 r k-1 ) / (d T k-1 Ad k-1 ) step length x k = x k-1 + α k d k-1 approx solution r k = r k-1 – α k Ad k-1 residual y k = B -1 r k preconditioning solve β k = (y T k r k ) / (y T k-1 r k-1 ) improvement d k = y k + β k d k-1 search direction  One matrix-vector multiplication per iteration  One solve with preconditioner per iteration

24 Other Krylov subspace methods  Nonsymmetric linear systems: GMRES: for i = 1, 2, 3,... find x i  K i (A, b) such that r i = (Ax i – b)  K i (A, b) But, no short recurrence => save old vectors => lots more space (Usually “restarted” every k iterations to use less space.) BiCGStab, QMR, etc.: Two spaces K i (A, b) and K i (A T, b) w/ mutually orthogonal bases Short recurrences => O(n) space, but less robust Convergence and preconditioning more delicate than CG Active area of current research  Eigenvalues: Lanczos (symmetric), Arnoldi (nonsymmetric)

25 Formulation: Residual Min |r| 2 =|b-Ax| 2, for an arbitrary square matrix A Min R(x)=(b-Ax) T (b-Ax) Search space: x=x 0 +Vy where x 0 is an initial solution, matrix V nxm has m bases of subspace K vector y m contains the m variables.

26 Solution: Residual Min R(x)=(b-Ax) T (b-Ax) Search space: x=x 0 +Vy 1.V T A T AV is nonsingular if A is nonsingular and V is full ranked. 2.y=(V T A T AV) -1 V T A T r 0 3.x=x 0 +V(V T A T AV) -1 V T A T r 0 4.V T A T r= 0 5.R(x)=R(x 0 )-r 0 T AV(V T A T AV) -1 V T A T r 0

27 Steepest Descent: Residual Min R(x)=(b-Ax) T (b-Ax) Gradient: -2A T (b-Ax)=-2A T r Let x=x 0 +yA T r 0 1.V T A T AV is nonsingular if A is nonsingular where V=A T r 0. 2.y=(V T A T AV) -1 V T A T r 0 3.x=x 0 +V(V T A T AV) -1 V T A T r 0 4.V T A T r= 0 5.R(x)=R(x 0 )-r 0 T AV(V T A T AV) -1 V T A T r 0

28 GMRES: Residual Min R(x)=(b-Ax) T (b-Ax) Gradient: -2A T (b-Ax)=-2A T r Let x=x 0 +yA T r 0 1.v 1 =r 0 2.v i is in K{r 0, A, i} 3.V=[v 1,v 2, …,v m ] is orthogonal H 4.AV=VH m +v m+1 e m T =V m+1 H m 5.x=x 0 +V(V T A T AV) -1 V T A T r 0 HHH 6.=x 0 +V(H m T H m ) -1 H m T e 1 |r 0 | 2

29 Conjugate Residual: Residual Min R(x)=(b-Ax) T (b-Ax) Gradient: -2A T (b-Ax)=-2A T r Let x=x 0 +yA T r 0 1.v 1 =r 0 2.v i is in K{r 0, A, i} 3.(AV) T AV= D Diagonal Matrix 4.x=x 0 +V(V T A T AV) -1 V T A T r 0 5.=x 0 +VD -1 V T A T r 0

Outline  Iterative Method Stationary Iterative Method (SOR, GS,Jacob) Krylov Method (CG, GMRES) Multigrid Method 30

What is the multigrid  A multilevel iterative method to solve Ax=b  Originated in PDEs on geometric grids  Expend the multigrid idea to unstructured problem – Algebraic MG  Geometric multigrid for presenting the basic ideas of the multigrid method. 31

The model problem + v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v8v8 vsvs Ax = b 32

Simple iterative method  x (0) -> x (1) -> … -> x (k)  Jacobi iteration  Matrix form : x (k) = R j x (k-1) + C j  General form: x (k) = Rx (k-1) + C (1)  Stationary: x* = Rx* + C (2) 33

Error and Convergence Definition: error e = x* - x (3) residual r = b – Ax (4) e, r relation: Ae = r (5) ((3)+(4)) e (1) = x*-x (1) = Rx* + C – Rx (0) – C =Re (0) Error equation e (k) = R k e (0) (6) ((1)+(2)+(3)) Convergence: 34

Error of diffenent frequency  Wavenumber k and frequency  = k/n High frequency error is more oscillatory between points k= 1 k= 4 k= 2 35

Iteration reduce low frequency error efficiently  Smoothing iteration reduce high frequency error efficiently, but not low frequency error Error Iterations k = 1 k = 2 k = 4 36

Multigrid – a first glance  Two levels : coarse and fine grid 1 2 3 4 56 7 8 1 2 3 4 A 2h x 2h =b 2h A h x h =b h hh Ax=b  2h 37

Idea 1: the V-cycle iteration  Also called the nested iteration A 2h x 2h = b 2h hh  2h A h x h = b h Iterate => Start with Prolongation:  Restriction:  Iterate to get Question 1: Why we need the coarse grid ? 38

Prolongation  Prolongation (interpolation) operator x h = x 2h 1 2 3 4 56 7 8 1 2 3 4 39

Restriction  Restriction operator x h = x 2h 1 2 3 4 56 7 8 1 2 3 4 40

Smoothing  The basic iterations in each level In  ph : x ph old  x ph new  Iteration reduces the error, makes the error smooth geometrically. So the iteration is called smoothing. 41

Why multilevel ?  Coarse lever iteration is cheap.  More than this… Coarse level smoothing reduces the error more efficiently than fine level in some way. Why ? ( Question 2 ) 42

Error restriction  Map error to coarse grid will make the error more oscillatory K = 4,  =  /2 K = 4,  =  43

Idea 2: Residual correction  Known current solution x  Solve Ax=b eq. to  MG do NOT map x directly between levels Map residual equation to coarse level 1.Calculate r h 2.b 2h= I h 2h r h ( Restriction ) 3.e h = I h 2h x 2h ( Prolongation ) 4.x h = x h + e h 44

Why residual correction ?  Error is smooth at fine level, but the actual solution may not be.  Prolongation results in a smooth error in fine level, which is suppose to be a good evaluation of the fine level error.  If the solution is not smooth in fine level, prolongation will introduce more high frequency error. 45

` Revised V-cycle with idea 2  Smoothing on x h  Calculate r h  b 2h= I h 2h r h  Smoothing on x 2h  e h = I h 2h x 2h  Correct: x h = x h + e h Restriction Prolongation  2h  h 46

What is A 2h  Galerkin condition 47

Going to multilevels  V-cycle and W-cycle  Full Multigrid V-cycle h 2h 4h h 2h 4h 8h 48

Performance of Multigrid  Complexity comparison Gaussian eliminationO(N 2 ) Jacobi iteration O(N 2 log) Gauss-Seidel O(N 2 log) SOR O(N 3/2 log) Conjugate gradient O(N 3/2 log) Multigrid ( iterative ) O(Nlog) Multigrid ( FMG )O(N) 49

Summary of MG ideas Important ideas of MG 1.Hierarchical iteration 2.Residual correction 3.Galerkin condition 4.Smoothing the error: high frequency : fine grid low frequency : coarse grid 50

AMG :for unstructured grids  Ax=b, no regular grid structure  Fine grid defined from A 1 2 3 4 5 6 51

Three questions for AMG  How to choose coarse grid  How to define the smoothness of errors  How are interpolation and prolongation done 52

How to choose coarse grid  Idea: C/F splitting As few coarse grid point as possible For each F-node, at least one of its neighbor is a C-node Choose node with strong coupling to other nodes as C-node 1 2 3 4 5 6 53

How to define the smoothness of error  AMG fundamental concept: Smooth error = small residuals  ||r|| << ||e|| 54

How are Prolongation and Restriction done  Prolongation is based on smooth error and strong connections  Common practice: I 55

AMG Prolongation (2) 56

AMG Prolongation (3)  Restriction : 57

Summary  Multigrid is a multilevel iterative method.  Advantage: scalable  If no geometrical grid is available, try Algebraic multigrid method 58

59 Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust More General The landscape of Solvers

60 References  G.H. Golub and C.F. Van Loan, Matrix Computataions, 4th Edition, Johns Hopkins, 2013  Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition, SIAM, 2003.

Similar presentations