Presentation is loading. Please wait.

Presentation is loading. Please wait.

Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)

Similar presentations


Presentation on theme: "Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)"— Presentation transcript:

1 Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)

2 Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent 9 1 2 3 4 6 7 8 5 G(A)

3 Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 137 1 3 7 37 3 7 F 1 = A 1 => U 1 For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent 9 1 2 3 4 6 7 8 5 G(A)

4 Symmetric-pattern multifrontal factorization 239 2 3 9 39 3 9 F 2 = A 2 => U 2 137 1 3 7 37 3 7 F 1 = A 1 => U 1 For each node of T from leaves to root: Sum own row/col of A with children’s Update matrices into Frontal matrix Eliminate current variable from Frontal matrix, to get Update matrix Pass Update matrix to parent T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)

5 Symmetric-pattern multifrontal factorization T(A) 239 2 3 9 39 3 9 F 2 = A 2 => U 2 137 1 3 7 37 3 7 F 1 = A 1 => U 1 3789 3 7 8 9 789 7 8 9 F 3 = A 3 +U 1 +U 2 => U 3 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)

6 Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G + (A)

7 Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 1 2 3 4 6 7 8 9 5 G(A) Really uses supernodes, not nodes All arithmetic happens on dense square matrices. Needs extra memory for a stack of pending update matrices Potential parallelism: 1.between independent tree branches 2.parallel dense ops on frontal matrix

8 MUMPS: distributed-memory multifrontal MUMPS: distributed-memory multifrontal [Amestoy, Duff, L’Excellent, Koster, Tuma] Symmetric-pattern multifrontal factorization Parallelism both from tree and by sharing dense ops Dynamic scheduling of dense op sharing Symmetric preordering For nonsymmetric matrices: optional weighted matching for heavy diagonal expand nonzero pattern to be symmetric numerical pivoting only within supernodes if possible (doesn’t change pattern) failed pivots are passed up the tree in the update matrix

9 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization

10 SuperLU-dist: Distributed static data structure Process (or) mesh 0 12 3 4 5 L 0 0 1 2 34 5 0 1 2 3 4 5 0 1 2 34 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 0 1 2 3 4 5 0 1 2 0 3 0 3 0 3 U Block cyclic matrix layout

11 GESP: Gaussian elimination with static pivoting PA = LU Sparse, nonsymmetric A P is chosen numerically in advance, not by partial pivoting! After choosing P, can permute PA symmetrically for sparsity: Q(PA)Q T = LU = x P

12 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

13 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

14 Row permutation for heavy diagonal Row permutation for heavy diagonal [Duff, Koster] Represent A as a weighted, undirected bipartite graph (one node for each row and one node for each column) Find matching (set of independent edges) with maximum product of weights Permute rows to place matching on diagonal Matching algorithm also gives a row and column scaling to make all diag elts =1 and all off-diag elts <=1 15234 1 5 2 3 4 A 1 5 2 3 4 1 5 2 3 4 15234 4 2 5 3 1 PA

15 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

16 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

17 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

18 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

19 Iterative refinement to improve solution Iterate: r = b – A*x backerr = max i ( r i / (|A|*|x| + |b|) i ) if backerr lasterr/2 then stop iterating solve L*U*dx = r x = x + dx lasterr = backerr repeat Usually 0 – 3 steps are enough

20 Convergence analysis of iterative refinement Let C = I – A(LU) -1 [ so A = (I – C)·(LU) ] x 1 = (LU) -1 b r 1 = b – Ax 1 = (I – A(LU) -1 )b = Cb dx 1 = (LU) -1 r 1 = (LU) -1 Cb x 2 = x 1 +dx 1 = (LU) -1 (I + C)b r 2 = b – Ax 2 = (I – (I – C)·(I + C))b = C 2 b... In general, r k = b – Ax k = C k b Thus r k  0 if |largest eigenvalue of C| < 1.

21 SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

22 Directed graph A is square, unsymmetric, nonzero diagonal Edges from rows to columns Symmetric permutations PAP T 1 2 3 4 7 6 5 AG(A)

23 Undirected graph, ignoring edge directions Overestimates the nonzero structure of A Sparse GESP can use symmetric permutations (min degree, nested dissection) of this graph 1 2 3 4 7 6 5 A+A T G(A+A T )

24 Symbolic factorization of undirected graph Overestimates the nonzero structure of L+U chol(A +A T )G + (A+A T ) 1 2 3 4 7 6 5

25 + Symbolic factorization of directed graph Add fill edge a -> b if there is a path from a to b through lower-numbered vertices. Sparser than G + (A+A T ) in general. But what’s a good ordering for G + (A)? 1 2 3 4 7 6 5 AG (A) L+U

26 Question: Preordering for GESP Use directed graph model, less well understood than symmetric factorization Symmetric: bottom-up, top-down, hybrids Nonsymmetric: mostly bottom-up Symmetric: best ordering is NP-complete, but approximation theory is based on graph partitioning (separators) Nonsymmetric: no approximation theory is known; partitioning is not the whole story Good approximations and efficient algorithms both remain to be discovered


Download ppt "Symmetric-pattern multifrontal factorization T(A) 1 2 3 4 6 7 8 9 5 9 1 2 3 4 6 7 8 5 G(A)"

Similar presentations


Ads by Google