CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.

Slides:



Advertisements
Similar presentations
Load Balancing Parallel Applications on Heterogeneous Platforms.
Advertisements

Fill Reduction Algorithm Using Diagonal Markowitz Scheme with Local Symmetrization Patrick Amestoy ENSEEIHT-IRIT, France Xiaoye S. Li Esmond Ng Lawrence.
ECE 552 Numerical Circuit Analysis Chapter Four SPARSE MATRIX SOLUTION TECHNIQUES Copyright © I. Hajj 2012 All rights reserved.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Sparse Matrices in Matlab John R. Gilbert Xerox Palo Alto Research Center with Cleve Moler (MathWorks) and Rob Schreiber (HP Labs)
SOLVING SYSTEMS OF LINEAR EQUATIONS. Overview A matrix consists of a rectangular array of elements represented by a single symbol (example: [A]). An individual.
Numerical Algorithms Matrix multiplication
Solution of linear system of equations
Symmetric Minimum Priority Ordering for Sparse Unsymmetric Factorization Patrick Amestoy ENSEEIHT-IRIT (Toulouse) Sherry Li LBNL/NERSC (Berkeley) Esmond.
1cs542g-term Notes  Assignment 1 is out (due October 5)  Matrix storage: usually column-major.
1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.
Analysis of Triangular Factorization Lecture #8 EEE 574 Dr. Dan Tylavsky.
1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,
CS 290H: Sparse Matrix Algorithms
Symmetric Weighted Matching for Indefinite Systems Iain Duff, RAL and CERFACS John Gilbert, MIT and UC Santa Barbara June 21, 2002.
ECIV 301 Programming & Graphics Numerical Methods for Engineers Lecture 17 Solution of Systems of Equations.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS 240A: Solving Ax = b in parallel °Dense A: Gaussian elimination with partial pivoting Same flavor as matrix * matrix, but more complicated °Sparse A:
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods Nonsymmetric systems Graph theoretic tools Sparse LU with partial pivoting Supernodal factorization.
1/26 Design of parallel algorithms Linear equations Jari Porras.
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
ECIV 520 Structural Analysis II Review of Matrix Algebra.
CS240A: Conjugate Gradients and the Model Problem.
Mujahed AlDhaifallah (Term 342) Read Chapter 9 of the textbook
The Evolution of a Sparse Partial Pivoting Algorithm John R. Gilbert with: Tim Davis, Jim Demmel, Stan Eisenstat, Laura Grigori, Stefan Larimore, Sherry.
Sparse Direct Solvers on High Performance Computers X. Sherry Li CS267: Applications of Parallel Computers March.
CS 290H Lecture 17 Dulmage-Mendelsohn Theory
CS 290H Lecture 12 Column intersection graphs, Ordering for sparsity in LU with partial pivoting Read “Computing the block triangular form of a sparse.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
Complexity of direct methods n 1/2 n 1/3 2D3D Space (fill): O(n log n)O(n 4/3 ) Time (flops): O(n 3/2 )O(n 2 ) Time and space to solve any problem on any.
1 Iterative Solution Methods Starts with an initial approximation for the solution vector (x 0 ) At each iteration updates the x vector by using the sytem.
Introduction to Numerical Analysis I MATH/CMPSC 455 PA=LU.
Symbolic sparse Gaussian elimination: A = LU
Lecture 5 Parallel Sparse Factorization, Triangular Solution
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Amesos Sparse Direct Solver Package Ken Stanley, Rob Hoekstra, Marzio Sala, Tim Davis, Mike Heroux Trilinos Users Group Albuquerque 3 Nov 2004.
CS 290H Lecture 5 Elimination trees Read GLN section 6.6 (next time I’ll assign 6.5 and 6.7) Homework 1 due Thursday 14 Oct by 3pm turnin file1.
 6.2 Pivoting Strategies 1/17 Chapter 6 Direct Methods for Solving Linear Systems -- Pivoting Strategies Example: Solve the linear system using 4-digit.
JAVA AND MATRIX COMPUTATION
Solution of Sparse Linear Systems
Lecture 4 Sparse Factorization: Data-flow Organization
CS240A: Conjugate Gradients and the Model Problem.
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Administrivia: October 5, 2009 Homework 1 due Wednesday Reading in Davis: Skim section 6.1 (the fill bounds will make more sense next week) Read section.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
CS 290H 31 October and 2 November Support graph preconditioners Final projects: Read and present two related papers on a topic not covered in class Or,
CS 290H Lecture 9 Left-looking LU with partial pivoting Read “A supernodal approach to sparse partial pivoting” (course reader #4), sections 1 through.
Symmetric-pattern multifrontal factorization T(A) G(A)
Unit #1 Linear Systems Fall Dr. Jehad Al Dallal.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
Spring Dr. Jehad Al Dallal
CS 290N / 219: Sparse Matrix Algorithms
Solving Linear Systems Ax=b
Linear Equations.
CS 290H Administrivia: April 16, 2008
Lectures on Network Flows
The Landscape of Sparse Ax=b Solvers
Problem Solving 4.
CS 290H Lecture 3 Fill: bounds and heuristics
Numerical Analysis Lecture10.
Dense Linear Algebra (Data Distributions)
Read GLN sections 6.1 through 6.4.
Nonsymmetric Gaussian elimination
Presentation transcript:

CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion me with your preferred day – first come first served Course evaluations at end of class today

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization

SuperLU-dist: Distributed static data structure Process (or) mesh L U Block cyclic matrix layout

GESP: Gaussian elimination with static pivoting PA = LU Sparse, nonsymmetric A P is chosen numerically in advance, not by partial pivoting! After choosing P, can permute PA symmetrically for sparsity: Q(PA)Q T = LU = x P

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Row permutation for heavy diagonal Row permutation for heavy diagonal [Duff, Koster] Represent A as a weighted, undirected bipartite graph (one node for each row and one node for each column) Find matching (set of independent edges) with maximum product of weights Permute rows to place matching on diagonal Matching algorithm also gives a row and column scaling to make all diag elts =1 and all off-diag elts <= A PA

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Iterative refinement to improve solution Iterate: r = b – A*x backerr = max i ( r i / (|A|*|x| + |b|) i ) if backerr lasterr/2 then stop iterating solve L*U*dx = r x = x + dx lasterr = backerr repeat Usually 0 – 3 steps are enough

Convergence analysis of iterative refinement Let C = I – A(LU) -1 [ so A = (I – C)·(LU) ] x 1 = (LU) -1 b r 1 = b – Ax 1 = (I – A(LU) -1 )b = Cb dx 1 = (LU) -1 r 1 = (LU) -1 Cb x 2 = x 1 +dx 1 = (LU) -1 (I + C)b r 2 = b – Ax 2 = (I – (I – C)·(I + C))b = C 2 b... In general, r k = b – Ax k = C k b Thus r k  0 if |largest eigenvalue of C| < 1.

SuperLU-dist: GE with static pivoting SuperLU-dist: GE with static pivoting [Li, Demmel] Target: Distributed-memory multiprocessors Goal: No pivoting during numeric factorization 1.Permute A unsymmetrically to have large elements on the diagonal (using weighted bipartite matching) 2.Scale rows and columns to equilibrate 3.Permute A symmetrically for sparsity 4.Factor A = LU with no pivoting, fixing up small pivots: if |a ii | < ε · ||A|| then replace a ii by  ε 1/2 · ||A|| 5.Solve for x using the triangular factors: Ly = b, Ux = y 6.Improve solution by iterative refinement

Directed graph A is square, unsymmetric, nonzero diagonal Edges from rows to columns Symmetric permutations PAP T AG(A)

Undirected graph, ignoring edge directions Overestimates the nonzero structure of A Sparse GESP can use symmetric permutations (min degree, nested dissection) of this graph A+A T G(A+A T )

Symbolic factorization of undirected graph Overestimates the nonzero structure of L+U chol(A +A T )G + (A+A T )

+ Symbolic factorization of directed graph Add fill edge a -> b if there is a path from a to b through lower-numbered vertices. Sparser than G + (A+A T ) in general. But what’s a good ordering for G + (A)? AG (A) L+U

Question: Preordering for GESP Use directed graph model, less well understood than symmetric factorization Symmetric: bottom-up, top-down, hybrids Nonsymmetric: mostly bottom-up Symmetric: best ordering is NP-complete, but approximation theory is based on graph partitioning (separators) Nonsymmetric: no approximation theory is known; partitioning is not the whole story Good approximations and efficient algorithms both remain to be discovered

Remarks on nonsymmetric GE Multifrontal tends to be faster but use more memory Unsymmetric-pattern multifrontal Lots more complicated, not simple elimination tree Sequential and SMP versions in UMFpack and WSMP (see web links) Distributed-memory unsymmetric-pattern multifrontal is a research topic Combinatorial preliminaries are important: ordering, etree, symbolic factorization, matching, scheduling not well understood in many ways also, mostly not done in parallel Not mentioned: symmetric indefinite problems Direct-methods technology is also used in preconditioners for iterative methods