1cs542g-term1-2007 Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.

Slides:



Advertisements
Similar presentations
Fill Reduction Algorithm Using Diagonal Markowitz Scheme with Local Symmetrization Patrick Amestoy ENSEEIHT-IRIT, France Xiaoye S. Li Esmond Ng Lawrence.
Advertisements

Using Sparse Matrix Reordering Algorithms for Cluster Identification Chris Mueller Dec 9, 2004.
Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
Pratik Agarwal and Edwin Olson University of Freiburg and University of Michigan Variable reordering strategies for SLAM.
Chapter 8, Part I Graph Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 excerpts Graphs (breadth-first-search)
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
Sparse Matrices in Matlab John R. Gilbert Xerox Palo Alto Research Center with Cleve Moler (MathWorks) and Rob Schreiber (HP Labs)
Symmetric Minimum Priority Ordering for Sparse Unsymmetric Factorization Patrick Amestoy ENSEEIHT-IRIT (Toulouse) Sherry Li LBNL/NERSC (Berkeley) Esmond.
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes. 2 Meshing goals  Robust: doesn’t fail on reasonable geometry  Efficient: as few triangles as possible Easy to refine later.
1cs542g-term Notes  Assignment 1 is out (questions?)
1cs542g-term Notes  Assignment 1 is out (due October 5)  Matrix storage: usually column-major.
1 Representing Graphs. 2 Adjacency Matrix Suppose we have a graph G with n nodes. The adjacency matrix is the n x n matrix A=[a ij ] with: a ij = 1 if.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
1cs542g-term Notes  Note that r 2 log(r) is NaN at r=0: instead smoothly extend to be 0 at r=0  Schedule a make-up lecture?
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
Sparse Matrix Methods Day 1: Overview Day 2: Direct methods Nonsymmetric systems Graph theoretic tools Sparse LU with partial pivoting Supernodal factorization.
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
The Evolution of a Sparse Partial Pivoting Algorithm John R. Gilbert with: Tim Davis, Jim Demmel, Stan Eisenstat, Laura Grigori, Stefan Larimore, Sherry.
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Pointers (Continuation) 1. Data Pointer A pointer is a programming language data type whose value refers directly to ("points to") another value stored.
CS 290H Lecture 12 Column intersection graphs, Ordering for sparsity in LU with partial pivoting Read “Computing the block triangular form of a sparse.
Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.
Scalabilities Issues in Sparse Factorization and Triangular Solution Sherry Li Lawrence Berkeley National Laboratory Sparse Days, CERFACS, June 23-24,
Graph Partitioning Donald Nguyen October 24, 2011.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
CS 290H Lecture 5 Elimination trees Read GLN section 6.6 (next time I’ll assign 6.5 and 6.7) Homework 1 due Thursday 14 Oct by 3pm turnin file1.
Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of Software Chinese Academy of Sciences
Efficiency and Flexibility of Jagged Arrays Geir Gundersen Department of Informatics University of Bergen Norway Joint work with Trond Steihaug.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 9, 2014.
Week 11 - Monday.  What did we talk about last time?  Binomial theorem and Pascal's triangle  Conditional probability  Bayes’ theorem.
JAVA AND MATRIX COMPUTATION
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Solution of Sparse Linear Systems
Lecture 4 Sparse Factorization: Data-flow Organization
Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
Department of Electronic Engineering, Tsinghua University Nano-scale Integrated Circuit and System Lab. Performance Analysis of Parallel Sparse LU Factorization.
Administrivia: October 5, 2009 Homework 1 due Wednesday Reading in Davis: Skim section 6.1 (the fill bounds will make more sense next week) Read section.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
What Dynamic Programming (DP) is a fundamental problem solving technique that has been widely used for solving a broad range of search and optimization.
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
Large Scale Parallel Graph Coloring 1. Presentation Overview Problem Description Basic Algorithm Parallel Strategy –Work Spawning –Graph Partition Results.
CS 290H Lecture 9 Left-looking LU with partial pivoting Read “A supernodal approach to sparse partial pivoting” (course reader #4), sections 1 through.
Symmetric-pattern multifrontal factorization T(A) G(A)
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
Graphs Representation, BFS, DFS
Direct Methods for Sparse Linear Systems
CS 290H Administrivia: April 16, 2008
Graphs Representation, BFS, DFS
Unit-2 Divide and Conquer
CS 290H Lecture 3 Fill: bounds and heuristics
A Graph Partitioning Algorithm by Node Separators
Read GLN sections 6.1 through 6.4.
Presentation transcript:

1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format CSR is better for matrix-vector multiplies; CSC can be better for factorization  CSR: Array of all values, row by row Array of all column indices, row by row Array of pointers to start of each row

2cs542g-term Direct Solvers  We’ll just peek at Cholesky factorization of SPD matrices: A=LL T In particular, pivoting not required!  Modern solvers break Cholesky into three phases: Ordering: determine order of rows/columns Symbolic factorization: determine sparsity structure of L in advance Numerical factorization: compute values in L  Allows for much greater optimization…

3cs542g-term Graph model of elimination  Take the graph whose adjacency matrix matches A  Choosing node “i” to eliminate next in row- reduction: Subtract off multiples of row i from rows of neighbours In graph terms: unioning edge structure of i with all its neighbours A is symmetric -> connecting up all neighbours of i into a “clique”  New edges are called “fill” (nonzeros in L that are zero in A)  Choosing a different sequence can result in different fill

4cs542g-term Extreme fill  The star graph  If you order centre last, zero fill: O(n) time and memory  If you order centre first, O(n 2 ) fill: O(n 3 ) time and O(n 2 ) memory

5cs542g-term Fill-reducing orderings  Finding minimum fill ordering is NP-hard  Two main heuristics in use: Minimum Degree: (greedy incremental) choose node of minimum degree first  Without many additional accelerations, this is too slow, but now very efficient: e.g. AMD Nested Dissection: (divide-and-conquer) partition graph by a node separator, order separator last, recurse on components  Optimal partition is also NP-hard, but very good/fast heuristic exist: e.g. Metis  Great for parallelism: e.g. ParMetis

6cs542g-term A peek at Minimum Degree  See George & Liu, “The evolution of the minimum degree algorithm” A little dated now, but most of key concepts explained there  Biggest optimization: don’t store structure explicitly Treat eliminated nodes as “quotient nodes” Edge in L = path in A via zero or more eliminated nodes

7cs542g-term A peek at Nested Dissection  Core operation is graph partitioning  Simplest strategy: breadth-first search  Can locally improve with Kernighan-Lin  Can make this work fast by going multilevel

8cs542g-term Theoretical Limits  In 2D (planar or near planar graphs), Nested Dissection is within a constant factor of optimal: O(n log n) fill (n=number of nodes - think s 2 ) O(n 3/2 ) time for factorization Result due to Lipton & Tarjan…  In 3D asymptotics for well-shaped 3D meshes is worse: O(n 5/3 ) fill (n=number of nodes - think s 3 ) O(n 2 ) time for factorization  Direct solvers are very competitive in 2D, but don’t scale nearly as well in 3D

9cs542g-term Symbolic Factorization  Given ordering, determining L is also just a graph problem  Various optimizations allow determination of row or column counts of L in nearly O(nnz(A)) time Much faster than actual factorization!  One of the most important observations: good orderings usually results in supernodes: columns of L with identical structure  Can treat these columns as a single block column

10cs542g-term Numerical Factorization  Can compute L column by column with left-looking factorization  In particular, compute a supernode (block column) at a time Can use BLAS level 3 for most of the numerics Get huge performance boost, near “optimal”

11cs542g-term Software  See Tim Davis’s list at  Ordering: AMD and Metis becoming standard  Cholesky: PARDISO, CHOLMOD, …  General: PARDISO, UMFPACK, …