Model Problem: Solving Poisson’s equation for temperature

Slides:

Advertisements

Similar presentations

CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods (II) Chung-Kuan Cheng.

Advertisements

CSE 245: Computer Aided Circuit Simulation and Verification

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.

CS 290H 7 November Introduction to multigrid methods

Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

1 Parallel Algorithms II Topics: matrix and graph algorithms.

Solving Linear Systems (Numerical Recipes, Chap 2)

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Computer Graphics Recitation 5.

Sparse Matrix Algorithms CS 524 – High-Performance Computing.

CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.

Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.

CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)

Sparse Matrix Methods Day 1: Overview Day 2: Direct methods

The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.

Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

CS240A: Conjugate Gradients and the Model Problem.

CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

CS 240A: Complexity Measures for Parallel Computation.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.

Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.

Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

CS 219: Sparse Matrix Algorithms

Administrivia: May 20, 2013 Course project progress reports due Wednesday. Reading in Multigrid Tutorial: Chapters 3-4: Multigrid cycles and implementation.

1 Incorporating Iterative Refinement with Sparse Cholesky April 2007 Doron Pearl.

1 Marijn Bartel Schreuders Supervisor: Dr. Ir. M.B. Van Gijzen Date:Monday, 24 February 2014.

Parallel Solution of the Poisson Problem Using MPI

CS240A: Conjugate Gradients and the Model Problem.

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Minimizing Communication in Numerical Linear Algebra Optimizing Krylov Subspace Methods Jim Demmel EECS & Math Departments,

Administrivia: October 5, 2009 Homework 1 due Wednesday Reading in Davis: Skim section 6.1 (the fill bounds will make more sense next week) Read section.

CS 484. Iterative Methods n Gaussian elimination is considered to be a direct method to solve a system. n An indirect method produces a sequence of values.

CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections

Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.

Krylov-Subspace Methods - I Lecture 6 Alessandra Nardi Thanks to Prof. Jacob White, Deepak Ramaswamy, Michal Rewienski, and Karen Veroy.

Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.

Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.

The Landscape of Sparse Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage More Robust More.

Numerical Algorithms Chapter 11.

Krylov-Subspace Methods - I

CS 290N / 219: Sparse Matrix Algorithms

Solving Linear Systems Ax=b

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

CS 290H Administrivia: April 16, 2008

Introduction to parallel algorithms

Lecture 19 MA471 Fall 2003.

Computation on meshes, sparse matrices, and graphs

Systems of First Order Linear Equations

Parallel Matrix Multiplication and other Full Matrix Algorithms

CSE 245: Computer Aided Circuit Simulation and Verification

Computational meshes, matrices, conjugate gradients, and mesh partitioning Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy.

CSE 245: Computer Aided Circuit Simulation and Verification

Complexity Measures for Parallel Computation

CSE 245: Computer Aided Circuit Simulation and Verification

CSCE569 Parallel Computing

Supported by the National Science Foundation.

Parallel Matrix Multiplication and other Full Matrix Algorithms

Conjugate Gradient Method

Introduction to parallel algorithms

CS5321 Numerical Optimization

Redundant Ghost Nodes in Jacobi

Complexity Measures for Parallel Computation

Administrivia: November 9, 2009

Introduction to parallel algorithms

Presentation transcript:

Model Problem: Solving Poisson’s equation for temperature k = n1/3 For each i from 1 to n, except on the boundaries: – x(i-k2) – x(i-k) – x(i-1) + 6*x(i) – x(i+1) – x(i+k) – x(i+k2) = 0 n equations in n unknowns: A*x = b Each row of A has at most 7 nonzeros In two dimensions, k = n1/2 and each row has at most 5 nzs

The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay More Robust More General Non- symmetric Symmetric positive definite More Robust Less Storage (if sparse)

Complexity of linear solvers Time to solve model problem (Poisson’s equation) on regular mesh n1/2 2D 3D Sparse Cholesky: O(n1.5 ) O(n2 ) CG, exact arithmetic: CG, no precond: O(n1.33 ) CG, modified IC: O(n1.25 ) O(n1.17 ) CG, support trees: O(n1.20 ) -> O(n1+ ) O(n1.75 ) -> O(n1+ ) Multigrid: O(n)

CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) See Jim Demmel’s slides Same flavor as matrix * matrix, but more complicated Sparse A: Iterative methods – Conjugate gradient, etc. Sparse matrix times dense vector Sparse A: Gaussian elimination – Cholesky, LU, etc. Graph algorithms Sparse A: Preconditioned iterative methods and multigrid Mixture of lots of things

CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting See Jim Demmel’s slides Same flavor as matrix * matrix, but more complicated Sparse A: Iterative methods – Conjugate gradient etc. Sparse matrix times dense vector Sparse A: Gaussian elimination – Cholesky, LU, etc. Graph algorithms Sparse A: Preconditioned iterative methods and multigrid Mixture of lots of things

Conjugate gradient iteration for Ax = b x0 = 0 approx solution r0 = b residual = b - Ax d0 = r0 search direction for k = 1, 2, 3, . . . xk = xk-1 + … new approx solution rk = … new residual dk = … new search direction

Conjugate gradient iteration for Ax = b x0 = 0 approx solution r0 = b residual = b - Ax d0 = r0 search direction for k = 1, 2, 3, . . . αk = … step length xk = xk-1 + αk dk-1 new approx solution rk = … new residual dk = … new search direction

Conjugate gradient iteration for Ax = b x0 = 0 approx solution r0 = b residual = b - Ax d0 = r0 search direction for k = 1, 2, 3, . . . αk = (rTk-1rk-1) / (dTk-1Adk-1) step length xk = xk-1 + αk dk-1 new approx solution rk = … new residual dk = … new search direction

Conjugate gradient iteration for Ax = b x0 = 0 approx solution r0 = b residual = b - Ax d0 = r0 search direction for k = 1, 2, 3, . . . αk = (rTk-1rk-1) / (dTk-1Adk-1) step length xk = xk-1 + αk dk-1 new approx solution rk = … new residual βk = (rTk rk) / (rTk-1rk-1) dk = rk + βk dk-1 new search direction

Conjugate gradient iteration for Ax = b x0 = 0 approx solution r0 = b residual = b - Ax d0 = r0 search direction for k = 1, 2, 3, . . . αk = (rTk-1rk-1) / (dTk-1Adk-1) step length xk = xk-1 + αk dk-1 new approx solution rk = rk-1 – αk Adk-1 new residual βk = (rTk rk) / (rTk-1rk-1) dk = rk + βk dk-1 new search direction

Conjugate gradient iteration x0 = 0, r0 = b, d0 = r0 for k = 1, 2, 3, . . . αk = (rTk-1rk-1) / (dTk-1Adk-1) step length xk = xk-1 + αk dk-1 approx solution rk = rk-1 – αk Adk-1 residual βk = (rTk rk) / (rTk-1rk-1) improvement dk = rk + βk dk-1 search direction One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage

Conjugate gradient: Krylov subspaces Eigenvalues: Av = λv { λ1, λ2 , . . ., λn} Cayley-Hamilton theorem: (A – λ1I)·(A – λ2I) · · · (A – λnI) = 0 Therefore Σ ciAi = 0 for some ci so A-1 = Σ (–ci/c0) Ai–1 Krylov subspace: Therefore if Ax = b, then x = A-1 b and x  span (b, Ab, A2b, . . ., An-1b) = Kn (A, b) 0  i  n 1  i  n

Conjugate gradient: Orthogonal sequences Krylov subspace: Ki (A, b) = span (b, Ab, A2b, . . ., Ai-1b) Conjugate gradient algorithm: for i = 1, 2, 3, . . . find xi  Ki (A, b) such that ri = (b – Axi)  Ki (A, b) Notice ri  Ki+1 (A, b), so ri  rj for all j < i Similarly, the “directions” are A-orthogonal: (xi – xi-1 )T·A· (xj – xj-1 ) = 0 The magic: Short recurrences. . . A is symmetric => can get next residual and direction from the previous one, without saving them all.

Conjugate gradient: Convergence In exact arithmetic, CG converges in n steps (completely unrealistic!!) Accuracy after k steps of CG is related to: consider polynomials of degree k that are equal to 1 at 0. how small can such a polynomial be at all the eigenvalues of A? Thus, eigenvalues close together are good. Condition number: κ(A) = ||A||2 ||A-1||2 = λmax(A) / λmin(A) Residual is reduced by a constant factor by O(κ1/2(A)) iterations of CG.

Other Krylov subspace methods Nonsymmetric linear systems: GMRES: for i = 1, 2, 3, . . . find xi  Ki (A, b) such that ri = (Axi – b)  Ki (A, b) But, no short recurrence => save old vectors => lots more space (Usually “restarted” every k iterations to use less space.) BiCGStab, QMR, etc.: Two spaces Ki (A, b) and Ki (AT, b) w/ mutually orthogonal bases Short recurrences => O(n) space, but less robust Convergence and preconditioning more delicate than CG Active area of current research Eigenvalues: Lanczos (symmetric), Arnoldi (nonsymmetric)

to complexity slides…

Conjugate gradient iteration x0 = 0, r0 = b, d0 = r0 for k = 1, 2, 3, . . . αk = (rTk-1rk-1) / (dTk-1Adk-1) step length xk = xk-1 + αk dk-1 approx solution rk = rk-1 – αk Adk-1 residual βk = (rTk rk) / (rTk-1rk-1) improvement dk = rk + βk dk-1 search direction One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage

Conjugate gradient primitives DAXPY: v = v + a*w (vectors v, w; scalar a) Almost embarrassingly parallel DDOT: a = vT*w (vectors v, w; scalar a) Global sum reduction; span = log n Matvec: v = A*w (matrix A, vectors v, w) The hard part But all you need is a subroutine to compute v from w Sometimes (e.g. the model problem) you don’t even need to store A

Model Problem: Solving Poisson’s equation k = n1/3 For each i from 1 to n, except on the boundaries: – x(i-k2) – x(i-k) – x(i-1) + 6*x(i) – x(i+1) – x(i+k) – x(i+k2) = 0 n equations in n unknowns: A*x = b Each row of A has at most 7 nonzeros In two dimensions, k = n1/2 and each row has at most 5 nzs

Stencil computations Data lives at the vertices of a regular mesh Each step, new values are computed from neighbors Examples: Game of Life (9-point stencil) Matvec in 2D model problem (5-point stencil) Matvec in 3D model problem (7-point stencil)

Parallelism in Stencil Computations Parallelism is straightforward Mesh is regular data structure Even decomposition across processors gives load balance Locality is achieved by using large patches of the mesh boundary values from neighboring patches are needed

Where’s the data? 1-D vs 2-D layout: Look at total communication volume

Complexity measures for parallel computation Problem parameters: n index of problem size p number of processors Algorithm parameters: tp running time on p processors t1 time on 1 processor = sequential time = “work” t∞ time on unlimited procs = critical path length = “span” v total communication volume Performance measures speedup s = t1 / tp efficiency e = t1 / (p*tp) = s / p (potential) parallelism pp = t1 / t∞

Ghost Nodes in Stencil Computations Size of ghost region (and redundant computation) depends on network/memory speed vs. computation Can be used on unstructured meshes To compute green Copy yellow Compute blue

Detailed complexity measures for data movement I Moving data between processors by message-passing Machine parameters: a or tstartup latency (message startup time in seconds) b or tdata inverse bandwidth (in seconds per word) Time to send & recv or bcast a message of w words: a + w*b tcomm total commmunication time tcomp total computation time Total parallel time: tp = tcomp + tcomm