Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

Slides:



Advertisements
Similar presentations
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
Advertisements

Exploring Communication Options with Adaptive Mesh Refinement Courtenay T. Vaughan, and Richard F. Barrett Sandia National Laboratories SIAM Computational.
CS 584. Review n Systems of equations and finite element methods are related.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
Languages and Compilers for High Performance Computing Kathy Yelick EECS Department U.C. Berkeley.
1 Synthesis of Distributed ArraysAmir Kamil Synthesis of Distributed Arrays in Titanium Amir Kamil U.C. Berkeley May 9, 2006.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.
Filling Arbitrary Holes in Finite Element Models 17 th International Meshing Roundtable 2008 Schilling, Bidmon, Sommer, and Ertl.
02/09/2005CS267 Lecture 71 CS 267 Sources of Parallelism and Locality in Simulation – Part 2 James Demmel
ECE669 L5: Grid Computations February 12, 2004 ECE 669 Parallel Computer Architecture Lecture 5 Grid Computations.
CS267 L12 Sources of Parallelism(3).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality (Part 3)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CS 240A: Sources of Parallelism in Physical Simulation Based on slides from David Culler, Jim Demmel, Kathy Yelick, et al., UCB CS267.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
CS267 L11 Sources of Parallelism(2).1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 11: Sources of Parallelism and Locality (Part 2)
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
CS240A: Conjugate Gradients and the Model Problem.
CS267 L24 Solving PDEs.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 24: Solving Linear Systems arising from PDEs - I James Demmel.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
1 Sources of Parallelism and Locality in Simulation.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.
Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.
CS 240A: Parallelism in Physical Simulation
02/14/2007CS267 Lecture 91 CS 267 Sources of Parallelism and Locality in Simulation – Part 2 Kathy Yelick
1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.
Lecture 8 – Stencil Pattern Stencil Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Parallel Computing for Urban Cellular Automata Qingfeng. Gene. Guan 2004-Nov-18 Geography Department Colloquium Univ. of California, Santa Barbara.
Molecular Dynamics Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Computation on meshes, sparse matrices, and graphs Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Discontinuous Galerkin Methods and Strand Mesh Generation
Adaptive Mesh Modification in Parallel Framework Application of parFUM Sandhya Mangala (MIE) Prof. Philippe H. Geubelle (AE) University of Illinois, Urbana-Champaign.
Parallel Simulation of Continuous Systems: A Brief Introduction
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
02/07/2006CS267 Lecture 71 CS 267 Sources of Parallelism and Locality in Simulation – Part 2 James Demmel
1 CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick
CS 267 Sources of Parallelism and Locality in Simulation – Part 2
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS240A: Conjugate Gradients and the Model Problem.
I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 Titanium Review: Immersed Boundary Armando Solar-Lezama Biological Simulations Using the Immersed Boundary Method in Titanium Ed Givelberg, Armando Solar-Lezama,
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Auburn University
Model Problem: Solving Poisson’s equation for temperature
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
Programming Models for SimMillennium
CS 267 Sources of Parallelism and Locality in Simulation – Part 2
Computation on meshes, sparse matrices, and graphs
Computational meshes, matrices, conjugate gradients, and mesh partitioning Some slides are from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy.
CS 267 Sources of Parallelism and Locality in Simulation – Part 2
Sathish Vadhiyar Courtesy: Dr. David Walker, Cardiff University
CS6068 Applications: Numerical Methods
Redundant Ghost Nodes in Jacobi
CS 267 Sources of Parallelism and Locality in Simulation – Part 2
James Demmel CS 267 Applications of Parallel Computers Lecture 12: Sources of Parallelism and Locality.
Presentation transcript:

Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

2 Parallelism in the Game of Life The activities in this system are discrete events The simulation is synchronous use two copies of the grid (old and new) the value of each new grid cell in new depends only on the 9 cells (itself plus neighbors) in old grid (“stencil computation”) Each grid cell update is independent: reordering or parallelism OK simulation proceeds in timesteps, where (logically) each cell is evaluated at every timestep old worldnew world

3 Parallelism in Life Parallelism is straightforward ocean is regular data structure even decomposition across processors gives load balance Locality is achieved by using large patches of the world boundary values from neighboring patches are needed Optimization: visit only occupied cells (and neighbors)

4 Two-dimensional block decomposition If each processor owns n 2 /p elements to update, … … amount of data communicated, n/p per neighbor, is relatively small if n>>p This is less than n per neighbor for block column decomposition

5 Redundant “Ghost” Nodes in Stencil Computations Size of ghost region (and redundant computation) depends on network/memory speed vs. computation Can be used on unstructured meshes To compute green Copy yellow Compute blue

6 Comments on practical meshes Regular 1D, 2D, 3D meshes Important as building blocks for more complicated meshes Practical meshes are often irregular Composite meshes, consisting of multiple “bent” regular meshes joined at edges Unstructured meshes, with arbitrary mesh points and connectivities Adaptive meshes, which change resolution during solution process to put computational effort where needed

7 Parallelism in Regular meshes Computing a Stencil on a regular mesh need to communicate mesh points near boundary to neighboring processors. Often done with ghost regions Surface-to-volume ratio keeps communication down, but Still may be problematic in practice Implemented using “ghost” regions. Adds memory overhead

8 Irregular mesh: NASA Airfoil in 2D

9 Composite Mesh from a Mechanical Structure

10 Converting the Mesh to a Matrix

11 Adaptive Mesh Refinement (AMR) Adaptive mesh around an explosion Refinement done by calculating errors Parallelism Mostly between “patches,” dealt to processors for load balance May exploit some within a patch (SMP)

12 Adaptive Mesh Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement) See: fluid density

13 Irregular mesh: Tapered Tube (Multigrid)

14 Challenges of Irregular Meshes for PDE’s How to generate them in the first place E.g. Triangle, a 2D mesh generator by Jonathan Shewchuk 3D harder! E.g. QMD by Stephen Vavasis How to partition them ParMetis, a parallel graph partitioner How to design iterative solvers PETSc, a Portable Extensible Toolkit for Scientific Computing Prometheus, a multigrid solver for finite element problems on irregular meshes How to design direct solvers SuperLU, parallel sparse Gaussian elimination These are challenges to do sequentially, more so in parallel