Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

Similar presentations


Presentation on theme: "Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267."— Presentation transcript:

1 Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267

2 2 Parallelism in the Game of Life The activities in this system are discrete events The simulation is synchronous use two copies of the grid (old and new) the value of each new grid cell in new depends only on the 9 cells (itself plus neighbors) in old grid (“stencil computation”) Each grid cell update is independent: reordering or parallelism OK simulation proceeds in timesteps, where (logically) each cell is evaluated at every timestep old worldnew world

3 3 Parallelism in Life Parallelism is straightforward ocean is regular data structure even decomposition across processors gives load balance Locality is achieved by using large patches of the world boundary values from neighboring patches are needed Optimization: visit only occupied cells (and neighbors)

4 4 Two-dimensional block decomposition If each processor owns n 2 /p elements to update, … … amount of data communicated, n/p per neighbor, is relatively small if n>>p This is less than n per neighbor for block column decomposition

5 5 Redundant “Ghost” Nodes in Stencil Computations Size of ghost region (and redundant computation) depends on network/memory speed vs. computation Can be used on unstructured meshes To compute green Copy yellow Compute blue

6 6 Comments on practical meshes Regular 1D, 2D, 3D meshes Important as building blocks for more complicated meshes Practical meshes are often irregular Composite meshes, consisting of multiple “bent” regular meshes joined at edges Unstructured meshes, with arbitrary mesh points and connectivities Adaptive meshes, which change resolution during solution process to put computational effort where needed

7 7 Parallelism in Regular meshes Computing a Stencil on a regular mesh need to communicate mesh points near boundary to neighboring processors. Often done with ghost regions Surface-to-volume ratio keeps communication down, but Still may be problematic in practice Implemented using “ghost” regions. Adds memory overhead

8 8 Irregular mesh: NASA Airfoil in 2D

9 9 Composite Mesh from a Mechanical Structure

10 10 Converting the Mesh to a Matrix

11 11 Adaptive Mesh Refinement (AMR) Adaptive mesh around an explosion Refinement done by calculating errors Parallelism Mostly between “patches,” dealt to processors for load balance May exploit some within a patch (SMP)

12 12 Adaptive Mesh Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement) See: http://www.llnl.gov/CASC/SAMRAI/http://www.llnl.gov/CASC/SAMRAI/ fluid density

13 13 Irregular mesh: Tapered Tube (Multigrid)

14 14 Challenges of Irregular Meshes for PDE’s How to generate them in the first place E.g. Triangle, a 2D mesh generator by Jonathan Shewchuk 3D harder! E.g. QMD by Stephen Vavasis How to partition them ParMetis, a parallel graph partitioner How to design iterative solvers PETSc, a Portable Extensible Toolkit for Scientific Computing Prometheus, a multigrid solver for finite element problems on irregular meshes How to design direct solvers SuperLU, parallel sparse Gaussian elimination These are challenges to do sequentially, more so in parallel


Download ppt "Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267."

Similar presentations


Ads by Google