Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty,

Slides:



Advertisements
Similar presentations
Multiscale Models for Microstructure Evolution and Response R. B. Haber Center for Process Simulation and Design University of Illinois at Urbana– Champaign.
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Parallel Programming Laboratory1 Fault Tolerance in Charm++ Sayantan Chakravorty.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
A Bezier Based Approach to Unstructured Moving Meshes ALADDIN and Sangria Gary Miller David Cardoze Todd Phillips Noel Walkington Mark Olah Miklos Bergou.
Software Version Control SubVersion software version control system WebSVN graphical interface o View version history logs o Browse directory structure.
Parallel Decomposition-based Contact Response Fehmi Cirak California Institute of Technology.
Filling Arbitrary Holes in Finite Element Models 17 th International Meshing Roundtable 2008 Schilling, Bidmon, Sommer, and Ertl.
Parallel Mesh Refinement with Optimal Load Balancing Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard Scientific Computation Research Center.
Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.
Mapping Techniques for Load Balancing
Introduction to virtual engineering László Horváth Budapest Tech John von Neumann Faculty of Informatics Institute of Intelligent Engineering.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.
Adaptive MPI Milind A. Bhandarkar
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Advanced / Other Programming Models Sathish Vadhiyar.
Adaptive Mesh Modification in Parallel Framework Application of parFUM Sandhya Mangala (MIE) Prof. Philippe H. Geubelle (AE) University of Illinois, Urbana-Champaign.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Workshop on Operating System Interference in High Performance Applications Performance Degradation in the Presence of Subnormal Floating-Point Values.
Fault Tolerant Extensions to Charm++ and AMPI presented by Sayantan Chakravorty Chao Huang, Celso Mendes, Gengbin Zheng, Lixia Shi.
LLNL-PRES DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract.
1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Predictive Load Balancing Using Mesh Adjacencies for Mesh Adaptation  Cameron Smith, Onkar Sahni, Mark S. Shephard  Scientific Computation Research Center.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
 Dan Ibanez, Micah Corah, Seegyoung Seol, Mark Shephard  2/27/2013  Scientific Computation Research Center  Rensselaer Polytechnic Institute 1 Advances.
Seminar On Rain Technology
1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University.
Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.
TensorFlow– A system for large-scale machine learning
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
Parallel Unstructured Mesh Infrastructure
Parallel Programming By J. H. Wang May 2, 2017.
In-situ Visualization using VisIt
Parallel Objects: Virtualization & In-Process Components
PREGEL Data Management in the Cloud
Spacetime Discontinuous Galerkin Methods for Hyperbolic Problems in Physics and Engineering Robert Haber, Jeff Erickson, Michael Garland, Robert Jerrard,
ParFUM: High-level Adaptivity Algorithms for Unstructured Meshes
Performance Evaluation of Adaptive MPI
Component Frameworks:
Milind A. Bhandarkar Adaptive MPI Milind A. Bhandarkar
Gary M. Zoppetti Gagan Agrawal
An Orchestration Language for Parallel Objects
Higher Level Languages on Adaptive Run-Time System
Support for Adaptivity in ARMCI Using Migratable Objects
Parallel Implementation of Adaptive Spacetime Simulations A
Presentation transcript:

Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty, Terry Wilmarth, L.V. Kale, Robert Haber presented by: Isaac Dooley University Illinois Urbana-Champaign

USNCCM8 July 25, Overview My background in parallel programming How the Spacetime Discontinuous Galerkin Method utilizes unstructured meshes. Requirements to parallelize SDG Existing functionality in ParFUM which satisfies some SDG requirements New functionality which has been added to ParFUM to support the rest of the SDG requirements

USNCCM8 July 25, Parallel Programming Lab Our focus is parallel programming, especially in frameworks and dynamic or adaptive applications We are not Computational Geometers, nor Mathematicians. We try to build general purpose reusable high performance frameworks Charm++ and AMPI Focus on Migratable Objects and Virtualization Multiple Platforms (Clusters, SMPs, BlueGene/L)

USNCCM8 July 25, Spacetime Discontinuous Galerkin Collaboration with: Bob Haber, Jeff Erickson, Mike Garland, … NSF funded center SDG Motivation: Spatial adaptivity is needed in structural dynamics applications. Why shouldn’t we also adapt in the time dimension?

USNCCM8 July 25, Spacetime Discontinuous Galerkin Mesh generation is an advancing front algorithm called Tent Pitcher. Adds a set of new elements called patches to the mesh, then solves them, thus advancing the front. Each patch depends only on inflow elements.

6 Unsolved Patches 1-d Mesh Generation Tent Pole Time Space

7 Solved Patches Unsolved Patches 1-d Mesh Generation Time

8 Solved Patches Unsolved Patches Refinement 1-d Mesh Generation Time

USNCCM8 July 25, Adaptive SDG Method described in Abedi, Zhou, et. al.; Spacetime meshing with adaptive refinement and coarsening; 2004 Tent poles are not just pitched above existing space nodes Entire space-time mesh or frontier is built as a mesh. Non-adaptive SDG can store patches as attributes of nodes in original mesh.

USNCCM8 July 25, d Adaptive Mesh Generation

USNCCM8 July 25, d Adaptive Mesh Generation

USNCCM8 July 25, d Adaptive Mesh Generation

USNCCM8 July 25, d Adaptive Mesh Generation

USNCCM8 July 25, d Adaptive Mesh Generation

15 Courtesy: Shuo-Heng and Michael Garland

16 Courtesy: Shuo-Heng Chung and Michael Garland

USNCCM8 July 25, SDG Is Time Consuming Some simulations would take days on a single processor. We want to parallelize it to speed up simulations! There are multiple ways of parallelizing it. A goal of the parallelization is to use existing frameworks where possible.

USNCCM8 July 25, Master/Slave Parallelization of SDG The first parallelization of the SDG method was based on the observation that each patch could be solved independently. Thus the space mesh is not partitioned, but maintained on one master processor. Workers request patches to solve from the master processor. This method resulted in a bottleneck at the processor holding the entire space mesh.

USNCCM8 July 25, Can We Parallelize the Geometry? Do not want a single processor bottleneck. We have an initial mesh, we’ll partition the geometric space mesh. We need consistent ghost element layer. We need a locking mechanism for updating ghost values at appropriate times to ensure we have a consistent mesh. We will need the ability to incrementally add/remove elements to/from the mesh, maintaining consistency across all processors.

USNCCM8 July 25, Parallel Frameworks for Unstructured Meshes: ParFUM (Parallel Programming Lab, UIUC) Sierra(Sandia National Labs) Simmetrix Roccom(Center for Simulation of Advanced Rockets, UIUC) SUMAA3d(Argonne National Laboratory) UG

USNCCM8 July 25, ParFUM Existing Features (Parallel Framework for Unstructured Meshes) Originally designed for standard structural dynamics codes Extended to support Fluid Dynamic codes(Finite Volume) Local element/node ID numbering Efficient ID translation for communicating Partitioning Ghost layer generation Field registration and updating for shared nodes, ghosts Rocket Burn Simulation, CSAR 3-D Fractography in FEM

USNCCM8 July 25, ParFUM Existing Features (Parallel Framework for Unstructured Meshes) ParFUM programs look similar to serial codes, operating upon local elements/nodes and ghost layers Can write programs in Fortran, C, C++ Visualization Tools Collision Detection Library Tet Data Transfer Library Rocket Burn Simulation, CSAR 3-D Fractography in FEM

USNCCM8 July 25, ParFUM Existing Features (Parallel Framework for Unstructured Meshes) Virtualization Load balancing(explicit or asynchronous) Fault tolerance Checkpointing Performance Analysis Rocket Burn Simulation, CSAR 3-D Fractography in FEM

USNCCM8 July 25, Virtualization Charm++ Runtime System Applications built using migratable objects Virtualization = multiple migratable objects per processor Load Balancing Principle of Persistence High(90-100%) Processor Utilization and Scaling. Automatic Checkpointing Each ParFUM mesh chunk mapped to a migratable object. System Implementation User View

25 ParFUM Application On Eight Physical Processors Benefit of Virtualization to Structural Dynamics Application

USNCCM8 July 25, ParFUM - New Features Required for non-adaptive space-time meshing Incremental updates to ghost layers of adjacent processors Locking of individual elements or nodes. No global synchronization. New adjacency data structures Element to element Node to node Node to element

Non-adaptive SDG Program Initial Results

USNCCM8 July 25, ParFUM Support For Adaptivity Access to general-purpose mesh modification primitives Mesh Refinement Mesh Coarsening

USNCCM8 July 25, Current work Non-trivial challenges for a framework which doesn’t allow unstructured meshes to be modified asynchronously during execution. Must maintain consistent mesh across all processors, with correct ghost layers, shared nodes, ghost nodes, and adjacencies at any time. ParFUM - New Features Useful for adaptive space-time meshing

USNCCM8 July 25, ParFUM Support For Adaptivity Load balancing is required in any efficient framework for adaptive SDG, since mesh partitions can differ in size by orders of magnitude. We have already extended ParFUM to provide parallel incremental mesh modification primitives. The primitives allow simple coding of incremental mesh modification, refinement, coarsening, and repair routines.

Mesh Adjacency e2e,e2n,n2e,n2n Generate, Modify … Mesh Modification Lock(),Unlock() Add/Remove Node() Add/Remove Element() Nodes: Local, Shared, Ghost Elements: Local,Ghost SDG Application API (serial or parallel) ParFUM Structure Mesh Adaptivity Edge Flip, Edge Bisect, Edge Contract, … ParFUM

USNCCM8 July 25, Mesh Modification Examples Edge Flip: Remove elements e1 Remove element e2 Add element (n1,n2,n4) Add element (n2,n3,n4)

USNCCM8 July 25, Mesh Modification Examples Edge Bisect: Remove elements e1 Remove element e2 Add node Add element (n1,n2,n5) Add element (n3,n5,n2) Add element (n4,n5,n3) Add element (n4,n1,n5)

USNCCM8 July 25, Mesh Modification in Parallel Mesh on Processor 1 before edge flip Mesh on Processor 2 before edge flip Mesh on Processor 2 after edge flip

USNCCM8 July 25, Mesh Modification in ParFUM Primitive Operations must do the following: Perform the operation on local and all applicable remote processors Convert local nodes to shared nodes when they become part of the new boundary Update ghost layers(nodes and elements) for all applicable processors. The ghost layers can grow or shrink

USNCCM8 July 25, Thank-you for listening to my talk! Questions or Comments?