Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.

Slides:

Advertisements

Similar presentations

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

Advertisements

Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.

Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.

NUMA Tuning for Java Server Applications Mustafa M. Tikir.

1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.

CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.

Evaluation and Optimization of a Titanium Adaptive Mesh Refinement Amir Kamil Ben Schwarz Jimmy Su.

Adaptive MPI Chao Huang, Orion Lawlor, L. V. Kalé Parallel Programming Lab Department of Computer Science University of Illinois at Urbana-Champaign.

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.

Strategies for Implementing Dynamic Load Sharing.

Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :

A Load Balancing Framework for Adaptive and Asynchronous Applications Kevin Barker, Andrey Chernikov, Nikos Chrisochoides,Keshav Pingali ; IEEE TRANSACTIONS.

A Framework for Collective Personalized Communication Laxmikant V. Kale, Sameer Kumar, Krishnan Varadarajan.

MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD

Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

ParFUM Parallel Mesh Adaptivity Nilesh Choudhury, Terry Wilmarth Parallel Programming Lab Computer Science Department University of Illinois, Urbana Champaign.

Global Address Space Applications Kathy Yelick NERSC/LBNL and U.C. Berkeley.

A Parallelisation Approach for Multi-Resolution Grids Based Upon the Peano Space-Filling Curve Student: Adriana Bocoi Advisor: Dipl.-Inf.Tobias Weinzierl.

1 Data Structures for Scientific Computing Orion Sky Lawlor charm.cs.uiuc.edu 2003/12/17.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.

Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.

Scaling to New Heights Retrospective IEEE/ACM SC2002 Conference Baltimore, MD.

7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.

Adaptive Mesh Modification in Parallel Framework Application of parFUM Sandhya Mangala (MIE) Prof. Philippe H. Geubelle (AE) University of Illinois, Urbana-Champaign.

Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.

Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

Fault Tolerant Extensions to Charm++ and AMPI presented by Sayantan Chakravorty Chao Huang, Celso Mendes, Gengbin Zheng, Lixia Shi.

Akram Bitar and Larry Manevitz Department of Computer Science

LLNL-PRES DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract.

I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

1 ©2004 Board of Trustees of the University of Illinois Computer Science Overview Laxmikant (Sanjay) Kale ©

CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,

An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.

Using Charm++ to Mask Latency in Grid Computing Applications Gregory A. Koenig Parallel Programming Laboratory Department.

BOĞAZİÇİ UNIVERSITY – COMPUTER ENGINEERING Mehmet Balman Computer Engineering, Boğaziçi University Parallel Tetrahedral Mesh Refinement.

Electronic visualization laboratory, university of illinois at chicago Visualizing Very Large Scale Earthquake Simulations (SC 2003) K.L.Ma, UC-Davis.

Scalable and Topology-Aware Load Balancers in Charm++ Amit Sharma Parallel Programming Lab, UIUC.

Memory-Aware Scheduling for LU in Charm++ Isaac Dooley, Chao Mei, Jonathan Lifflander, Laxmikant V. Kale.

Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.

Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.

1 Parallel Applications Computer Architecture Ning Hu, Stefan Niculescu & Vahe Poladian November 22, 2002.

1 Data Structures for Scientific Computing Orion Sky Lawlor /04/14.

1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.

Parallel Application Paradigms CS433 Spring 2001 Laxmikant Kale.

Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.

University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.

FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.

Massively Parallel Cosmological Simulations with ChaNGa Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas Quinn.

1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.

Xing Cai University of Oslo

Performance Evaluation of Adaptive MPI

Component Frameworks:

SAT-Based Area Recovery in Technology Mapping

Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.

Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,

Support for Adaptivity in ARMCI Using Migratable Objects

Parallel Implementation of Adaptive Spacetime Simulations A

Center for Process Simulation and Design, University of Illinois Robert B. Haber, Duane D. Johnson, Jonathan A. Dantzig, DMR Visualizing Shocks.

Akram Bitar and Larry Manevitz Department of Computer Science

Presentation transcript:

Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory University of Illinois at Urbana-Champaign Kuo-Chuan Pan, Paul Ricker Department of Astronomy University of Illinois at Urbana-Champaign

Outline Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR2 Introduction to Adaptive Mesh Refinement (AMR) Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach

Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR3 Introduction to Adaptive Mesh Refinement Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach Outline

Adaptive Mesh Refinement Introduction Solving Partial Differential Equations (PDEs) – PDEs solved using discrete domain – Algebraic equations estimate values of unknowns at the mesh points – Resolution/Spacing of mesh points determines error 16th April, 2013Scalable Algorithms for AMR4

16th April, 2013Scalable Algorithms for AMR5 Uniform meshes High resolution required for handling difficult regions (discontinuities, steep gradients, shocks, etc.) Computationally extremely costly Adaptive Mesh Refinement Introduction Adaptive Mesh Refinement Start with a coarse grid Identify regions that need finer resolution Superimpose finer subgrids only on those regions AMR makes it feasible to solve problems that are intractable on uniform grid Uniform MeshAdaptively Refined Mesh

16th April, 2013Scalable Algorithms for AMR6 Impact of Type Ia super nova explosion on a main –sequence binary companion, rendered using FLASH Adaptive Mesh Refinement Applications CFD Astrophysics Climate Modeling Turbulence Mantle Convection Modeling Combustion Biophysics and many more

Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR7 Introduction to Adaptive Mesh Refinement Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach Outline

PARAMESH SAMRAI FLASH p4est Enzo Chombo deal.II and many more Our current work focusses on Oct-tree based simulations 16th April, 2013Scalable Algorithms for AMR8 Traditional Approach Existing Frameworks

16th April, 2013Scalable Algorithms for AMR9 Refinement structure can be represented using a quad-tree (2D)/ oct-tree (3D) Little more background on AMR An important condition in AMR Refinement levels of neighboring blocks differ by ±1

16th April, 2013Scalable Algorithms for AMR10 A set of blocks assigned to a process Use space-filling curves for load balancing Traditional Approach Typical Implementations

16th April, 2013Scalable Algorithms for AMR11 Traditional Approach Disadvantages O(#blocks) memory per-process O(#blocks) Memory bottlenecks!!! O(d) reductions Synchronization overheads!!!

Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR12 Introduction to Adaptive Mesh Refinement Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach Outline

Instead of processes, promote individual blocks to first class entities – chare array with custom indices – A chare corresponds to a block – A block is now the end point of communication 16th April, 2013Scalable Algorithms for AMR13 Scalable Approach Doing it the Charm++ way

Block Naming – Bitvector describing path from root to block’s node – One bit per dimension at each level – Easy to compute parent, children, siblings using bit manipulation Block acts as a virtual processor – Run time handles communication between arbitrary blocks – overlap of computation with communication of other blocks on same physical process Dynamic placement of blocks (chares) on processes – Facilitates dynamic load balancing Block is a unit of algorithm expression – Simplifies implementation complexity 16th April, 2013Scalable Algorithms for AMR14 Scalable Approach Doing it the Charm++ way p0 computing Waiting for boundary layers

Based on local error estimate, each block makes one of the following decisions: refine, coarse or stay refine and stay decisions are communicated to neighbors – coarse decision is implied if either refine, stay message is not received 16th April, 2013Scalable Algorithms for AMR15 Scalable Approach The Mesh Restructuring (Remeshing) Algorithm

refine, stay decisions will propagate 16th April, 2013Scalable Algorithms for AMR16 Scalable Approach The Mesh Restructuring Algorithm Decisions are updated based on the DFA and change in decisions are again communicated

When to stop? 16th April, 2013Scalable Algorithms for AMR17 Scalable Approach The Mesh Restructuring Algorithm

16th April, 2013Scalable Algorithms for AMR18 Dynamic Load Balancing

Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR19 Introduction to Adaptive Mesh Refinement Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach Outline

Traditional ApproachScalable Approach Memory Mesh Restructuring SynchronizedHighly asynchronous Load BalancingCentralized and Distributed Distributed Neighbor LookupHash table ImplementationComplex implementationSimple implementation (SLOC: 1300 for 2D, 1600 for 3D Advection) 16th April, 2013Scalable Algorithms for AMR20 Conclusion New approach promises high performance for much more deeply refined computations than are currently practiced Comparison with the Traditional Approach

Experimental Results/ Performance 16th April, 2013Scalable Algorithms for AMR21 Introduction to Adaptive Mesh Refinement Traditional Approach – Existing Frameworks – Algorithm – Disadvantages New Scalable Approach – Doing it the Charm++ way – The Mesh Restructuring Algorithm – Dynamic Distributed Load Balancing Comparison with Traditional Approach Outline

16th April, 2013Scalable Algorithms for AMR22 Performance Benchmark Advection benchmark First-order upwind method in 2D space Advection of a tracer along with the fluid

16th April, 2013Scalable Algorithms for AMR23 Performance Mesh Restructuring Latency 2D Advection on BG/Q Two components Remeshing decisions and communication scales well Termination detection time Logarithmic in #processes

16th April, 2013Scalable Algorithms for AMR24 Performance Strong Scaling 2D Advection on BG/Q with max-depth of 15

Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller Parallel Programming Laboratory University of Illinois at Urbana-Champaign Kuo-Chuan Pan, Paul Ricker Department of Astronomy University of Illinois at Urbana-Champaign Thank you! Questions??