I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET.

Slides:



Advertisements
Similar presentations
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Advertisements

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
National Energy Research Scientific Computing Center (NERSC) I/O Patterns from NERSC Applications NERSC Center Division, LBNL.
Chombo: An HPC Toolkit for Adaptive Mesh Refinement Applications Brian Van Straalen Applied Numerical Algorithms Group Lead Software Developer for Chombo.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR Collaborators: Adam Frank Brandon Shroyer Chen Ding Shule Li.
I/O Analysis and Optimization for an AMR Cosmology Simulation Jianwei LiWei-keng Liao Alok ChoudharyValerie Taylor ECE Department Northwestern University.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
I/O Optimization for ENZO Cosmology Simulation Using MPI-IO Jianwei Li12/06/2001.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
E. WES BETHEL (LBNL), CHRIS JOHNSON (UTAH), KEN JOY (UC DAVIS), SEAN AHERN (ORNL), VALERIO PASCUCCI (LLNL), JONATHAN COHEN (LLNL), MARK DUCHAINEAU.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium Siu Man Yau, Katherine.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Kathy Yelick U.C. Berkeley.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Use of a High Level Language in High Performance Biomechanics Simulations Katherine Yelick, Armando Solar-Lezama, Jimmy Su, Dan Bonachea, Amir Kamil U.C.
Support for Adaptive Computations Applied to Simulation of Fluids in Biological Systems Immersed Boundary Method Simulation in Titanium.
Kathy Yelick, 1 Advanced Software for Biological Simulations Elastic structures in an incompressible fluid. Blood flow, clotting, inner ear, embryo growth,
Parallelization Of The Spacetime Discontinuous Galerkin Method Using The Charm++ FEM Framework (ParFUM) Mark Hills, Hari Govind, Sayantan Chakravorty,
Scalable Algorithms for Structured Adaptive Mesh Refinement Akhil Langer, Jonathan Lifflander, Phil Miller, Laxmikant Kale Parallel Programming Laboratory.
SciDAC Software Introduction Osni Marques Lawrence Berkeley National Laboratory DOE Workshop for Industry Software Developers March 31,
Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.
HDF5 A new file format & software for high performance scientific data management.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CASC This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under.
Unified Parallel C at LBNL/UCB UPC AMR Status Report Michael Welcome LBL - FTG.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
The Applied Partial Differential Equations Integrated Software Infrastructure Center (APDEC) Goal: Develop algorithms and software for simulating multiscale.
The european ITM Task Force data structure F. Imbeaux.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
1 1  Capabilities: Building blocks for block-structured AMR codes for solving time-dependent PDE’s Functionality for [1…6]D, mixed-dimension building.
Adaptive Meshing Control to Improve Petascale Compass Simulations Xiao-Juan Luo and Mark S Shephard Scientific Computation Research Center (SCOREC) Interoperable.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
Using IOR to Analyze the I/O Performance
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
TR&D 2: NUMERICAL TOOLS FOR MODELING IN CELL BIOLOGY Software development: Jim Schaff Fei Gao Frank Morgan Math & Physics: Boris Slepchenko Diana Resasco.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
ESMF,WRF and ROMS. Purposes Not a tutorial Not a tutorial Educational and conceptual Educational and conceptual Relation to our work Relation to our work.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
PIDX PIDX - a parallel API to capture the data models used by HPC application and write it out in an IDX format. PIDX enables simulations to write out.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN This work has been sponsored by the Mathematics,
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
Challenges in Electromagnetic Modeling Scalable Solvers
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
HDF5 Metadata and Page Buffering
In-situ Visualization using VisIt
Programming Models for SimMillennium
SDM workshop Strawman report History and Progress and Goal.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

I/O for Structured-Grid AMR Phil Colella Lawrence Berkeley National Laboratory Coordinating PI, APDEC CET

Block-Structured Local Refinement (Berger and Oliger, 1984) Refined regions are organized into rectangular patches. Refinement performed in time as well as in space.

Stakeholders SciDAC projects: Combustion, astrophysics (cf. John Bell’s talk). MHD for tokomaks (R. Samtaney). Wakefield accelerators (W. Mori, E. Esarey). AMR visualization and analytics collaboration (VACET). AMR elliptic solver benchmarking / performance collaboration (PERI, TOPS). Other projects: ESL edge plasma project - 5D gridded data (LLNL, LBNL). Cosmology - AMR Fluids + PIC (F. Miniati, ETH). Systems biology - PDE in complex geometry (A. Arkin, LBNL). Larger structured-grid AMR community: Norman (UCSD), Abel (SLAC), Flash (Chicago), SAMRAI (LLNL)… We all talk to each other, have common requirements.

Chombo: a Software Framework for Block-Structured AMR Requirement: to support a wide variety of applications that use block-structured AMR using a common software framework. Mixed-language model: C++ for higher level data structures, Fortran for regular single-grid calculations. Reusable components: Component design based on mapping of mathematical abstractions to classes. Build on public domain standards: MPI, HDF5, VTK. Previous work: BoxLib (LBNL/CCSE), KeLP (Baden, et. al., UCSD), FIDIL (Hilfinger and Colella).

Layered Design Layer 1. Data and operations on unions of boxes - set calculus, rectangular array library (with interface to Fortran), data on unions of rectangles, with SPMD parallelism implemented by distributing boxes over processors. Layer 2. Tools for managing interactions between different levels of refinement in an AMR calculation - interpolation, averaging operators, coarse-fine boundary conditions. Layer 3. Solver libraries - AMR-multigrid solvers, Berger-Oliger time-stepping. Layer 4. Complete parallel applications. Utility layer. Support, interoperability libraries - API for HDF5 I/O, visualization package implemented on top of VTK, C API’s.

Distributed Data on Unions of Rectangles Provides a general mechanism for distributing data defined on unions of rectangles onto processors, and communication between processors. Metadata of which all processors have a copy: BoxLayout is a collection of Boxes and processor assignment. template LevelData and other container classes hold data distributed over multiple processors. For each k=1... nGrids, an “array” of type T corresponding to the box B k is located on processor p k. Straightforward API’s for copying, exchanging ghost cell data, iterating over the arrays on your processor in a SPMD manner.

Typical I/O requirements Loads are balanced to fill available memory on all processors. Typical output data size corresponding to a single time slice: 10% - 100% of total memory image. Current problems scale to processors. Combustion and astrophysics simulations write one file / processor; other applications use Chombo API for HDF5.

Disk File “/ “ Group subdirectory Attribute, dataset files. Attribute: small metadata that multiple processes in a SPMD program can write out redundantly. Dataset: large data, each processor writes out only what it owns P0 P1 P2 Chombo API for HDF5 Parallel neutral: can change processor layout when re-inputting output data. Dataset creation is expensive: create only one dataset for each LevelData. The data for each patch is written into offsets from the origin of that dataset. HDF5 I/O

Observed performance of HDF5 applications in Chombo: no (weak) scaling. More detailed measurements indicate two causes: misalignment with disk block boundaries, lack of aggregation. Performance Analysis (Shan and Shalf, 2006)

Future Requirements Weak scaling to 10 4 processors. Need fo finer time resolution will add another 10x in data. Other data types: sparse data, particles. One file / processor doesn’t scale. Interfaces to VACET, FastBit.

Potential for Collaboration with SDM Common AMR data API developed under SciDAC I. APDEC weak scaling benchmark for solvers could be extended to I/O. Minimum buy-in: high-level API, portability, sustained support.