Presentation is loading. Please wait.

Presentation is loading. Please wait.

F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.

Similar presentations


Presentation on theme: "F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale."— Presentation transcript:

1 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale

2 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Characteristics of Nuclear Astrophysical Simulation Data Origin: –Usually from hydrodynamic, MHD, or radiation-transport components of a simulation –Supernova models, neutron-star mergers, etc. Disk Access Patterns: –Data written & read primarily from structured or block-structured AMR grids –Unstructured grid or particle data is possible –Writes & reads done via parallelized I/O (usually MPI-I/O) –Large number of processes (>= 1024) –Post run analysis & Viz requires accessing lengthy sequences of large files

3 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Characteristics of Nuclear Astrophysical Simulation Data File Sizes: –Large for both checkpointing and viz dumps –Currently ~ 1 Gbyte –Near Future ~ 10’s of Gbytes File Abundances: –Many (typically 1000’s) of files from a single batch job –Especially true for long timescale problems File Access Frequency: –Low –Perhaps only once per run –Probably will be post-processed/analyzed on non-ultrascale platform

4 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Problem Sizes A typical 2-D multi-group Boltzmann transport simulation: A 3-D multi-group flux-limited diffusion model or 3-D hydro model checkpoint file @ 1024x1204x1024 resolution will be comparable in size # Scratch vectors Bytes/variable# Neutrino species# Energy bins # Angular points# Spatial points 256x256 x 50 x 16x16 x (6 + 8) x 8 = 100 Gbytes

5 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 NERSC ORNL Indiana U. Stony Brook NC State UC San Diego Distributed Set of Computing and Analysis Sites Long round-trip time between sites: –Approx. 75 msec for NERSC to Stony Brook –Bad for interactive visualization & analysis of data –Must use store & forward capabilities of logisitical networking

6 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Networking Challenges # Checkpoint files Bytes/variable# Neutrino species# Energy bins # Angular points# Spatial points 256x256 x 50 x 16x16 x 6 x 100 x 8 = 4 Tbytes Movement of large data sets between compute & user sites Needed for post-run analysis, reconstruction, and visualization A typical 2-D multi-group Boltzmann transport simulation: A 3-D multi-group flux-limited diffusion model may produce a Terabyte of data

7 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs of Ultrascale Nuclear Astrophysical Simulation Projects Parallel I/O -Support for portable data formats in data storage and data management products & tools –Vendors need to be helpful to developers of these formats –Support HDF5 & netCDF –Specifically must help portable data format developers -Support parallel HDF5 & netCDF interfaces for vendor specific MPI-I/O implementations -Support for parallel HDF5 & netCDF on vendor specific parallel filesystems -I/O must scale to large ( > 2048) processors –Asynchronous I/O support via MPI-2 is highly desireable

8 F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs of Ultrascale Nuclear Astrophysical Simulation Projects Seamless file access across scratch & tertiary file storage –Tertiary storage files accessible via Unix paths directly from OS Enable easy Tbyte data transfer between select sites –Vital for post-run analysis & visualization –Integration of storage with Logistical Networking (LBONE) depots –Increase transfer throughput by use of dedicated non-tcp networks to handle scheduled transfers?? –Automated data migration between sites Ability to handle large numbers of large files from single batch job –Need lots of scratch space on Ultrascale platforms –Automate migration of files from scratch to tertiary storage –Viz & Analysis tools need to be able to handle long time- sequences of files from a simulation


Download ppt "F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March 16-18 Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale."

Similar presentations


Ads by Google