Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCEC Capability Simulations on TeraGrid

Similar presentations


Presentation on theme: "SCEC Capability Simulations on TeraGrid"— Presentation transcript:

1 SCEC Capability Simulations on TeraGrid
Yifeng Cui San Diego Supercomputer Center

2 SCEC Computational Pathways

3 SCEC Capability Simulations on Kraken and Ranger
ShakeOut-D: 600 x 300 x 80 km domain, 100m resolution, 14.4 billion grids, upper frequency limit to 1-Hz, 3 minutes, 50k time steps, min surface velocity 500m/s, dynamic source (SGSN), velocity properties SCEC CVM4.0, 1 terabyte inputs, 5 terabytes output ShakeOut-K: 600 x 300 x 80 km domain, 100m resolution, 14.4 billion grids, upper frequency limit to 1-Hz, 3 minutes, 50k time steps, min surface velocity 500m/s, kinematic, velocity properties SCEC CVM4.0, 1 terabyte inputs, 5 terabytes output Chino Hills: 180x125x60km, 50m resolution, billion grids, 80k time steps, upper frequency limit to 2-hz, using both SCEC CVM4 and CVM-H velocity models Latest simulation completed within 1.8 hours for ShakeOut-D run on 64k Kraken XT5 cores. ShakeOut-D 2-hz benchmark achieved sustained 49 Teraflop/s. Source: Yifeng Cui, UCSD

4 Validation of Chino Hills Simulations
Goodness-of-fit at Hz for synthetics relative to data from M5.4 Chino hills earthquake. Seismogram comparisons of recorded data (black traces), CVM-S synthetics (read traces) and CVM-H synthetics (blue traces)

5 SCEC Capability Simulations Workflow
Inputs are in TB size with spatial and temporal locality Input partitions are transferred between TG sites, Simulation outputs are backed up on TACC Ranch and NICS HPSS. Visualization done on Ranger

6 Adapting SCEC Applications to Different TeraGrid Architectures
initial stress input settings yes media input Serial or parallel source partitioning and split options source fault input Serial or parallel mesh partitioning and options no if >2 0-4 source mode 0-3 media mode if 0 or 2 Read in Read in Spatial Locality Temporal Locality if 2 media partition save partition save partition if 1 if 1 if 0-1 if 2 if 0 or 2 if 1 solver if I/O mode 1 restart settings sfc or sfc+ vlm if >0 0-max checkpoints 0-1 MD5 mode 0-1 output mode 0-1 accumulation 0-1 performance ckpts if 1 MD5 if 1 performance measurement SAN switch Instrastructure SAM-QFS HPSS Source: Cui et al. Toward Petascale Earthquake Simulations, Acta Geotechnica, June 2008

7 Mesh Partitioning Mesh inputs Mesh 0 Mesh 1 Mesh 2 … Mesh N
Serial (part-serial) Serial (part-paralllel) MPI-IO scattered read MPI-IO Contiguous read

8 Mesh Serial Read

9 Mesh Partitioned in Advance
Data locality

10 Mesh MPI-IO Scattered Read

11 Mesh MPI-IO Contiguous Read
Read XY plane and then redistribute data Data Continuity

12 Comparisons of Mesh Approaches
Serial IO Seria IO (partitioned local files) MPIIO (scattered) MPIIO (contigous) and data redistribution Performance Low High Midium System dependence Scalability poor dependents Good Number of files 1 npx*npy*npz Memory requirement (elements) nxt*nyt*nzt/core nx*ny/core - sender (nz cores) nxt*nyt*nzt/core - receiver (all cores) Communication overhead None Collective IO No Yes Stripe number (recommended) Small Large Stripe size (recommended) Big Bigger (nx*ny)

13 Serial (part-paralllel)
Source Partitioning Source inputs Source 1 Time Step 1-600 Time step Time step Source 2 Time step Source 3 Source Serial (part-serial) Serial (part-paralllel) MPI-IO scattered read

14

15

16 Synchronous Communication

17 Synchronous Communication

18 Asynchronous Communication

19 Asynchronous Communication

20 Asynchronous Communication

21 Asynchronous Communication

22

23

24 SCEC Capability Simulations Performance on TeraGrid
* Benchmark

25 Other efforts in progress supporting SCEC larger-scale simulations
Single CPU optimization, for example division is very expensive, by reducing division work, we have observed performance improvements by 25-45% on up to 8k cores Workflow: end-to-end approach to automate procedures of capability simulations Restructuring code to prepare as SCEC community code, emphasize modularity, re-usability and ease of integration Developing hybrid code with a two level MPI/OpenMP

26 Acknowledgements This work has received technical supports from varied TeraGrid sites, in particular: Tommy Minyard and Karl Schultz of TACC Kwai Lam Wong and Bruce Loftis of NICS Amit Chourasia of SDSC SCEC Collaborations


Download ppt "SCEC Capability Simulations on TeraGrid"

Similar presentations


Ads by Google