Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indiana University UITS/PTI

Similar presentations


Presentation on theme: "Indiana University UITS/PTI"— Presentation transcript:

1 Indiana University UITS/PTI
A Centralized Filesystem on the TeraGrid: Using the Data Capacitor to Enhance a Workflow in Astrophysics Scott Michael Indiana University UITS/PTI TeraGrid ‘10 Conference August 3, Pittsburgh PA

2 Many Thanks To Indiana University Mississippi State University
Stephen Simms Matt Link Robert Henschel Joshua Walgenbach Nathan Heald Justin Miller Thomas William Thomas Johnson Mississippi State University Trey Breckenridge Roger Smith Joey Jones Vince Sanders Greg Grimes

3 Outlook of This Talk For the purposes of this talk I will take the perspective of an astrophysicist with an interest in technology As a domain scientist, my research is the most important research in the world…to me

4 Outline The science of planet formation and protoplanetary disks
Using the Data Capacitor for this scientific workflow How useful is a centralized filesystem for other types of workflows?

5 Planet Formation with Numerical Simulations
To date 393 systems containing 464 planets have been discovered outside the Solar System Although many gas giant planets have been discovered their formation is not well understood We use three dimensional radiative self-gravitating hydrodynamic simulations to study planet formation Courtesy NASA JPL

6 The CHYMERA Code The IU hydrodynamics code is very mature and includes a variety of physical properties including Fluid dynamics on Eulerian grid Self gravitating fluid Stellar motion via the indirect potential method Fully consistent raditiave physics using a ray method Inclusion of rocky bodies such as planetary embryos The code is run on various SMP/ccNUMA HPC resources due to its OpenMP parallelization and scales well to 64 processors Because the code exhibits weak scaling and to satisfy our scientific purposes we use fairly high resolution grids (17-70 million cells)

7 Large Data Volumes We use large fixed grids to accurately capture fine detail Typical grid sizes are 512x64x512 (r,z,ϕ) and maximum grid sizes are 1024x128x2048 A full simulation at the typical grid size produces 5.5 TB of data My dissertation contains 9 such simulations

8 Analysis Procedures Our simulations require shared memory resources to execute, but analysis can be done on distributed clusters and visualization requires proprietary software and interactivity In the past we have transferred our data from the HPC facility and stored the data locally to perform the analysis and visualization Here each arrow represents a login, the files end up on the local file server which can be accessed from multiple workstations

9 Enter the Data Capacitor
To alleviate these issues we use Indiana University’s Data Capacitor (DC) to store and analyze our simulation data The WAN portion of the DC is 340 TB of spinning disk using the Lustre file system and a GID:UID mapping scheme developed at IU The DC facilitates our science in three main ways Storage File Transfer Workflow

10 The Story Thus Far Simulations were performed at PSC and NCSA and data was written directly to DC-WAN We measured the WAN write performance to be equivalent to or better than write performance to local storage resources Data have been analyzed and visualized using IU resources in both the Astronomy department and with TeraGrid resources at IU The DC-WAN eliminates the need for the researcher to oversee the transfer of files from resource to resource

11 Add Some Complexity For this work we mounted DC-WAN at Mississippi State University on their clusters Raptor and Talon In this instance the clusters at MSU represent an unforeseen surplus of compute cycles We were able to achieve 50 MB/s reads with the DC-WAN compared to 105 MB/s reads with a local Lustre setup But to use the data locally you need to transfer it This two step process would require 95 MB/s throughput in the transfer phase to compete with DC-WAN

12 Performance Tools We used VampirTrace to generate Open Trace Format traces with I/O tracing turned on We then used the otfdump tool to dump the I/O data and wrote programs to combine the data, compute statistics, and plot the results Lustre has aggressive client side caching which can cause apparently inconsistent results

13 Lessons Learned Network is typically the bottleneck in a new mount – small amounts of packet loss can kill performance The larger the network latency the larger your I/O block size should be to get good performance (this is due to TCP not Lustre) Striping may or may not be helpful depending on the WAN Lustre caching can cause interesting performance measurements

14 That’s Great But… Is a centralized filesystem really necessary?
Short answer: No. Is a centralized filesystem useful to scientists? Long answer: In some (many) cases, yes. If the tool is made available to researchers they will discover ways to utilize it and shorten their time to scientific discovery.

15 Some interesting cases
Streaming instrument or sensor data Gene sequencing Electron microscope CCD imagers on telescopes Heterogeneous workflow elements Shared memory vs. Distributed memory vs. GPUs vs. Etc. Batchable vs. Interactive Unforeseen supply or demand

16 Conclusions The DC-WAN provides a platform for seamlessly bridging TG and non-TG sites We have had success at a variety of TG sites and a growing number of non-TG campuses (e.g. Mississippi State) There are many scientific use cases where the DC-WAN can accelerate a researcher’s workflow Get connected by mailing

17 Future Work Full development and integration of the automated workflow
Additional Data Capacitor mounts at HPC facilities This material is based upon work supported by the National Science Foundation under Grants No. ACI l, OCI , OCI , OCI , and CNS Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).


Download ppt "Indiana University UITS/PTI"

Similar presentations


Ads by Google