Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana.

Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana University 9/14/2011 Scott A. Klasky klasky@ornl.gov Special Thanks to: H. Abbasi 2, Q. Liu 2, J. Logan 1, M. Parashar 6, N. Podhorszki 1, K. Schwan 4, A. Shoshani 3, M. Wolf 4, S. Ahern 1, I.Altintas 9, W.Bethel 3, V. Bhat 6, S. Canon 3, L. Chacon 1, CS Chang 10, J. Chen 5, H. Childs 3, T. Critchlow 16, J. Cummings 13, C. Docan 6, G. Eisenhauer 4, S. Ethier 10, B. Geveci 8, G. Gibson 17, R. Grout 7, R. Kendall 1, J. Kim 3, T. Kordenbrock 5, Z. Lin 10, J. Lofstead 5, B. Ludaescher 18, X. Ma 15, K. Moreland 5, R. Oldfield 5, V. Pascucci 12, M. Polte 17, J. Saltz 18, N. Samatova 15, W. Schroeder 8, R. Tchoua 1, Y. Tian 14, R. Vatsavai 1, M. Vouk 15, K. Wu 3, W. Yu 14, F. Zhang 6, F. Zheng 4 1 ORNL, 2 U.T. Knoxville, 3 LBNL, 4 Georgia Tech, 5 Sandia Labs, 6 Rutgers, 7 NREL, 8 Kitware, 9 UCSD, 10 PPPL, 11 UC Irvine, 12 U. Utah, 13 Caltech, 14 Auburn University, 15 NCSU, 16 PNNL, 17 CMU, 18 Emory

Managed by UT-Battelle for the Department of Energy Outline The ADIOS story Step 1: Identification of the problem Step 2: Understand the application use cases Step 3: Investigate related research Step 4: Solve the problem Step 5: Innovate Step 6: Impact Step 7: Future Work supported under: ASCR : CPES, SDM, Runtime Staging, SAP, OLCF, Co-design OFES : GPSC, GSEP NSF : EAGER, RDAV NASA : ROSES

Managed by UT-Battelle for the Department of Energy We want our system to be so easy, even a chimp can use it Even I can use it! Sustainable Fast Scalable Portable

Managed by UT-Battelle for the Department of Energy The ADIOS story Initial partnership with GTC (PPPL) Study the codes I/O Posix, netcdf, HDF5 used Requirement: visually monitoring Data Streaming APIs created Catamount forces to use MPI-IO for checkpoint-restarts (SDM team, but 30% overhead. Assemble a team (ORNL, Georgia Tech, Rutgers): Create a new I/O abstraction New file format introduced to reduce the I/O impact + increase resiliency Introduce SOA to answer a new challenges from the CPES team New questions arise: reduce I/O variability, (Sandia) Faster reads needed, (Auburn) S3D team wants the reads even faster! Staging techniques introduced for reading, NREL.

Managed by UT-Battelle for the Department of Energy The ADIOS story Couple multi-scale, multi-physics codes as separate executables, (NYU, CalTech) Data reduction necessary, NCSU Index the data, FastBit LBNL Optimize the I/O for vis., (Kitware, Sandia, LBNL) Introduce workflow automation for in situ processing, SDSC IBM BGP requirements NCSU Integrate U.Q. into ADIOS staging for co-design, U. Texas Abstract data monitoring with I/O abstraction, ORNL Image processing, spatio- temporal queries (ORNL, Emory) Cloud analysis requirements, (Wisconsin )

Managed by UT-Battelle for the Department of Energy Identify the problem Step 1

Managed by UT-Battelle for the Department of Energy Extreme scale computing Trends More FLOPS Limited number of users at the extreme scale Problems Performance Resiliency Debugging Getting Science done Problems will get worse Need a “revolutionary” way to store, access, debug to get the science done! ASCI purple (49 TB/140 GB/s) – JaguarPF (300 TB/200 GB/s) From J. Dongarra, “Impact of Architecture and Technology for Extreme Scale on Software and Algorithm Design,” Cross- cutting Technologies for Computing at the Exascale, February 2-5, 2010. Most people get < 5 GB/s at scale

Managed by UT-Battelle for the Department of Energy I/O challenges for the extreme scale Application Complexity As the complexity of physics increases, data grows Checkpoint-restart (write) Analysis and visualization (read/write) Coupled application formulations (read/write) System/Hardware Constraints Our systems are growing by 2x FLOPS/year Disk Bandwidth is growing ~20%/year As the systems grow, the MTTF falls Further stresses due to shared file systems Need new and innovative approaches to cope with these problem! JaguarPF 224 K sith 1K lens 512 MDS 1 node SAN

Managed by UT-Battelle for the Department of Energy Next generation I/O and file system challenges At the architecture or node level Use increasingly deep memory hierarchies coupled with new memory properties At the system level Cope with I/O rates and volumes that stress the interconnect and can severely limit application performance Can consume unsustainable levels of power At the exascale Immense aggregate I/O needs with potentially uneven loads placed on underlying resource Can result in data hotspots, interconnect congestion and similar issues

Managed by UT-Battelle for the Department of Energy Understand the application use cases Step 2

Managed by UT-Battelle for the Department of Energy Our team works directly with Fusion: XGC1, GTC, GTS, GTC-P, Pixie3D, M3D, GEM Combustion: S3D Relativity: PAMR Nuclear Physics: Nuclear Physics Astrophysics: Chimera, Flash many more application teams Requirements (in order) Fast I/O with low overhead Simplicity in using I/O Self Describing file formats QoS Extras, if possible Ability to visualize data from simulation Couple multiple codes Perform in situ workflows Applications are the main drivers

Managed by UT-Battelle for the Department of Energy Investigate related research Step 3

Managed by UT-Battelle for the Department of Energy Parallel netCDF New file format to allow for large array support New optimizations for non-blocking calls Idea is to allow netcdf to work in parallel and for large files and large arrays 2011 paper: A Case Study for Scientic I/O: Improving the FLASH Astrophysics Code http://www.mcs.anl.gov/uploads/cels/papers/P1819.pdf 5% peak 0.8% peak

Managed by UT-Battelle for the Department of Energy Parallel netCDF-4/ HDF5 Use HDF5 for the file format Keep backward compatibility in tools to read netCDF 3 files HDF5 optimized chunking New journaling techniques to handle resiliency Many other optimizations 5 GB/s http://www.hdfgroup.org/pubs/papers/howison_hdf5_lustre_iasds2010.pdf Vorpal, 40^3: 1.4 MB/proc

Managed by UT-Battelle for the Department of Energy Solve the problem Step 4

Managed by UT-Battelle for the Department of Energy The “early days”: 2001, Reduce I/O overhead for 1 TB data from the GTC code and “real-time” monitoring S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, R. Samtaney, “Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code,” SC 2003 Conference. V. Bhat, S. Klasky, S. Atchley, M. Beck, D. McCune, and M. Parashar, “High Performance Threaded Data Streaming for Large Scale Simulations,” 5 th IEEE/ACM International Workshop on Grid Computing (Grid 2004) Key IDEAS: Focus on I/O and WAN for an application driven approach Buffer Data, and combine all I/O requests from all variables into 1 write call Thread the I/O Write data out on the receiving side Visualize the data near-real-time Focus on the 5% rule

Managed by UT-Battelle for the Department of Energy Adaptable I/O System Provides portable, fast, scalable, easy-to-use, metadata rich output with a simple API Change I/O method by changing XML input file Layered software architecture: Allows plug-ins for different I/O implementations Abstracts the API from the method used for I/O Open source: http://www.nccs.gov/user-support/center-projects/adios/ High Writing Performance S3D: 32 GB/s with 96K cores, 1.9MB/core: 0.6% I/O overhead with ADIOS XGC1 code  40 GB/s, SCEC code 30 GB/s GTC code  40 GB/s, GTS code: 35 GB/s

Managed by UT-Battelle for the Department of Energy Performance Optimizations for Write New technologies are usually constrained by the lack of usability in extracting performance Next generation I/O frameworks must address this concern Partitioning the task of optimizations from the actual description of the I/O Innovate to deal with high I/O variability, and increase the “average” I/O performance std dev. time

Managed by UT-Battelle for the Department of Energy Understand common read access patterns Restarts: arbitrary number of processors reading All of a few variables on a arbitrary number of processors. All of 1 variable Full Planes Sub Planes Sub Volume Conclusion: File formats that chunk the data routinely achieved better read performance than logically contiguous file formats

Managed by UT-Battelle for the Department of Energy Understand why: (Examine reading a 2D plane from a 3D dataset) Use Hilbert curve to place chunks on lustre file system with an Elastic Data Organization Theoretical concurrencyObserved Performance

Managed by UT-Battelle for the Department of Energy Innovate Step 5

Managed by UT-Battelle for the Department of Energy SOA philosophy The overarching design philosophy of our framework is based on the Service-Oriented Architecture Used to deal with system/application complexity, rapidly changing requirements, evolving target platforms, and diverse teams Applications constructed by assembling services based on a universal view of their functionality using a well-defined API Service implementations can be changed easily Integrated simulation can be assembled using these services Manage complexity while maintaining performance/scalability Complexity from the problem (complex physics) Complexity from the codes and how they are Complexity of underlying disruptive infrastructure Complexity from coordination across codes and research teams

Managed by UT-Battelle for the Department of Energy SOA Scales (Yahoo data challenges) Data diversity Rich set of processing – not just database queries (SQL), but analytics (transformation, aggregation, …) Scalable solutions Leverage file system’s high bandwidth Multiple ways to represent data Deal with reliability Make it easy to use: self-management, self-tuning Make it easy to change: adaptability They do this for $$$$$$ If Yahoo and Google can do it, so can we! 24PB/day and growing

Managed by UT-Battelle for the Department of Energy Designing the system Why staging? Decouple file system performance variations and limitations from application run time Enables optimizations based on dynamic number of writers High bandwidth data extraction from application: RDMA Scalable data movement with shared resources requires us to manage the transfers

Managed by UT-Battelle for the Department of Energy Our discovery of data scheduling for asynchronous I/O Scheduling technique Naïve scheduling (Continuous Drain), can be slower than synchronous I/O Smart scheduling can be scalable and much better than synchronous I/O

Managed by UT-Battelle for the Department of Energy Monolithic staging applications Multiple pipelines are separate staging areas Data movement is between each staging code High level of fault tolerance Staging Revolution: Phase 1 Application Staging Process 1 Staging Process 2 Storage Staging Area

Managed by UT-Battelle for the Department of Energy Approach allows for in-memory code coupling Semantically-specialized virtual shared space Constructed on-the-fly on the cloud of staging nodes Indexes data for quick access and retrieval Complements existing interaction/coordination mechanisms In-memory code coupling becomes part of the I/O pipeline

Managed by UT-Battelle for the Department of Energy Comparison of performance against MCT M=512, N=2048M=1024, N=4096

Managed by UT-Battelle for the Department of Energy Phase 2: Introducing Plugins Disruptive step Staging codes broken down into multiple plugins Staging area is launched as a framework to launch plugins Data movement occurs between all plugins Scheduler decides resource allocations Application Plugin A Plugin B Plugin E Plugin E Plugin C Plugin D Staging Area

Managed by UT-Battelle for the Department of Energy Resources are co-allocated to plugins Plugins are executed consecutively, co-located plugins for no data movement Scheduler attempts to minimize data movement Phase 3: Managing the staging area Application Plugin A Plugin B Plugin E Plugin E Plugin C Plugin D Managed Staging Area Co-located plugins

Managed by UT-Battelle for the Department of Energy Hybrid staging Plugins may be executed in the application space Scheduler decides which plugin is migrated upstream to the application Maintain resiliency of the staging architecture Application Plugin A Plugin B Plugin E Plugin E Plugin C Plugin D Managed Staging Area Phase 4: Hybrid Staging Application partition Plugins executed within the application node

Managed by UT-Battelle for the Department of Energy Runtime placement decisions Dynamic code generation Filter specialization Move work to data (I/II) Staging functions are implemented in C-on-Demand (CoD) CoD can be transparently moved from staging nodes to application nodes or vice versa

Managed by UT-Battelle for the Department of Energy Move work to data (II/II) ActiveSpaces – Dynamic binary code deployment and execution Programming support for defining custom data kernels using native programming language Operates only on data of interest Runtime system for binary code transfer and execution in the staging area

Managed by UT-Battelle for the Department of Energy Data Management (Data Reduction/in situ) ImpactObjectives  Develop a lossy compression methodology to reduce “incompressible” spatio-temporal double precision scientific data  Ensure high compression rates with ISABELA, while adhering to user- controlled accuracy bounds  Provide fast and accurate approximate query processing technique directly over ISABELA-compressed scientific data  Up to 85% reduction in storage on data from petascale simulation applications: S3D, GTS, XGC, …  Guaranteed bounds in per-point error, and overall 0.99 correlation, and 0.01 NRMSE with original data  Low storage requirement  Communication-free, minimal overhead technique easily pluggable into any simulation application ISABELA Compression Accuracy at 75% reduction in size Analysis of XGC1 Temperature Data

Managed by UT-Battelle for the Department of Energy Impact for the future Data movement is a deterrent to effective use of the data The output costs increase the runtime and force the user to reduce the output data Input costs can make up to 90% of the total running time for scientific workflows Our goal is to create an infrastructure to make it easy for you to forget the technology and focus on your science!

Managed by UT-Battelle for the Department of Energy eSiMon collaborative monitoring Critical for multiple scientists to monitor their simulation when they run at scale These scientists generally look for different pieces of the output and require different levels of simplicity in the technology Provenance used for post-processing SAVE $$’s and time

Managed by UT-Battelle for the Department of Energy From research to innovation to impact principles Understand the important problems in the field Partner with the “best” researchers in the field Form teams based on each members ability to deliver Avoid overlap as much as possible Understand the field, and the GAPS as we move forward in terms of applications involvement, technology changes, and software issues, and grow the software Further develop the software so that it can be sustainable

Managed by UT-Battelle for the Department of Energy Impact Step 6

Managed by UT-Battelle for the Department of Energy Community highlights Fusion 2006 Battelle Annual report, cover image http://www.battelle.org/annualreports/ar2006/def ault.htm 2/16/2007 Supercomputing Keys Fusion Research http://www.hpcwire.com/hpcwire/2007-02- 16/supercomputing_keys_fusion_research-1.html http://www.hpcwire.com/hpcwire/2007-02- 16/supercomputing_keys_fusion_research-1.html 7/14/2008 Researchers Conduct Breakthrough Fusion Simulation http://www.hpcwire.com/hpcwire/2008-07- 14/researchers_conduct_breakthrough_fusion_sim ulation.html http://www.hpcwire.com/hpcwire/2008-07- 14/researchers_conduct_breakthrough_fusion_sim ulation.html 8/27/2009 Fusion Gets Faster http://www.hpcwire.com/hpcwire/2009-07- 27/fusion_gets_faster.html http://www.hpcwire.com/hpcwire/2009-07- 27/fusion_gets_faster.html 4/21/2011 Jaguar Supercomputing Harnesses Heat for Fusion Energy http://www.hpcwire.com/hpcwire/2011-04- 19/jaguar_supercomputer_harnesses_heat_for_fusi on_energy.html http://www.hpcwire.com/hpcwire/2011-04- 19/jaguar_supercomputer_harnesses_heat_for_fusi on_energy.html Combustion 9/29/2009 ADIOS Ignites Combustion Simulations http://www.hpcwire.com/hpcwire/2009-10- 29/adios_ignites_combustion_simulations.html http://www.hpcwire.com/hpcwire/2009-10- 29/adios_ignites_combustion_simulations.html Computer science 2/12/2009 Breaking the bottleneck in computer data flow http://ascr- discovery.science.doe.gov/bigiron/io3.shtml http://ascr- discovery.science.doe.gov/bigiron/io3.shtml 9/12/2010 Say Hello to the NEW ADIOS http://www.hpcwire.com/hpcwire/2010-09- 12/say_hello_to_the_new_adios.html http://www.hpcwire.com/hpcwire/2010-09- 12/say_hello_to_the_new_adios.html 2/28/2011 OLCF, Partners Release eSiMon Dashboard Simulation Tool http://www.hpcwire.com/hpcwire/2011-02- 28/olcf_partners_release_esimon_dashboard_simu lation_tool.html http://www.hpcwire.com/hpcwire/2011-02- 28/olcf_partners_release_esimon_dashboard_simu lation_tool.html 7/21/2011 ORNL ADIOS Team Releases Version 1.3 of Adaptive Input/Output System http://www.hpcwire.com/hpcwire/2011-07- 21/ornl_adios_team_releases_version_1.3_of_adap tive_input_output_system.html http://www.hpcwire.com/hpcwire/2011-07- 21/ornl_adios_team_releases_version_1.3_of_adap tive_input_output_system.html

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications SOA 1. S. Klasky, et al., “In Situ data processing for extreme- scale computing”, appear SciDAC 2011. 2. C. Docan, J. Cummings, S. Klasky, M. Parashar, “Moving the Code to the Data – Dynamic Code Deployment using ActiveSpaces”, IPDPS 2011. 3. F. Zhang, C. Docan, M. Parashar, S. Klasky, “Enabling Multi-Physics Coupled Simulations within the PGAS Programming Framework”, CCgrid 2011. 4. C. Docan, S. Klasky, M. Parashar, “DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows”, HPDC’10. ACM, Chicago Ill. 5. Cummings, Klasky, et al., “EFFIS: and End-to-end Framework for Fusion Integrated Simulation”, PDP 2010, http://www.pdp2010.org/. 6. C. Docan, J. Cummings, S. Klasky, M. Parashar, N. Podhorszki, F. Zhang, “Experiments with Memory-to- Memory Coupling for End-to-End fusion Simulation Workflows”, ccGrid2010, IEEE Computer Society Press 2010. 7. N. Podhorszki, S. Klasky, et al.: Plasma fusion code coupling using scalable I/O services and scientific workflows. SC-WORKS 2009 8. C. Docan, M. Parashar and S. Klasky. “DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows“. Journal of Cluster Computing, 2011 Data Formats 8. Y. Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Q. Liu, Y. Wang, W. Yu, “EDO: Improving Read Performance for Scientific Applications Through Elastic Data Organization”, to appear Cluster 2011. 9. Y. Tian, “SRC: Enabling Petascale Data Analysis for Scientific Applications Through Data Reorganization”, ICS 2011, “First Place: Student Research Competition” 10. M. Polte, J. Lofstead, J. Bent, G. Gibson, S. Klasky, "...And Eat it Too: High Read Performance in Write- Optimized HPC I/O Middleware File Formats," in In Proceedings of Petascale Data Storage Workshop 2009 at Supercomputing 2009, ed, 2009. 11. S. Klasky, et al., “Adaptive IO System (ADIOS)”, Cray User Group Meeting 2008.

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications SOA 1. S. Klasky, et al., “In Situ data processing for extreme- scale computing”, appear SciDAC 2011. 2. C. Docan, J. Cummings, S. Klasky, M. Parashar, “Moving the Code to the Data – Dynamic Code Deployment using ActiveSpaces”, IPDPS 2011. 3. F. Zhang, C. Docan, M. Parashar, S. Klasky, “Enabling Multi-Physics Coupled Simulations within the PGAS Programming Framework”, CCgrid 2011. 4. C. Docan, S. Klasky, M. Parashar, “DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows”, HPDC’10. ACM, Chicago Ill. 5. Cummings, Klasky, et al., “EFFIS: and End-to-end Framework for Fusion Integrated Simulation”, PDP 2010, http://www.pdp2010.org/. 6. C. Docan, J. Cummings, S. Klasky, M. Parashar, N. Podhorszki, F. Zhang, “Experiments with Memory-to- Memory Coupling for End-to-End fusion Simulation Workflows”, ccGrid2010, IEEE Computer Society Press 2010. 7. N. Podhorszki, S. Klasky, et al.: Plasma fusion code coupling using scalable I/O services and scientific workflows. SC-WORKS 2009 Data Formats 8. Y. Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Q. Liu, Y. Wang, W. Yu, “EDO: Improving Read Performance for Scientific Applications Through Elastic Data Organization”, to appear Cluster 2011. 9. Y. Tian, “SRC: Enabling Petascale Data Analysis for Scientific Applications Through Data Reorganization”, ICS 2011, “First Place: Student Research Competition” 10. M. Polte, J. Lofstead, J. Bent, G. Gibson, S. Klasky, "...And Eat it Too: High Read Performance in Write- Optimized HPC I/O Middleware File Formats," in In Proceedings of Petascale Data Storage Workshop 2009 at Supercomputing 2009, ed, 2009. 11. S. Klasky, et al., “Adaptive IO System (ADIOS)”, Cray User Group Meeting 2008. Data Staging 12. H. Abbasi, G. Eisenhauer, S. Klasky, K. Schwan, M. Wolf. “Just In Time: Adding Value to IO Pipelines of High Performance Applications with JITStaging, HPDC 2011. 13.H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, F. Zheng, “DataStager: scalable data staging services for petascale applications”, Cluster Computer, Springer 1386-7857, pp. 277-290, Vol 13, Issue 3, 2010. 14.C. Docan, M. Parashar, S. Klasky: Enabling high-speed asynchronous data extraction and transfer using DART. Concurrency and Computation: Practice and Experience 22(9): 1181-1204 (2010) 15. H. Abbasi, Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K.,, Zheng, F. 2009. DataStager: scalable data staging services for petascale applications. In Proceedings of the 18th ACM international Symposium on High Performance Distributed Computing (Garching, Germany, June 11 - 13, 2009). HPDC '09. ACM, New York, NY, 39-48. 16. H. Abbasi, J. Lofstead, F. Zheng, S. Klasky, K. Schwan, M. Wolf, “Extending I/O through High Performance Data Services”, Cluster Computing 2009, New Orleans, LA, August 2009. 17.C. Docan, M. Parashar, S. Klasky, “Enabling High Speed Asynchronous Data Extraction and Transfer Using DART,” Proceedings of the 17 th International Symposium on High-Performance Distributed Computing (HPDC), Boston, MA, USA, IEEE Computer Society Press, June 2008. 18.H. Abbasi, M. Wolf, K. Schwan, S. Klasky, “Managed Streams: Scalable I/O”, HPDC 2008.

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications Usability of Optimizations 19. J. Lofstead, F. Zheng, Q. Liu, S. Klasky, R. Oldfield, T. Kordenbrock, Karsten Schwan, Matthew Wolf. "Managing Variability in the IO Performance of Petascale Storage Systems". In Proceedings of SC 10. New Orleans, LA. November 2010. 20. J. Lofstead, M. Polte, G. Gibson, S. Klasky, K. Schwan, R. Oldfield, M. Wolf, “Six Degrees of Scientific Data: Reading Patterns for extreme scale science IO”, HPDC 2011. 21. Y. Xiao, I. Holod, W. L. Zhang, S. Klasky, Z. H. Lin, “Fluctuation characteristics and transport properties of collisionless trapped electron mode turbulence”, Physics of Plasmas, 17, 2010. 22. J. Lofstead, F. Zheng, S. Klasky, K. Schwan, “Adaptable Metadata Rich IO Methods for Portable High Performance IO”, IPDPS 2009, IEEE Computer Society Press 2009. 23. Lofstead, Zheng, Klasky, Schwan, “Input/Output APIs and Data Organization for High Performance Scientific Computing”, PDSW SC2008. 24. J. Lofstead, S. Klasky, K. Schwan, N. Podhorszki, C. Jin, “Flexible IO and Integration for Scientific Codes Through the adaptable IO System”, Challenges of Large Applications in Distributed Environments (CLADE), June 2008.

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications Data Management 25. J. Kim, H. Abbasi, C. Docan, S. Klasky, Q. Liu, N. Podhorszki, A. Shoshani, K. Wu, “Parallel In Situ Indexing for Data- Intensive Computing”, accepted 2011 LDAV 26. T. Critchlow, et al., “Working with Workflows: Highlights from 5 years Building Scientific Workflows”, SciDAC 2011. 27. S. Lakshminarasimhan, N. Shah, Stephane Ethier, Scott Klasky, Rob Latham, Rob Ross, Nagiza F. Samatova, “Compressing the Incompressible with ISABELA: In Situ Reduction of Spatio-Temporal Data”, Europar 2011. 28. S. Lakshminarasimhan, J. Jenkins, I. Arkatkar, Z. Gong, H. Kolla, S. H. Ku, S. Ethier, J. Chen, C.S. Chang, S. Klasky, R. Latham, R. Ross,, N. F. Samatova, “ISABELA-QA: Query- driven Analytics with ISABELA-compressed Extreme-Scale Scientific Data”, to appear SC 2011. 29. K. Wu, R. Sinha, C. Jones, S. Ethier, S. Klasky, K. L. Ma, A. Shoshani, M. Winslett, “Finding Regions of Interest on Toroidal Meshes”, to appear in Journal of Computational Science and Discovery, 2011 30. J. Logan, S. Klasky, H. Abbasi, et al., “Skel: generative software for producing I/O skeletal applications”, submitted to Workshop on D 3 Science, 2011. 31. E. Schendel, Y. Jin, N. Shah, J. Chen, C.S. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, N. Samatova, “ISOBAR Preconditioner for Effective and High- throughput Lossless Data Compression”, submitted to ICDE 2012. 32. I. Arkatkar, J. Jenkins, S. Lakshminarasimhan, N. Shah, E. Schendel, S. Ethier, CS Chang, J. Chen, H. Kolla, S. Klasky, R. Ross, N. Samatova, “ALACRI2TY: Analytics- driven lossless data compression for rapid in-situ indexing, storing, and querying”, submitted to ICDE 2012. 33. Z. Gong, S. Lakshminarasimhan, J. Jenkins, H. Kolla, S. Ethier, J. Chen, R. Ross, S. Klasky, N. Samatova, “Multi- level layout optimizations for efficient spatio-temporal queries of ISABELA-compressed data”, submitted to ICDE 2012. 34. Y. Jin, S. Lakshminarasimhan, N. Shah, Z. Gong, C Chang, J. Chen, S. Ethier, H. Kolla, S. Ku, S. Klasky, R. Latham, R. Ross, K. Schuchardt, N. Samatova, “S-preconditioner for Multi- fold Data Reduction with Guaranteed user-controlled accuracy”, ICDM 2011.

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications Simulation Monitoring 35.R. Tchoua, S. Klasky, N. Podhorszki, B. Grimm, A. Khan, E. Santos, C. Silva, P. Mouallem, M. Vouk, "Collaborative Monitoring and Analysis for Simulation Scientist", in Proceedings of CTS 2010. 36.P. Mouallem, R. Barreto, S. Klasky, N. Podhorszki, M. Vouk. 2009. Tracking Files in the Kepler Provenance Framework. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management (SSDBM 2009), Marianne Winslett (Ed.). Springer-Verla, 273-282. 37.E. Santos, J. Tierny, A. Khan, B. Grimm, L. Lins, J. Freire, V. Pascucci, C. Silva, S. Klasky, R. Barreto, N. Podhorszki, “Enabling Advanced Visualization Tools in a Web-Based Simulation Monitoring System”, escience 2009. 38.R. Barreto, S. Klasky, N. Podhorszki, P. Mouallem, M. Vouk: “Collaboration Portal for Petascale Simulations”, International Symposium on Collaborative Technologies and Systems, pp. 384-393, Baltimore, Maryland, May 2009. 39.R. Barreto, S. Klasky, C. Jin, N. Podhorszki, “Framework for Integrating End to End SDM Technologies and Applications (FIESTA).”, Chi 09 workshop, The Changing Face of Digital Science, 2009. 40.Klasky, et al., “Collaborative Visualization Spaces for Petascale Simulations”, to appear in the 2008 International Symposium on Collaborative Technologies and Systems (CTS 2008).

Managed by UT-Battelle for the Department of Energy 2008 – 2011 I/O Pipeline Publications In situ processing 41. A. Shoshani, et al., “The Scientific Data Management Center: Available Technologies and Highlights”, SciDAC 2011. 42. A. Shoshani, S. Klasky, R. Ross, “Scientific Data Management: Challenges and Approaches in the Extreme Scale Era”, SciDAC 2010. 43. F. Zheng, H. Abbasi, C. Docan, J. Lofstead, Q. Liu, S. Klasky, M. Parashar, N. Podhorszki, K. Schwan, M. Wolf, “PreDatA - Preparatory Data Analytics on Peta- Scale Machines”, IPDPS 2010, IEEE Computer Society Press 2010. 44. S. Klasky, et al., “High Throughput Data Movement,”, “Scientific Data Management: Challenges, Existing Technologies, and Deployment,” Editors: A. Shoshani and D. Rotem, Chapman and Hall, 2009 45. G. Eisenhauer, M. Wolf, H. Abbasi, S. Klasky, K. Schwan, “ A Type System for High Performance Communication and Computation”, submitted to Workshop on D 3 Science, 2011. Misc. 46. C. S. Chang, S. Ku, et al., “Whole-volume integrated gyrokinetic simulation of plasma turbulence in realistic diverted-tokamak geometry”, SciDAC 2009. 47. C. S. Chang, S. Klasky, et al., “Towards a first- principles integrated simulation of tokamak edge plasmas”, SciDAC 2008. 48. Z Lin, Y. Xiao, W. Deng, I. Holod, C. Kamath, S. Klasky, Z. Wang, H. Zhang, “Size scaling and nondiffusive features of electron heat transport in multi-scale turbulence”, IAEA 2010. 49. Z. Lin, Y. Xiao, I. Holod, W. Zhang, W. Deng, S. Klasky, J. Lofstead, C. Kamath, N. Wichmann, “Advanced simulation of electron heat transport in fusion plasmas”, SciDAC 2009.

Managed by UT-Battelle for the Department of Energy Future Step 7

Managed by UT-Battelle for the Department of Energy Integrating new technologies New collaborations blended with current collaborators to identify the gaps, and who can deliver solutions New techniques for I/O pipeline optimizations on HPC Utilizing new memory technologies more effectively Hybrid Staging area PGAS-like programming models for in situ processing Integrating with clouds Extend ADIOS to stream ‘analysis’ data from staging area to ‘cloud’ resources Allow researchers to build components in the open framework Explore cloud file systems to match programming models used for HPC analytics Allow for extra levels of fault tolerance Explore different programming models for HPC + Clouds

Managed by UT-Battelle for the Department of Energy Possible integration of ADIOS + Twister ADIOS componentizes data movement for I/O pipelines Replace communication mechanism with ADIOS APIs What changes in ADIOS will be necessary to support Twisters requirements? Allows for RDMA, socket, and file data movement How well will this work with a cloud file system? Disk/Strea m Map Disk/Strea m Reduce ADIOS: methods for low latency go to RDMA/memory references, high latency/persistence file system iterations in memory Map Reduce Twister (Pub/Sub/Collectives) service Cloud/HPC file system

Managed by UT-Battelle for the Department of Energy Conclusions and Future Direction Strong leadership and a complete understanding of the domain allows a community to lead from innovation ADIOS, developed by many institutions, is a proven DOE framework which blends together Self-describing file formats for high levels of I/O performance Staging methods with visualization and analysis “plugins” Advanced data management plugins, for data reduction, multi- resolution data reduction, and advanced workflow scheduling Multiplexing using a novel SOA approach Added levels of resiliency for next generation knowledge discover in HPC Future work should build on these successes to integrate HPC + cloud technology Our goal is to deliver breakthrough science and get the job done no matter what it takes

Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana.

Similar presentations

Presentation on theme: "Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana.

Similar presentations

Presentation on theme: "Managed by UT-Battelle for the Department of Energy From Innovation to Impact: Research and Development in Data Intensive High Performance Computing Indiana."— Presentation transcript:

Similar presentations

About project

Feedback