Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.

Similar presentations


Presentation on theme: "Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7."— Presentation transcript:

1 Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7

2 Look into the Future ORNL Heterogeneous Distributed Computing Research Reply Hazy Try Again Petascale systems Federated Tera-clusters HPC Linux Adaptable software Fault Tolerance High performance I/O Eight Ball

3 Scalable Systems Software for Terascale Centers ORNL Heterogeneous Distributed Computing Research Part of the DOE SciDAC effort Resource Management Accounting & user mgmt System Build & Configure Job management System Monitoring Collectively (with labs, NSF centers, and industry) define standard interfaces between systems components for interoperability Create scalable, standardized management tools for efficiently running our large computing centers www.scidac.org/ScalableSystems IBM Cray Intel Unlimited Scale ORNL ANL LBNL PNNL NCSA PSC SDSC SNL LANL Ames Goal

4 Grid Interfaces Accounting Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler Node State Manager Allocation Management Process Manager Usage Reports Meta Services System & Job Monitor Job Queue Manager Node Configuration & Build Manager Standard XML interfaces Working Components and Interfaces (bold) authentication communication Components written in any mixture of C, C++, Java, Perl, and Python Checkpoint / Restart Progress so far on Integrated Suite Validation & Testing Hardware Infrastructure Manager Important!

5 ORNL Heterogeneous Distributed Computing Research Scalable High Performance OS Single System Img Adaptive O/S Asymmetric Kernels A scalable file system Underneath it all What will it be? Linux Lightweight kernel (like Red, BG/L) Scyld approach Other? Rogue OS and/or daemons cited as problem by existing computer centers Fast-OS effort

6 ORNL Heterogeneous Distributed Computing Research Need a Fault Tolerance Overhaul Scale up and Fall Down Fault Tolerance serious issue when scaling to 100 TF and beyond RAS critical Checkpointing eventually becomes ineffective Time Scale MTBF Ckpt restart Needs: Adaptive runtime MPI Fault Tolerance New FT paradigms

7 ORNL Heterogeneous Distributed Computing Research General Purpose vs Simple and Custom Hardware: Customized clusters for each group Centralized general purpose machine Internet in a box Or “out of the box” Software: Minimum OS w/ High performance but limited app support Full OS Tuned to hardware adapt on the fly Autonomic algorithms Petascale Paths

8 Big Science ORNL Heterogeneous Distributed Computing Research Science will ultimately be driven by computation, simulation and modeling. Science drivers are key to success in HPC and visa versa The final word - don’t lose track of why we justify petascale systems


Download ppt "Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7."

Similar presentations


Ads by Google