Presentation is loading. Please wait.

Presentation is loading. Please wait.

B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.

Similar presentations


Presentation on theme: "B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring."— Presentation transcript:

1 B5: Exascale Hardware

2 Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring Exaflops-years –Streaming/Realtime –I/O intensive (e.g., analysis, data mining) Not considering capacity

3 Exaflops are Possible Extrapolation of Top500 suggests that 1EF in 2019 DOE (through ASCI and LCF) has contributed to staying on this trajectory –May require investment to stay on this trajectory –History shows Federal investment accelerated top systems –May not get usable FLOPS (non LINPACK) without investment

4 Components of an Exascale System Its not just FLOPS. Need –Processors –Interconnect –Memory –I/O (persistent storage) –Connection to the outside world –Balance of these Constraints Include –Power –Cooling –Reliability –Adoption by applications, particularly legacy, and including familiar development environment –Cost :)

5 Example Commodity Design

6 Notes on Commodity Design Based on Jeff Vetter’s extrapolation of current technology –Details in ORNL presentation Does not preserve the performance ratios (e.g., bytes/flop interconnect bandwidth) commonly expected –This is not new; e.g., PC memory/disk size ratios have changed significantly Most (all?) Exascale system designs will mandate some changes in those ratios –R&D can either reduce the change in the ratio or reduce the impact of the change (e.g., new algorithms) –E.g., more specialized systems may provide better cost/perf for specific application classes

7 Issues (concerns) There are possible hazards: –Interconnect performance Latency, bandwidth –I/O Density, bandwidth, fault management –Memory Cost, power (and latency and bandwidth) –Power 4M PS3 is 1EF but use 1GW –Latency/bandwidth/faults/concurrency –Software and algorithms Workaround/with latency/bandwidth/faults/concurrency Non issue - getting the peak FLOPS All of these can (must) benefit from research and development investment

8 Alternate Directions Commodity –GPGPU and STI Cell offer very high compute density wrt commodity CPU –Ex. 4M PS3 = 1EF (single precision) –But Not all algorithms can effectively use these systems Programming complexity (currently) much greater –Embedded processors (better FLOPS/Watt) New Architectures –PIM, FPGA-centric, … Not in this time frame –Quantum, molecular, DNA, …

9 Suggestions Need multiple architectures (no one right answer) Approaches –Integrated solution (e.g., BG) –Component solution (e.g., Cray) –Not general purpose (e.g., GPGPU, FPGA, GRAPE)

10 Promising Tech Tech that can improve balance (ratios) in system; cost, reliability, etc. Optimizing the use of die space for CPU (manycore, multicore, stream, vector, heterogeneous, variable precision arithmetic, etc.) Optical network (faster signaling, cheaper/denser connectors) Optical into/out of the processor 3-D chips, integrated memory/processor Faster development of customized processors Hardware accelerated system verification (e.g., RAMP) NAND Flash, MRAM, and other non-volatile memory (disk replacements) Myriad approaches to power efficiency

11 Cross Cutting Issues Better characterization of algorithm requirements wrt system ratios New algorithms to match system ratios –Disk I/O/main memory –Interconnect bandwidth/flops –Etc New algorithms/software to detect and handle faults New approaches to algorithms/software for specialized/disruptive processor architectures –E.g., good ways to move apps to GPGPUs, PIMs, or FPGAs Need to accelerate applications and algorithms (esp. new ones) to PF now to prepare for EF Programming Language and Environments –PGAS, Domain-specific, auto-tuner, hierarchical programming models (built on current models) –Interaction with hardware (e.g., user-managed caches, remote atomic updates, etc.) –Performance modeling and debugging –Productivity etc. –System software, OS (e.g., memory management)

12 Sample Plan Components Point studies for future –Like the Petaflops point designs, with more application/algorithm designer involvement and include OS. Evaluate time/cost to get apps running on system. Ongoing process; contrast with baseline Early simulation and modeling of systems, algorithms, and applications (see open source below) incl hardware (e.g., RAMP), particularly wrt promising technologies Evaluate special purpose architectures and non-MPI programming models for application/algorithm classes (cheaper, faster, better) Partnerships for disruptive technologies –Need to understand timeline and costs –Goal is to accelerate; not required for Exaflops Directed vendor partnerships –QCDOC is a good example Support application involvement from the beginning –WRT point designs, with performance understanding –Must encourage new apps to increase community size Some Principles –Open source –Support multiple prototypes (at suitable scale) –Establish a framework to move from point studies to full systems through multiple stages


Download ppt "B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring."

Similar presentations


Ads by Google