Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCAR Science Day April 17, 2015 Shawn Strande High-end Services Section.

Similar presentations


Presentation on theme: "NCAR Science Day April 17, 2015 Shawn Strande High-end Services Section."— Presentation transcript:

1 NCAR Science Day April 17, 2015 Shawn Strande High-end Services Section

2 Procurement Context 2

3 Yellowstone Supports a Huge Amount of Science* 10.17M user jobs (6.41M Yellowstone; 3.76 Geyser & Caldera) 1.28B core-hours consumed 99.16% availability 9.9 PB, 608M files and 26M directories on GLADE 43.7PB & 195M files in the archive * Data from 12/20/12 -3/31/15

4 Yellowstone is just hitting stride, why not just keep running it? We will! Yellowstone will overlap with NWSC-2 to reduce technical risk – running through 2017 under previously negotiated, extended maintenance agreement. But it’s risky to run Yellowstone much beyond the end of 2017. New system will deliver higher performance, better power efficiency and prepare us for some of the major changes we expect in the HPC landscape beyond NWSC-2. 4

5 Yellowstone is an order of magnitude larger than its predecessor, and the most complex HPC system ever deployed at NCAR. HPC technology is undergoing unprecedented shifts (consolidation of vendors, processor frequencies have maxed out, emergence of accelerators, new storage architectures) NWSC-2 will be a “bridging” procurement. Need to simultaneously address production computing and future architecture changes NCAR’s community applications are in need of refactoring to leverage current and future architectures NWSC-2 Context

6 Procurement Approach 6

7 Yellowstone’s Data-Centric Architecture Data Transfer Gateway Science Gateways Data Analysis Visualization Data Analysis Visualization Supercomputers scratch Project Space Project Space Data Collect- ions Data Collect- ions Central Filesystem HPSS People Other Centers Long term tape storage

8 Procurement Approach Procure an HPC system and integrated high-performance storage GLADE file system will be refreshed, and cross mounted to NWSC-2 to support complex workflows Significant background research in understanding emerging technologies and open dialogue with peer HPC centers Design is driven by balancing Yellowstone usage metrics and future science requirements Begin optimizing applications for new architectures now via Intel Parallel Computing Center, SPOC, and other research initiatives and partnerships Issue a Draft Technical Specification (RFI) to seek input on approach Extend Yellowstone through 2017 Maintain funding flexibility for future options 8

9 Procurement timeline 9 July 2014: Develop and share with NSF, NCAR Next-Generation HPC Procurement Strategy October 20, 2014: Draft Technical Specification (DTS) and Yellowstone workload study released December 19, 2014: Feedback on DTS due February, 2015: Benchmarking files and instructions released April 3, 2015: Formal Request for Proposals released  May 19, 2015: Proposals due (45 days) 2H-2015: Vendor selection, negotiation, NSF approval 1H-2016: Complete NWSC Module A preparations 2H-2016: Equipment delivery, acceptance, friendly-user January 1, 2017: Production December 31, 2017: Decommission Yellowstone

10 NWSC-2 is a “data-driven” procurement. Metrics tell us how Yellowstone is being used 10 Shout out to Sophie Yang, our SIParCS intern from Rutgers who helped in developing custom queries for Yellowstone job statistics!

11 Science Requirements Advisory Panel (SRAP) 11 Headed by Rich Loft, the SRAP is a series of NCAR and Wyoming stakeholder meetings to get insights into future sciences drivers for NWSC-2 Reviewed current Yellowstone usage metrics, with the goal of extrapolating to 2017 Discussed future science plans and translated these into system requirements. Outcomes were integrated into the final design specification. Related work on Yellowstone Workflow Working Group conducted a series of interviews to better understand issues of data workflow, storage, and DAV usage

12 NWSC-2 Highlights A highly efficient, usable, and reliable HPC resource that achieves a minimum of 2x the performance of Yellowstone based on workload representative of NCAR applications. The majority of the system will be based on traditional processors. Deploy a high performance storage and parallel file system that achieves 400 GB/s and a capacity of 20 PB Integrate with NCAR’s existing high-performance, GPFS-based storage system. Current GLADE persists. Enable the transition to future architectures by providing a small fraction of the system based on many-core processors – if/when they are market- ready. Technical options: many-core nodes; GPGPU nodes; DAV nodes; storage expansion; innovative storage and memory technologies; future expansions of HPC system and storage; maintenance Deploy the system in 2H-2016; production availability of January 1, 2017 Estimated funding $33M available through a “Best Value” procurement Deploy the systems in Module A at NWSC 12

13 HPC Futures Lab @ ML gives us hands on experience with next generation hardware and software 13 Next generation Xeon x86 processors Many core processors GPGPU/GPUs Various MPI implementations Compilers, debuggers, and tools Interconnect technologies Storage and file systems Schedulers Systems and cluster management tools OS distributions

14 14 Strategic Parallel Optimization for Computing (SPOC) Application performance improvements achieved through optimization are likely to surpass in importance those obtained from mere architectural improvements. Focused on existing Xeon processors This will increase the amount of productive science that can be performed on Yellowstone today Improvements made now will accrue to future systems. Intel Parallel Computing Center for Weather and Climate Simulation IPCC-WACS is an Intel funded effort focused on future technologies involving computer scientists, graduate students, Intel, and international partners. Its aim is the development of tools and knowledge to help with the development and performance improvement of CESM, WRF, and MPAS on Intel Xeon and Xeon Phi architectures. A partnership of NCAR, University of Colorado, Boulder and IIS in Bangalore, India. Preparing for Future Architectures

15 A peek at HPC systems in 2016 15 The Sandy Bridge processor (what’s in Yellowstone) will be 3 - 4 generations old. New processors will be higher performance (up to 4x theoretical), consume less power, and offer other architectural features that will help our codes and improve overall systems reliability and usability. Moore’s Law is still giving us more transistors/cores on a processor, but the free-ride on performance has come to an end. Future improvements to application performance will need to come from improvements in how these cores are used. Interconnect technology will be available that is 2x faster with lower latencies than FDR (what’s in Yellowstone) DDR4 memory will replace DDR3. DDR4 offers higher performance, higher capacities, and lower power consumption. We expect to see self-hosted many core processors in systems

16 More information 16 https://www2.cisl.ucar.edu/NWSC-2 –Letter to Interested Parties –Draft Technical Specification –Yellowstone Workload Study –NWSC Science Justification –Formal RFP documents –Benchmarking files and instructions –Various other supporting documents

17 Thanks to the many folks throughout NCAR who are contributing to the procurement! 17

18 Nitty-gritty details 18

19 Base System Performance 19 AttributeTarget Minimum sustained HPC system performance increase over Yellowstone based on Yellowstone Sustained Equivalent Performance (YSEP) 2 Minimum memory on an HPC system compute node. 64 GB Aggregate external bandwidth on/off the HPC system for accessing NCAR’s GLADE file system 100 GB/s Minimum usable disk capacity of the PFS system 20 PB Minimum bandwidth of the PFS system 400 GB/s Aggregate external bandwidth on/off the PFS system for general TCP/IP connectivity, including NCAR HPSS archive and other data services 25 GB/s Aggregate external bandwidth on/off the PFS system for access by other NWSC client systems. This could include, for example, the existing Caldera, Geyser, and/or Yellowstone systems 100 GB/s

20 NWSC-2 Benchmarking Applications 20 BenchmarkDescriptionUsed in RFP Used for Acceptance HOMME/HOMME_COMM kernelHOMME and its communication KernelXX HPCGHigh Performance Conjugate Gradient Solver X LESLarge Eddy Simulation kernelXX MG2Morrison/Gettleman atmosphere microphysics kernel X MPAS-AModel for Prediction Across ScalesXX POPperfOcean Circulation ModelX WRFWeather Research and Forecasting modelXX Community Earth System ModelOcean circulation model-X OSU MPI BenchmarkInterconnect performanceXX STREAMMemory bandwidthXX SHOC (GPU)Scalable HeterOgeneous Computing benchmark XX IORI/O latency and bandwidthXX MDTESTMetadata performanceXX pyReshaperApplication I/O kernelXX

21 Technical Options give us the flexibility to consider additions to base system if it is in our best interest 21 Technical Options §4.1 Many-core partition §4.2 General Purpose GPU §4.3 Data Analysis, Visualization and Post Processing §4.4 Innovative Storage and Memory Technologies §4.5 Early Access System §4.6 Software Tools and Programming Environment §4.7 Upgrades and Expansions to the NWSC-2 PFS System §4.8 Maintenance, Support, and Technical Services §4.9 Upgrades and Expansions to the NWSC-2 HPC System §4.10 AMPS System (Erebus, via augmented funding)


Download ppt "NCAR Science Day April 17, 2015 Shawn Strande High-end Services Section."

Similar presentations


Ads by Google