Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program.

Similar presentations


Presentation on theme: "HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program."— Presentation transcript:

1 HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program

2 Outline Context – HPCMP Initial Motivation from 2003 Process Review Results

3 DoD HPC Modernization Program

4

5 HPCMP Serves a Large, Diverse DoD User Community 519 projects and 4,086 users at approximately 130 sites Requirements categorized in 10 Computational Technology Areas (CTA) FY08 non-real-time requirements of 1,108 Habu-equivalents 156 users are self characterized as “Other” Computational Structural Mechanics – 437 Users Electronics, Networking, and Systems/C4I – 114 Users Computational Chemistry, Biology & Materials Science – 408 Users Computational Electromagnetics & Acoustics – 337 Users Computational Fluid Dynamics – 1,572 Users Environmental Quality Modeling & Simulation – 147 Users Signal/Image Processing – 353 Users Integrated Modeling & Test Environments – 139 Users Climate/Weather/Ocean Modeling & Simulation – 241 Users Forces Modeling & Simulation – 182 Users

6 Benchmarks Have REAL Impact In 2003 we started to describe our benchmarking approach Today benchmarks are even more important

7 2003 Benchmark Focus Focused on application benchmarks Recognized application benchmarks were not enough

8 2003 Challenge – Move to Synthetic Benchmarks 5 years later we have made progress, but not enough to fully transition to synthetics Supported over $300M in purchases so far

9 Comparison of HPCMP System Capabilities – FY 2003 - FY 2008 Habu-equivalents per Processor

10 What Has Changed Since 2003 (TI-08) Introduction of performance modeling and predictions –Primary emphases still on application benchmarks –Performance modeling now used to predict some application performance –Performance predictions and measured benchmark results compared for HPCMP systems used in TI-08 to assess accuracy (TI-08) Met one on one with vendors to review performance predictions for each vendor’s individual systems

11 Overview of TI-XX Acquisition Process Determine requirements, usage, and allocations Choose application benchmarks, test cases, and weights Vendors provide measured and projected times on offered systems Measure benchmark times on DoD standard system Measure benchmark times on existing DoD systems Determine performance for each offered system per application test case Determine performance for each existing system per application test case Determine performance for each offered system Usability/past performance information on offered systems Collective acquisition decision Use optimizer to determine price/performance for each offered system and combination of systems Center facility requirements Vendor pricing Life-cycle costs for offered systems

12 TI-09 Application Benchmarks AMR – Gas dynamics code –(C++/FORTRAN, MPI, 40,000 SLOC) AVUS (Cobalt-60) – Turbulent flow CFD code –(Fortran, MPI, 19,000 SLOC) CTH – Shock physics code –(~43% Fortran/~57% C, MPI, 436,000 SLOC) GAMESS – Quantum chemistry code –(Fortran, MPI, 330,000 SLOC) HYCOM – Ocean circulation modeling code –(Fortran, MPI, 31,000 SLOC) ICEPIC – Particle-in-cell magnetohydrodynamics code –(C, MPI, 60,000 SLOC) LAMMPS – Molecular dynamics code –(C++, MPI, 45,400 SLOC) Red = predicted Black = benchmarked

13 Predicting Code Performance for TI-08 and TI-09 *The next 12 charts were provided by the Performance Modeling and Characterization Group at the San Diego Supercomputer Center.

14 Prediction Framework – Processor and Communications Models

15 Memory Subsystem Is Key in Predicting Performance

16 Red Shift – Memory Subsystem Bottleneck

17 Predicted Compute Time Per Core – HYCOM

18 MultiMAPS System Profile One curve per stride pattern –Plateaus correspond to data fitting in cache –Drops correspond to data split between cache levels MultiMAPS ported to C and will be included in HPC Challenge Benchmarks Working Set Size (8 Byte Words) Memory Bandwidth (MB/s) Sample MultiMAPS Output

19 L2 cache being shared 4 Core Woodcrest Node Modeling the Effects of Multicore

20 Performance Sensitivity of LAMMPS LRG to 2x Improvements

21 Performance Sensitivity of OVERFLOW2 STD to 2x Improvements

22 Performance Sensitivity of OVERFLOW2 LRG to 2x Improvements

23 Main Memory and L1 Cache Have Most Effect on Runtime

24 Differences Between Predicted and Measured Benchmark Times (Unsigned ) Application Test Case System AMR Std AMR Lg ICEPIC Std ICEPIC Lg LAMMPS Lg OVERFLOW2 Std OVERFLOW2 Lg WRF Std WRF Lg Overall ASC HP Opteron Cluster 16.6%6.3%- -2.9%8.0%43.0%- - 15.4% ASC SGI Altix14.1%3.4%22.1%15.6%7.5%4.1%10.0%24.3%16.5% 13.1% MHPCC Dell Xeon Cluster 20.7%14.7%6.7%4.2%8.1%23.3%--- 13.0% NAVO IBM P5+11.7%---9.6%3.0%1.8%7.8%16.4% 8.4% Overall 12.4% Note: Average uncertainties of measured benchmark times on loaded HPCMP systems are approximately 5%.

25 Solving the hard problems... 11/12/2015 25

26 Solving the hard problems... 11/12/2015 26

27 Solving the hard problems... 11/12/2015 27

28

29

30 Solving the hard problems... 11/12/2015 30

31 Solving the hard problems... 11/12/2015 31

32 Solving the hard problems... 11/12/2015 32

33

34 What’s Next? More focus on Signature Analysis Continue to evolve application benchmarks to represent accurately the HPCMP computational workload Increase profiling and performance modeling to understand application performance better Use performance predictions to supplement application benchmark measurements and to guide vendors in designing more efficient systems


Download ppt "HPCMP Benchmarking Update Cray Henry April 2008 Department of Defense High Performance Computing Modernization Program."

Similar presentations


Ads by Google