Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presentation Outline A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance.

Similar presentations


Presentation on theme: "Presentation Outline A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance."— Presentation transcript:

1 Presentation Outline A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance metrics Where do we go from here?

2 HPC Modernization Program

3 HPC Modernization Program Goals

4 DoD HPC Modernization Program

5 HPCMP Serves a Large, Diverse DoD User Community 519 projects and 4,086 users at approximately 130 sites Requirements categorized in 10 Computational Technology Areas (CTA) FY08 non-real-time requirements of 1,108 Habu-equivalents 156 users are self characterized as Other Computational Structural Mechanics – 437 Users Electronics, Networking, and Systems/C4I – 114 Users Computational Chemistry, Biology & Materials Science – 408 Users Computational Electromagnetics & Acoustics – 337 Users Computational Fluid Dynamics – 1,572 Users Environmental Quality Modeling & Simulation – 147 Users Signal/Image Processing – 353 Users Integrated Modeling & Test Environments – 139 Users Climate/Weather/Ocean Modeling & Simulation – 241 Users Forces Modeling & Simulation – 182 Users

6 High Performance Computing Centers 4 Allocated Distributed Centers Strategic Consolidation of Resources 4 Major Shared Resource Centers

7 2007 Total HPCMP End-of-Year Computational Capabilities 1993 HPCMP Center Resources Legend MSRCs ADCs (DCs) Note: Computational capability reflects available GFLOPS during fiscal year

8 FY03 FY04 FY05 FY06 FY07 FY03 FY04 FY05 FY06 FY07 As of: August 2007 HPC Modernization Program (MSRCs)

9 FY03 FY04 FY05 FY06 FY03 FY04 FY05 FY06 As of: August 2007 HPC Modernization Program (ADCs)

10 Overview of TI-XX Acquisition Process Determination of Requirements, Usage, and Allocations Choose application benchmarks, test cases, and weights Vendors provide measured and projected times on offered systems Measure benchmark times on DoD standard system Measure benchmark times on existing DoD systems Determine performance for each offered system on each application test case Determine performance for each existing system on each application test case Determine performance for each offered system Usability/past performance information on offered systems Collective Acquisition Decision Use optimizer to determine price/performance for each offered system and combination of systems Center facility requirements Vendor pricing Life-cycle costs for offered systems

11 TI-08 Synthetic Test Suite CPUBench – Floating point execution rate ICBench – Interconnect bandwidth and latency LANBench – External network interface and connection bandwidth MEMBench – Memory bandwidth (MultiMAPS) OSBench – Operating system noise (PSNAP from LANL) SPIOBench – Streaming parallel I/O bandwidth

12 TI-08 Application Benchmark Codes AMR – Gas dynamics code –(C++/FORTRAN, MPI, 40,000 SLOC) AVUS (Cobalt-60) – Turbulent flow CFD code –(Fortran, MPI, 19,000 SLOC) CTH – Shock physics code –(~43% Fortran/~57% C, MPI, 436,000 SLOC) GAMESS – Quantum chemistry code –(Fortran, MPI, 330,000 SLOC) HYCOM – Ocean circulation modeling code –(Fortran, MPI, 31,000 SLOC) ICEPIC – Particle-in-cell magnetohydrodynamics code –(C, MPI, 60,000 SLOC) LAMMPS – Molecular dynamics code –(C++, MPI, 45,400 SLOC) OOCore – Out-of-core solver mimicking electromagnetics code –(Fortran, MPI, 39,000 SLOC) Overflow2 – CFD code originally developed by NASA –(Fortran, MPI, 83,600 SLOC) WRF – Multi-Agency mesoscale atmospheric modeling code –(Fortran and C, MPI, 100,000 SLOC)

13 Application Benchmark History Computational Technology Area FY 2003FY 2004FY 2005FY 2006FY 2007FY 2008 Computational Structural Mechanics CTH RFCTH CTH Computational Fluid Dynamics Cobalt60 LESLIE3D Aero Cobalt60 Aero AVUS Overflow2 Aero AVUS Overflow2 Aero AVUS Overflow2 AVUS Overflow2 AMR Computational Chemistry, Biology, and Materials Science GAMESS NAMD GAMESS NAMD GAMESS LAMMPS GAMESS LAMMPS GAMESS LAMMPS Computational Electromagnetics and Acoustics OOCore ICEPIC OOCore ICEPIC Climate/Weather/ Ocean Modeling and Simulation NLOMHYCOM WRF HYCOM WRF HYCOM WRF HYCOM WRF

14 Determination of Performance Establish a DoD standard benchmark time for each application benchmark case –ERDC Cray dual-core XT3 (Sapphire) chosen as standard DoD system –Standard benchmark times on DoD standard system measured at 128 processors for standard test cases and 512 processor for large test cases –Split in weight between standard and large application test cases will be made at 256 processors Benchmark timings (at least four on each test case) are requested for systems that meet or beat the DoD standard benchmark times by at least a factor of two (preferably four) Benchmark timings may be extrapolated provided they are guaranteed, but at least two actual timings must be provided for each test case

15 Determination of Performance (cont.) Curve fit: Time = A/N + B + C*N –N = number of processing cores –A/N = time for parallel portion of code (|| base) –B = time for serial portion of code –C*N = parallel penalty (|| overhead) Constraints –A/N 0 Parallel base time is non-negative. –T min B 0 Serial time is non-negative and is not greater than the minimum observed time.

16 Determination of Performance (cont.) Curve fit approach –For each value of B (T min B 0) Determine A: Time – B = A/N Determine C: Time – (A/N + B) = C*N Calculate fit quality (N i, T i ) = time T i observed at N i cores M = number of observed core counts –Select the value of B with largest fit quality

17 Determination of Performance (cont.) Calculate score (in DoD standard system equivalents) –C = number of compute cores in target system –C base = number of compute cores in standard system –S base = number of compute cores in standard execution –STM = size-to-match = number of compute cores of target system required to match performance of S base cores of the standard system

18 AMR Large Test Case on HP Opteron Cluster

19 AMR Large Test Case on SGI Altix

20 AMR Large Test Case on Dell Xeon Cluster

21 Overflow-2 Standard Test Case on Dell Xeon Cluster

22 Overflow-2 Large Test Case on IBM P5+

23 ICEPIC Standard Test Case on SGI Altix

24 ICEPIC Large Test Case on SGI Altix

25 Comparison of HPCMP System Capabilities: FY 2003 - FY 2008 Habu-equivalents per Processor

26 Whats Next? Continue to evolve application benchmarks to represent accurately the HPCMP computational workload Increase profiling and performance modeling to understand application performance better Use performance predictions to supplement application benchmark measurements and guide vendors in designing more efficient systems


Download ppt "Presentation Outline A word or two about our program Our HPC system acquisition process Program benchmark suite Evolution of benchmark-based performance."

Similar presentations


Ads by Google