Presentation is loading. Please wait.

Presentation is loading. Please wait.

NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.

Similar presentations


Presentation on theme: "NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger."— Presentation transcript:

1 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger Allan Snavely San Diego Supercomputer Center June 19, 1998

2 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Background CRAY vector computers have been the workhorses of scientific computing for over 2 decades. CRAY PVPs have been ‘effort/performance’ leaders due to vector processors, flat shared memory, and great tools. Vector machines are still very popular in terms of number of users and available scientific applications software. NPACI currently offers T916/14, J98/5, J916/16. There is lots of legacy vector code, much of which will never see an MPI_Send call. T90s are the last in the line of CRAY PVP computers.

3 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim More Background Tera has developed revolutionary new architecture, the MTA, for parallel computing with a programming model as simple as the PVP model. MTA can exploit more levels of parallelism than T90. First Tera machine (MTA, for MultiThreaded Architecture) was delivered to SDSC in November 1997 with a single 145 MHz processor (< 1/2 final speed). Tera delivered a two processor system to SDSC in early 1998 with two 255 MHz (still not final) processors and a network board (not final, either), but no UNIX.

4 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Caveats, Disclaimers, and Excuses MTA software is still being debugged. Processors are not running at full speed: –theoretical peak is 765 Mflops/CPU (255MHz), but will rise to 0.9-1.0 Gflops Interconnect is not up to specification: –memory-intensive codes cannot speed up by more than 1.75 until new network boards are installed All of the above are improving daily and are production issues, not research issues. We have had 2 processors running and a stable OS (but not UNIX yet) for only a few weeks. Time is shared w/Tera.

5 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim

6 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim T90/MTA Hardware Comparison CRAY T90 440 MHz frequency 8 128-element vector registers/CPU Dual vector pipes into FUs Pipelines ADD and MULT units Can execute 4 flops/cycle (commonly 2) Flat shared memory DRAM, high bandwidth, low latency Can issue 2 loads + 1 store / cycle Peak 1.76 Gflops / CPU Practical peak of 1 Gflops Currently observe 400-800 Mflops in 'good' user codes Tera MTA-1 300+ MHz clock (255MHz now) 128 Streams (HW for threads)/CPU Effective depth of pipeline is 21 Additional FMA unit Can execute 3 flops/cycle (commonly 2) Flat shared memory SRAM, moderate latency, moderate bandwidth Can issue 1 memory ref / cycle Peak 0.9+ Gflops / CPU Practical peak of 600 Mflops Tera expects sustained 30-60% of peak in 'good' user codes

7 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim NAS 2.3-Serial Benchmarks NAS Parallel Benchmarks version 2.3 –Level 2 are not pencil-and-paper; must be executed as is or with minimal tuning –Written using MPI for distributed memory, RISC-based machines NAS 2.3-Serial –‘Reverse-engineered’ from NPB 2.3; MPI versions were ‘serialized’ –Not necessarily optimal for vector or multithreaded platforms ‘as is’

8 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim

9 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim NAS 2.3-Serial Benchmarks Results

10 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Applications Performance: Disclaimer MTA wasn’t available long enough to port, tune many applications 2 processors weren’t available long enough to obtain many multiprocessor results Most tuning effort performed by Tera staff Applications selected were not chosen for superior T90 performance: –LCPFCT performs very well on T90 –AMBER performs fairly well on T90 –LS-DYNA3D performs less well on T90 for many interesting cases

11 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim LCPFCT Performance Comparison

12 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim AMBER Performance Comparison

13 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim LS-DYNA3D Comparison

14 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Conclusions T90 multitasking doesn't allow the user fine control over load balancing. Porting T90 codes to the MTA is easy. Tuning on both platforms is facilitated by excellent compilers and simple programming models. MTA can exploit the same parallelism in a problem which the T90 can. Can also exploit levels which the T90 doesn’t. MTA is likely to give good performance & scalability on most T90 codes. The T90 is still the world's fastest vector machine, but the MTA may outperform it across a wider spectrum of problems using vectors but also having more potential outer-loop, and higher level, parallelism.

15 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Future MTA Hardware Plans 4-processor network to be delivered soon (July?) 2 more processors delivered shortly thereafter (August?) [With each processor comes one or two 1GB memory modules (not associated directly with processor, just how network is built)] UNIX will be completed by end of summer (Aug-Sept?) Pending results of evaluations, increase size to 8 (end of year?), then 16 (next year) Fortran 90, OpenMP, other tools on the way...

16 NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim Future Work SC98: –updated NAS benchmarks (‘final’ processors, network) –multiprocessor benchmarks –applications as well as kernels Applications Porting and Tuning: –More work on AMBER, LS-DYNA3D –Port GAMESS, MPIRE, OVERFLOW –Port other vendor and research codes –Suggestions? (allans@sdsc.edu)


Download ppt "NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger."

Similar presentations


Ads by Google