6/3/2015Copyright G Bell & TCM History Center 1 Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp. Photos courtesy of The Computer Museum History Center Please only copy with credit!
6/3/2015Copyright G Bell & TCM History Center 2 Supercomputer Largest computer at a given time Technical use for science and engineering calculations Large government defense, weather, aero laboratories are first buyers Price is no object Market size is 3-5
6/3/2015Copyright G Bell & TCM History Center 3 Growth in Computational Resources Used for UK Weather Forecasting T 1T 100G 10G 1G 100M 10M 1M 100K 10K 1K Leo Mercury KDF YMP / 50 yrs =
6/3/2015Copyright G Bell & TCM History Center 4 What a difference 25 years and spending >10x more makes! LLNL 150 Mflops machine room c1978 Artist’s view of 40 Tflops ESRDC c2002
6/3/2015Copyright G Bell & TCM History Center 5 Harvard Mark I aka IBM ASCC
6/3/2015Copyright G Bell & TCM History Center 6 I think there is a world market for maybe five computers. “ ” Thomas Watson Senior, Chairman of IBM, 1943
6/3/2015Copyright G Bell & TCM History Center 7 The scientific market is still about that size… 3 computers When scientific processing was 100% of the industry a good predictor $3 Billion: 6 vendors, 7 architectures DOE buys 3 very big ($100-$200 M) machines every 3-4 years
6/3/2015Copyright G Bell & TCM History Center 8 Supercomputer price (t) Time$Mstructure example 19501mainframesmany instruction //smIBM / CDC mainframe SMP pipelining7600 / Cray vectors; SCI“Crays” MIMDs: mC, SMP, DSM“Crays”/MPP 20001,000ASCI, COTS MPPGrid, Legion
6/3/2015Copyright G Bell & TCM History Center 9 Supercomputing: speed at any price, using parallelism Intra processor Memory overlap & instruction lookahead Functional parallelism (2-4) Pipelining (10) SIMD ala ILLIAC 2d array of 64 pe vs vectors Wide instruction word (2-4) MTA (10-20) MIMDs… processor replication SMP (4-64) Distributed Shared Memory SMPs 100 MIMD… computer replication Multicomputers aka MPP aka clusters (10K) Grid: 100K
6/3/2015Copyright G Bell & TCM History Center 10 High performance architectures timeline VtubesTrans.MSI(mini) Micro RISCnMicr Processoroverlap, lookahead “killer micros” Cray era Cray1X Y C T Vector-----SMP > SMPmainframes--->“multis” > DSMKSR SGI----> ClustersTandmVAXIBM UNIX-> MPP if n>1000 Ncube IntelIBM-> Networks n>10,000NOWGrid
6/3/2015Copyright G Bell & TCM History Center 11 High performance architectures timeline VtubesTrans.MSI(mini) Micro RISCnMicr Sequential programming----> <SIMD Vector--// Parallelization--- Parallel programming < multicomputers <--MPP era ultracomputers 10X in price10xMPP “in situ” resources 100x in //sm NOWVLC Grid
6/3/2015Copyright G Bell & TCM History Center 12 Time line of hpcc contributions
6/3/2015Copyright G Bell & TCM History Center 13 Time line of hpcc contributions
6/3/2015Copyright G Bell & TCM History Center 14 Lehmer UC/Berkeley pre- computer number sieves
6/3/2015Copyright G Bell & TCM History Center 15 Eniac c1946
6/3/2015Copyright G Bell & TCM History Center 16 Manchester: the first computer. Baby, Mark I, and Atlas
6/3/2015Copyright G Bell & TCM History Center 17 von Neumann computers : Rand Johniac
6/3/2015Copyright G Bell & TCM History Center 18 Gene Amdahl’s Dissertation and first computer
6/3/2015Copyright G Bell & TCM History Center 19 IBM
6/3/2015Copyright G Bell & TCM History Center 20 IBM Stretch c1961 & 360/91 c1965 consoles!
6/3/2015Copyright G Bell & TCM History Center 21 IBM Terabit Photodigital Store c1967
6/3/2015Copyright G Bell & TCM History Center 22 STC Terabytes of storage c1999
6/3/2015Copyright G Bell & TCM History Center 23 Amdahl aka Fujitsu version of the 360 c1975
6/3/2015Copyright G Bell & TCM History Center 24 IBM ASCI LLNL
6/3/2015Copyright G Bell & TCM History Center 25 CDC, ETA, Cray Research, Cray Computer
6/3/2015Copyright G Bell & TCM History Center 26 Cray
6/3/2015Copyright G Bell & TCM History Center 27 Circuits and Packaging, Plumbing (bits and atoms) & Parallelism… plus Programming and Problems Packaging, including heat removal High level bit plumbing… getting the bits from I/O, into memory through a processor and back to memory and to I/O Parallelism Programming: O/S and compiler Problems being solved
6/3/2015Copyright G Bell & TCM History Center 28 Seymour Cray Computers 1951: ERA 1103 control circuits 1957: Sperry Rand NTDS; to CDC 1959: Little Character to test transistor ckts 1960: CDC 1604 (3600, 3800) & 160/160A 1964: CDC 6600 (6xxx series) 1969: CDC 7600
6/3/2015Copyright G Bell & TCM History Center 29 Cray Research, Cray Computer Corp. and SRC Computer Corp. 1976: Cray 1... (1/M, 1/S, XMP, YMP, C90, T90) 1985: Cray Computer Cray 2 from Cray Research; GaAs: Cray 3 (1993), Cray : SRC Company large scale, shared memory multiprocessor using x86 microprocessors
6/3/2015Copyright G Bell & TCM History Center 30 Cray contributions… Creative and productive during his entire career Creator and un-disputed designer of supers from c to Cray 1, 1s, 1m c1977… basis for SMPvector: XMP, YMP, T90, C90, 2, 3 Circuits, packaging, and cooling… “the mini” as a peripheral computer Use I/O computers versus I/O processors Use the main processor and interrupt it for I/O versus I/O processors aka IBM Channels
6/3/2015Copyright G Bell & TCM History Center 31 Cray Contributions Multi-theaded processor (6600 PPUs) CDC 6600 functional parallelism leading to RISC… software control Pipelining in the 7600 leading to... Use of vector registers: adopted by 10+ companies. Mainstream for technical computing Established the template for vector supercomputer architecture SRC Company use of x86 micro in 1986 that could lead to largest, smP?
6/3/2015Copyright G Bell & TCM History Center 32 “Cray” Clock speed (Mhz), no. of processors, peak power (Mflops)
6/3/2015Copyright G Bell & TCM History Center 33 Time line of Cray designs control vector control packaging,// pipelining circuit NTDS Mil spec 1957)
6/3/2015Copyright G Bell & TCM History Center 34 CDC 1604 & 6600
6/3/2015Copyright G Bell & TCM History Center 35 CDC 7600: pipelining
6/3/2015Copyright G Bell & TCM History Center 36 CDC 8600 Prototype: SMP, scalar, discrete circuits, failed to achieve clock speed
6/3/2015Copyright G Bell & TCM History Center 37 CDC STAR… ETA10
6/3/2015Copyright G Bell & TCM History Center 38 CDC 7600 & Cray 1 at Livermore Cray 1CDC 7600 Disks
6/3/2015Copyright G Bell & TCM History Center 39 Cray 1 #6 from LLNL. Located at The Computer Museum History Center, Moffett Field
6/3/2015Copyright G Bell & TCM History Center 40 Cray Kw. MG set & heat exchanger
6/3/2015Copyright G Bell & TCM History Center 41 Cray XMP/4 Proc. c1984
6/3/2015Copyright G Bell & TCM History Center 42 Cray 2 from NERSC/LBL
6/3/2015Copyright G Bell & TCM History Center 43 Cray 3 c1995 processor 500 MHz 32 modules 1K GaAs ic’s/module 8 proc.
6/3/2015Copyright G Bell & TCM History Center 44 c1970: Beginning the search for parallelism SIMDs Illiac IV CDC Star Cray 1
6/3/2015Copyright G Bell & TCM History Center 45 Iliac IV: first SIMD c 1970s
6/3/2015Copyright G Bell & TCM History Center 46 SCI (Strategic Computing Initiative) funded by DARPA and aimed at a Teraflops! Era of State computers and many efforts to build high speed computers… lead to HPCC Thinking Machines, Intel supers, Cray T3 series
6/3/2015Copyright G Bell & TCM History Center 47 Minisupercomputers: a market whose time never came. Alliant, Convex, Ardent+Stellar= Stardent = 0,
6/3/2015Copyright G Bell & TCM History Center 48 Cydrome and Multiflow: prelude to wide word parallelism in Merced Minisupers with VLIW attack the market Like the minisupers, they are repelled It’s software, software, and software Was it a basically good idea that will now work as Merced?
6/3/2015Copyright G Bell & TCM History Center 49 MasPar... A less costly, CM 1/2 done in silicon chips It is repelled. S is the fatal flaw
6/3/2015Copyright G Bell & TCM History Center 50 Thinking Machines:
6/3/2015Copyright G Bell & TCM History Center 51 Thinking Machines: CM1 & CM5 c
6/3/2015Copyright G Bell & TCM History Center 52 “ ” In Dec computers with 1,000 processors will do most of the scientific processing. Danny Hillis 1990 (1 paper or 1 company)
6/3/2015Copyright G Bell & TCM History Center 53 The Bell-Hillis Bet Massive Parallelism in 1995 TMC World-wide Supers TMC World-wide Supers TMC World-wide Supers Applications Revenue Petaflops / mo.
6/3/2015Copyright G Bell & TCM History Center 54 Bell-Hillis Bet: wasn’t paid off! My goal was not necessarily to just win the bet! Hennessey and Patterson were to evaluate what was really happening… Wanted to understand degree of MPP progress and programmability
6/3/2015Copyright G Bell & TCM History Center 55 KSR 1: first commercial DSM NUMA (non-uniform memory access) aka COMA (cache-only memory architecture)
6/3/2015Copyright G Bell & TCM History Center 56 SCI (c1980s): Strategic Computing Initiative funded ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),
6/3/2015Copyright G Bell & TCM History Center 57 Those who gave their lives in the search for parallellism Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC, Chen Systems, CHOPP, Cogent, Convex (now HP), Culler, Cray Computers, Cydrome, Dennelcor, Elexsi, ETA, E & S Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, KSR, MasPar, Multiflow, Myrias, Ncube, Pixar, Prisma, SAXPY, SCS, SDSA, Supertek (now Cray), Suprenum, Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Thinking Machines, Vitec, Vitesse, Wavetracer.
6/3/2015Copyright G Bell & TCM History Center 58 NCSA Cluster of 8 x 128 processors SGI Origin c1999
6/3/2015Copyright G Bell & TCM History Center 59 Humble beginning: In 1981… would you have predicted this would be the basis of supers?
6/3/2015Copyright G Bell & TCM History Center 60 Intel’s ipsc 1 & Touchstone Delta
6/3/2015Copyright G Bell & TCM History Center 61 Intel Sandia Cluster 9K PII: 1.8 TF
6/3/2015Copyright G Bell & TCM History Center 62 GB with NT, Compaq, HP cluster
6/3/2015Copyright G Bell & TCM History Center HP 300 MHz 64 Compaq 333 MHz Andrew Chien, CS UIUC-->UCSD Rob Pennington, NCSA Myrinet Network, HPVM, Fast Msgs Microsoft NT OS, MPI API “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft The Alliance LES NT Supercluster
6/3/2015Copyright G Bell & TCM History Center 64 Intel/Sandia: 9000x1 node Ppro LLNL/IBM: 512x8 PowerPC (SP2) LANL/Cray: 6144 CPUs Maui Supercomputer Center – 512x1 SP2 Our Tax Dollars At Work ASCI for Stockpile Stewardship
6/3/2015Copyright G Bell & TCM History Center 65 ASCI Blue Mountain 3.1 Tflops SGI Origin ,000 sq. ft. of floor space 1.6 MWatts of power 530 tons of cooling 384 cabinets to house 6144 CPU’s with 1536 GB (32GB / 128 CPUs) 48 cabinets for metarouters 96 cabinets for 76 TB of raid disks 36 x HIPPI-800 switch Cluster Interconnect 9 cabinets for 36 HIPPI switches about 348 miles of fiber cable
6/3/2015Copyright G Bell & TCM History Center 66 Half of SGI ASCI Computer at LASL c1999
6/3/2015Copyright G Bell & TCM History Center Groups of 8 Computers each 18 16x16 Crossbar Switches 18 Separate Networks LASL ASCI Cluster Interconnect
6/3/2015Copyright G Bell & TCM History Center 68 LASL ASCI Cluster Interconnect
6/3/2015Copyright G Bell & TCM History Center 69 Typical MCNP BNCT simulation: 1 cm resolution (21x21x25) 1 million particles 1 hour on 200 MHz PC ASCI Blue Mountain MCNP simulation: 1 mm resolution (256x256x250) 100 million particles 2 hours on 6144 CPUs 3 TeraOps makes a difference!
6/3/2015Copyright G Bell & TCM History Center 70 LLNL Architecture Sector S Sector Y Sector K 24 Each SP sector has 488 Silver nodes 24 HPGN Links System Parameters 3.89 TFLOP/s Peak 2.6 TB Memory 62.5 TB Global disk HPGN HiPPI 2.5 GB/node Memory 24.5 TB Global Disk 8.3 TB Local Disk 1.5 GB/node Memory 20.5 TB Global Disk 4.4 TB Local Disk 1.5 GB/node Memory 20.5 TB Global Disk 4.4 TB Local Disk FDDI SST Achieved >1.2TFLOP/s on sPPM and Problem >70x Larger Than Ever Solved Before!
6/3/2015Copyright G Bell & TCM History Center 71 I/O Hardware Architecture System Data and Control Networks 488 Node IBM SP Sector 56 GPFS Servers 432 Silver Compute Nodes Each SST Sector local and global I/O file system 2.2 GB/s global I/O performance 3.66 GB/s local I/O performance Separate SP first level switches Independent command and control Full system mode Application launch over full 1,464 Silver nodes 1,048 MPI/us tasks, 2,048 MPI/IP tasks High speed, low latency communication Single STDIO interface GPFS 24 SP Links to Second Level Switch
6/3/2015Copyright G Bell & TCM History Center 72 Fujitsu VPP5000 multicomputer: (not available in the U.S.) Computing nodes speed: 9.6 Gflops vector, 1.2 Gflops scalar primary memory: 4-16 GB memory bandwidth: 76 GB/s (9.6 x 64 Gb/s) inter-processor comm: 1.6 GB/s non-blocking with global addressing among all nodes I/O: 3 GB/s to scsi, hippi, gigabit ethernet, etc computers deliver 1.22 Tflops
6/3/2015Copyright G Bell & TCM History Center 73 NEC SX 5: clustered SMPv (not available in the U.S.) SMPv computing nodes – processors/computer – Processor pap: 8 Gflops – Memory – I/O speed Cluster
6/3/2015Copyright G Bell & TCM History Center 74 NEC Supers
6/3/2015Copyright G Bell & TCM History Center 75 High Performance COTS Raceway and (RACE++) Busses – ANSI Standardized – Mapped Memory, Message Passing, ‘Planned Direct’ Transfers – Circuit Switched; Basic Bus Interface Unit Is a 6 (8) Port Bidirectional Switch at 40MB/s (66MB/s) Per Port. – Scales to 4000 Processors Skychannel – ANSI Standardized – 320mb/sec; Crossbar backplane supports up to 1.6 GB/s Throughput Non-blocking – Heart of Air Force $3M / 256 Gflops System
6/3/2015Copyright G Bell & TCM History Center 76 Mercury & Sky Computers - & $ Rugged System With 10 Modules ~ $100K; $1K /# Scalable to several K processors; ~1-10 Gflop / Ft U Boards * 4 Ppc750’s 440 Specfp95 in 1 Ft 3 (18.5 * 8 * 10.75”) Sky 384 Signal Processor, #20 on ‘Top 500’, $3M Mercury VME Platinum System Sky PPC Daughtercard
6/3/2015Copyright G Bell & TCM History Center 77 Brookhaven/Columbia QCD c1999 (1999 Bell Prize for performance/$)
6/3/2015Copyright G Bell & TCM History Center 78 Brookhaven/Columbia QCD board
6/3/2015Copyright G Bell & TCM History Center 79 HT-MT: What’s ? c1999
6/3/2015Copyright G Bell & TCM History Center 80 HT-MT… Mechanical: cooling and signals Chips: design tools, fabrication Chips: memory, PIM Architecture: mta on steroids Storage material
6/3/2015Copyright G Bell & TCM History Center 81 HTMT challenges the heuristics for a successful computer Mead 11 year rule: time between lab appearance and commercial use Requires >2 break throughs Team’s first computer or super It’s government funded… albeit at a university