Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/3/2015Copyright G Bell & TCM History Center 1 Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp.

Similar presentations


Presentation on theme: "6/3/2015Copyright G Bell & TCM History Center 1 Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp."— Presentation transcript:

1 6/3/2015Copyright G Bell & TCM History Center 1 Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp. http://research.microsoft.com/users/gbell Photos courtesy of The Computer Museum History Center Please only copy with credit! http://www.computerhistory.org

2 6/3/2015Copyright G Bell & TCM History Center 2 Supercomputer Largest computer at a given time Technical use for science and engineering calculations Large government defense, weather, aero laboratories are first buyers Price is no object Market size is 3-5

3 6/3/2015Copyright G Bell & TCM History Center 3 Growth in Computational Resources Used for UK Weather Forecasting 1950 2000 10T 1T 100G 10G 1G 100M 10M 1M 100K 10K 1K 100 10 Leo Mercury KDF9 195 205 YMP 10 10 / 50 yrs = 1.58 50

4 6/3/2015Copyright G Bell & TCM History Center 4 What a difference 25 years and spending >10x more makes! LLNL 150 Mflops machine room c1978 Artist’s view of 40 Tflops ESRDC c2002

5 6/3/2015Copyright G Bell & TCM History Center 5 Harvard Mark I aka IBM ASCC

6 6/3/2015Copyright G Bell & TCM History Center 6 I think there is a world market for maybe five computers. “ ” Thomas Watson Senior, Chairman of IBM, 1943

7 6/3/2015Copyright G Bell & TCM History Center 7 The scientific market is still about that size… 3 computers When scientific processing was 100% of the industry a good predictor $3 Billion: 6 vendors, 7 architectures DOE buys 3 very big ($100-$200 M) machines every 3-4 years

8 6/3/2015Copyright G Bell & TCM History Center 8 Supercomputer price (t) Time$Mstructure example 19501mainframesmany... 19603instruction //smIBM / CDC mainframe SMP 197010pipelining7600 / Cray 1 198030vectors; SCI“Crays” 1990250MIMDs: mC, SMP, DSM“Crays”/MPP 20001,000ASCI, COTS MPPGrid, Legion

9 6/3/2015Copyright G Bell & TCM History Center 9 Supercomputing: speed at any price, using parallelism Intra processor Memory overlap & instruction lookahead Functional parallelism (2-4) Pipelining (10) SIMD ala ILLIAC 2d array of 64 pe vs vectors Wide instruction word (2-4) MTA (10-20) MIMDs… processor replication SMP (4-64) Distributed Shared Memory SMPs 100 MIMD… computer replication Multicomputers aka MPP aka clusters (10K) Grid: 100K

10 6/3/2015Copyright G Bell & TCM History Center 10 High performance architectures timeline 1950.1960.1970.1980.1990.2000 VtubesTrans.MSI(mini) Micro RISCnMicr Processoroverlap, lookahead “killer micros” Cray era66007600Cray1X Y C T Vector-----SMP----------------> SMPmainframes--->“multis”-----------> DSMKSR SGI----> ClustersTandmVAXIBM UNIX-> MPP if n>1000 Ncube IntelIBM-> Networks n>10,000NOWGrid

11 6/3/2015Copyright G Bell & TCM History Center 11 High performance architectures timeline 1950.1960.1970.1980.1990.2000 VtubesTrans.MSI(mini) Micro RISCnMicr Sequential programming---->------------------------------ <SIMD Vector--//--------------- Parallelization--- Parallel programming <--------------- multicomputers <--MPP era------ ultracomputers 10X in price10xMPP “in situ” resources 100x in //sm NOWVLC Grid

12 6/3/2015Copyright G Bell & TCM History Center 12 Time line of hpcc contributions

13 6/3/2015Copyright G Bell & TCM History Center 13 Time line of hpcc contributions

14 6/3/2015Copyright G Bell & TCM History Center 14 Lehmer UC/Berkeley pre- computer number sieves

15 6/3/2015Copyright G Bell & TCM History Center 15 Eniac c1946

16 6/3/2015Copyright G Bell & TCM History Center 16 Manchester: the first computer. Baby, Mark I, and Atlas

17 6/3/2015Copyright G Bell & TCM History Center 17 von Neumann computers : Rand Johniac

18 6/3/2015Copyright G Bell & TCM History Center 18 Gene Amdahl’s Dissertation and first computer

19 6/3/2015Copyright G Bell & TCM History Center 19 IBM

20 6/3/2015Copyright G Bell & TCM History Center 20 IBM Stretch c1961 & 360/91 c1965 consoles!

21 6/3/2015Copyright G Bell & TCM History Center 21 IBM Terabit Photodigital Store c1967

22 6/3/2015Copyright G Bell & TCM History Center 22 STC Terabytes of storage c1999

23 6/3/2015Copyright G Bell & TCM History Center 23 Amdahl aka Fujitsu version of the 360 c1975

24 6/3/2015Copyright G Bell & TCM History Center 24 IBM ASCI Red @ LLNL

25 6/3/2015Copyright G Bell & TCM History Center 25 CDC, ETA, Cray Research, Cray Computer

26 6/3/2015Copyright G Bell & TCM History Center 26 Cray 1925 -1996

27 6/3/2015Copyright G Bell & TCM History Center 27 Circuits and Packaging, Plumbing (bits and atoms) & Parallelism… plus Programming and Problems Packaging, including heat removal High level bit plumbing… getting the bits from I/O, into memory through a processor and back to memory and to I/O Parallelism Programming: O/S and compiler Problems being solved

28 6/3/2015Copyright G Bell & TCM History Center 28 Seymour Cray Computers 1951: ERA 1103 control circuits 1957: Sperry Rand NTDS; to CDC 1959: Little Character to test transistor ckts 1960: CDC 1604 (3600, 3800) & 160/160A 1964: CDC 6600 (6xxx series) 1969: CDC 7600

29 6/3/2015Copyright G Bell & TCM History Center 29 Cray Research, Cray Computer Corp. and SRC Computer Corp. 1976: Cray 1... (1/M, 1/S, XMP, YMP, C90, T90) 1985: Cray Computer Cray 2 from Cray Research; GaAs: Cray 3 (1993), Cray 4 1999: SRC Company large scale, shared memory multiprocessor using x86 microprocessors

30 6/3/2015Copyright G Bell & TCM History Center 30 Cray contributions… Creative and productive during his entire career 1951-1996. Creator and un-disputed designer of supers from c1960 1604 to Cray 1, 1s, 1m c1977… basis for SMPvector: XMP, YMP, T90, C90, 2, 3 Circuits, packaging, and cooling… “the mini” as a peripheral computer Use I/O computers versus I/O processors Use the main processor and interrupt it for I/O versus I/O processors aka IBM Channels

31 6/3/2015Copyright G Bell & TCM History Center 31 Cray Contributions Multi-theaded processor (6600 PPUs) CDC 6600 functional parallelism leading to RISC… software control Pipelining in the 7600 leading to... Use of vector registers: adopted by 10+ companies. Mainstream for technical computing Established the template for vector supercomputer architecture SRC Company use of x86 micro in 1986 that could lead to largest, smP?

32 6/3/2015Copyright G Bell & TCM History Center 32 “Cray” Clock speed (Mhz), no. of processors, peak power (Mflops)

33 6/3/2015Copyright G Bell & TCM History Center 33 Time line of Cray designs control vector control packaging,// pipelining circuit NTDS Mil spec 1957)

34 6/3/2015Copyright G Bell & TCM History Center 34 CDC 1604 & 6600

35 6/3/2015Copyright G Bell & TCM History Center 35 CDC 7600: pipelining

36 6/3/2015Copyright G Bell & TCM History Center 36 CDC 8600 Prototype: SMP, scalar, discrete circuits, failed to achieve clock speed

37 6/3/2015Copyright G Bell & TCM History Center 37 CDC STAR… ETA10

38 6/3/2015Copyright G Bell & TCM History Center 38 CDC 7600 & Cray 1 at Livermore Cray 1CDC 7600 Disks

39 6/3/2015Copyright G Bell & TCM History Center 39 Cray 1 #6 from LLNL. Located at The Computer Museum History Center, Moffett Field

40 6/3/2015Copyright G Bell & TCM History Center 40 Cray 1 150 Kw. MG set & heat exchanger

41 6/3/2015Copyright G Bell & TCM History Center 41 Cray XMP/4 Proc. c1984

42 6/3/2015Copyright G Bell & TCM History Center 42 Cray 2 from NERSC/LBL

43 6/3/2015Copyright G Bell & TCM History Center 43 Cray 3 c1995 processor 500 MHz 32 modules 1K GaAs ic’s/module 8 proc.

44 6/3/2015Copyright G Bell & TCM History Center 44 c1970: Beginning the search for parallelism SIMDs Illiac IV CDC Star Cray 1

45 6/3/2015Copyright G Bell & TCM History Center 45 Iliac IV: first SIMD c 1970s

46 6/3/2015Copyright G Bell & TCM History Center 46 SCI (Strategic Computing Initiative) funded by DARPA and aimed at a Teraflops! Era of State computers and many efforts to build high speed computers… lead to HPCC Thinking Machines, Intel supers, Cray T3 series

47 6/3/2015Copyright G Bell & TCM History Center 47 Minisupercomputers: a market whose time never came. Alliant, Convex, Ardent+Stellar= Stardent = 0,

48 6/3/2015Copyright G Bell & TCM History Center 48 Cydrome and Multiflow: prelude to wide word parallelism in Merced Minisupers with VLIW attack the market Like the minisupers, they are repelled It’s software, software, and software Was it a basically good idea that will now work as Merced?

49 6/3/2015Copyright G Bell & TCM History Center 49 MasPar... A less costly, CM 1/2 done in silicon chips It is repelled. S is the fatal flaw

50 6/3/2015Copyright G Bell & TCM History Center 50 Thinking Machines:

51 6/3/2015Copyright G Bell & TCM History Center 51 Thinking Machines: CM1 & CM5 c1983-1993

52 6/3/2015Copyright G Bell & TCM History Center 52 “ ” In Dec. 1995 computers with 1,000 processors will do most of the scientific processing. Danny Hillis 1990 (1 paper or 1 company)

53 6/3/2015Copyright G Bell & TCM History Center 53 The Bell-Hillis Bet Massive Parallelism in 1995 TMC World-wide Supers TMC World-wide Supers TMC World-wide Supers Applications Revenue Petaflops / mo.

54 6/3/2015Copyright G Bell & TCM History Center 54 Bell-Hillis Bet: wasn’t paid off! My goal was not necessarily to just win the bet! Hennessey and Patterson were to evaluate what was really happening… Wanted to understand degree of MPP progress and programmability

55 6/3/2015Copyright G Bell & TCM History Center 55 KSR 1: first commercial DSM NUMA (non-uniform memory access) aka COMA (cache-only memory architecture)

56 6/3/2015Copyright G Bell & TCM History Center 56 SCI (c1980s): Strategic Computing Initiative funded ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),

57 6/3/2015Copyright G Bell & TCM History Center 57 Those who gave their lives in the search for parallellism Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC, Chen Systems, CHOPP, Cogent, Convex (now HP), Culler, Cray Computers, Cydrome, Dennelcor, Elexsi, ETA, E & S Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, KSR, MasPar, Multiflow, Myrias, Ncube, Pixar, Prisma, SAXPY, SCS, SDSA, Supertek (now Cray), Suprenum, Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Thinking Machines, Vitec, Vitesse, Wavetracer.

58 6/3/2015Copyright G Bell & TCM History Center 58 NCSA Cluster of 8 x 128 processors SGI Origin c1999

59 6/3/2015Copyright G Bell & TCM History Center 59 Humble beginning: In 1981… would you have predicted this would be the basis of supers?

60 6/3/2015Copyright G Bell & TCM History Center 60 Intel’s ipsc 1 & Touchstone Delta

61 6/3/2015Copyright G Bell & TCM History Center 61 Intel Sandia Cluster 9K PII: 1.8 TF

62 6/3/2015Copyright G Bell & TCM History Center 62 GB with NT, Compaq, HP cluster

63 6/3/2015Copyright G Bell & TCM History Center 63 192 HP 300 MHz 64 Compaq 333 MHz Andrew Chien, CS UIUC-->UCSD Rob Pennington, NCSA Myrinet Network, HPVM, Fast Msgs Microsoft NT OS, MPI API “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft The Alliance LES NT Supercluster

64 6/3/2015Copyright G Bell & TCM History Center 64 Intel/Sandia: 9000x1 node Ppro LLNL/IBM: 512x8 PowerPC (SP2) LANL/Cray: 6144 CPUs Maui Supercomputer Center – 512x1 SP2 Our Tax Dollars At Work ASCI for Stockpile Stewardship

65 6/3/2015Copyright G Bell & TCM History Center 65 ASCI Blue Mountain 3.1 Tflops SGI Origin 2000 12,000 sq. ft. of floor space 1.6 MWatts of power 530 tons of cooling 384 cabinets to house 6144 CPU’s with 1536 GB (32GB / 128 CPUs) 48 cabinets for metarouters 96 cabinets for 76 TB of raid disks 36 x HIPPI-800 switch Cluster Interconnect 9 cabinets for 36 HIPPI switches about 348 miles of fiber cable

66 6/3/2015Copyright G Bell & TCM History Center 66 Half of SGI ASCI Computer at LASL c1999

67 6/3/2015Copyright G Bell & TCM History Center 67 123456789101112131415161718 123456 6 Groups of 8 Computers each 18 16x16 Crossbar Switches 18 Separate Networks LASL ASCI Cluster Interconnect

68 6/3/2015Copyright G Bell & TCM History Center 68 LASL ASCI Cluster Interconnect

69 6/3/2015Copyright G Bell & TCM History Center 69 Typical MCNP BNCT simulation: 1 cm resolution (21x21x25) 1 million particles 1 hour on 200 MHz PC ASCI Blue Mountain MCNP simulation: 1 mm resolution (256x256x250) 100 million particles 2 hours on 6144 CPUs 3 TeraOps makes a difference!

70 6/3/2015Copyright G Bell & TCM History Center 70 LLNL Architecture Sector S Sector Y Sector K 24 Each SP sector has 488 Silver nodes 24 HPGN Links System Parameters 3.89 TFLOP/s Peak 2.6 TB Memory 62.5 TB Global disk HPGN HiPPI 2.5 GB/node Memory 24.5 TB Global Disk 8.3 TB Local Disk 1.5 GB/node Memory 20.5 TB Global Disk 4.4 TB Local Disk 1.5 GB/node Memory 20.5 TB Global Disk 4.4 TB Local Disk FDDI SST Achieved >1.2TFLOP/s on sPPM and Problem >70x Larger Than Ever Solved Before! 6 6 12

71 6/3/2015Copyright G Bell & TCM History Center 71 I/O Hardware Architecture System Data and Control Networks 488 Node IBM SP Sector 56 GPFS Servers 432 Silver Compute Nodes Each SST Sector local and global I/O file system 2.2 GB/s global I/O performance 3.66 GB/s local I/O performance Separate SP first level switches Independent command and control Full system mode Application launch over full 1,464 Silver nodes 1,048 MPI/us tasks, 2,048 MPI/IP tasks High speed, low latency communication Single STDIO interface GPFS 24 SP Links to Second Level Switch

72 6/3/2015Copyright G Bell & TCM History Center 72 Fujitsu VPP5000 multicomputer: (not available in the U.S.) Computing nodes speed: 9.6 Gflops vector, 1.2 Gflops scalar primary memory: 4-16 GB memory bandwidth: 76 GB/s (9.6 x 64 Gb/s) inter-processor comm: 1.6 GB/s non-blocking with global addressing among all nodes I/O: 3 GB/s to scsi, hippi, gigabit ethernet, etc. 1-128 computers deliver 1.22 Tflops

73 6/3/2015Copyright G Bell & TCM History Center 73 NEC SX 5: clustered SMPv (not available in the U.S.) SMPv computing nodes – 4 - 8 processors/computer – Processor pap: 8 Gflops – Memory – I/O speed Cluster

74 6/3/2015Copyright G Bell & TCM History Center 74 NEC Supers

75 6/3/2015Copyright G Bell & TCM History Center 75 High Performance COTS Raceway and (RACE++) Busses – ANSI Standardized – Mapped Memory, Message Passing, ‘Planned Direct’ Transfers – Circuit Switched; Basic Bus Interface Unit Is a 6 (8) Port Bidirectional Switch at 40MB/s (66MB/s) Per Port. – Scales to  4000 Processors Skychannel – ANSI Standardized – 320mb/sec; Crossbar backplane supports up to 1.6 GB/s Throughput Non-blocking – Heart of Air Force $3M / 256 Gflops System

76 6/3/2015Copyright G Bell & TCM History Center 76 Mercury & Sky Computers - & $ Rugged System With 10 Modules ~ $100K; $1K /# Scalable to several K processors; ~1-10 Gflop / Ft 3 10 9U Boards * 4 Ppc750’s  440 Specfp95 in 1 Ft 3 (18.5 * 8 * 10.75”) Sky 384 Signal Processor, #20 on ‘Top 500’, $3M Mercury VME Platinum System Sky PPC Daughtercard

77 6/3/2015Copyright G Bell & TCM History Center 77 Brookhaven/Columbia QCD c1999 (1999 Bell Prize for performance/$)

78 6/3/2015Copyright G Bell & TCM History Center 78 Brookhaven/Columbia QCD board

79 6/3/2015Copyright G Bell & TCM History Center 79 HT-MT: What’s 0.5 5 ? c1999

80 6/3/2015Copyright G Bell & TCM History Center 80 HT-MT… Mechanical: cooling and signals Chips: design tools, fabrication Chips: memory, PIM Architecture: mta on steroids Storage material

81 6/3/2015Copyright G Bell & TCM History Center 81 HTMT challenges the heuristics for a successful computer Mead 11 year rule: time between lab appearance and commercial use Requires >2 break throughs Team’s first computer or super It’s government funded… albeit at a university


Download ppt "6/3/2015Copyright G Bell & TCM History Center 1 Supercomputers(t) Gordon Bell Bay Area Research Center Microsoft Corp."

Similar presentations


Ads by Google