Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART.

Similar presentations


Presentation on theme: "Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART."— Presentation transcript:

1 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART

2 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 2 CCS-3 P AL Section 2 n Overview u We are going to briefly describe some state-of-the- art supercomputers u The goal is to evaluate the degree of integration of the three main components, processing nodes, interconnection network and system software u Analysis limited to 6 supercomputers (ASCI Q and Thunder, System X, BlueGene/L, Cray XD1 and ASCI Red Storm), due to space and time limitations

3 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 3 CCS-3 P AL ASCI Q: Los Alamos National Laboratory

4 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 4 CCS-3 P AL ASCI Q  Total — TF/s, #3 in the top 500  Systems — 2048 AlphaServer ES45s  8,192 EV GHz CPUs with 16-MB cache  Memory — 22 Terabytes  System Interconnect  Dual Rail Quadrics Interconnect  4096 QSW PCI adapters  Four 1024-way QSW federated switches  Operational in 2002

5 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 5 CCS-3 P AL Memory Up to 32 GB MMB 2 MMB 1 MMB 0 Serial, Parallel keyboard/mouse floppy Cache 16 MB per CPU 256b 125 MHz (4.0 GB/s) 256b 125 MHz (4.0 GB/s) EV GHz PCI5PCI4 PCI0 PCI2 PCI1 PCI6PCI7 PCI-USB PCI-junk IO PCI3PCI8 PCI 9 64b 33MHz (266MB/S) 64b 66MHz (528 MB/S ) PCI5PCI4 PCI0 PCI2 PCI1 64b 66MHz (528 MB/S) PCI6PCI7 PCI-USB PCI-junk IO PCI3PCI8 PCI 9 64b 33 MHz (266 MB/S) 64b 66 MHz (528 MB/S) Quad C-Chip Controller PCI Chip Bus 0 PCI Chip Bus 1 D D D D DD D D Quad C-Chip Controller PCI Chip Bus 0,1 PCI Chip Bus 2,3 D D D D DD D D MMB 3 PCI7 HS PCI5PCI4 PCI3 HSPCI2 HSPCI1 HS PCI0 64b 500 MHz (4.0 GB/s) PCI9 HSPCI8 HSPCI6 HS 3.3V I/O5.0V I/O Node: HP (Compaq) AlphaServer ES System Architecture

6 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 6 CCS-3 P AL QsNET: Quaternary Fat Tree Hardware support for Collective Communication MPI Latency 4  s, Bandwidth 300 MB/s Barrier latency less than 10  s

7 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 7 CCS-3 P AL Interconnection Network 1 st 64U64D Nodes th 64U64D Nodes Switch Level Mid Level Super Top Level 1024 nodes (2x = 2048 nodes)

8 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 8 CCS-3 P AL System Software n Operating System is Tru64 n Nodes organized in Clusters of 32 for resource allocation and administration purposes (TruCluster) n Resource management executed through Ethernet (RMS)

9 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 9 CCS-3 P AL ASCI Q: Overview n Node Integration u Low (multiple boards per node, network interface on I/O bus) n Network Integration u High (HW support for atomic collective primitives) n System Software Integration u Medium/Low (TruCluster)

10 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 10 CCS-3 P AL ASCI Thunder, 1,024 Nodes, 23 TF/s peak

11 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 11 CCS-3 P AL ASCI Thunder, Lawrence Livermore National Laboratory 1,024 Nodes, 4096 Processors, 23 TF/s, #2 in the top 500

12 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 12 CCS-3 P AL ASCI Thunder: Configuration n 1,024 Nodes, Quad 1.4 Ghz Itanium2, 8GB DDR266 SDRAM (8 Terabytes total) 2.5  s, 912 MB/s MPI latency and bandwidth over Quadrics Elan4 Barrier synchronization 6  s, allreduce 15  s n 75 TB in local disk in 73GB/node UltraSCSI320 n Lustre file system with 6.4 GB/s delivered parallell I/O performance n Linux RH 3.0, SLURM, Chaos

13 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 13 CCS-3 P AL n CHAOS: Clustered High Availability Operating System u Derived from Red Hat, but differs in the following areas F Modified kernel (Lustre and hw specific) F New packages for cluster monitoring, system installation, power/console management F SLURM, an open-source resource manager

14 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 14 CCS-3 P AL ASCI Thunder: Overview n Node Integration u Medium/Low (network interface on I/O bus) n Network Integration u Very High (HW support for atomic collective primitives) n System Software Integration u Medium (Chaos)

15 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 15 CCS-3 P AL System X: Virginia Tech

16 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 16 CCS-3 P AL System X, TF/s n 1100 dual Apple G5 2GHz CPU based nodes. u 8 billion operations/second/processor (8 GFlops) peak double precision floating performance. n Each node has 4GB of main memory and 160 GB of Serial ATA storage. u 176TB total secondary storage. Infiniband, 8  s and 870 MB/s, latency and bandwidth, partial support for collective communication n System-level Fault-tolerance ( Déjà vu)

17 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 17 CCS-3 P AL System X: Overview n Node Integration u Medium/Low (network interface on I/O bus) n Network Integration u Medium (limited support for atomic collective primitives) n System Software Integration u Medium (system-level fault-tolerance)

18 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 18 CCS-3 P AL Chip (2 processors) Compute Card (2 chips, 2x1x1) Node Card (32 chips, 4x4x2) 16 Compute Cards System (64 cabinets, 64x32x32) Cabinet (32 Node boards, 8x8x16) 2.8/5.6 GF/s 4 MB 5.6/11.2 GF/s 0.5 GB DDR 90/180 GF/s 8 GB DDR 2.9/5.7 TF/s 256 GB DDR 180/360 TF/s 16 TB DDR BlueGene/L System

19 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 19 CCS-3 P AL BlueGene/L Compute ASIC PLB (4:1) “Double FPU” Ethernet Gbit JTAG Access 144 bit wide DDR 256/512MB JTAG Gbit Ethernet 440 CPU I/O proc L2 Multiported Shared SRAM Buffer Torus DDR Control with ECC Shared L3 directory for EDRAM Includes ECC 4MB EDRAM L3 Cache or Memory 6 out and 6 in, each at 1.4 Gbit/s link ECC k/32k L1 “Double FPU” 256 snoop Tree 3 out and 3 in, each at 2.8 Gbit/s link Global Interrupt 4 global barriers or interrupts 128 IBM CU-11, 0.13 µm 11 x 11 mm die size 25 x 32 mm CBGA 474 pins, 328 signal 1.5/2.5 Volt

20 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 20 CCS-3 P AL

21 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 21 CCS-3 P AL DC-DC Converters: 40V  1.5, 2.5V 2 I/O cards 16 compute cards

22 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 22 CCS-3 P AL

23 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 23 CCS-3 P AL BlueGene/L Interconnection Networks 3 Dimensional Torus u Interconnects all compute nodes (65,536) u Virtual cut-through hardware routing u 1.4Gb/s on all 12 node links (2.1 GBytes/s per node) u 350/700 GBytes/s bisection bandwidth u Communications backbone for computations Global Tree u One-to-all broadcast functionality u Reduction operations functionality u 2.8 Gb/s of bandwidth per link u Latency of tree traversal in the order of 5 µs u Interconnects all compute and I/O nodes (1024) Ethernet u Incorporated into every node ASIC u Active in the I/O nodes (1:64) u All external comm. (file I/O, control, user interaction, etc.) Low Latency Global Barrier u 8 single wires crossing whole system, touching all nodes Control Network (JTAG) u For booting, checkpointing, error logging

24 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 24 CCS-3 P AL BlueGene/L System Software Organization n Compute nodes dedicated to running user application, and almost nothing else - simple compute node kernel (CNK) n I/O nodes run Linux and provide O/S services u file access u process launch/termination u debugging n Service nodes perform system management services (e.g., system boot, heart beat, error monitoring) - largely transparent to application/system software

25 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 25 CCS-3 P AL Operating Systems n Compute nodes: CNK u Specialized simple O/S F 5000 lines of code, F 40KBytes in core u No thread support, no virtual memory u Protection F Protect kernel from application F Some net devices in userspace u File I/O offloaded (“function shipped”) to IO nodes F Through kernel system calls u “Boot, start app and then stay out of the way” n I/O nodes: Linux u kernel (2.6 underway) w/ ramdisk u NFS/GPFS client u CIO daemon to F Start/stop jobs F Execute file I/O n Global O/S (CMCS, service node) u Invisible to user programs u Global and collective decisions u Interfaces with external policy modules (e.g., job scheduler) u Commercial database technology (DB2) stores static and dynamic state F Partition selection F Partition boot F Running of jobs F System error logs F Checkpoint/restart mechanism u Scalability, robustness, security n Execution mechanisms in the core n Policy decisions in the service node

26 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 26 CCS-3 P AL BlueGeneL: Overview n Node Integration u High (processing node integrates processors and network interfaces, network interfaces directly connected to the processors) n Network Integration u High (separate tree network) n System Software Integration u Medium/High (Compute kernels are not globally coordinated) n #2 and #4 in the top500

27 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 27 CCS-3 P AL Cray XD1

28 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 28 CCS-3 P AL Cray XD1 System Architecture Compute n 12 AMD Opteron 32/64 bit, x86 processors n High Performance Linux RapidArray Interconnect n 12 communications processors n 1 Tb/s switch fabric Active Management n Dedicated processor Application Acceleration n 6 co-processors n Processors directly connected to the interconnect

29 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 29 CCS-3 P AL Cray XD1 Processing Node Six SATA Hard Drives Four independent PCI-X Slots 500 Gb/s crossbar switch 12-port Inter- chassis connector Connector to 2 nd 500 Gb/s crossbar switch and 12-port inter-chassis connector 4 Fans Chassis Rear Chassis Front Six 2-way SMP Blades

30 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 30 CCS-3 P AL Cray XD1 Compute Blade 4 DIMM Sockets for DDR 400 Registered ECC Memory 4 DIMM Sockets for DDR 400 Registered ECC Memory RapidArray Communications Processor AMD Opteron 2XX Processor Connector to Main Board AMD Opteron 2XX Processor

31 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 31 CCS-3 P AL Fast Access to the Interconnect Processor I/OInterconnect GigaBytesGFLOPSGigaBytes per Second Cray XD1 Memory Xeon Server 6.4GB/s DDR GB/s 5.3 GB/s DDR GB/s GigE 1 GB/s PCI-X

32 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 32 CCS-3 P AL Communications Optimizations RapidArray Communications Processor u HT/RA tunnelling with bonding u Routing with route redundancy u Reliable transport u Short message latency optimization u DMA operations u System-wide clock synchronization RapidArray Communications Processor 2 GB/s 3.2 GB/s 2 GB/s AMD Opteron 2XX Processor RA

33 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 33 CCS-3 P AL Usability u Single System Command and Control Resiliency u Dedicated management processors, real-time OS and communications fabric. u Proactive background diagnostics with self- healing. u Synchronized Linux kernels Active Manager System Active Management Software

34 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 34 CCS-3 P AL Cray XD1: Overview n Node Integration u High (direct access from HyperTransport to RapidArray) n Network Integration u Medium/High (HW support for collective communication) n System Software Integration u High (Compute kernels are globally coordinated) n Early stage

35 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 35 CCS-3 P AL ASCI Red STORM

36 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 36 CCS-3 P AL Red Storm Architecture n Distributed memory MIMD parallel supercomputer n Fully connected 3D mesh interconnect. Each compute node processor has a bi-directional connection to the primary communication network n 108 compute node cabinets and 10,368 compute node processors (AMD 2.0 GHz) n ~10 TB of DDR 333MHz n Red/Black switching: ~1/4, ~1/2, ~1/4 n 8 Service and I/O cabinets on each end (256 processors for each color240 TB of disk storage (120 TB per color)

37 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 37 CCS-3 P AL Red Storm Architecture n Functional hardware partitioning: service and I/O nodes, compute nodes, and RAS nodes n Partitioned Operating System (OS): LINUX on service and I/O nodes, LWK (Catamount) on compute nodes, stripped down LINUX on RAS nodes n Separate RAS and system management network (Ethernet) n Router table-based routing in the interconnect

38 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 38 CCS-3 P AL Net I/O Service Users File I/O Compute /home Red Storm architecture

39 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 39 CCS-3 P AL System Layout (27 x 16 x 24 mesh) Normally Unclassified Normally Classified Switchable Nodes Disconnect Cabinets {{

40 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 40 CCS-3 P AL  Run-Time System  Logarithmic loader  Fast, efficient Node allocator  Batch system – PBS  Libraries – MPI, I/O, Math  File Systems being considered include  PVFS – interim file system  Lustre – Pathforward support,  Panassas…  Operating Systems  LINUX on service and I/O nodes  Sandia’s LWK (Catamount) on compute nodes  LINUX on RAS nodes Red Storm System Software

41 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 41 CCS-3 P AL ASCI Red Storm: Overview n Node Integration u High (direct access from HyperTransport to network through custom network interface chip) n Network Integration u Medium (No support for collective communication) n System Software Integration u Medium/High (scalable resource manager, no global coordination between nodes) n Expected to become the most powerful machine in the world (competition permitting)

42 Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 42 CCS-3 P AL Overview Node Integration Network Integration Software Integration ASCI Q ASCI Thunder System X BlueGene/L Cray XD1 Red Storm


Download ppt "Kei Davis and Fabrizio Petrini Europar 2004, Pisa Italy 1 CCS-3 P AL STATE OF THE ART."

Similar presentations


Ads by Google