Presentation is loading. Please wait.

Presentation is loading. Please wait.

BlueGene/L Power, Packaging and Cooling Todd Takken IBM Research February 6, 2004 (edited 2/11/04 version of viewgraphs)

Similar presentations


Presentation on theme: "BlueGene/L Power, Packaging and Cooling Todd Takken IBM Research February 6, 2004 (edited 2/11/04 version of viewgraphs)"— Presentation transcript:

1 BlueGene/L Power, Packaging and Cooling Todd Takken IBM Research February 6, 2004 (edited 2/11/04 version of viewgraphs)

2 10/14/032 BlueGene/L Design Fundamentals Low power core System-on-a-chip ASIC technology Dense packaging Ducted, air cooled, 25 kW rack Redundancy, fault detection and fault tolerance Standard proven components for reliability and cost Custom advanced components where needed for increased application performance.

3 10/14/033 ASIC cost/performance advantage Embedded processor has power/performance advantage System-on-a-chip allows less complexity, denser packaging

4 10/14/034 BlueGene/L System

5 10/14/035 The BlueGene/L Networks

6 10/14/036 BlueGene/L Compute ASIC IBM CU-11, 0.13 µm 11 x 11 mm die size 25 x 32 mm CBGA 474 pins, 328 signal 1.5/2.5 Volt

7 10/14/037 Dual Node Compute Card 9 x 512 Mb DRAM; 16B interface Heatsinks designed for 15W 54 mm (2.125”) 206 mm (8.125”) wide, 14 layers Metral 4000 connector (180 pins)

8 10/14/038 32- way (4x4x2) node card dc-dc converters IO Gb Ethernet connectors through tailstock Latching and retention Midplane (450 pins) torus, tree, barrier, clock, Ethernet service port 16 compute cards 2 optional IO cards Ethernet- JTAG FPGA

9 10/14/039 512 Way BG/L Prototype

10 10/14/0310 64 Rack Floor Layout, compure racks only.

11 10/14/0311 This artist concept for BlueGene/L illustrates its remarkably compact footprint 2,500 ft 2 footprint includes 400 TB of disk storage

12 10/14/0312 BlueGene/L Link Chip IBM CU-11, 0.13 µm technology 6.6 mm die size 25 x 32 mm CBGA 474 pins, 312 signal 1.5 Volt

13 10/14/0313 BG/L link card Link ASIC ~4W Ethernet-> JTAG FPGA Redundant DC- DC converters 22 differential pair cables, max 8.5 meter Midplane (540 pins)

14 10/14/0314 BG/L rack, cabled Y Cables X Cables Z Cables

15 10/14/0315 BlueGene/L Link “Eye” Measurements 1.6 Gb/s Signal path includes module, card wire (86 cm), and card edge connectors Signal path includes module, card wire (2 x 10 cm), cable connectors, and 8 m cable

16 10/14/0316 Link Performance Exceeds Design Target 700 MHz Early measurements on Raw Link BER 36 hours on 32 way with a single error. 3.5 10^(-17) (1.7 Gb/s) All observed errors were “corrected” through packet retransmission

17 10/14/0317 Bit Error Rate Measurements Average data rate for experiment exceeds 260 Gb/s with 24% of bits transmitted through 8-10 m cables In over 4900 total hours of operation over 4.6 x 10 18 bits have been transferred with only 8 errors observed (one error through 8-10 m cables) All errors were single bit (detectable by CRC) Aggregate midplane BW=8.4 Tb/s, at BER of 10 -18 we expect a single bit error about every 33 hours per midplane Based on these results, packet resends due to CRC detected link errors will not significantly degrade BG/L performance Data Rate (Gb/s) Time (hours) Total bits ErrBER 1.43352.3 x 10 17 04.4 x 10 -18 1.51841.3 x 10 17 07.5 x 10 -18 1.68939.3 x 10 17 01.1 x 10 -18 1.721392.0 x 10 18 14.9 x 10 -19 1.86076.3 x 10 17 69.6 x 10 -18 1.95125.0 x 10 17 02.0 x 10 -18 2.02892.2 x 10 17 14.5 x 10 -18 1.4-1.735513.3 x 10 18 13.0 x 10 -19 1.8-2.014081.4 x 10 18 75.1 x 10 -18 Total49594.7 x 10 18 88.9 x 10 -19 BER test status: 6/9/03

18 10/14/0318 BlueGene/L 512 Way Prototype Power Maximum Power (W)500 MHz700 MHz UnitNumUnit PwrTotal PwrUnit PwrTotal Pwr Node Cards1639062405198304 Link Cards4218426104 Service Card117 dc-dc Conversion Loss--- 791---1051 Fans302678026780 ac-dc Conversion Loss--- 950---1231 Midplane Total Power--- 8862---11487 64k System Power (kW)1288.862114611.4871470 MF/W (Peak)--- 231---250 MF/W (Sustained)--- 160---172

19 10/14/0319.. Fan module AC-DC converter

20 10/14/0320 Rack Alternative Ducting Scheme Shawn Hall 4-3-02 02-04-03 Angled Plenums Rack Alternative Ducting: Ducts are larger where flow is greater (Tj ~10C lower) Hot Cold etc. Thermal- Insulating Baffle Hot Cold Hot Cold Hot Cold Hot Cold Flow rate in cold duct is largest at bottom; flow rate in hot duct is largest at top. This scheme has same duct area, top to bottom, regardless of flow rate. BG/L L R airflow, direct from raised floor

21 10/14/0321 BG/L Reliability & Serviceability Redundant bulk supplies, power converters, fans, DRAM bits, cable bits ECC or parity/retry with sparing on most buses. Extensive data logging (voltage, temp, recoverable errors, … ) and failure forecasting. Uncorrectable errors cause restart from checkpoint after repartitioning. Only fails early in global clock tree, or certain failures of link cards, require immediate service.

22 10/14/0322 BG/L Reliability Estimates ComponentFIT per component * Components per 64k partition FITs per system Failure rate per week Ethernet->JTAG FPGA1602806450k DRAM5599,0402,995k Compute + I/O ASIC2066,5601,331k Link ASIC10307210k, 20k** Clock chip5~11,00050k, 5k** Non-redundant power supply500768384k Total (65,536 compute nodes)5247k0.88*** * After burn-in and applied redundancy. ** Will result in at most ¼ of the system being unavailable *** 1.4%, or 2 fails in 3 years, are multi-midplane. Remainder of fails are restricted to a midplane, thus single point of failure. T=60C, V=Nom, 40K POH. FIT = Failures in parts per million per thousand power-on hours. 1 FIT = 0.168*10 -6 fails/week if the machine runs 24 hrs/day.

23 10/14/0323 BlueGene/L Facts. Platform Characteristics512-node prototype64 rack BlueGene/L Machine Peak Performance1.0 / 2.0 TFlops/s180 / 360 TFlops/s Total Memory Size128 GByte16 / 32 TByte Foot Print9 sq feet2500 sq feet Total Power9 KW1.5 MW Compute Nodes512 dual proc65,536 dual proc Clock Frequency500 MHz700 MHz NetworksTorus, Tree, Barrier Torus Bandwidth3 B/cycle

24 10/14/0324 BlueGene/L Comparison Price performance ~$100 million for 180/360 TFlops/s Performance scales much better with machine size than standard Linux clusters Space efficiency 2500 sq ft for 64-rack system, including disk 2x - 4x compute density improvement vs. blades, superior connectivity Power efficiency 1.5 MW for 64-rack system Scaleability – from ½ rack to 100s of racks Reliability less than 1 fail per week expected for 64-rack system

25 10/14/0325 Summary Exploiting low power embedded processors, ASIC system-on- chip, and dense packaging enables large improvements in peak performance, cost/performance, floor space, and total power consumed over previous supercomputers. 512 way 500 MHz prototype is complete and all major functional subsystems are operational. Compute and IO nodes with Gb Ethernet Tree, torus and global interrupts Control system Power and performance of half-rack 512 way prototype meet the design goals required to build a 64k node BG/L system. 700 MHz production-level system bringup has begun The success of BlueGene/L depends on the number and variety of applications that can be ported to run efficiently on the hardware.


Download ppt "BlueGene/L Power, Packaging and Cooling Todd Takken IBM Research February 6, 2004 (edited 2/11/04 version of viewgraphs)"

Similar presentations


Ads by Google