Presentation is loading. Please wait.

Presentation is loading. Please wait.

BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego.

Similar presentations


Presentation on theme: "BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego."— Presentation transcript:

1 BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego

2 Outline Introduction Introduction Hardware Overview Hardware Overview PackagingPackaging NodesNodes NetworksNetworks Software Overview Software Overview Use Double Floating Point UnitUse Double Floating Point Unit Computation ModesComputation Modes ExamplesExamples Conclusions Conclusions University of California, San Diego

3 Introduction (1) Goals Goals Price/PerformancePrice/Performance Power/PerformancePower/Performance PerformancePerformance Massively Parallel System Massively Parallel System Largest System scheduled at LLNLLargest System scheduled at LLNL 2 16 compute nodes2 16 compute nodes 64x32x32 three-dimensional torus64x32x32 three-dimensional torus 360TFlops360TFlops University of California, San Diego

4 Introduction (2) How to meet these goals? How to meet these goals? Modest Clock RateModest Clock Rate Single ASICSingle ASIC Dense PackagingDense Packaging Results Results ~1MW, ~300tons, <2500 sq ft~1MW, ~300tons, <2500 sq ft Earth Simulator (40Tflops) Earth Simulator (40Tflops) 2x37125 sq ft2x37125 sq ft University of California, San Diego

5 Packaging Chips to Racks: University of California, San Diego

6 Packaging (2) Air cooled system Air cooled system Standard raised floorStandard raised floor 220V rack feed and failover air220V rack feed and failover air Adaptive speedAdaptive speed MTBF MTBF Dominated by memory failureDominated by memory failure 6.16 days6.16 days University of California, San Diego

7 Nodes (1) ASIC ASIC Dual PowerPC 440FP2 700 MHzDual PowerPC 440FP2 700 MHz 32KB L1 non-coherent32KB L1 non-coherent 2KB prefetch buffer (L2)2KB prefetch buffer (L2) 4MB L3 shared EDRAM4MB L3 shared EDRAM Network controllersNetwork controllers Memory Memory 1GB DDR memory1GB DDR memory University of California, San Diego

8 Nodes (2) University of California, San Diego

9 Nodes (3) PowerPc 440 FP2 PowerPc 440 FP2 Two floating-point unitsTwo floating-point units SIMOMD instructionsSIMOMD instructions Quadword datapathQuadword datapath Superscalar architectureSuperscalar architecture ALU + load/store ALU + load/store University of California, San Diego

10 Networks (1) Torus Torus 3D point-to-point links3D point-to-point links Routers embeddedRouters embedded 6 connections per node6 connections per node 175MB bandwidth175MB bandwidth University of California, San Diego

11 Networks (2) Tree Tree 350MB bandwidth, latency 1.5msec350MB bandwidth, latency 1.5msec Per module integer ALUPer module integer ALU Interrupt/Barrier Interrupt/Barrier BarriersBarriers AND/ORAND/OR JTAG JTAG Control networkControl network Gigabit Ethernet for I/O Gigabit Ethernet for I/O University of California, San Diego

12 Software Support Compute nodes Compute nodes Compute Node KernelCompute Node Kernel Asymmetric view of the coresAsymmetric view of the cores Run only one job at a timeRun only one job at a time Different operation modesDifferent operation modes I/O nodes I/O nodes run Linuxrun Linux Manage compute nodesManage compute nodes Perform file I/OPerform file I/O

13 Using the Double FP unit Compiler Optimization Compiler Optimization Generate SIMOMD operationsGenerate SIMOMD operations Requires consecutive data 16-byte aligned Requires consecutive data 16-byte aligned C/C++ explicit aliasing disambiguation C/C++ explicit aliasing disambiguation Use of primitive functions Use of primitive functions University of California, San Diego

14 Operation Modes Default Default Computation/CommunicationComputation/Communication Virtual Node Mode Virtual Node Mode Resources splitResources split Coprocessor Mode Coprocessor Mode L1 cache not hardware coherentL1 cache not hardware coherent Software based coeherenceSoftware based coeherence University of California, San Diego

15 Coprocessor Mode Fork-join model of execution Fork-join model of execution No communicationNo communication Single-shot workSingle-shot work Permanent workPermanent work Avoid false sharing Avoid false sharing 32bit alignment32bit alignment Data partitioningData partitioning Use of shadow variablesUse of shadow variables University of California, San Diego

16 Reciprocal Computation T=8.4282n+510.67 (pclks) T=8.4282n+510.67 (pclks) T co =5.4335n+1807.3 (pclks) T co =5.4335n+1807.3 (pclks) Crossover point at 430 Crossover point at 430 78% parallel efficiency 78% parallel efficiency University of California, San Diego

17 Daxpy routine BLAS routine BLAS routine y(i)=a*x(i) + y(i) y(i)=a*x(i) + y(i) Theoretical limit of 8 flops in 3 cycles Theoretical limit of 8 flops in 3 cycles Obtained 6 flops in 3 cycle (75%) Obtained 6 flops in 3 cycle (75%) University of California, San Diego

18 Linpack Performance as percentage of peak Performance as percentage of peak Weak scaling (70% memory) Weak scaling (70% memory) Coprocessor vs Virtual Node Coprocessor vs Virtual Node 512 nodes512 nodes 70% - 65%70% - 65% University of California, San Diego

19 NAS BT Navier-Stokes equations Navier-Stokes equations 3D space decomposition 3D space decomposition 2D square process mesh 2D square process mesh 30-50 Mflops/task 30-50 Mflops/task University of California, San Diego

20 Conclusions System overview System overview Peak performance Peak performance Use both FP unitsUse both FP units Run in coprocessorRun in coprocessor Cache coherence problem Cache coherence problem Run in virtual nodeRun in virtual node Half resources, scalability Half resources, scalability Map processes to nodesMap processes to nodes Questions? Questions? University of California, San Diego

21 References An overview of BlueGene/L Supercomputer An overview of BlueGene/L Supercomputer http://www.llnl.gov/asci/platforms/bluegenel/sc2002-pap207.pdf Enabling Dual-Core Mode in BlueGene/L: Chanllenges and Solutions Enabling Dual-Core Mode in BlueGene/L: Chanllenges and Solutions http://ieeexplore.ieee.org/iel5/8848/27982/01250317.pdf?tp=&a rnumber=1250317&isnumber=27982 Unlocking Performance of the BlueGene/L Supercomputer Unlocking Performance of the BlueGene/L Supercomputer http://www- unix.mcs.anl.gov/%7Egropp/projects/parallel/BGL/docs/unlock ingperf.pdf http://www- unix.mcs.anl.gov/%7Egropp/projects/parallel/BGL/docs/unlock ingperf.pdf BG System overview BG System overview http://www.sdsc.edu/user_services/bluegene/guide/docs/SDSC_ BG_overview050707.ppt University of California, San Diego


Download ppt "BluesGene/L Supercomputer A System Overview Pietro Cicotti October 10, 2005 University of California, San Diego."

Similar presentations


Ads by Google