Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = 10 12 floating point.

Similar presentations


Presentation on theme: "Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = 10 12 floating point."— Presentation transcript:

1 Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = 10 12 floating point ops/sec  PFLOPS = 1,000,000,000,000,000 / sec (10 15 )

2 Columbia (10240-processor SGI Altix, 50 Teraflops, NASA Ames Research Center)

3 Beowulf (18-processor cluster, lab machine)

4 AMD Opteron quad-core die

5 The nVidia G80 GPU 128 streaming floating point processors @1.5Ghz 1.5 Gb Shared RAM with 86Gb/s bandwidth 500 Gflop on one chip (single precision)

6 The Computer Architecture Challenge  Most high-performance computer designs allocate resources to optimize Gaussian elimination on large, dense matrices.  Originally, because linear algebra is the middleware of scientific computing.  Nowadays, mostly for bragging rights. = x P A L U

7 Top 500 List http://www.top500.org/list/2008/11/100

8 Generic Parallel Machine Architecture Key architecture question: Where is the interconnect, and how fast? Key algorithm question: Where is the data? Proc Cache L2 Cache L3 Cache Memory Storage Hierarchy Proc Cache L2 Cache L3 Cache Memory Proc Cache L2 Cache L3 Cache Memory potential interconnects

9 Multicore SMP Systems 4MB Shared L2 Core2 FSB Fully Buffered DRAM 10.6GB/s Core2 Chipset (4x64b controllers) 10.6GB/s 10.6 GB/s(write) 4MB Shared L2 Core2 4MB Shared L2 Core2 FSB Core2 4MB Shared L2 Core2 21.3 GB/s(read) Intel Clovertown Crossbar Switch Fully Buffered DRAM 4MB Shared L2 (16 way) 42.7GB/s (read), 21.3 GB/s (write) 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 8K D$MT UltraSparcFPU 179 GB/s (fill) 90 GB/s (writethru) Sun Niagara2 4x128b FBDIMM memory controllers AMD Opteron 1MB victim Opteron 1MB victim Opteron Memory Controller / HT 1MB victim Opteron 1MB victim Opteron Memory Controller / HT DDR2 DRAM 10.6GB/s 4GB/s (each direction)

10 More Detail on GPU Architecture

11 Michael Perrone (IBM): Proper Care and Feeding of Multicore Beasts http://www.csm.ornl.gov/workshops/HPA/documents/ 1-arch/feeding_the_beast_perrone.pdf

12 Cray XMT (highly multithreaded shared memory)


Download ppt "Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = 10 12 floating point."

Similar presentations


Ads by Google