Presentation on theme: "HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery."— Presentation transcript:
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery
What is a GPU? Graphics Processing Unit Originally designed to make Video Games Uses many processing cores to parallelize the math required for real time game play. Early researchers made general programs that looked like graphics so they could run in the GPU. In 2006 nVidia released the CUDA programming interface to allow users to easily make scalable general purpose programs that run on the GPU (GPGPU).
Running on the GPU Program Starts on the CPU Copy data to GPU (slow-ish) Run kernel threads on GPU (very fast) Copy results back to CPU (slow-ish) There are a lot of clever ways to fully utilize both the GPU and CPU.
Pros and Cons Benefits Lots of processing cores. Works with the CPU as a co-processor Very fast local memory bandwidth Large online community of developers Drawbacks Can be difficult to program. Memory Transfers between GPU and CPU are costly (time). Cores typically run the same code.
gfx-000 Test hardware Single Quad core 2.4 Ghz Intel Processor. 8GB of CPU RAM Three Nvidia GTX 280 Video cards: 1GB of ram per card 240 CUDA processing Cores per card 1.3 GHz Processor Clock Speed Total of 724 cores on a single machine
Installed Software on gfx-000 Cuda toolkit 2.2 and 2.3 For programming in c/c++ and fortran cublas – Cuda version of blas libraries cufft – Cuda version of fft libraries pycuda – Python Cuda Interface Zephyr – Molecular Dynamics Program optimized for GPUs
Other Available Software OpenCL c/c++ interface Jacket Matlab GPU wrapper Lattice Boltzmann pde solver OpenVIDIA Machine Vision Many Many others Cuda Zone ~90 thousand cuda developers. Lots of software examples Developer Forms Tutorials http://www.nvidia.com/o bject/cuda_home.htmlhttp://www.nvidia.com/o bject/cuda_home.html
New GPU Cluster Buy-In Rack Units: 1U CPU: 2x Intel Xeon E5530 Quad-Core 2.40GHZ Memory: 18GB of Ram Hard drive: 250GB disk for OS and Local Scratch Network: Ethernet only, (no Infiniband support) GPU: Two Nvidia Tesla M1060 GPUs Support: Four year, next business day hardware support Cost: $5,224
Each Nvidia Tesla M1060 Number of Streaming Processor Cores 240 Frequency of processor cores 1.3 GHz Single Precision peak floating point performance 933 gigaflops Double Precision peak floating point performance 78 gigaflops Dedicated Memory 4 GB GDDR3 Memory Speed 800 MHz Memory Interface 512-bit Memory Bandwidth 102 GB/sec System Interface PCIe
What are we buying 240 cores * 2 GPUs + 4 cores * 2 CPUs = 488 Cores / node 31 Nodes (minimum) * 488 Cores / node = 15,128 cores in our new cluster However, 20 of these nodes are dedicated buy-in nodes so only 5368 cores will be available in the general cluster