HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

Slides:



Advertisements
Similar presentations
GPU Programming using BU Shared Computing Cluster
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Computing with Accelerators: Overview ITS Research Computing Mark Reed.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
The AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members December 9,
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Information Technology Center Introduction to High Performance Computing at KFUPM.
AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 18, 2008.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
Tuesday, September 08, Head Node – Magic.cse.buffalo.edu Hardware Profile Model – Dell PowerEdge 1950 CPU - two Dual Core Xeon Processors (5148LV)
HPCC Mid-Morning Break Powertools Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Research
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Exploiting Disruptive Technology: GPUs for Physics Chip Watson Scientific Computing Group Jefferson Lab Presented at GlueX Collaboration Meeting, May 11,
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Performance and Energy Efficiency of GPUs and FPGAs
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
ICER User Meeting 3/26/10. Agenda What’s new in iCER (Wolfgang) Whats new in HPCC (Bill) Results of the recent cluster bid Discussion of buy-in (costs,
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
GPU Architecture and Programming
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 10, 2009.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
GRID Applications for the LHC ALICE Experiment Máté Ferenc Nagy Eötvös University Applied-Physics student Supervisor : Gergely Barnaföldi MTA KFKI RMKI.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
What’s Coming? What are we Planning?. › Better docs › Goldilocks – This slot size is just right › Storage › New.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
MD5 CUDA by n VIDIA BARSWF NETWORK SECURITY. MD5  Designer Ronald L. Rivest  Published April 1992  Digest size 128 bits  Rounds 4  ReplacesMD4 
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
General Purpose computing on Graphics Processing Units
Computer Graphics Graphics Hardware
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
NFV Compute Acceleration APIs and Evaluation
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
HPC usage and software packages
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Computer Graphics Graphics Hardware
Multicore and GPU Programming
Presentation transcript:

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster

Naming dev - developer node. gfx - graphics node. GPU - Graphics Processing Unit.  I use this term interchangeably with gfx nvx - nVidia Graphics node.  Internal prefix used on the new cluster.

HPCC System Diagram

System Diagram of GFX dev-gfx08 - (formerly gfx-000) dev-gfx10 – New developer node gfx10 – New 32 node GPU cluster

dev-gfx08 (formerly gfx-000) Single Quad core 2.4 Ghz Intel Processor. 8GB of CPU RAM Three nVidia GTX 280 Video cards:  1GB of ram per card  240 CUDA processing Cores per card  1.3 GHz Processor Clock Speed Total of 724 cores on a single machine

New gfx10 hardware 32 Node cluster + 1 developer node  CPU: 2x Intel Xeon E5530 Quad-Core 2.40GHZ  Memory: 18GB of Ram (~2GB per core)  Hard drive: 200GB disk for Local Scratch  Network: Ethernet only, (no Infiniband)  GPU: Two Nvidia Tesla M1060 GPUs Note: 21 of these nodes are buy-in nodes and set aside for primary use by the owners.

Each nVidia Tesla M1060 Number of Streaming Processor Cores 240 Frequency of processor cores 1.3 GHz Single Precision peak floating point performance 933 gigaflops Double Precision peak floating point performance 78 gigaflops Dedicated Memory 4 GB GDDR3 Memory Speed 800 MHz Memory Interface 512-bit Memory Bandwidth 102 GB/sec System Interface PCIe

Installed Software on HPCC Cuda toolkit 2.2 and 2.3  For programming in c/c++ and fortran PGI compilers with graphics accelerator support. cublas – Cuda version of blas libraries cufft – Cuda version of fft libraries pycuda – Python Cuda Interface Openmm/gromacs – Molecular Dynamics Program optimized for GPUs

Other Available Software OpenCL  c/c++ interface Jacket  Matlab GPU wrapper Lattice Boltzmann  pde solver OpenVIDIA  Machine Vision Many Many others Cuda Zone  ~90 thousand cuda developers.  Lots of software examples  Developer Forms  Tutorials bject/cuda_home.htmlhttp:// bject/cuda_home.html

Running on the new cluster We are still researching the best way to run on the GPU cluster. Looking for Beta Tests:  Probably add the gfx10 attribute to your submission script.  Please sign up and we will add you to a mailing list and give you access to the gfx10 queue.

Example script #!/bin/bash –login #PBS –l nodes=1:ppn=1:gfx10,walltime=01:00:00 module load cuda myprogram myarguments

What is a GPU? Graphics Processing Unit Originally designed to make Video Games Uses many processing cores to parallelize the math required for real time game play. Early researchers made general programs that looked like graphics so they could run in the GPU. In 2006 nVidia released the CUDA programming interface to allow users to easily make scalable general purpose programs that run on the GPU (GPGPU).

GPU vs CPU

CPU and GPU working together

Running on the GPU Program Starts on the CPU  Copy data to GPU (slow-ish)  Run kernel threads on GPU (very fast)  Copy results back to CPU (slow-ish) There are a lot of clever ways to fully utilize both the GPU and CPU.

Pros and Cons Benefits  Lots of processing cores.  Works with the CPU as a co-processor  Very fast local memory bandwidth  Large online community of developers Drawbacks  Can be difficult to program.  Memory Transfers between GPU and CPU are costly (time).  Cores typically run the same code.