Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching Center @ UoM

Training Program on GPU Programming with CUDA Sanath Jayasena CUDA Teaching Center @ UoM Day 1, Session 1 Introduction

Outline Training Program Description CUDA Teaching Center at UoM Subject Matter Introduction to GPU Computing GPU Computing with CUDA CUDA Programming Basics July-Aug 20113CUDA Training Program

Overview of Training Program 3 Sundays, starting 31 st July Schedule and program outline Main resource persons – Sanath Jayasena, Jayathu Samarawickrama, Kishan Wimalawarna, Lochandaka Ranathunga Dept of Computer Science & Eng, Dept of Electronic & Telecom. Engineering (of Faculty of Engineering) and Faculty of IT July-Aug 2011CUDA Training Program4

CUDA Teaching Center UoM was selected as a CTC – A group of people from multiple Depts – http://research.nvidia.com/content/cuda-teaching-centers Benefits – Donation of hardware by NVIDIA (GeForce GTX480s and Tesla C2070) – Access to other resources Expectations – Use of the resources for teaching/research, industry collaboration July-Aug 2011CUDA Training Program5

GPU Computing: Introduction Graphics Processing Units (GPUs) – high-performance many-core processors that can be used to accelerate a wide range of applications GPGPU - General-Purpose computation on Graphics Processing Units GPUs lead the race for floating-point performance since start of 21 st century GPUs are being used as parallel processors July-Aug 2011CUDA Training Program6

GPU Computing: Introduction General computing, until end of 20 th century – Relied on the advances in hardware to increase the speed of software/apps Slowed down since then due to – Power consumption issues – Limited productivity within a single processor Switch to multi-core and many-core models – Multiple processing units (processor cores) used in each chip to increase the processing power – Impact on software developers? July-Aug 2011CUDA Training Program7

GPU Computing: Introduction A sequential program will only run on one of the cores, which will not become any faster With each new generation of processors – Software that will continue to enjoy performance improvement will be parallel programs – Where, multiple threads of execution cooperate to achieve the functionality faster July-Aug 2011CUDA Training Program8

CPU-GPU Performance Gap July-Aug 2011CUDA Training Program 9 Source: CUDA Prog. Guide 4.0

CPU-GPU Performance Gap July-Aug 2011CUDA Training Program 10 Source: CUDA Prog. Guide 4.0

GPGPU & CUDA GPU designed as a numeric computing engine – Will not perform well on some tasks as CPUs – Most applications will use both CPUs and GPUs CUDA – NVIDIA’s parallel computing architecture aimed at increasing computing performance by harnessing the power of the GPU – A programming model July-Aug 2011CUDA Training Program11

More Details on GPUs GPU is typically a computer card, installed into a PCI Express 16x slot Market leaders: NVIDIA, Intel, AMD (ATI) – Example NVIDIA GPUs (donated to UoM) GeForce GTX 480Tesla 2070 July-Aug 201112CUDA Training Program

Example Specifications GTX 480Tesla 2070 Peak double precision floating point performance 650 Gigaflops515 Gigaflops Peak single precision floating point performance 1300 Gigaflops1030 Gigaflops CUDA cores480448 Frequency of CUDA Cores 1.40 GHz 1.15 GHz Memory size (GDDR5)1536 MB 6 GigaBytes Memory bandwidth177.4 GBytes/sec150 GBytes/sec ECC MemoryNOYES July-Aug 201113CUDA Training Program

CPU vs. GPU Architecture The GPU devotes more transistors for computation July-Aug 201114CUDA Training Program

CPU-GPU Communication July-Aug 201115CUDA Training Program

CUDA Architecture CUDA is NVIDA’s solution to access the GPU Can be seen as an extension to C/C++ CUDA Software Stack July-Aug 201116CUDA Training Program

CUDA Architecture There are two main parts 1.Host (CPU part) -Single Program, Single Data 2.Device (GPU part) -Single Program, Multiple Data July-Aug 201117CUDA Training Program

CUDA Architecture GRID Architecture July-Aug 201118CUDA Training Program The Grid 1.A group of threads all running the same kernel 2.Can run multiple grids at once The Block 1.Grids composed of blocks 2.Each block is a logical unit containing a number of coordinating threads and some amount of shared memory

Some Applications of GPGPU Computational Structural Mechanics Bio-Informatics and Life Sciences Computational Electromagnetics and Electrodynamics Computational Finance July-Aug 201119CUDA Training Program

Some Applications… Computational Fluid Dynamics Data Mining, Analytics, and Databases Imaging and Computer Vision Medical Imaging July-Aug 201120CUDA Training Program

Some Applications… Molecular Dynamics Numerical Analytics Weather, Atmospheric, Ocean Modeling and Space Sciences July-Aug 201121CUDA Training Program

CUDA Programming Basics

Accessing/Using the CUDA-GPUs You have been given access to our cluster – User accounts on 192.248.8.13x – It is a Linux system CUDA Toolkit and SDK for development – Includes CUDA C/C++ compiler for GPUs (“nvcc”) – Will need C/C++ compiler for CPU code NVIDIA device drivers needed to run programs – For programs to communicate with hardware July-Aug 2011CUDA Training Program23

Example Program 1 “__global__” says the function is to be compiled to run on a “device” (GPU), not “host” (CPU) Angle brackets “ >>” for passing params/args to runtime July-Aug 2011CUDA Training Program24 #include __global__ void kernel (void) { } int main (void) { kernel >> (); printf("Hello World!\n"); return 0; } A function executed on the GPU (device) is usually called a “kernel”

Example Program 2 – Part 1 July-Aug 2011CUDA Training Program 25 As can be seen in next slide: We can pass parameters to a kernel as we would with any C function We need to allocate memory to do anything useful on a device, such as return values to the host

Example Program 2 – Part 2 int main (void) { int c, *dev_c; cudaMalloc ((void **) &dev_c, sizeof (int)); add >> (2,7, dev_c); cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost); printf(“2 + 7 = %d\n“, c); cudaFree(dev_c); return 0; } July-Aug 2011CUDA Training Program26

Example Program 3 Within host (CPU) code, call the kernel by using >> specifying the grid size (number of blocks) and/or the block size (number of threads) - (more details later) July-Aug 201127CUDA Training Program

Example Program 3 …contd July-Aug 201128CUDA Training Program Note: Details on threads and thread IDs will come later

Example Program 4 July-Aug 201129CUDA Training Program

Grids, Blocks and Threads July-Aug 201130CUDA Training Program A grid of size 6 (3x2 blocks) Each block has 12 threads (4x3)

Conclusion In this session we discussed – Introduction to GPU Computing – GPU Computing with CUDA – CUDA Programming Basics Next session – Data Parallelism – CUDA Programming Model – CUDA Threads July-Aug 2011CUDA Training Program31

References for this Session Chapters 1 and 2 of: D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 2010 Chapters 1-4 of: E. Kandrot and J. Sanders, CUDA by Example, Addison-Wesley, 2010 Chapters 1-2 of: NVIDIA CUDA C Programming Guide, NVIDIA Corporation, 2006-2011 (Versions 3.2 and 4.0) July-Aug 2011CUDA Training Program32

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Similar presentations

Presentation on theme: "Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Similar presentations

Presentation on theme: "Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM."— Presentation transcript:

Similar presentations

About project

Feedback