GPU Computing CIS-543 Lecture 01: GPU Computing

GPU Computing CIS-543 Lecture 01: GPU Computing
Dr. Muhammad Abid, DCIS, PIEAS

What's GPU Computing? GPU Computing means using Graphics Processing Units (GPUs) to perform the calculations, traditionally performed by CPUs, to solve the math required for data processing and scientific simulation GPU Computing is also known as General Purpose computing on GPU (GPGPU) These early efforts to use GPUs as general-purpose processors required reformulating computational problems in terms of graphics primitives, as supported by the two major APIs for graphics processors,OpenGL and DirectX. This cumbersome translation was obviated by the advent of general-purpose programming languages and APIs such as Sh/RapidMind, Brook and Accelerator.[7][8] These were followed by Nvidia's CUDA, which allowed programmers to ignore the underlying graphical concepts in favor of more common high-performance computing concepts.[6] Newer, hardware vendor-independent offerings include Microsoft's DirectCompute and Apple/Khronos Group's OpenCL.[6] This means that modern GPGPU pipelines can leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form.

Parallel Programming Sequential program: The performance of a sequential program is not improving anymore as processor cores are are not getting any faster no new features and capabilities into software  reducing growth opportunities of the entire computer industry Parallel programs are the future as their performance improves with each new generation of many core processors parallel programming is around for decades; run on large-scale, expensive computers limiting the practice of parallel programming 1. A sequential program will only run on one of the processor cores, which will not become significantly faster than those in use today. Without performance improvement, application developers will no longer be able to introduce new features and capabilities into their software as new microprocessors are introduced, thus reducing the growth opportunities of the entire computer industry 2. The practice of parallel programming is by no means new. The high-performance computing community has been developing parallel programs for decades. These programs run on large-scale, expensive computers. Only a few elite applications can justify the use of these expensive computers, thus limiting the practice of parallel programming to a small number of application developers. Now that all new microprocessors are parallel computers, the number of applications that must be developed as parallel programs has increased dramatically. There is now a great need for software developers to learn about parallel programming

Parallel Programming Parallel programming for the masses:
all new computer systems are parallel computers so number of parallel programs has increased dramatically. Parallel programming for multicore processors or Graphics Processing Units (GPUs) or heterogenous(CPU plus GPU). Our focus in this course is heterogenous programming using CUDA 1. A sequential program will only run on one of the processor cores, which will not become significantly faster than those in use today. Without performance improvement, application developers will no longer be able to introduce new features and capabilities into their software as new microprocessors are introduced, thus reducing the growth opportunities of the entire computer industry 2. The practice of parallel programming is by no means new. The high-performance computing community has been developing parallel programs for decades. These programs run on large-scale, expensive computers. Only a few elite applications can justify the use of these expensive computers, thus limiting the practice of parallel programming to a small number of application developers. Now that all new microprocessors are parallel computers, the number of applications that must be developed as parallel programs has increased dramatically. There is now a great need for software developers to learn about parallel programming

Can We Avoid or Drop this course?
YES! It depends. End User: no need to take this course; as an end user of many software applications, e.g. MATLAB, Mathematica, gnumpy or Theano in Python etc Programmer: must take; Write parallel programs that run on GPUs and/or CPUs Skill: take this course for a skill; Many cuda programmers are engineers, e.g. Mechanical -You can use GPUs with MATLAB through Parallel Computing Toolbox and perform fft, filter, and several linear algebra operations. -Mathematica simplifies GPU programming with CUDALink and OpenCLLink -Do you want to have both the compute power of GPU's and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that. NumPy is the fundamental package for scientific computing with Python -Deep Learning in Python: Theano is a Python library for fast numerical computation that can be run on the CPU or GPU

GPU Computation Power

GPU Bandwidth

What Kinds of Computation Map Well to GPUs?
well-suited to data-parallel applications - the same program is executed on many data elements in parallel i.e. Single Program Multiple Data (SPMD) Two key attributes: Compute-intensive: GPUs suits to compute-intensive jobs Data Parallelism and Data Independence(ideal case) Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. In data parallel operations, the source collection is partitioned so that multiple threads can operate on different segments concurrently.

GPU Applications Molecular Dynamics: Quantum Chemistry:
molecular mechanics force fields molecular dynamics on biomolecule Protein folding, misfolding, aggregation, etc Quantum Chemistry: Quantum Monte Carlo (QMC) electronic structure calculations for finite and periodic systems Materials Science investigating the effects of temperature on magnetism Solving the many-body Schrodinger equation for electronic structures

GPU Applications Differential Equations: Used in many disciplines of science and engineering Particle tracing Navier-Stokes eq. for incomressible fluid flow Linear Algebra: sparse & dense linear algebra; building blocks of numeric algorithms real-time visual effects Signal and Image Processing Databases and Data Mining Physics Simulation for games

GPU Applications Bioinformatics Numerical Analytics Physics
Sequence mapping software Open source software for Smith-Waterman protein database searches on GPUs Numerical Analytics Physics Defense and Intelligence Video analytics Computational Finance Computational Fluid Dynamics

GPU Applications Oil and Gas: Weather and climate forecasting
Seismic Processing Weather and climate forecasting Global climate model Regional atmospheric model More: GPU-accelerated Applications

GPU-based Systems GPUs in desktops, workstations
GPUs in mobiles, tablets GPUs in laptops GPUs in servers, supercomputers More: NVIDIA GPU-based Products/ Technologies

GPU advantges Fast: measured in FLOPS
Cheap: measured in performance-per-dollar Energy-efficient: measured in performance-per-watt Example: Bloomberg shifted one bond pricing application running on 2,000 CPUs to a 48 GPU rack of NVIDIA Tesla GPUs. The CPU system cost $4 million and $1.2 million in annual energy bills; the GPU one cost under $150,000, with about $30,000 yearly in energy. More Info In finance, HPC systems handle the complex transactions behind global markets. To reduce costs and save energy, Bloomberg shifted one bond pricing application running on 2,000 CPUs to a 48 GPU rack of NVIDIA Tesla GPUs. The CPU system cost $4 million and $1.2 million in annual energy bills; the GPU one cost under $150,000, with about $30,000 yearly in energy. Similarly, the French bank BNP Paribas swapped out a 64 CPU system for a pair of NVIDIA Tesla S1070 systems – just eight GPUs – and cut energy use from 44 kilowatts to 2.9 kilowatts. -Bloomberg delivers business and markets news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News

Prerequisites C/C++ programming -- Most programming is in C.
This course is concerned with programming computers, servers and clusters that have GPUs support – both Windows and Linux although most work is likely to be done on Linux systems

Course Text Primary Book: Professional CUDA C Prog By john cheng Others: Programming Massively Parallel Processors 2. CUDA by Example 3. The CUDA handbook

Course Contents Introduction to the GPU programming model and CUDA, device memory Basic CUDA program structure, kernel calls, threads, blocks, grid, thread addressing, predefined variables, example code: matrix addition and multiplication More program demonstrations illustrating various features, CUDA API, timing, synchronization, atomics, ... Monte Carlo programs, Illustration of CUDA random number generator and __device__ routines

Course Contents Global barrier synchronization.
Critical sections and atomics. Parallel sorting. Reduction Convolution Scan

Course Contents More advanced features of CUDA, streams, multiple GPUs, using shared memory, constant memory, coalesced global memory access Optimizing performance, using knowledge of warps, and other characteristics of GPUs, overlapping computations, effects of control, flow, ..., Building complex applications, debugging tools, ... Hybrid programming incorporating OpenMP and/or MPI with CUDA, ...

Programming assignments 10% Course project 20% Sessional 2 15%
Assessment Class quizzes/tests 05% Programming assignments 10% Course project 20% Sessional % Final exam % The assessment and percentages may be modified.

Donation from NVIDIA for GPU Education Center
Lab Donation from NVIDIA for GPU Education Center ONE Two

or send me an email for a mutually convenient time.
Instructor Dr. Muhammad Abid B-Block, Room B-210 Office Hours Any time or send me an for a mutually convenient time.

Backup Slides

CPU vs GPU Optimized for sequential code
Sophisticated control logic to execute instructions in parallel or even out-of-order Branch prediction for reduced branch latency Data forwarding for reduced data latency Optimized for compute-intensive, highly data parallel applications like Graphics rendering Simple control logic shared by number of processor cores No branch prediction No data forwarding

CPU vs GPU High memory bandwidth Small memory bandwidth
Large caches: Convert long latency memory accesses to short latency cache accesses Small memory bandwidth Instructions in the same thread tolerate latencies Small caches: To boost memory throughput High memory bandwidth Require massive number of threads to tolerate latencies

CPU vs GPU Out-of-order processor cores with few threads support
Heavily multithreaded in-order processor cores

Most applications use both CPUs and GPUs
CPU vs GPU CPU GPU Most applications use both CPUs and GPUs

More History: Read this presentation

GPU Computing CIS-543 Lecture 01: GPU Computing

Similar presentations

Presentation on theme: "GPU Computing CIS-543 Lecture 01: GPU Computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GPU Computing CIS-543 Lecture 01: GPU Computing

Similar presentations

Presentation on theme: "GPU Computing CIS-543 Lecture 01: GPU Computing"— Presentation transcript:

Similar presentations

About project

Feedback