3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3

GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical path of application ● Data parallel algorithms leverage GPU attributes – Large data arrays, streaming throughput – Fine-grain SIMD parallelism – Low-latency floating point (FP) computation

GPGPU Constraints ● Dealing with graphics API – Working with the corner cases of the graphics API ● Addressing modes – Limited texture size/dimension ● Shader capabilities – Limited outputs ● Instruction sets – Lack of Integer & bit ops ● Communication limited – Between pixels – Scatter a[i] = p

CUDA ● General purpose programming model – User kicks off batches of threads on the GPU – GPU = dedicated super-threaded, massively data parallel co-processor ● Targeted software stack – Compute oriented drivers, language, and tools

CUDA ● Driver for loading computation programs into GPU – Standalone Driver - Optimized for computation – Interface designed for compute - graphics free API – Data sharing with OpenGL buffer objects – Guaranteed maximum download & readback speeds – Explicit GPU memory management

Parallel Computing on a GPU NVIDIA GPU Computing Architecture – Via a separate HW interface – In laptops, desktops, workstations, servers 8-series GPUs deliver 50 to 200 GFLOPS on compiled parallel C applications

Parallel Computing on a GPU GPU parallelism is doubling every year Programming model scales transparently Programmable in C with CUDA tools Multithreaded SPMD model uses application-data parallelism and thread parallelism

CPU vs GPU

● GPU Baseline speedup is approximately 60x ● For 500,000 particles that is a reduction in calculation time from 33 minutes to 33 seconds!

Conclusion ● Without optimization we already got an amazing speedup on CUDA ● N 2 algorithm is “made” for CUDA ● Optimizations are hard to predict in advance  tradeoffs

Conclusion ● There are ways to dynamically distribute workloads across a fixed number of blocks ● Biggest problem: how to handle dynamic results in global memory

Uses – CUDA provided benefit for many applications. Here list of some: ● Seismic Database - 66x to 100x speedup http://www.headwave.com. ● Molecular Dynamics - 21x to 100x speedup http://www.ks.uiuc.edu/Research/vmd ● MRI processing - 245x to 415x speedup ● http://bic-test.beckman.uiuc.edu ● Atmospheric Cloud Simulation - 50x speedup http://www.cs.clemson.edu/~jesteel/clouds.html.

References – CUDA, Supercomputing for the Masses by Rob Farber. ● http://www.ddj.com/architect/207200659. – CUDA, Wikipedia. ● http://en.wikipedia.org/wiki/CUDA. – Cuda for developers, Nvidia. ● http://www.nvidia.com/object/cuda_home.html#. – Download CUDA manual and binaries. ● http://www.nvidia.com/object/cuda_get.html

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

Similar presentations

Presentation on theme: "3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

Similar presentations

Presentation on theme: "3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical."— Presentation transcript:

Similar presentations

About project

Feedback