CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed.

CUDA 5.0 By Peter Holvenstot CS6260

CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed @WMU

Major New Features GPUDirect  Allows Direct Memory Access GPU Object Linking  Libraries for GPU code Dynamic Parallelism  Kernels inside kernels

GPUDirect Allows Direct Memory Access to PCIe bus Third-party device access now supported Requires use of pinned memory DMAs can be chained across network

GPUDirect

Pinned Memory Malloc() - unpinned, can be paged out CudaHostAlloc() - pinned Cannot be paged out Takes longer to allocate, but allows features requiring DMA and increases copy performance

Kernel Linking Kernels now support compilation to.obj file Allows compiling into/against static libraries Allows closed-source distribution of libraries

Dynamic Parallelism CUDA 4.1: __device__ functions may make inline-able recursive calls However, __global__ functions/kernels cannot CUDA 5: GPU/kernels may launch additional kernels

Dynamic Parallelism Most important feature in release Reduces need for synchronization Allows program flow to be controlled by GPU Allows recursion and subdivision of problems

Dynamic Parallelism CPU code can now become a kernel Kernel calls can be used as tasks GPU controls kernel launch/flow/scheduling Increases practical thread count to thousands

Dynamic Parallelism Interesting data is not uniformly distributed Dynamic parallelism can launch additional threads in interesting areas Allows higher resolution in critical areas without slowing down others

Source: NVIDIA

Dynamic Parallelism Nested Dependencies Source: NVIDIA

Dynamic Parallelism Scheduling can be controlled by streams No new concurrency guarantees Launched kernels may execute out-of-order within a stream Named streams can guarantee concurrency

Dynamic Parallelism Nested Dependencies - cudaDeviceSynchronize () Can be used inside a kernel Synchronizes all launches by any kernel in block Does NOT imply __syncthreads()!

Dynamic Parallelism Kernel launch implies memory sync operation Child sees state at time of launch Parent sees child writes after sync Local and shared memory are private, cannot be shared with children

Questions?

Sources http://docs.nvidia.com/cuda/pdf/CUDA_Dynamic_ Parallelism_Programming_Guide.pdf http://docs.nvidia.com/cuda/gpudirect- rdma/index.html http://developer.download.nvidia.com/GTC/PDF/G TC2012/PresentationPDF/S0338-GTC2012- CUDA-Programming-Model.pdf https://developer.nvidia.com/content/trenches-gtc- cuda-5-and-beyond

CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed.

Similar presentations

Presentation on theme: "CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed.

Similar presentations

Presentation on theme: "CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed."— Presentation transcript:

Similar presentations

About project

Feedback