Presentation is loading. Please wait.

Presentation is loading. Please wait.

CUDA and the Memory Model (Part II). Code executed on GPU.

Similar presentations


Presentation on theme: "CUDA and the Memory Model (Part II). Code executed on GPU."— Presentation transcript:

1 CUDA and the Memory Model (Part II)

2 Code executed on GPU

3 Variable Qualifiers (GPU Code)

4 CUDA: Features available to kernals Standard mathematical functions  Sinf, powf, atanf, ceil, etc Built-in vector types  Float4, int4, uint4, etc for dimensions 1…4 Texture accesses in kernels  Texture my_texture // declare texture reference Float4 texel = texfetch (my_texture, u, v);

5 Thread Synchronization function

6 Host Synchronization (for Kalin…)

7 Thread Blocks must be independent

8

9 Example: Increment Array Elements

10

11 Example: Host Code

12 CUDA Error Reporting to CPU

13 CUDA Event API (someone asked out this….)

14 Shared Memory On-chip  2 orders of magnitude lower latency than global memory  Order of magnitude higher bandwidth than global memory  16KB per multiprocessor NVIDIA GPUs contain up to ~ 30 multiprocessors Allocated per threadblock Accessible by any thread in the threadblock  Not accessible to other threadblocks Several uses:  Sharing data among threads in a threadblock  User-managed cache (reducing global memory accesses)

15 Using Shared Memory Slide Courtesy of NVIDA: Timo Stich

16 Using Shared Memory

17

18

19 Thread counts More threads per block are better for time slicing -Minimum: 64, Ideal: 192-256 More threads per block means fewer registers per thread -Kernel invocation may fail if the kernel compiles to more registers than are available Threads within a block can be synchronized -Important for SIMD efficiency The maximum threads allowed per grid is 64K^3

20 Block Counts There should be at least as many blocks as multiprocessors -The number of blocks should be at least 100 to scale to future generations Blocks within a grid can not be synchronized Blocks can only be swapped by partitioning registers and shared memory among them


Download ppt "CUDA and the Memory Model (Part II). Code executed on GPU."

Similar presentations


Ads by Google