Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008

Previously CUDA programming model CUDA programming model –GPU as co-processor –Kernel definition and invocation –Thread blocks – 1D, 2D, 3D –Thread ID and threadIdx –Global/shared memory for threads –Compute capability

Today Theory/practical course? Theory/practical course? CUDA programming model CUDA programming model –Limitations on number of threads –Grids of thread blocks

Today Theory/practical course? Theory/practical course? –The course is meant to be practical –Programming with CUDA –Is that a problem for some of you? –Should we change something?

The CUDA Programming Model (cont'd)‏

Number of threads A kernel is executed on the device simultaneously by many threads dim3 blockSize(Dx,Dy,Dz); // for 1D block, Dy = 1 // for 2D block, Dz = 1 kernel >>(...)‏ A kernel is executed on the device simultaneously by many threads dim3 blockSize(Dx,Dy,Dz); // for 1D block, Dy = 1 // for 2D block, Dz = 1 kernel >>(...)‏ –# threads = blockSize = Dx*Dy*Dz

A bit about the hardware The GPU consists of several multiprocessors The GPU consists of several multiprocessors Each multiprocessor consists of several processors Each multiprocessor consists of several processors Each processor in a multiprocessor has its local memory in the form of registers Each processor in a multiprocessor has its local memory in the form of registers All processors in a multiprocessor have access to a shared memory All processors in a multiprocessor have access to a shared memory

Threads and processors All threads in a block run on the same multiprocessor. All threads in a block run on the same multiprocessor. –They might not all run at the same time –Therefore, threads should be independent of each other – __syncthreads() causes all threads to reach the same execution point before carrying on.

Threads and processors How many threads can run on a multiprocessor? How many threads can run on a multiprocessor? –how much memory the multiprocessor has –how much memory does each thread require

Threads and processors How many threads can a block have? How many threads can a block have? –how much memory the multiprocessor has –how much memory does each thread require

Grids of Blocks What if I want to run more threads? What if I want to run more threads? –Call multiple blocks of threads –These form a grid of blocks A grid can be 1D or 2D A grid can be 1D or 2D

Grids of Blocks Example of 1D grid Invoke (in main): int N; // assign some value to N dim3 blockDimension (N,N); kernel >> (...); Example of 1D grid Invoke (in main): int N; // assign some value to N dim3 blockDimension (N,N); kernel >> (...); Example of 2D grid Invoke (in main): int N; // assign some value to N dim3 blockDimension (N,N); dim3 gridDimension (N,N); kernel >> (...); Example of 2D grid Invoke (in main): int N; // assign some value to N dim3 blockDimension (N,N); dim3 gridDimension (N,N); kernel >> (...);

Grids of Blocks Invoking a grid: kernel >> (...); Invoking a grid: kernel >> (...); – # threads = gridDimension* blockDimension

Accessing block information Grids can be 1D or 2D Grids can be 1D or 2D The index of a block in a grid is available through the blockIdx variable The index of a block in a grid is available through the blockIdx variable The dimension of a block is available through the blockDim vairable The dimension of a block is available through the blockDim vairable

Arranging blocks Threads in a block should be independent of other threads in the block Threads in a block should be independent of other threads in the block Blocks in a grid should be independent of other blocks in the grid Blocks in a grid should be independent of other blocks in the grid

Memory available to threads Each thread has a local memory Each thread has a local memory Threads in a block share a shared memory Threads in a block share a shared memory All threads can access the global memory All threads can access the global memory

Memory available to threads All threads have read-only access to constant and texture memories All threads have read-only access to constant and texture memories

Memory available to threads An application is expected to manage An application is expected to manage –global, constant and texture memory spaces –Data transfer between host and device memories –(de)allocating host and device memory

Have a nice weekend See you next time

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Similar presentations

Presentation on theme: "Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

Similar presentations

Presentation on theme: "Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008."— Presentation transcript:

Similar presentations

About project

Feedback