GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Start GPU kernel Copy output from GPU to host (Copying can be asynchronous)
The Kernel Determine a thread index from block ID and thread ID within a block:
Lab 1! Sum of polynomials – Fun, parallelizable example! Suppose we have a polynomial P(r) with coefficients c 0, …, c n-1, given by: We want, for r 0, …, r N-1, the sum: Output condenses to one number!
Calculating P(r) once Pseudocode (one possible method): Given r, coefficients result <- 0.0 power <- 1.0 for all coefficient indecies i from 0 to n-1: result += (coefficients[i] * power) power *= r
Accumulation atomicAdd() function Important for safe operations!