# CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]

## Presentation on theme: "CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]"— Presentation transcript:

CS 179: Lecture 2 Lab Review 1

The Problem  Add two arrays  A[] + B[] -> C[]

GPU Computing: Step by Step  Setup inputs on the host (CPU-accessible memory)  Allocate memory for inputs on the GPU  Copy inputs from host to GPU  Allocate memory for outputs on the host  Allocate memory for outputs on the GPU  Start GPU kernel  Copy output from GPU to host  (Copying can be asynchronous)

The Kernel  Determine a thread index from block ID and thread ID within a block:

Calling the Kernel …

CUDA implementation (2)

Fixing the Kernel  For large arrays, our kernel doesn’t work!  Bounds-checking – be on the lookout!  Also, need a way for kernel to handle a few more elements…

Fixing the Kernel – Part 1

Fixing the Kernel – Part 2

Fixing our Call

Lab 1!  Sum of polynomials – Fun, parallelizable example!  Suppose we have a polynomial P(r) with coefficients c 0, …, c n-1, given by:  We want, for r 0, …, r N-1, the sum:  Output condenses to one number!

Calculating P(r) once  Pseudocode (one possible method): Given r, coefficients[] result <- 0.0 power <- 1.0 for all coefficient indecies i from 0 to n-1: result += (coefficients[i] * power) power *= r

Accumulation  atomicAdd() function  Important for safe operations!

Accumulation

Shared Memory  Faster than global memory  Per-block  One block

Linear Accumulation  atomicAdd() has a choke point!  What if we reduced our results in parallel?

Linear Accumulation …

Linear Accumulation (2)

Can we do better?

Last notes  minuteman.cms.caltech.edu – the easiest option  CMS accounts!  Office hours  Kevin: Monday, 8-10 PM  Connor: Tuesday, 8-10 PM

Download ppt "CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]"

Similar presentations