Download presentation

Presentation is loading. Please wait.

Published byLiliana Curvey Modified over 2 years ago

1
CS 179: Lecture 2 Lab Review 1

2
The Problem Add two arrays A[] + B[] -> C[]

3
GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Copy inputs from host to GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Start GPU kernel Copy output from GPU to host (Copying can be asynchronous)

4
The Kernel Determine a thread index from block ID and thread ID within a block:

5
Calling the Kernel …

6
CUDA implementation (2)

7
Fixing the Kernel For large arrays, our kernel doesn’t work! Bounds-checking – be on the lookout! Also, need a way for kernel to handle a few more elements…

8
Fixing the Kernel – Part 1

9
Fixing the Kernel – Part 2

10
Fixing our Call

11
Lab 1! Sum of polynomials – Fun, parallelizable example! Suppose we have a polynomial P(r) with coefficients c 0, …, c n-1, given by: We want, for r 0, …, r N-1, the sum: Output condenses to one number!

12
Calculating P(r) once Pseudocode (one possible method): Given r, coefficients[] result <- 0.0 power <- 1.0 for all coefficient indecies i from 0 to n-1: result += (coefficients[i] * power) power *= r

13
Accumulation atomicAdd() function Important for safe operations!

14
Accumulation

15
Shared Memory Faster than global memory Per-block One block

16
Linear Accumulation atomicAdd() has a choke point! What if we reduced our results in parallel?

17
Linear Accumulation …

18
Linear Accumulation (2)

19
Can we do better?

20
Last notes minuteman.cms.caltech.edu – the easiest option CMS accounts! Office hours Kevin: Monday, 8-10 PM Connor: Tuesday, 8-10 PM

Similar presentations

OK

CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:

CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on human body digestive system Ppt on disk formatting freeware Ppt on network security using quantum cryptography Ppt on industrial development in gujarat ahmedabad Ppt on earth hour Ppt on online examination system in php Ppt on credit policy and procedure Ppt on 9-11 conspiracy theories attacks Ppt on cervical cancer vaccine Ppt on heredity and evolution class 10