Adding GPU Computing to Computer Organization Courses Karen L. Karavanic Portland State University with David Bunde, Knox College and Jens Mache, Lewis.

Slides:

Advertisements

Similar presentations

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

Advertisements

0 - 0.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25, 2011 DeviceRoutines.pptx Device Routines and device variables These notes will introduce:

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

Intermediate GPGPU Programming in CUDA

List Ranking and Parallel Prefix

INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

Test B, 100 Subtraction Facts

GPU Programming using BU Shared Computing Cluster

1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.

CS 791v Fall # a simple makefile for building the sample program. # I use multiple versions of gcc, but cuda only supports # gcc 4.4 or lower. The.

CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 20, 2011 CUDA Programming Model These notes will introduce: Basic GPU programming model.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

CUDA (Compute Unified Device Architecture) Supercomputing for the Masses by Peter Zalutski.

CUDA Grids, Blocks, and Threads

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

© David Kirk/NVIDIA and Wen-mei W. Hwu, , SSL 2014, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

An Introduction to Programming with CUDA Paul Richmond

2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

First CUDA Program. #include "stdio.h" int main() { printf("Hello, world\n"); return 0; } #include __global__ void kernel (void) { } int main (void) {

1 ITCS 4/5010 GPU Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU.

GPU in HPC Scott A. Friedman ATS Research Computing Technologies.

CUDA All material not from online sources/textbook copyright © Travis Desell, 2012.

CIS 565 Fall 2011 Qing Sun

Lecture 8 : Manycore GPU Programming with CUDA Courtesy : Prof. Christopher Cooper’s and Prof. Chowdhury’s course note slides are used in this lecture.

Parallel Processing1 GPU Program Optimization (CS 680) Parallel Programming with CUDA * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University of Seoul) Chao-Yue Lai (UC Berkeley) Slav Petrov (Google Research) Kurt Keutzer (UC Berkeley)

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Introduction to CUDA C (Part 2)

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.

Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.

© David Kirk/NVIDIA and Wen-mei W. Hwu, CS/EE 217 GPU Architecture and Programming Lecture 2: Introduction to CUDA C.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE 8823A GPU Architectures Module 2: Introduction.

1 ITCS 5/4010 Parallel computing, B. Wilkinson, Jan 14, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One dimensional.

Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.

My Coordinates Office EM G.27 contact time:

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.

1 ITCS 4/5145GPU Programming, UNC-Charlotte, B. Wilkinson, Nov 4, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU programming.

Introduction to CUDA Programming CUDA Programming Introduction Andreas Moshovos Winter 2009 Some slides/material from: UIUC course by Wen-Mei Hwu and David.

S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.

Unit -VI  Cloud and Mobile Computing Principles  CUDA Blocks and Treads  Memory handling with CUDA  Multi-CPU and Multi-GPU solution.

CUDA C/C++ Basics Part 3 – Shared memory and synchronization

CUDA C/C++ Basics Part 2 - Blocks and Threads

Prof. Zhang Gang School of Computer Sci. & Tech.

CUDA Programming Model

Basic CUDA Programming

Programming Massively Parallel Graphics Processors

Programming Massively Parallel Graphics Processors

CUDA Programming Model

CUDA Programming Model

6- General Purpose GPU Programming

Presentation transcript:

Adding GPU Computing to Computer Organization Courses Karen L. Karavanic Portland State University with David Bunde, Knox College and Jens Mache, Lewis & Clark College

Our Backgrounds in CUDA Education Karavanic (PSU) – new course “Multicore Computing” in 2008 – “General Purpose GPU Computing” in 2010 – Mixed graduate/undergraduate Mache (Lewis & Clark) – Special topics course in CUDA – Project with students “Game of Life” Module Bunde (Knox) – Modules for teaching CUDA within existing courses SC12 HPC Educators [Full-Day] Session: – An Educators Toolbox for CUDA Adding GPU Computing to Computing Organization Courses 2

Why Teach Parallel Computing with GPUs? It is here – Students have GPUs (on desk/ on lap/ in pocket) – Inexpensive (no need to pay $$$ or to build) We see the future – Massively parallel: 100s of cores – Ahead of the curve (how many cores in your CPU?) We see pay-off – Performance improvements – Knowledge of computer architecture helps

4 Example CUDA program Adding two vectors, A and B N elements in A and B, and N threads (without code to load arrays with data) #define N 256 __global__ void vecAdd(int *A, int *B, int *C) { int i = threadIdx.x; C[i] = A[i] + B[i]; } int main (int argc, char **argv ) { int size = N *sizeof( int); int *a, *b, *c, *devA, *devB, *devC; a = (int*)malloc(size); b = (int*)malloc(size); c = (int*)malloc(size); cudaMalloc( (void**)&devA, size) ); cudaMalloc( (void**)&devB, size ); cudaMalloc( (void**)&devC, size ); cudaMemcpy( devA, a, size, cudaMemcpyHostToDevice); cudaMemcpy( devB, b size, cudaMemcpyHostToDevice); vecAdd >>(devA, devB, devC); cudaMemcpy( c, devC size, cudaMemcpyDeviceToHost); cudaFree( devA); cudaFree( devB); cudaFree( devC); free( a ); free( b ); free( c ); return (0); }

Why teach GPUs in Computer Organization? “Feed me” – Thread “execution” configuration (threads, blocks) – Transfer CPU – GPU – Explicit cache management “Conflict” – Architecture leads to large penalties for naïve code – synchronization

Mache - Unit goals Idea of parallelism Benefits and costs of system heterogeneity Data movement and NUMA Generally, the effect of architecture on program performance Adding GPU Computing to Computing Organization Courses 6

Bunde – Module Design Brief time: Course has lots of other goals – One 70-minute lab and parts of 2 lectures Relatively inexperienced students – Some just out of CS 2 – Many didn’t know C or Unix programming Adding GPU Computing to Computing Organization Courses 7

Bunde: Approach taken Introductory lecture – GPUs: massively parallel, outside CPU, kernels, SIMD Lab illustrating features of CUDA architecture – Data transfer time – Thread divergence – Memory types (next time) “Lessons learned” lecture – Reiterate architecture – Demonstrate speedup with Game of Life – Talk about use in Top 500 systems Adding GPU Computing to Computing Organization Courses 8

Bunde: Survey results: Good news Asked to describe CPU/GPU interaction: – 9 of 11 mention both data movement and invoking kernel – Another just mentions invoking the kernel Asked to explain experiment illustrating data movement cost: – 9 of 12 say comparing computation and communication cost – 2 more talk about comparing different operations Adding GPU Computing to Computing Organization Courses 9

Bunde: Survey results: Not so good news Asked to explain experiment illustrating thread divergence: – 2 of 9 were correct – 2 more seemed to understand, but misused terminology – 3 more remembered performance effect, but said nothing about the cause Adding GPU Computing to Computing Organization Courses 10

Convey’s Game of Life Rules Visual Demo

Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 12 1=strongly disagree 7=strongly agree

Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 13 1=strongly disagree 7=strongly agree

Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 14 1=strongly disagree 7=strongly agree

Conclusions Bunde: – Unit was mostly successful, but thread divergence is a harder concept – Students interested in CUDA and about half the class requested more of it Mache: – What students say It’s not easy, it’s worthwhile, more please – What instructors think We’ll do it again, focus, use new resources Bottom line: A brief introduction is possible even to students with limited background Adding GPU Computing to Computing Organization Courses 15

Future Work Bunde – Will add constant memory and a small assignment to next offering Mache and Karavanic – Continuing Collaboration for summer 2013 course at PSU Versions of CUDA & Hardware Adding GPU Computing to Computing Organization Courses 16

Thank You We thank Barry Wilkinson for helpful input throughout our collaboration, and Julian Dale for his help in creating the GoL exercise and website. This material is based upon work supported by the National Science Foundation under grants , and ; by Intel; and by a PSU Miller Foundation Sustainability Grant. More information – Game of Life Exercise lclark.edu/~jmache/parallel – Authors Karen L. Karavanic karavan at cs.pdx.edu David Bunde dbunde at knox.edu Jens Mache jmache at lclark.edu Adding GPU Computing to Computing Organization Courses 17