1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE 2011 - The 42 nd ACM Technical.

Slides:

Advertisements

Similar presentations

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.

Advertisements

Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.

Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

Demo of running CUDA programs on GPU and potential speed-up over CPU ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 10, 2011.

AssignPrelim1.1 ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. Jan 13, 2007 Course Preliminaries.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.

1 Teaching Grid Computing across North Carolina and Beyond Dr. Clayton Ferner University of North Carolina Wilmington Dr. Barry Wilkinson University of.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.

Panda: MapReduce Framework on GPU’s and CPU’s

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 22, 2011assignprelim.1 Assignment Preliminaries ITCS 6010/8010 Spring 2011.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, 2012, Jan 18, 2012assignprelim.1 Assignment Preliminaries ITCS 4145/5145 Spring 2012.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

1 ITCS 4/5010 GPU Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU.

ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 28, 2012assignprelim.1 Assignment Preliminaries ITCS 4010/5010 Spring 2013.

Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.

Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CMPS 5433 Dr. Ranette Halverson Programming Massively.

ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson Dec 24, 2012outline.1 ITCS 4010/5010 Topics in Computer Science: GPU Programming for High Performance.

1 Short Course on Grid Computing Jornadas Chilenas de Computación 2010 INFONOR-CHILE 2010 November 15th - 19th, 2010 Antofagasta, Chile Dr. Barry Wilkinson.

1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Outline.1 Grid Computing Fall 2011 Tuesday/Thursday 9:30 am - 10:45 pm Instructors © 2011 B. Wilkinson/Clayton Ferner. Modification date: Aug 22, 2011.

ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson, Dec 26, 2012outline.1 ITCS 4145/5145 Parallel Programming Spring 2013 Barry Wilkinson Department.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

GPU Programming Shirley Moore CPS 5401 Fall 2013

Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University of Seoul) Chao-Yue Lai (UC Berkeley) Slav Petrov (Google Research) Kurt Keutzer (UC Berkeley)

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013 Branching.ppt Control Flow These notes will introduce scheduling control-flow.

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Assignprelim.1 Assignment Preliminaries © 2012 B. Wilkinson/Clayton Ferner. Modification date: Aug 10, 2012.

Data Parallel Computations and Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson, slides6c.ppt Nov 4, c.1.

ITCS 4145 Parallel Computing Spring 2016

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

GPU-based iterative CT reconstruction

Parallel Computing Lecture

Constructing a system with multiple computers or processors

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Nov 4, 2013.

Dr. Barry Wilkinson © B. Wilkinson Modification date: Jan 9a, 2014

Dr. Barry Wilkinson University of North Carolina Charlotte

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Dr. Barry Wilkinson University of North Carolina Charlotte

Dr. Barry Wilkinson University of North Carolina Charlotte

CUDA Programming Model

Presentation transcript:

1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE The 42 nd ACM Technical Symposium on Computer Science Education Wednesday March 9, 2011, 7:00 pm - 10:00 pm Dr. Barry Wilkinson University of North Carolina Charlotte Dr. Yaohang Li Old Dominion University SIGCSE 2011 Workshop 9 intro.ppt © 2010 B. Wilkinson Modification date: Feb 22, 2011

2 Agenda 7:00 pm - 7:15 pmWelcome and opening remarks: GPUs and CUDA, remote server configurations, guest accounts, sample programs with graphics. 7:15pm - 8:25 pmSession 1: Basic CUDA programming Presentation Hands-on experience using remote GPU server 8:25 pm - 8:35 pmBreak, with demos 8:35 pm - 9:35 pmSession 2: Further features and performance of CUDA programs Presentation Guided hands-on experience 9:35 pm - 10:00 pmDiscussion of general-purpose GPU programming at undergraduate level 2

3 GPU performance gains over CPUs T12 Westmere NV30 NV40 G70 G80 GT200 3GHz Dual Core P4 3GHz Core2 Duo 3GHz Xeon Quad Source © David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign Emergence of GPU systems for General Purpose High Performance Computing GPUs have developed from graphics cards into a platform for HPC GPUs are being designed with that application in mind Very significant performance improvements on scientific code

4

outline.5 A hot topic to teach Taught at Illinois, Stanford, MIT, Harvard, Duke, Chapel Hill, UNC-C, … Taught at graduate level and now moving into undergraduate level

GPU Course for High Performance Computing Concerned with using Graphics Processing Units (GPUs) for high performance computing Not graphics A programming course Uses CUDA (Compute Unified Device Architecture), an architecture and programming model introduced by NVIDIA in 2007 C-based. Easy to learn.

NVIDIA products NVIDIA Corp. is the leader in GPUs for high performance computing: Established by Jen- Hsun Huang, Chris Malachowsky, Curtis Priem NV1GeForce 1 GeForce 2 series GeForce FX series GeForce 8 series GeForce 200 series GeForce 400 series GTX460/465/470/475/ 480/485 GTX260/275/280/285/295 GeForce 8800 GT 80 Tesla Quadro NVIDIA's first GPU with general purpose processors C870, S870, C1060, S1070, C2050, … Tesla 2050 GPU has 448 thread processors Fermi Kepler (2011) Maxwell (2013) CUDA

8 Programming Model GPUs historically designed for creating image data for displays. Involves manipulating image picture elements (pixels) and often the same operation each pixel. SIMD (Single Instruction Multiple Data) model - An efficient mode of operation in which the same operation done on each data element at the same time. GPUs use a thread version of SIMD called Single Instruction Multiple Thread (SIMT).

9 GPU’s SIMT Programming Model GPUs use very lightweight threads to achieve high parallel performance and to hide memory latency Multiple threads, each execute the same instruction sequence. Very large number of threads (10,000’s) possible on GPUs. Threads mapped onto available processors on GPU (100’s of processors) all executing same program sequence More on the program model shortly

10 Programming applications using SIMT model Matrix operations -- very amenable to SIMT Same operations done on different elements of matrices Some “embarassingly” parallel computations such as Monte Carlo calculations Monte Carlo calculations use random selections that are independent of each other Data manipulations Some sorting can be done quite efficiently

coit-grid01-4 Each dual Xeon processors (3.4Ghz) 8GB main memory coit-grid05 -- Four quad-core Xeon processors (2.93Ghz) 64GB main memory 1.2 TB disk coit- grid01 coit-grid01.uncc.edu – coit-grid06.uncc.edu switch coit- grid05 coit- grid03 coit- grid02 coit- grid04 All user’s home directories on coit-grid05 (NFS) Computer system used for workshop at UNC-Charlotte coit- grid06 NVIDIA Tesla GPU (448 core Fermi) System to log onto firstOnly available directly from on campus

Guest accounts on computer systems Account details consist of an account name and an ssh password. Logon through first to coit-grid01 and then to grid06 Files needed for hands-on sessions provided in each account. More details in hands-on session write-ups Use PuTTY or WinSCP if Windows coit-grid01.uncc.edu

13 Xclock running on client PC Xclock running on coit- grid01.uncc.edu Xclock running on coit- grid06.uncc.edu Xterm running on client PC, logged onto coit-grid06.uncc.edu User interface accessing for forwarding X11 graphics Not needed for workshop WinSCP running on client PC connected to grid01.uncc.edu To make sure all X servers running

14 Simple implementation 800 x 800 points iterations Speed-up = Fireplace Heat distribution problem (Solving Laplace’s equation) Graphics forwards to client computer (PC)

15 N Body problem

16 Video

Questions

Next Basic CUDA programming Intro to 1 st hands-on session