CS 179: GPU Programming Lecture 1: Introduction 1

Slides:



Advertisements
Similar presentations
Lecture 1: Introduction
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CS 179: Lecture 2 Lab Review 1. The Problem  Add two arrays  A[] + B[] -> C[]
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
Lecture 1: Introduction
Operating Systems CS451 Brian Bershad
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/ Introduction Lecture 01: Introduction COMP 175: Computer Graphics January 15, 2015.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
CS179: GPU Programming Lecture 16: Final Project Discussion.
GPU Architecture and Programming
OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.
Programming with CUDA WS 08/09 Lecture 1 Tue, 21 Oct, 2008.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
My Coordinates Office EM G.27 contact time:
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Computer Engg, IIT(BHU)
Computer Graphics Graphics Hardware
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
ENCM 369 Computer Organization
CS 179: GPU Programming Lecture 1: Introduction 1
EEL4720/5721 Reconfigurable Computing
CS427 Multicore Architecture and Parallel Computing
IT253: Computer Organization
An Introduction to GPU Computing
Chapter 4: Multithreaded Programming
Graphics Processing Unit
Programming Massively Parallel Graphics Processors
CS 179: GPU Programming Lecture 1: Introduction 1
Operating Systems and Systems Programming
Pipeline parallelism and Multi–GPU Programming
CS 179: GPU Programming Lecture 7.
EEL4930/5934 Reconfigurable Computing
CS 179 Project Intro.
NVIDIA Fermi Architecture
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
CS 0007 Spring Lory Al Moakar.
Programming Massively Parallel Graphics Processors
Computer Graphics Graphics Hardware
ECE 8823: GPU Architectures
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
CUDA Execution Model – III Streams and Events
EEL4720/5721 Reconfigurable Computing
Human Media Multicore Computing Lecture 1 : Course Overview
Graphics Processing Unit
CS 8803: Memory Models David Devecsery.
6- General Purpose GPU Programming
Sarah Diesburg Operating Systems CS 3430
EEL4930/5934 Reconfigurable Computing
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

CS 179: GPU Programming Lecture 1: Introduction 1 Images: http://en.wikipedia.org http://www.pcper.com http://northdallasradiationoncology.com/ GPU Gems (Nvidia) 1

Administration Covered topics: (GP)GPU computing/parallelization C++ CUDA (parallel computing platform) TAs: Andrew Zhao (azhao@dmail.caltech.edu) Parker Won (jwon@caltech.edu) Nailen Matchstick (nailen@caltech.edu) Jordan Bonilla (jbonilla@caltech.edu) Website: http://courses.cms.caltech.edu/cs179/ Overseeing Instructor: Al Barr (barr@cs.caltech.edu) Class time: ANB 107, MWF 3:00 PM

Course Requirements Homework: Final project: 6 weekly assignments Each worth 10% of grade Final project: 4-week project 40% of grade total

Homework Due on Wednesdays before class (3PM) Collaboration policy: Discuss ideas and strategies freely, but all code must be your own Office Hours: Located in ANB 104 Times: TBA (will be announced before first set is out) Extensions Ask a TA for one if you have a valid reason

Projects Topic of your choice Teams of up to 2 people Requirements We will also provide many options Teams of up to 2 people 2-person teams will be held to higher expectations Requirements Project Proposal Progress report(s) and Final Presentation More info later…

Machines Primary machine (multi-GPU, remote access): haru.caltech.edu Secondary machines mx.cms.caltech.edu minuteman.cms.caltech.edu Use your CMS login NOTE: Not all assignments work on these machines Change your password Use passwd command

Machines Alternative: Use your own machine: Must have an NVIDIA CUDA-capable GPU Virtual machines won’t work Exception: Machines with I/O MMU virtualization and certain GPUs Special requirements for: Hybrid/optimus systems Mac/OS X Setup guides posted on the course website

Machines OS/Server Access Survey PLEASE take this survey by 12PM Wednesday (03/30/2016) https://www.surveymonkey.com/r/DTKX2HX (link will be sent out via email after class) 8

The CPU The “Central Processing Unit” Traditionally, applications use CPU for primary calculations General-purpose capabilities Established technology Usually equipped with 8 or less powerful cores Optimal for concurrent processes but not large scale parallel computations Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg

The CPU The “Central Processing Unit” Traditionally, applications use CPU for primary calculations General-purpose capabilities Established technology Usually equipped with 8 or less powerful cores Optimal for concurrent processes but not large scale parallel computations Wikimedia commons: Intel_CPU_Pentium_4_640_Prescott_bottom.jpg 10

The GPU The "Graphics Processing Unit" Relatively new technology designed for parallelizable problems Initially created specifically for graphics Became more capable of general computations

GPUs – The Motivation for all pixels (i,j): Raytracing: for all pixels (i,j): Calculate ray point and direction in 3d space if ray intersects object: calculate lighting at closest object store color of (i,j) Superquadric Cylinders, exponent 0.1, yellow glass balls, Barr, 1981

EXAMPLE Add two arrays On the CPU: A[] + B[] -> C[] float *C = malloc(N * sizeof(float)); for (int i = 0; i < N; i++) C[i] = A[i] + B[i]; Operates sequentially… can we do better?

A simple problem… On the CPU (multi-threaded, pseudocode): (allocate memory for C) Create # of threads equal to number of cores on processor (around 2, 4, perhaps 8) (Indicate portions of A, B, C to each thread...) ... In each thread, For (i from beginning region of thread) C[i] <- A[i] + B[i] //lots of waiting involved for memory reads, writes, ... Wait for threads to synchronize... Slightly faster – 2-8x (slightly more with other tricks)

A simple problem… How many threads? How does performance scale? Context switching: High penalty on the CPU Low penalty on the GPU

A simple problem… On the GPU: (allocate memory for A, B, C on GPU) Create the “kernel” – each thread will perform one (or a few) additions Specify the following kernel operation: For (all i‘s assigned to this thread) C[i] <- A[i] + B[i] Start ~20000 (!) threads Wait for threads to synchronize...

GPU: Strengths Revealed Parallelism / lots of cores Low context switch penalty! We can “cover up” performance loss by creating more threads!

GPU Computing: Step by Step Setup inputs on the host (CPU-accessible memory) Allocate memory for inputs on the GPU Allocate memory for outputs on the host Allocate memory for outputs on the GPU Copy inputs from host to GPU Start GPU kernel Copy output from GPU to host (Copying can be asynchronous)

The Kernel Our “parallel” function Simple implementation

Indexing Can get a block ID and thread ID within the block: Unique thread ID!

Calling the Kernel

Calling the Kernel (2)

Questions?

GPUs – Brief History Fixed-function pipelines Pre-set functions, limited options http://gamedevelopment.tutsplus.com/articles/the-end-of-fixed-function-rendering-pipelines-and-how-to-move-on--cms-21469 Source: Super Mario 64, by Nintendo

GPUs – Brief History Shaders Could implement one’s own functions! GLSL (C-like language) Could “sneak in” general-purpose programming! http://minecraftsix.com/glsl-shaders-mod/

GPUs – Brief History CUDA (Compute Unified Device Architecture) General-purpose parallel computing platform for NVIDIA GPUs OpenCL (Open Computing Language) General heterogenous computing framework … Accessible as extensions to C! (and other languages…)

GPUs Today “General-purpose computing on GPUs” (GPGPU)