Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Make great images Intricate shapes Complex optical effects Seamless motion Make them fast Invent clever techniques Use every trick imaginable Build monster hardware Eugene d’Eon, David Luebke, Eric Enderton, In Proc. EGSR 2007 and GPU Gems 3 History of GPUs – Slide 2

History of GPUs – Slide 3 Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer

History of GPUs – Slide 5 Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer Transform from “world space” to “image space” Compute per-vertex lighting

History of GPUs – Slide 6 Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending Framebuffer Convert geometric representation (vertex) to image representation (fragment) Interpolate per- vertex quantities across pixels

Key abstraction of real-time graphics Hardware used to look like this One chip/board per stage Fixed data flow through pipelineVertexRasterize Pixel Test & Blend Framebuffer History of GPUs – Slide 8

Everything fixed function with a certain number of modes Number of modes for each stage grew over time Hard to optimize hardware Developers always wanted more flexibilityVertexRasterize Pixel Test & Blend Framebuffer History of GPUs – Slide 9

Remains a key abstraction Hardware used to look like this Vertex and pixel processing became programmable, new stages added GPU architecture increasingly centers around shader executionVertexRasterize Pixel Test & Blend Framebuffer History of GPUs – Slide 10

Exposing an (at first limited) instruction set for some stages Limited instructions and instruction types and no control flow at first Expanded to full ISAVertexRasterize Pixel Test & Blend Framebuffer History of GPUs – Slide 11

Workload and programming model provide lots of parallelism Applications provide large groups of vertices at once Vertices can be processed in parallel Apply same transform to all vertices Triangles contain many pixels Pixels from a triangle can be processed in parallel Apply same shader to all pixels Very efficient hardware to hide serialization bottlenecks History of GPUs – Slide 12

History of GPUs – Slide 13 Raster Vertex Pixel Blend Raster Vertex Pixel 0 Blend Pixel 1 Pixel 2 Pixel 3 Vrtx 0 Vrtx 2Vrtx 1

Note that we do the same thing for lots of pixels/vertices A warp = 32 threads launched together Usually execute together as well History of GPUs – Slide 14 ALU Control ALU Control ALU Control ALU Control ALU Control ALU Control ALU Control ALU

All this performance attracted developers To use GPUs, re-expressed their algorithms as general purpose computations using GPUs and graphics API in applications other than 3-D graphics Pretend to be graphics; disguise data as textures or geometry, disguise algorithm as render passes Fool graphics pipeline to do computation to take advantage of massive parallelism of GPU GPU accelerates critical path of application History of GPUs – Slide 15

Data parallel algorithms leverage GPU attributes Large data arrays, streaming throughput Fine-grain SIMD parallelism Low-latency floating point (FP) computation Applications – see http://GPGPU.org Game effects (FX) physics, image processing Physical modeling, computational engineering, matrix algebra, convolution, correlation, sorting History of GPUs – Slide 16

Dealing with graphics API Working with the corner cases of the graphics API Addressing modes Limited texture size/dimension Shader capabilities Limited outputs Instruction sets Lack of integer & bit ops Communication limited Between pixels Scatter a[i] = p History of GPUs – Slide 17 Input Registers Fragment Program Output Registers Constants Texture Temp Registers per thread per Shader per Context FB Memory

To use GPUs, re-expressed algorithms as graphics computations Very tedious, limited usability Still had some very nice results This was the lead up to CUDA History of GPUs – Slide 18

General purpose programming model User kicks off batches of threads on the GPU GPU = dedicated super-threaded, massively data parallel co-processor Targeted software stack Compute oriented drivers, language, and tools History of GPUs – Slide 19

Driver for loading computation programs into GPU Standalone Driver - Optimized for computation Interface designed for compute – graphics-free API Data sharing with OpenGL buffer objects Guaranteed maximum download & readback speeds Explicit GPU memory management History of GPUs – Slide 20

History of GPUs – Slide 21 21 CPU (host) GPU w/ local DRAM (device)

8-series GPUs deliver 25 to 200+ GFLOPS on compiled parallel C applications Available in laptops, desktops, and clusters GPU parallelism is doubling every year Programming model scales transparently History of GPUs – Slide 22 GeForce 8800 Tesla D870

Programmable in C with CUDA tools Multithreaded SPMD model uses application data parallelism and thread parallelism History of GPUs – Slide 23 Tesla S870

GPUs evolve as hardware and software evolve Five stage graphics pipelining An example of GPGPU Intro to CUDA History of GPUs – Slide 24

Reading: Chapter 2, “Programming Massively Parallel Processors” by Kirk and Hwu. Based on original material from The University of Illinois at Urbana-Champaign David Kirk, Wen-mei W. Hwu The University of Minnesota: Weijun Xiao Stanford University: Jared Hoberock, David Tarjan Revision history: last updated 5/24/2011. History of GPUs – Slide 25

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Similar presentations

Presentation on theme: "Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Similar presentations

Presentation on theme: "Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron."— Presentation transcript:

Similar presentations

About project

Feedback