Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary.

Slides:

Advertisements

Similar presentations

Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

Advertisements

Graphics Pipeline.

Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.

The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.

Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.

Morphing and Animation GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from ShaderX 3, 4 and 5 And GPU Gems 1.

A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Joseph.

Status – Week 277 Victor Moya.

Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.

The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.

The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.

Some Things Jeremy Sugerman 22 February Jeremy Sugerman, FLASHG 22 February 2005 Topics Quick GPU Topics Conditional Execution GPU Ray Tracing.

Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.

Mapping Computational Concepts to GPU’s Jesper Mosegaard Based primarily on SIGGRAPH 2004 GPGPU COURSE and Visualization 2004 Course.

Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.

Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.

REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.

Aaron Lefohn University of California, Davis GPU Memory Model Overview.

Enhancing GPU for Scientific Computing Some thoughts.

Programmable Pipelines. Objectives Introduce programmable pipelines Vertex shaders Fragment shaders Introduce shading languages Needed to describe.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &

Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

GPU Computation Strategies & Tricks Ian Buck Stanford University.

GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.

Programmable Pipelines. 2 Objectives Introduce programmable pipelines Vertex shaders Fragment shaders Introduce shading languages Needed to describe.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.

Cg Programming Mapping Computational Concepts to GPUs.

1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.

The programmable pipeline Lecture 3.

CSE 690: GPGPU Lecture 6: Cg Tutorial Klaus Mueller Computer Science, Stony Brook University.

Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.

Shadow Mapping Chun-Fa Chang National Taiwan Normal University.

Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.

CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.

GPU Data Formatting and Addressing

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.

Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.

Computing & Information Sciences Kansas State University Lecture 12 of 42CIS 636/736: (Introduction to) Computer Graphics CIS 636/736 Computer Graphics.

Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.

David Luebke 1 1/25/2016 Programmable Graphics Hardware.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.

Geometry processing on GPUs Jens Krüger Technische Universität München.

Ray Tracing using Programmable Graphics Hardware

What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

Dynamic Geometry Displacement Jens Krüger Technische Universität München.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.

GLSL I.  Fixed vs. Programmable  HW fixed function pipeline ▪ Faster ▪ Limited  New programmable hardware ▪ Many effects become possible. ▪ Global.

An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.

GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.

COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE

Programmable Pipelines

Graphics Processing Unit

Deferred Lighting.

Chapter 6 GPU, Shaders, and Shading Languages

The Graphics Rendering Pipeline

CS451Real-time Rendering Pipeline

GRAPHICS PROCESSING UNIT

Introduction to Computer Graphics with WebGL

Graphics Processing Unit

Computer Graphics Introduction to Shaders

CIS 6930: Chip Multiprocessor: GPU Architecture and Programming

Presentation transcript:

Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University of Pennsylvania GPU Memory Model Overview

Review Vertex Index Stream 3D API Commands Assembled Primitives Pixel Updates Pixel Location Stream Programmable Fragment Processor Programmable Fragment Processor Transformed Vertices Programmable Vertex Processor Programmable Vertex Processor GPU Front End GPU Front End Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation 3D API: OpenGL or Direct3D 3D API: OpenGL or Direct3D 3D Application Or Game 3D Application Or Game Pre-transformed Vertices Pre-transformed Fragments Transformed Fragments GPU Command & Data Stream CPU-GPU Boundary (AGP/PCIe) Fixed-function pipeline

Overview Color Buffers –Front-left –Front-right –Back-left –Back-right Depth Buffer (z-buffer) Stencil Buffer Accumulation Buffer

Overview GPU Memory Model GPU Data Structure Basics Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

5 Memory Hierarchy CPU and GPU Memory Hierarchy Disk CPU Main Memory GPU Video Memory CPU Caches CPU Registers GPU Caches GPU Temporary Registers GPU Constant Registers

6 CPU Memory Model At any program point –Allocate/free local or global memory –Random memory access Registers –Read/write Local memory –Read/write to stack Global memory –Read/write to heap Disk –Read/write to disk

7 GPU Memory Model Much more restricted memory access –Allocate/free memory only before computation –Limited memory access during computation (kernel) Registers –Read/write Local memory –Does not exist Global memory –Read-only during computation –Write-only at end of computation (pre-computed address) Disk access –Does not exist

GPU Memory Model Where is GPU Data Stored? –Vertex buffer –Frame buffer –Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Texture Frame Buffer(s) VS 3.0 GPUs

9 GPU Memory API Each GPU memory type supports subset of the following operations –CPU interface –GPU interface

10 GPU Memory API CPU interface –Allocate –Free –Copy CPU  GPU –Copy GPU  CPU –Copy GPU  GPU –Bind for read-only vertex stream access –Bind for read-only random access –Bind for write-only framebuffer access

11 GPU Memory API GPU (shader/kernel) interface –Random-access read –Stream read

Vertex Buffers GPU memory for vertex data Vertex data required to initiate render pass Vertex Buffer Vertex Processor Rasterizer Fragment Processor Texture Frame Buffer(s) VS 3.0 GPUs

13 Vertex Buffers Supported Operations –CPU interface Allocate Free Copy CPU  GPU Copy GPU  GPU(Render-to-vertex-array) Bind for read-only vertex stream access –GPU interface Stream read (vertex program only)

14 Vertex Buffers Limitations –CPU No copy GPU  CPU No bind for read-only random access No bind for write-only framebuffer access –ATI supported this in uberbuffers. Perhaps we’ll see this as an OpenGL extension? –GPU No random-access reads No access from fragment programs

Textures Random-access GPU memory Vertex Buffer Vertex Processor Rasterizer Fragment Processor Texture Frame Buffer(s) VS 3.0 GPUs

16 Textures Supported Operations –CPU interface Allocate Free Copy CPU  GPU Copy GPU  CPU Copy GPU  GPU(Render-to-texture) Bind for read-only random access (vertex or fragment) Bind for write-only framebuffer access –GPU interface Random read

17 Textures Limitations –No bind for vertex stream access

Framebuffer Memory written by fragment processor Write-only GPU memory Vertex Buffer Vertex Processor Rasterizer Fragment Processor Texture Frame Buffer(s) VS 3.0 GPUs

19 OpenGL Framebuffer Objects General idea –Framebuffer object is lightweight struct of pointers –Bind GPU memory to framebuffer as write-only –Memory cannot be read while bound to framebuffer Which memory? –Texture –Renderbuffer –Vertex buffer?? Texture (RGBA) Renderbuffer (Depth) Framebuffer Object

20 Framebuffer Object New OpenGL extension –Enables true render-to-texture in OpenGL –Mix-and-match depth/stencil buffers –Replaces pbuffers! –More details coming later in talk…

21 What is a Renderbuffer? “Traditional” framebuffer memory –Write-only GPU memory Color buffer Depth buffer Stencil buffer New OpenGL memory object –Part of Framebuffer Object extension

22 Renderbuffer Supported Operations –CPU interface Allocate Free Copy GPU  CPU Bind for write-only framebuffer access

23 Pixel Buffer Objects Mechanism to efficiently transfer pixel data –API nearly identical to vertex buffer objects Vertex Buffer Vertex Processor Rasterizer Fragment Processor Texture Frame Buffer(s) VS 3.0 GPUs

24 Pixel Buffer Objects Uses –Render-to-vertex-array glReadPixels into GPU-based pixel buffer Use pixel buffer as vertex buffer –Fast streaming textures Map PBO into CPU memory space Write directly to PBO Reduces one or more copies

25 Pixel Buffer Objects Uses (continued) –Asynchronous readback Non-blocking GPU  CPU data copy glReadPixels into PBO does not block Blocks when PBO is mapped into CPU memory

Summary : Render-to-Texture Basic operation in GPGPU apps OpenGL Support –Save up to 16, 32-bit floating values per pixel Multiple Render Targets (MRTs) on ATI and NVIDIA 1.Copy-to-texture glCopyTexSubImage 2.Render-to-texture GL_EXT_framebuffer_object

27 Summary : Render-To-Vertex-Array Enable top-of-pipe feedback loop OpenGL Support –Copy-to-vertex-array GL_ARB_pixel_buffer_object NVIDIA and ATI –Render-to-vertex-array Maybe future extension to framebuffer objects

28 Multiple Render to Texture (MRT) [nv40] Fragment program Fragment program MRT allows us to compress multiple passes into a single one. This does not fundamentally change the model though, since read/write access is still not allowed.

29 Overview GPU Memory Model GPU Data Structure Basics Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

30 GPU Data Structure Basics Summary of “Implementing Efficient Parallel Data Structures on GPUs” –Chapter 33, GPU Gems II Low-level details –See the “Glift” talk for high-level view of GPU data structures Now for the gory details…

31 GPU Arrays Large 1D Arrays –Current GPUs limit 1D array sizes to 2048 or 4096 –Pack into 2D memory –1D-to-2D address translation

32 GPU Arrays 3D Arrays –Problem GPUs do not have 3D frame buffers No render-to-slice-of-3D-texture yet (coming soon?) –Solutions 1.Stack of 2D slices 2.Multiple slices per 2D buffer

33 GPU Arrays Problems With 3D Arrays for GPGPU –Cannot read stack of 2D slices as 3D texture –Must know which slices are needed in advance –Visualization of 3D data difficult Solutions –Flat 3D textures –Need render-to-slice-of-3D-texture –Maybe with GL_EXT_framebuffer_object –Volume rendering of flattened 3D data –“Deferred Filtering: Rendering from Difficult Data Formats,” GPU Gems 2, Ch. 41, p. 667

34 GPU Arrays Higher Dimensional Arrays –Pack into 2D buffers –N-D to 2D address translation –Same problems as 3D arrays if data does not fit in a single 2D texture

35 Sparse/Adaptive Data Structures Why? –Reduce memory pressure –Reduce computational workload Examples –Sparse matrices Krueger et al., Siggraph 2003 Bolz et al., Siggraph 2003 –Deformable implicit surfaces (sparse volumes/PDEs) Lefohn et al., IEEE Visualization 2003 / TVCG 2004 –Adaptive radiosity solution (Coombe et al.) Premoze et al. Eurographics 2003

36 Sparse/Adaptive Data Structures Basic Idea –Pack “active” data elements into GPU memory

37 GPU Data Structures Conclusions –Fundamental GPU memory primitive is a fixed-size 2D array –GPGPU needs more general memory model –Building and modifying complex GPU-based data structures is an open research topic…

38 Overview GPU Memory Model GPU-Based Data Structures Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

39 Introduction to Framebuffer Objects Where is the “Pbuffer Survival Guide”? –Gone!!! –Framebuffer objects replace pbuffers –Simple, intuitive, fast render-to-texture in OpenGL

40 Framebuffer Objects What is an FBO? –A struct that holds pointers to memory objects –Each bound memory object can be a framebuffer rendering surface –Platform-independent Texture (RGBA) Renderbuffer (Depth) Framebuffer Object

41 Framebuffer Objects Which memory can be bound to an FBO? –Textures –Renderbuffers Depth, stencil, color Traditional write-only framebuffer surfaces

42 Framebuffer Objects Usage models –Keep N textures bound to one FBO (up to 16) Change render targets with glDrawBuffers –Keep one FBO for each size/format Change render targets with attach/unattach textures –Keep several FBOs with textures attached Change render targets by binding FBO

43 Framebuffer Objects Performance –Render-to-texture glDrawBuffers is fastest on NVIDIA/ATI –As-fast or faster than pbuffers Attach/unattach textures same as changing FBOs –Slightly slower than glDrawBuffers but faster than wglMakeCurrent Keep format/size identical for all attached memory –Current driver limitation, not part of spec –Readback Same as pbuffers for NVIDIA and ATI

44 Framebuffer Objects Driver support still evolving –GPUBench FBO tests coming soon… “fbocheck” evalulates completeness Other tests…

45 Framebuffer Object Code examples –Simple C++ FBO and Renderbuffer classes HelloWorld example –OpenGL Spec

Overview GPU Memory Model GPU Data Structure Basics Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

47 The fragment pipeline ColorRGBA PositionXYZW Texture coordinates XY[Z]- Texture coordinates XY[Z]- … Input: Fragment Attributes Interpolated from vertex information Input: Texture Image XYZW Each element of texture is 4D vector Textures can be “square” or rectangular (power-of-two or not) 32 bits = float 16 bits = half

48 The fragment pipeline Input: Uniform parameters Can be passed to a fragment program like normal parameters set in advance before the fragment program executes Example: A counter that tracks which pass the algorithm is in. Input: Constant parameters Fixed inside program E.g. float4 v = (1.0, 1.0, 1.0, 1.0) Examples: Size of compute window

49 The fragment pipeline Math ops: USE THEM ! cos(x)/log2(x)/pow(x,y) dot(a,b) mul(v, M) sqrt(x) cross(u, v) Using built-in ops is more efficient than writing your own Swizzling/masking: an easy way to move data around. v1 = (4,-2,5,3); // Initialize v2 = v1.yx; // v2 = (-2,4) s = v1.w; // s = 3 v3 = s.rrr; // v3 = (3,3,3) Write masking : v4 = (1,5,3,2); v4.ar = v2; // v4=(4,5,4,-2)

50 The fragment pipeline x y float4 v = tex2D(IMG, float2(x,y)) Texture access is like an array lookup. The value in v can be used to perform another lookup! This is called a dependent read Texture reads (and dependent reads) are expensive resources, and are limited in different GPUs. Use them wisely !

51 The fragment pipeline Control flow: ( )?a:b operator. if-then-else conditional –[nv3x] Both branches are executed, and the condition code is used to decide which value is used to write the output register. –[nv40] True conditionals for-loops and do-while –[nv3x] limited to what can be unrolled (i.e no variable loop limits) –[nv40] True looping. WARNING: Even though nv40 has true flow control, performance will suffer if there is no coherence (more on this later)

52 The fragment pipeline Fragment programs use call-by-result Notes: Only output color can be modified Textures cannot be written Setting different values in different channels of result can be useful for debugging out float4 result : COLOR // Do computation result =

Overview GPU Memory Model GPU Data Structure Basics Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

54 The Vertex Pipeline Input: vertices position, color, texture coords. Input: uniform and constant parameters. Matrices can be passed to a vertex program. Lighting/material parameters can also be passed.

55 The Vertex Pipeline Operations: Math/swizzle ops Matrix operators Flow control (as before) [nv3x] No access to textures. Output: Modified vertices (position, color) Vertex data transmitted to primitive assembly.

56 Vertex programs are useful We can replace the entire geometry transformation portion of the fixed-function pipeline. Vertex programs used to change vertex coordinates (move objects around) There are many fewer vertices than fragments: shifting operations to vertex programs improves overall pipeline performance. Much of shader processing happens at vertex level. We have access to original scene geometry.

57 Vertex programs are not useful Fragment programs allow us to exploit full parallelism of GPU pipeline (“a processor at every pixel”). Vertex programs can’t read input ! [nv3x] Current Cards can read vertex textures but can not read FBOs Rule of thumb: If computation requires intensive calculation, it should probably be in the fragment processor. If it requires more geometric/graphic computing, it should be in the vertex processor.

58 Conclusions GPU Memory Model Evolving –Writable GPU memory forms loop-back in an otherwise feed-forward pipeline –Memory model will continue to evolve as GPUs become more general data-parallel processors GPGPU Data Structures –Basic memory primitive is limited-size, 2D texture –Use address translation to fit all array dimensions into 2D –See “Glift” talk for higher-level GPU data structures

59 Acknowledgements Adam Moerschell, Shubho Sengupta UCDavis Mike Houston Stanford University John Owens, Ph.D. advisorUC Davis National Science Foundation Graduate Fellowship Extra slides were added by Gary Katz from Suresh Venkatasubramanian, lecture 3 found at Alteration to this slide package were made without the authorization by the original authors and should be used for educational purposes only.