Presentation is loading. Please wait.

Presentation is loading. Please wait.

GPGPU CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia.

Similar presentations


Presentation on theme: "GPGPU CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia."— Presentation transcript:

1 GPGPU CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia

2 2 David Luebke Demo Today: Matthew Rodgers That’s it!

3 3 David Luebke News New job: NVIDIA Research New e-mail: dave@luebke.usdave@luebke.us

4 4 David Luebke Final Topic: GPGPU GPGPU: “General Purpose GPU Computing” Active, exciting area of research and development A personal interest of mine Following slides taken from a recent talk I co-presented in Dublin –Accompanying this paperthis paper

5 A Survey of General-Purpose Computation on Graphics Hardware John Owens University of California, Davis David Luebke University of Virginia with Naga Govindaraju, Mark Harris, Jens Kr ü ger, Aaron Lefohn, Tim Purcell

6 6 Introduction The graphics processing unit (GPU) on commodity video cards has evolved into an extremely flexible and powerful processor The graphics processing unit (GPU) on commodity video cards has evolved into an extremely flexible and powerful processor Programmability Programmability Precision Precision Power Power GPGPU: an emerging field seeking to harness GPUs for general-purpose computation GPGPU: an emerging field seeking to harness GPUs for general-purpose computation

7 7 Motivation: Computational Power GPUs are fast… GPUs are fast… 3.0 GHz dual-core Pentium4: 24.6 GFLOPS 3.0 GHz dual-core Pentium4: 24.6 GFLOPS NVIDIA GeForceFX 7800: 165 GFLOPs NVIDIA GeForceFX 7800: 165 GFLOPs 1066 MHz FSB Pentium Extreme Edition : 8.5 GB/s 1066 MHz FSB Pentium Extreme Edition : 8.5 GB/s ATI Radeon X850 XT Platinum Edition: 37.8 GB/s ATI Radeon X850 XT Platinum Edition: 37.8 GB/s GPUs are getting faster, faster GPUs are getting faster, faster CPUs: 1.4 × annual growth CPUs: 1.4 × annual growth GPUs: 1.7 × (pixels) to 2.3 × (vertices) annual growth GPUs: 1.7 × (pixels) to 2.3 × (vertices) annual growth

8 8 Motivation: Computational Power Thanks to Ian Buck

9 9 Motivation: Computational Power Thanks to John Owens

10 10 Motivation: Flexible and Precise Modern GPUs are deeply programmable Modern GPUs are deeply programmable Programmable pixel, vertex, video engines Programmable pixel, vertex, video engines Solidifying high-level language support Solidifying high-level language support Modern GPUs support high precision Modern GPUs support high precision 32 bit floating point throughout the pipeline 32 bit floating point throughout the pipeline High enough for many (not all) applications High enough for many (not all) applications

11 11 Motivation: The Potential of GPGPU In short: In short: The power and flexibility of GPUs makes them an attractive platform for general-purpose computation The power and flexibility of GPUs makes them an attractive platform for general-purpose computation Example applications range from in-game physics simulation to conventional computational science Example applications range from in-game physics simulation to conventional computational science Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor

12 12 Problems: Difficult To Use GPUs designed for & driven by video games GPUs designed for & driven by video games Programming model unusual Programming model unusual Programming idioms tied to computer graphics Programming idioms tied to computer graphics Programming environment tightly constrained Programming environment tightly constrained Underlying architectures are: Underlying architectures are: Inherently parallel Inherently parallel Rapidly evolving (even in basic feature set!) Rapidly evolving (even in basic feature set!) Largely secret Largely secret Can’t simply “port” CPU code! Can’t simply “port” CPU code!

13 13 Problems: Not A Panacea GPUs are fast because they are specialized GPUs are fast because they are specialized Poorly suited to sequential, “pointer-chasing” code Poorly suited to sequential, “pointer-chasing” code Missing support for some basic functionality Missing support for some basic functionality E.g. integers, bitwise operations, indexed write E.g. integers, bitwise operations, indexed write More on limitations & difficulties later More on limitations & difficulties later

14 14 STAR Goals Detailed & useful survey of general-purpose computing on graphics hardware Detailed & useful survey of general-purpose computing on graphics hardware Hardware and software developments behind GPGPU Hardware and software developments behind GPGPU Building blocks: techniques for mapping general- purpose computation to the GPU Building blocks: techniques for mapping general- purpose computation to the GPU Applications: important applications of GPGPU Applications: important applications of GPGPU A comprehensive GPGPU bibliography (current through Summer 2005…) A comprehensive GPGPU bibliography (current through Summer 2005…)

15 Architectural Considerations

16 16 20 MIPS CPU 1987 The past: 1987 [courtesy Anant Agarwal]

17 17 1 Billion Transistors 2007 The future: 2007 [courtesy Anant Agarwal]

18 18 Today’s VLSI Capability: Keys to High Performance [courtesy Pat Hanrahan]

19 19 For High Performance, We Must … [courtesy Pat Hanrahan] 1. Exploit Ample Computation!

20 20 For High Performance, We Must … [courtesy Pat Hanrahan] 1. Exploit Ample Computation! 2. Require Efficient Communication!

21 21 kernel stream stream Stream Programming Abstraction Let’s think about our problem in a new way Let’s think about our problem in a new way Goal: SW programming model that matches today’s VLSI Goal: SW programming model that matches today’s VLSI Streams Streams Collection of data records Collection of data records All data is expressed in streams All data is expressed in streams Kernels Kernels Inputs/outputs are streams Inputs/outputs are streams Perform computation on streams Perform computation on streams Can be chained together Can be chained together

22 22 Why Streams? Ample computation by exposing parallelism Ample computation by exposing parallelism Streams expose data parallelism Streams expose data parallelism Multiple stream elements can be processed in parallel Multiple stream elements can be processed in parallel Pipeline (task) parallelism Pipeline (task) parallelism Multiple tasks can be processed in parallel Multiple tasks can be processed in parallel Kernels yield high arithmetic intensity Kernels yield high arithmetic intensity Efficient communication Efficient communication Producer-consumer locality Producer-consumer locality Predictable memory access pattern Predictable memory access pattern Optimize for throughput of all elements, not latency of one Optimize for throughput of all elements, not latency of one Processing many elements at once allows latency hiding Processing many elements at once allows latency hiding

23 23 [ATI Flipper – 51M T] GPU: Special-Purpose Graphics Hardware Task-parallel organization Task-parallel organization Assign each task to processing unit Assign each task to processing unit Hardwire each unit to specific task - huge performance advantage! Hardwire each unit to specific task - huge performance advantage! Provides ample computation resources Provides ample computation resources Efficient communication patterns Efficient communication patterns Dominant graphics architecture Dominant graphics architecture

24 24 GPU The Rendering Pipeline Application Rasterization Geometry Composite Compute 3D geometry Make calls to graphics API Transform geometry from 3D to 2D (in parallel) Generate fragments from 2D geometry (in parallel) Combine fragments into image

25 25 GPU The Programmable Rendering Pipeline Application Rasterization (Fragment) Geometry (Vertex) Composite Compute 3D geometry Make calls to graphics API Transform geometry from 3D to 2D; vertex programs Generate fragments from 2D geometry; fragment programs Combine fragments into image

26 26 Triangle Setup L2 Tex Shader Instruction Dispatch Fragment Crossbar Memory Partition Memory Partition Memory Partition Memory Partition Z-Cull NVIDIA GeForce 6800 3D Pipeline Courtesy Nick Triantos, NVIDIA Vertex Fragment Composite

27 27 Application Geometry Rasterization Fragment Composite Display Command Characteristics of Graphics Apps Lots of arithmetic Lots of arithmetic Lots of parallelism Lots of parallelism Simple control Simple control Multiple stages Multiple stages Feed forward pipelines Feed forward pipelines Latency-tolerant / deep pipelines Latency-tolerant / deep pipelines What other applications have these characteristics? What other applications have these characteristics?

28 28 Today’s Graphics Pipeline Application Command Composite Geometry Rasterization Fragment Display Graphics is well suited to: Graphics is well suited to: The stream programming model The stream programming model Stream hardware organizations Stream hardware organizations GPUs are a commodity stream processor! GPUs are a commodity stream processor! What if we could apply these techniques to more general- purpose problems? What if we could apply these techniques to more general- purpose problems? GPUs should excel at tasks that require: GPUs should excel at tasks that require: Ample computation Ample computation Regular computation Regular computation Efficient communication Efficient communication

29 29 Programming a GPU for Graphics Each fragment is shaded w/ SIMD program Each fragment is shaded w/ SIMD program Shading can use values from texture memory Shading can use values from texture memory Image can be used as texture on future passes Image can be used as texture on future passes Application specifies geometry  rasterized Application specifies geometry  rasterized

30 30 Programming a GPU for GP Programs Run a SIMD kernel over each fragment Run a SIMD kernel over each fragment “Gather” is permitted from texture memory “Gather” is permitted from texture memory Resulting buffer can be treated as texture on next pass Resulting buffer can be treated as texture on next pass Draw a screen-sized quad  stream Draw a screen-sized quad  stream

31 31 Feedback Each algorithm step depend on the results of previous steps Each algorithm step depend on the results of previous steps Each time step depends on the results of the previous time step Each time step depends on the results of the previous time step

32 32 CPU-GPU Analogies...... Grid[i][j]= x;... Array Write = Render to Texture CPU GPU

33 33 CPU-GPU Analogies CPU GPU CPU GPU Stream / Data Array = Texture Memory Read = Texture Sample

34 34 Kernels Kernel / loop body / algorithm step = Fragment Program CPUGPU

35 35 Scatter vs. Gather Grid communication Grid communication Grid cells share information Grid cells share information

36 36 Computational Resources Inventory Programmable parallel processors Programmable parallel processors Vertex & Fragment pipelines Vertex & Fragment pipelines Rasterizer Rasterizer Mostly useful for interpolating addresses (texture coordinates) and per-vertex constants Mostly useful for interpolating addresses (texture coordinates) and per-vertex constants Texture unit Texture unit Read-only memory interface Read-only memory interface Render to texture Render to texture Write-only memory interface Write-only memory interface

37 37 Vertex Processor Fully programmable (SIMD / MIMD) Fully programmable (SIMD / MIMD) Processes 4-vectors (RGBA / XYZW) Processes 4-vectors (RGBA / XYZW) Capable of scatter but not gather Capable of scatter but not gather Can change the location of current vertex Can change the location of current vertex Cannot read info from other vertices Cannot read info from other vertices Can only read a small constant memory Can only read a small constant memory Latest GPUs: Vertex Texture Fetch Latest GPUs: Vertex Texture Fetch Random access memory for vertices Random access memory for vertices  Gather (But not from the vertex stream itself)  Gather (But not from the vertex stream itself)

38 38 Fragment Processor Fully programmable (SIMD) Fully programmable (SIMD) Processes 4-component vectors (RGBA / XYZW) Processes 4-component vectors (RGBA / XYZW) Random access memory read (textures) Random access memory read (textures) Capable of gather but not scatter Capable of gather but not scatter RAM read (texture fetch), but no RAM write RAM read (texture fetch), but no RAM write Output address fixed to a specific pixel Output address fixed to a specific pixel Paper details ways to synthesize scatter Paper details ways to synthesize scatter Typically more useful than vertex processor Typically more useful than vertex processor More fragment pipelines than vertex pipelines More fragment pipelines than vertex pipelines Direct output (fragment processor is at end of pipeline) Direct output (fragment processor is at end of pipeline)

39 Building Blocks & Applications

40 40 GPGPU Building Blocks The STAR covers the following fundamental techniques & computational building blocks: The STAR covers the following fundamental techniques & computational building blocks: Flow control (a very fundamental building block) Flow control (a very fundamental building block) Stream operations Stream operations Data structures Data structures Differential equations & linear algebra Differential equations & linear algebra Data queries Data queries I’ll discuss each in a bit more detail I’ll discuss each in a bit more detail

41 41 Flow control Surprising number of issues on GPUs Surprising number of issues on GPUs Main themes: Main themes: Avoid branching when possible Avoid branching when possible Move branching earlier in the pipeline when possible Move branching earlier in the pipeline when possible Largely SIMD  coherent branching most efficient Largely SIMD  coherent branching most efficient Mechanisms: Mechanisms: Rasterized geometry Rasterized geometry Z-cull Z-cull Occlusion query Occlusion query

42 42 Domain Decomposition Avoid branches where outcome is fixed Avoid branches where outcome is fixed One region is always true, another false One region is always true, another false Separate FPs for each region, no branches Separate FPs for each region, no branches Example: boundaries Example: boundaries

43 43 Z-Cull In early pass, modify depth buffer In early pass, modify depth buffer Write depth=0 for pixels that should not be modified by later passes Write depth=0 for pixels that should not be modified by later passes Write depth=1 for rest Write depth=1 for rest Subsequent passes Subsequent passes Enable depth test (GL_LESS) Enable depth test (GL_LESS) Draw full-screen quad at z=0.5 Draw full-screen quad at z=0.5 Only pixels with previous depth=1 will be processed Only pixels with previous depth=1 will be processed Can also use early stencil test Can also use early stencil test Note: Depth replace disables ZCull Note: Depth replace disables ZCull

44 44 Pre-computation Pre-compute anything that will not change every iteration! Pre-compute anything that will not change every iteration! Example: arbitrary boundaries Example: arbitrary boundaries When user draws boundaries, compute texture containing boundary info for cells When user draws boundaries, compute texture containing boundary info for cells Reuse that texture until boundaries modified Reuse that texture until boundaries modified Combine with Z-cull for higher performance! Combine with Z-cull for higher performance!

45 45 Stream Operations Several stream operations in GPGPU toolkit: Several stream operations in GPGPU toolkit: Map: apply a function to every element in a stream Map: apply a function to every element in a stream Reduce: use a function to reduce a stream to a smaller stream (often 1 element) Reduce: use a function to reduce a stream to a smaller stream (often 1 element) Scatter/gather: indirect read and write Scatter/gather: indirect read and write Filter: select a subset of elements in a stream Filter: select a subset of elements in a stream Sort: order elements in a stream Sort: order elements in a stream Search: find a given element, nearest neighbors, etc Search: find a given element, nearest neighbors, etc

46 46 Data Structures GPU memory model, iteration, virtualization GPU memory model, iteration, virtualization Dense arrays (== textures) Dense arrays (== textures) Sparse arrays Sparse arrays Static sparsity: pack rows or bands into textures, vertex arrays Static sparsity: pack rows or bands into textures, vertex arrays Dynamic sparsity: multidimensional page tables Dynamic sparsity: multidimensional page tables

47 47 Data Structures Adaptive data structures Adaptive data structures Static: packed uniform grid, stackless k-d tree Static: packed uniform grid, stackless k-d tree Dynamic: mipmap of page tables, tree-based address translators Dynamic: mipmap of page tables, tree-based address translators Non-indexable structures Non-indexable structures Open & important problem: stacks, priority queues, sets, linked lists, hash tables, k-d tree construction… Open & important problem: stacks, priority queues, sets, linked lists, hash tables, k-d tree construction…

48 48 Differential Equations Early & common application of GPGPU Early & common application of GPGPU Ordinary differential equations (ODEs): commonly used in particle systems Ordinary differential equations (ODEs): commonly used in particle systems Partial differential equations (PDEs): well suited to GPU, especially when solved on dense grids Partial differential equations (PDEs): well suited to GPU, especially when solved on dense grids Closely related to… Closely related to…

49 49 Linear Algebra Major application of GPGPU Major application of GPGPU Ubiquitous in science, engineering, visual simulation Ubiquitous in science, engineering, visual simulation Several approaches explored in early GPGPU work Several approaches explored in early GPGPU work Memory layout a key consideration Memory layout a key consideration Pack vectors into 2D textures Pack vectors into 2D textures Split matrices, e.g. by columns (dense), bands (banded), into vertex array (sparse) Split matrices, e.g. by columns (dense), bands (banded), into vertex array (sparse)

50 50 Data Queries Relational database operations on the GPU Relational database operations on the GPU Predicates, Boolean combinations, aggregations Predicates, Boolean combinations, aggregations Join operations accelerated by sorting on join key Join operations accelerated by sorting on join key Uses special-purpose depth & stencil hardware extensively Uses special-purpose depth & stencil hardware extensively Attributes of data record stored in (multiple) color channels of (multiple) textures Attributes of data record stored in (multiple) color channels of (multiple) textures

51 51 Applications STAR discusses the following broad GPGPU application areas: STAR discusses the following broad GPGPU application areas: Physically-based simulation Physically-based simulation Signal & image processing Signal & image processing Global illumination Global illumination Geometric computing Geometric computing Databases & data mining Databases & data mining

52 52 Physically-Based Simulation Early work (pre-fully programmable GPU): Early work (pre-fully programmable GPU): Cellular automata, texture & blending modes Cellular automata, texture & blending modes Now Now Finite difference & finite element methods Finite difference & finite element methods Particle system simulations Particle system simulations Most popular topic Most popular topic Fluids! Incompressible flow, clouds, boiling, etc. Fluids! Incompressible flow, clouds, boiling, etc. Also-ran: mass-spring dynamics for cloth Also-ran: mass-spring dynamics for cloth

53 53 Signal & Image Processing Segmentation: identify features in 2D or 3D Segmentation: identify features in 2D or 3D Common example: identify surface of 3D feature (tumor, blood vessel, etc) in medical scan Common example: identify surface of 3D feature (tumor, blood vessel, etc) in medical scan Level-set deformation methods evolve an isosurface Level-set deformation methods evolve an isosurface Need sparse solution methods for efficient solution Need sparse solution methods for efficient solution GPGPU bonus: easy to integrate in volume renderer GPGPU bonus: easy to integrate in volume renderer Computer vision: Computer vision: Image projection, compositing, rectification Image projection, compositing, rectification Fast stereo depth extraction Fast stereo depth extraction

54 54 Signal & Image Processing cont. Image processing Image processing Image registration, motion reconstruction, computed tomography (CT), tone mapping Image registration, motion reconstruction, computed tomography (CT), tone mapping Core Image, Core Video Core Image, Core Video Signal processing Signal processing FFT: an interesting case! Memory bandwidth limited, lack of writable cache harms performance FFT: an interesting case! Memory bandwidth limited, lack of writable cache harms performance

55 55 Global Illumination Ray tracing Ray tracing Seminal GPGPU papers Seminal GPGPU papers Close to the heart of graphics Close to the heart of graphics Earliest complex data structures, control flow Earliest complex data structures, control flow Analysis to inform future hardware design Analysis to inform future hardware design Comparison to current efficient CPU implementation Comparison to current efficient CPU implementation Key insights Key insights Ray-triangle intersection maps well to pixel hardware Ray-triangle intersection maps well to pixel hardware Rest of the ray tracing pipeline can also be expressed as a stream computation Rest of the ray tracing pipeline can also be expressed as a stream computation

56 56 Global Illumination cont. Photon mapping Photon mapping Even more involved data structures Even more involved data structures Introduced stencil routing (scatter), k-nearest neighbor search Introduced stencil routing (scatter), k-nearest neighbor search Radiosity Radiosity Iterative LA solvers Iterative LA solvers Subsurface scattering Subsurface scattering Lots of work on using GPUs to accelerate/ approximate scattering algorithms Lots of work on using GPUs to accelerate/ approximate scattering algorithms

57 57 Geometric Computing Lots of image-space geometric computations Lots of image-space geometric computations CSG operations CSG operations Distance fields Distance fields Collision detection Collision detection Sorting for transparency Sorting for transparency Shadow generation Shadow generation Heavy use of stencil & depth hardware Heavy use of stencil & depth hardware

58 58 Databases & Data Mining GPU strengths are useful GPU strengths are useful Memory bandwidth Memory bandwidth Parallel processing Parallel processing Accelerating SQL queries – 10x improvement Accelerating SQL queries – 10x improvement Also well suited for stream mining Also well suited for stream mining Continuous queries on streaming data instead of one-time queries on static database Continuous queries on streaming data instead of one-time queries on static database

59 Close-Up: Linear Algebra

60 60 Per-pixel vs. per-vertex operations –6 gigapixels/second vs. 0.7 gigavertices/second –Efficient texture memory cache –Texture read-write access –2D Textures are even better–2D RGBA textures really rock Vector representation –Textures best we can do Linear Algebra Data Structures 1N 1 N For details go: http://wwwcg.in.tum.de

61 61 Representation (cont.) Dense Matrix representation –Treat a dense matrix as a set of column vectors –Again, store these vectors as 2D textures N 2D-Textures... 1iN For details go: http://wwwcg.in.tum.de

62 62 Representation (cont.) Banded Sparse Matrix representation –Treat a banded matrix as a set of diagonal vectors –Combine opposing vectors to save space N Matrix 2 Vectors N 12 2 2D-Textures 12 N-i For details go: http://wwwcg.in.tum.de

63 63 Operations 1 Vector-Vector Operations –Reduced to 2D texture operations –Coded in pixel shaders Example: Vector1 + Vector2  Vector3 return tex0 + tex1Pass through Vector 1Vector 2 TexUnit 0TexUnit 1 Vertex ShaderPixel Shader Vector 3 Render Texture Static quad For details go: http://wwwcg.in.tum.de

64 64 Operations 2 (reduce) Reduce operation for scalar products Reduce m x n region in fragment shader 2 pass nd... st 1 pass... original Texture For details go: http://wwwcg.in.tum.de

65 65 Operations In depth example: Vector / Banded-Matrix Multiplication A xb = For details go: http://wwwcg.in.tum.de

66 66 Example (cont.) Vector / Banded-Matrix Multiplication A xb = Ab For details go: http://wwwcg.in.tum.de

67 67 Pass 2 Example (cont.) Compute the result in 2 Passes: A b b‘ Pass 1 x For details go: http://wwwcg.in.tum.de

68 Conclusions

69 69 Moving Forward … What works well now? What works well now? What doesn’t work well now? What doesn’t work well now? What will improve in the future? What will improve in the future? What will continue to be difficult? What will continue to be difficult?

70 70 What Runs Well on GPUs? GPUs win when … GPUs win when … Limited data reuse Limited data reuse High arithmetic intensity: Defined as math operations per memory op High arithmetic intensity: Defined as math operations per memory op Attacks the memory wall - are all mem ops necessary? Attacks the memory wall - are all mem ops necessary? Common error: Not comparing against optimized CPU implementation Common error: Not comparing against optimized CPU implementation Memory BW Cache BW P4 3GHz 6 GB/s 44 GB/s NV GF 6800 36 GB/s --

71 71 R300R360 R420 GFLOPS GFloats/sec 7x Gap Arithmetic Intensity Historical growth rates (per year): Compute: 71% Compute: 71% DRAM bandwidth: 25% DRAM bandwidth: 25% DRAM latency: 5% DRAM latency: 5% [courtesy Ian Buck]

72 72 Arithmetic Intensity GPU wins when… Arithmetic intensity Arithmetic intensity Segment Segment 3.7 ops per word 11 GFLOPS GeForce 7800 GTX Pentium 4 3.0 GHz [courtesy Ian Buck]

73 73 Memory Bandwidth GPU wins when… Streaming memory bandwidth Streaming memory bandwidth SAXPY SAXPY  FFT GeForce 7800 GTX Pentium 4 3.0 GHz [courtesy Ian Buck]

74 74 Memory Bandwidth Streaming Memory System Streaming Memory System Optimized for sequential performance Optimized for sequential performance GPU cache is limited GPU cache is limited Optimized for texture filtering Optimized for texture filtering Read-only Read-only Small Small Local storage Local storage CPU >> GPU CPU >> GPU GeForce 7800 GTX Pentium 4 [courtesy Ian Buck]

75 75 What Will (Hopefully) Improve? Orthogonality Orthogonality Instruction sets Instruction sets Features Features Tools Tools Stability Stability Interfaces, APIs, libraries, abstractions Interfaces, APIs, libraries, abstractions Necessary as graphics and GPGPU converge! Necessary as graphics and GPGPU converge!

76 76 What Won’t Change? Rate of progress Rate of progress Precision (64b floating point?) Precision (64b floating point?) Parallelism Parallelism Won’t sacrifice performance Won’t sacrifice performance Difficulty of programming parallel hardware Difficulty of programming parallel hardware … but APIs and libraries may help … but APIs and libraries may help Concentration on entertainment apps Concentration on entertainment apps

77 Top Ten

78 78 GPGPU Top Ten The Killer App The Killer App Programming models and tools Programming models and tools GPU in tomorrow’s computer? GPU in tomorrow’s computer? Data conditionals Data conditionals Relationship to other parallel hw/sw Relationship to other parallel hw/sw Managing rapid change in hw/sw (roadmaps) Managing rapid change in hw/sw (roadmaps) Performance evaluation and cliffs Performance evaluation and cliffs Philosophy of faults and lack of precision Philosophy of faults and lack of precision Broader toolbox for computation / data structures Broader toolbox for computation / data structures Wedding graphics and GPGPU techniques Wedding graphics and GPGPU techniques


Download ppt "GPGPU CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia."

Similar presentations


Ads by Google