Download presentation
Presentation is loading. Please wait.
Published byKathlyn Alisha Boyd Modified over 9 years ago
1
David Luebke 1 1/25/2016 Programmable Graphics Hardware
2
David Luebke 2 1/25/2016 Admin ● Handout: Cg in two pages
3
David Luebke 3 1/25/2016 Acknowledgement & Aside ● The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology ● For this reason, and because the lab is outfitted with GeForce 3 cards, we will focus on NVIDIA tech
4
David Luebke 4 1/25/2016 Outline ● Programmable graphics ■ NVIDIA’s next-generation technology: GeForceFX (code name NV30) ● Programming programmable graphics ■ NVIDIA’s Cg language
5
David Luebke 5 1/25/2016 GPU Programming Model Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer GPU CPU Textures
6
David Luebke 6 1/25/2016 ● Framebuffer ● Textures ● Fragment processor ● Vertex processor ● Interpolants 32-bit IEEE floating-point throughout pipeline
7
David Luebke 7 1/25/2016 Hardware supports several other data types ● Fragment processor also supports: ■ 16-bit “half” floating point ■ 12-bit fixed point ■ These may be faster than 32-bit on some HW ● Framebuffer/textures also support: ■ Large variety of fixed-point formats ■ E.g., classical 8-bit per component ■ These formats use less memory bandwidth than FP32
8
David Luebke 8 1/25/2016 Vertex processor capabilities ● 4-vector FP32 operations, as in GeForce3/4 ● True data-dependent control flow ■ Conditional branch instruction ■ Subroutine calls, up to 4 deep ■ Jump table (for switch statements) ● Condition codes ● New arithmetic instructions (e.g. COS) ● User clip-plane support
9
David Luebke 9 1/25/2016 Vertex processor has high resource limits ● 256 instructions per program (effectively much higher w/branching) ● 16 temporary 4-vector registers ● 256 “uniform” parameter registers ● 2 address registers (4-vector) ● 6 clip-distance outputs
10
David Luebke 10 1/25/2016 Fragment processor has clean instruction set ● General and orthogonal instructions ● Much better than previous generation ● Same syntax as vertex processor: MUL R0, R1.xyz, R2.yxw; ● Full set of arithmetic instructions: RCP, RSQ, COS, EXP, …
11
David Luebke 11 1/25/2016 Fragment processor has flexible texture mapping ● Texture reads are just another instruction (TEX, TXP, or TXD) ● Allows computed texture coordinates, nested to arbitrary depth ● Allows multiple uses of a single texture unit ● Optional LOD control – specify filter extent ● Think of it as… A memory-read instruction, with optional user-controlled filtering
12
David Luebke 12 1/25/2016 Additional fragment processor capabilities ● Read access to window-space position ● Read/write access to fragment Z ● Built-in derivative instructions ■ Partial derivatives w.r.t. screen-space x or y ■ Useful for anti-aliasing ● Conditional fragment-kill instruction ● FP32, FP16, and fixed-point data
13
David Luebke 13 1/25/2016 Fragment processor limitations ● No branching ■ But, can do a lot with condition codes ● No indexed reads from registers ■ Use texture reads instead ● No memory writes
14
David Luebke 14 1/25/2016 Fragment processor has high resource limits ● 1024 instructions ● 512 constants or uniform parameters ■ Each constant counts as one instruction ● 16 texture units ■ Reuse as many times as desired ● 8 FP32 x 4 perspective-correct inputs ● 128-bit framebuffer “color” output (use as 4 x FP32, 8 x FP16, etc…)
15
David Luebke 15 1/25/2016 NV30 CineFX Technology Summary Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures FP32 throughout pipeline Clean instruction sets True branching in vertex processor Dependent texture in fragment processor High resource limits
16
David Luebke 16 1/25/2016 Programming in assembly is painful … FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D; … … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … Easier to read and modify Cross-platform Combine pieces etc. Assembly
17
David Luebke 17 1/25/2016 Quick Demo
18
David Luebke 18 1/25/2016 Cg – C for Graphics ● Cg is a GPU programming language ● Designed by NVIDIA and Microsoft ● Compilers available in beta versions from both companies
19
David Luebke 19 1/25/2016 Design goals for Cg ● Enable algorithms to be expressed… ■ Clearly, and ■ Efficiently ● Provide interface continuity ■ Focus on DX9-generation HW and beyond ■ But provide support for DX8-class HW too ■ Support both OpenGL and Direct3D ● Allow easy, incremental adoption
20
David Luebke 20 1/25/2016 Easy adoption for applications ● Avoid owning the application’s data ■ No scene graph ■ No buffering of vertex data ● Compiler sits on top of existing APIs ■ User can examine assembly-code output ■ Can compile either at run time, or at application- development time ● Allow partial adoption e.g. Use Cg vertex program with assembly fragment program ● Support current hardware
21
David Luebke 21 1/25/2016 Some points in the design space ● CPU languages ■ C – close to the hardware; general purpose ■ C++, Java, lisp – require memory management ■ RenderMan – specialized for shading ● Real-time shading languages ■ Stanford shading language ■ Creative Labs shading language
22
David Luebke 22 1/25/2016 Design strategy ● Start with C (and a bit of C++) ■ Minimizes number of decisions ■ Gives you known mistakes instead of unknown ones ● Allow subsetting of the language ● Add features desired for GPU’s ■ To support GPU programming model ■ To enable high performance ● Tweak to make it fit together well
23
David Luebke 23 1/25/2016 How are current GPU’s different from CPU? 1. GPU is a stream processor ■ Multiple programmable processing units ■ Connected by data flows Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.