Presentation is loading. Please wait.

Presentation is loading. Please wait.

A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen.

Similar presentations


Presentation on theme: "A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen."— Presentation transcript:

1 A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen

2 Where does the Vertex Engine fit? frame-buffer anti-aliasing texture blending setup rasterizer Transform & Lighting Traditional Graphics Pipeline

3 frame-buffer anti-aliasing texture blending setup rasterizer Transform & Lighting GeForce 3 Vertex Engine Vertex Program

4 API Support Designed to fit into OpenGL and D3D API’s Designed to fit into OpenGL and D3D API’s Program mode vs. Fixed function mode Program mode vs. Fixed function mode Load and bind program Load and bind program Simple to add to old D3D and OpenGL programs Simple to add to old D3D and OpenGL programs

5 Programming Model Enable vertex program Enable vertex program glEnable(GL_VERTEX_PROGRAM_NV); Create vertex program object Create vertex program object Bind vertex program object Bind vertex program object Execute vertex program object Execute vertex program object

6 Create Vertex Program Programs (assembly) are defined inline as Programs (assembly) are defined inline as character strings character strings static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \ END";

7 Create Vertex Program (2) Load and bind vertex programs similar to texture objects Load and bind vertex programs similar to texture objects glLoadProgramNV(GL_VERTEX_PROGRAM_NV, 7, strelen(programString), programString); …. glBindProgramNV(GL_VERTEX_PROGRAM_NV, 7);

8 Invoke Vertex Program The vertex program is initiated when a vertex is given, i.e., when The vertex program is initiated when a vertex is given, i.e., when glBegin(…) glBegin(…) glVertex3f(x,y,z) glVertex3f(x,y,z) … glEnd() glEnd()

9 Let’s look at the sample program static const GLubyte vpgm[] = “\!!VP1. 0\ DP4 o[HPOS].x, c[0], v[0]; \ DP4 o[HPOS].y, c[1], v[0]; \ DP4 o[HPOS].z, c[2], v[0]; \ DP4 o[HPOS].w, c[3], v[0]; \ MOV o[COL0],v[3]; \ END"; O[HPOS] = M(c0,c1,c2,c3) * v - HPOS? O[COL0] = v[3] - COL0? Calculate the clip space point position and Assign the vertex with v[3] as its diffuse color

10 Vertex Source Vertex Program Vertex Output Program Constants Temporary Registers 16x4 registers 128 instructions 96x4 registers 12x4 registers 15x4 registers Programming Model V[0] … V[15] c[0] … c[96] R0 … R11 O[HPOS] O[COL0] O[COL1] O[FOGP] O[PSIZ] O[TEX0] … O[TEX7] All quad floats

11 Input Vertex Attributes V[0] – V[15] V[0] – V[15] Aliased (tracked) with conventional per-vertex attributes (Table 3) Aliased (tracked) with conventional per-vertex attributes (Table 3) Use glVertexAttribNV() to explicitly assig values Use glVertexAttribNV() to explicitly assig values Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV() Can also specify a scalar value to the vertex attribute array - glVertexAttributesNV() Can change values inside or outside glBegin()/glEnd() pair Can change values inside or outside glBegin()/glEnd() pair

12 Program Constants Can only change values outside glBegin()/glEnd() pair Can only change values outside glBegin()/glEnd() pair No automatic aliasing No automatic aliasing Can be used to track OpenGl matrices (modelview, projection, texture, etc.) Can be used to track OpenGl matrices (modelview, projection, texture, etc.) Example: Example: glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV) glTrackMatrix(GL_VERTEX_PROGRAM_NV, 0, GL_MODELVIEW_PROJECTION_NV, GL_IDENTIGY_NV) - track 4 contiguous program constants starting with c[0] - track 4 contiguous program constants starting with c[0]

13 Program Constants (cont’d) DP4 o[HPOS].x, c[0], v[OPOS] DP4 o[HPOS].y, c[1], v[OPOS] DP4 o[HPOS].z, c[2], v[OPOS] DP4 o[HPOS].w, c[3], v[OPOS] What does it do? What does it do?

14 Program Constants (cont’d) glTrackMatrixNV(GL_VERTEX_PROGRAM_NV, 4, GL_MODEL_VIEW, GL_INVERSE_TRANPOSE_NV) DP3 R0.x, C[4], V[NRML] DP3 R0.y, C[5[, V[NRML] DP3 R0.z, C[6], V[NRML] What doe it do?

15 Hardware Block Diagram Vertex Attribute Buffer (VAB) Vector FP Core Vertex In Vertex Out

16 Vertex Attribute Buffer (VAB) … 128 ( 32 x 4 ) 128 dirty bits VAB ….0 1 14 15IB

17 HW Block Diagram

18 Data Path FPU Core Negate Swizzle Negate Swizzle Negate Swizzle XYZWXYZWXYZW Write Mask XYZW

19 Instruction Set: The ops 17 instructions total 17 instructions total MOV, MUL, ADD, MAD, DST MOV, MUL, ADD, MAD, DST DP3, DP4 DP3, DP4 MIN, MAX, SLT, SGE MIN, MAX, SLT, SGE RCP, RSQ, LOG, EXP, LIT RCP, RSQ, LOG, EXP, LIT ARL ARL

20 Instruction Set: The Core Features Immediate access to sources Immediate access to sources Swizzle/negate on all sources Swizzle/negate on all sources Write mask on all destinations Write mask on all destinations DP3,DP4 most common graphics ops DP3,DP4 most common graphics ops Cross product is MUL+MAD with swizzling Cross product is MUL+MAD with swizzling LIT instruction implements phong  lighting LIT instruction implements phong  lighting

21 Dot Product Instruction DP3 R0.x, R1, R2 R0.x = R1.x * R2.x + R1.y * R1.y + R1.z * R2.z DP4 R0.x, R1, R2 4-component dot product

22 MUL instruction MUL R1, R0, R2 (component-wise mult.) R1.x = R0.x * R2.x R1.y = R0.y * R2.y R1.z = R0.z * R2.z R1.z = R0.z * R2.z R1.w = R0.w * R2.w R1.w = R0.w * R2.w

23 MAD instruction MAD R1, R2, R3, R4 R1 = R2 * R3 + R4 *: component wise multiplication Example: MAD R1, R0.yzxw, R2.zxyw, -R1 What does it do?

24 Cross Product Coding Example # Cross product R2 = R0 x R1 MUL R2, R0.zxyw, R1.yzxw; MAD R2, R0.yzxw, R1.zxyw, -R2;

25 Lighting instruction LIT R1, R0 (phong light model) Input: R0 = (diffuse, specular, ??, shiness) Output R1 = (1, diffuse, specular^shininess, 1) Usually followed by DP3 o[COL0], C[21], R1 (assuming using c[21]) where C[xx] = (ka, kd, ks, ??) where C[xx] = (ka, kd, ks, ??)

26 Ready to trace some program?

27 Previous Work: Geometry Engine High bandwidth + lots of Flops High bandwidth + lots of Flops Low clock rate Low clock rate No architectural continuity No architectural continuity VERY hard to program VERY hard to program Some high-level language support (maybe) Some high-level language support (maybe) A compromise solution (vtx,prim,pix,…) A compromise solution (vtx,prim,pix,…)

28 Alternative: The CPU Low bandwidth + reasonable Flops Low bandwidth + reasonable Flops High clock rate High clock rate Excellent architectural continuity Excellent architectural continuity VERY hard to use efficiently VERY hard to use efficiently Excellent high-level language support Excellent high-level language support Flexible, but often too slow Flexible, but often too slow

29 New Design: The Vertex Engine Simple hardware for a commodity GPU Simple hardware for a commodity GPU Allows user to manipulate vertex transform Allows user to manipulate vertex transform Simple to use programming model Simple to use programming model Superset of fixed function mode Superset of fixed function mode

30 Why Vertex Processing? Very parallel Very parallel Use single vertex programming model Use single vertex programming model Hardware can batch or interleave Hardware can batch or interleave KISS KISS

31 Why Not Primitive Processing? Face culling and clipping break parallelism Face culling and clipping break parallelism Complicates memory accesses Complicates memory accesses Inefficient (control takes time) Inefficient (control takes time) Let hardware designers optimize Let hardware designers optimize

32 Programming Model: Vertex I/O Streaming vertex architecture Streaming vertex architecture Source data converted to floats Source data converted to floats Source data loaded Source data loaded Run program Run program Destination data drained Destination data drained Destination data re-formatted for hw Destination data re-formatted for hw

33 Hardware Implementation Vector SIMD Unit + Special Function Unit Vector SIMD Unit + Special Function Unit Multithreaded and pipelined to hide latency Multithreaded and pipelined to hide latency Any one instruction/cycle Any one instruction/cycle All instructions equal latency All instructions equal latency Free swizzling/negate/write mask support Free swizzling/negate/write mask support

34 Conclusion Very simple, efficient implementation Very simple, efficient implementation Allows vertex programming continuity Allows vertex programming continuity Stanford Imagine Architecture Stanford Imagine Architecture A work in progress, lots more to come… A work in progress, lots more to come… We welcome your feedback We welcome your feedback


Download ppt "A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen."

Similar presentations


Ads by Google