Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future.

Similar presentations


Presentation on theme: "Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future."— Presentation transcript:

1 Status – Week 281 Victor Moya

2 Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future 3D graphic hardware. Simulate current and future 3D graphic hardware. Finish (someday) the PhD ;). Finish (someday) the PhD ;).

3 Problems Information. Information. Choice of the simulation target: Choice of the simulation target: Current GPUs. Current GPUs. Near future GPUs. Near future GPUs. Absolutely new GPU designs. Absolutely new GPU designs. Future is hard to predict. Future is hard to predict. But GPUs change very fast. But GPUs change very fast. Fierce competence between ATI and NVidia. Matrox and 3DLabs follow (3DLabs can rule workstation market). SIS and VIA as OEM. Fierce competence between ATI and NVidia. Matrox and 3DLabs follow (3DLabs can rule workstation market). SIS and VIA as OEM.

4 Status Designing a hardware 3D graphics pipeline: Designing a hardware 3D graphics pipeline: Command processors. Command processors. Vertex Shader.  Vertex Shader.  Divide by w, Clip, Culling and Triangle Setup. Divide by w, Clip, Culling and Triangle Setup. Rasterization. Rasterization. Pixel shaders. Pixel shaders. Antialiasing. Antialiasing. Designing the simulator. Designing the simulator.

5 3D Graphics Pipeline

6 Geometry Vertex operations: Vertex operations: (1) Transform coordinates and normal (1) Transform coordinates and normal Model => World. Model => World. World => Eye. World => Eye. (2) Normalize the length of the normal. (2) Normalize the length of the normal. (3) Compute vertex lightning. (3) Compute vertex lightning. (4) Transform texture coordinates. (4) Transform texture coordinates. (5) Transform coordinates to clip coordinates (projection). (5) Transform coordinates to clip coordinates (projection). (8) Divide coordinate by w. (8) Divide coordinate by w. (9) Apply affine viewport transform (x, y, z). (9) Apply affine viewport transform (x, y, z).

7 Geometry Primitive operations: Primitive operations: (6) Primitive assembly (6) Primitive assembly (7) Clipping: (7) Clipping: (10) Backface cull: eliminate back-facing triangles. (10) Backface cull: eliminate back-facing triangles. Primitive generation: new pipeline stage (ATI TruForm). Primitive generation: new pipeline stage (ATI TruForm).

8 Vertex Shader VS 1.0, 1.1 and 1.2 (current technology) for Direct3D 8 and 8.1. OpenGL extensions: ARB_vertex_program (finally in OpenGL v1.4), NV_vertex_program1_1 (NVidia), EXT_vertex_shader (ATI). VS 1.0, 1.1 and 1.2 (current technology) for Direct3D 8 and 8.1. OpenGL extensions: ARB_vertex_program (finally in OpenGL v1.4), NV_vertex_program1_1 (NVidia), EXT_vertex_shader (ATI). No branching. No branching. Single cycle execution latency (?). Single cycle execution latency (?). Single issue instruction each cycle. Single issue instruction each cycle. Simple in order pipeline (?). Simple in order pipeline (?).

9 Vertex Shader 16 input registers (read only). 16 input registers (read only). 15 output registers (write only). 15 output registers (write only). 12 temporary registers (read/write). 12 temporary registers (read/write). 96 constant registers (read only or read/write?). 96 constant registers (read only or read/write?). 256 instructions max 256 instructions max

10 Vertex Shader Output Output Inputs (vector or Inputs (vector or Opcode (scalar or vector) replicated scalar) Operation Opcode (scalar or vector) replicated scalar) Operation ------ ------------------ ------------------ -------------------------- ------ ------------------ ------------------ -------------------------- ARL s address register address register load ARL s address register address register load MOV v v move MOV v v move MUL v,v v multiply MUL v,v v multiply ADD v,v v add ADD v,v v add MAD v,v,v v multiply and add MAD v,v,v v multiply and add RCP s ssss reciprocal RCP s ssss reciprocal RSQ s ssss reciprocal square root RSQ s ssss reciprocal square root DP3 v,v ssss 3-component dot product DP3 v,v ssss 3-component dot product DP4 v,v ssss 4-component dot product DP4 v,v ssss 4-component dot product DST v,v v distance vector DST v,v v distance vector MIN v,v v minimum MIN v,v v minimum MAX v,v v maximum MAX v,v v maximum SLT v,v v set on less than SLT v,v v set on less than SGE v,v v set on greater equal than SGE v,v v set on greater equal than EXP s v exponential base 2 EXP s v exponential base 2 LOG s v logarithm base 2 LOG s v logarithm base 2 LIT v v light coefficients LIT v v light coefficients DPH v,v ssss homogeneous dot product DPH v,v ssss homogeneous dot product RCC s ssss reciprocal clamped RCC s ssss reciprocal clamped SUB v,v v subtract SUB v,v v subtract ABS v v absolute value ABS v v absolute value

11 Clipping Clip geometry primitives with the view frustrum (6 planes). Clip geometry primitives with the view frustrum (6 planes). Clip geometry primitives with the user clip planes. Clip geometry primitives with the user clip planes. Techniques used: Techniques used: Guard-Band Clipping. Guard-Band Clipping. Homogenous rasterization avoids clipping in the geometry stage. Homogenous rasterization avoids clipping in the geometry stage.

12 Guard-Band Clipping

13 Homogeneus coordinates “Triangle Scan Conversion using 2D Homogeneus Coordinates”, Olano and Greer. “Triangle Scan Conversion using 2D Homogeneus Coordinates”, Olano and Greer.

14 Rasterization Setup (per-triangle). Setup (per-triangle). Sampling (triangle = {fragments}. Sampling (triangle = {fragments}. Interpolation (interpolate colors and coordinates). Interpolation (interpolate colors and coordinates).

15 Rasterization Converts primitives to fragments. Converts primitives to fragments. Primitive: point, line, polygon, … Primitive: point, line, polygon, … Fragment: transient data structure Fragment: transient data structure short x, y; long depth; short r, g, b, a; Fragment selection. Fragment selection. Parameter Assignment (color, depth...). Parameter Assignment (color, depth...).

16 Programmable Pipeline

17 Vertex Program

18

19 NV_vertex_program2 ARL (new support for four-component A0 and A1 instead of just A0.x) ARL (new support for four-component A0 and A1 instead of just A0.x) ARR (similar to ARL, but rounds instead of truncating before storing the integer result in an address register) ARR (similar to ARL, but rounds instead of truncating before storing the integer result in an address register) BRA, CAL, RET (branching instructions) BRA, CAL, RET (branching instructions) COS, SIN (high-precision trigonometric functions) COS, SIN (high-precision trigonometric functions) FLR, FRC (floor and fraction of floating-point values) FLR, FRC (floor and fraction of floating-point values) EX2, LG2 (high-precision exponentiation and logarithm functions) EX2, LG2 (high-precision exponentiation and logarithm functions) ARA (adds pairs of components of an address register; useful for looping and other operations) ARA (adds pairs of components of an address register; useful for looping and other operations) SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to SLT, SGE) SEQ, SFL, SGT, SLE, SNE, STR (“set on” instructions similar to SLT, SGE) SSG (“set sign” operation; generates a vector holding –1.0 for negative operand components, 0 for zero-value components, and +1.0 for positive components) SSG (“set sign” operation; generates a vector holding –1.0 for negative operand components, 0 for zero-value components, and +1.0 for positive components)

20 NV_vertex_program2 Overview 1. Condition codes 2. Branching & subroutines 3. Even faster performance 4. Nineteen new instructions 5. New source modifiers 6. Clip plane support 7. More registers & instructions

21 NV_vertex_program2 Resource Limits 256 vertex program parameters 256 vertex program parameters Up from 96 Up from 96 16 temporary registers 16 temporary registers Up from 12 Up from 12 Two 4-component address registers Two 4-component address registers Up from one single-component address register Up from one single-component address register 256 static instructions per program 256 static instructions per program Up from 128 Up from 128 Given branching, 65536 dynamic instructions can execute before termination to avoid infinite loops Given branching, 65536 dynamic instructions can execute before termination to avoid infinite loops

22 NV_vertex_program2 Source Modifiers Source operand absolute value Source operand absolute value Example: MOV R0, |R1|; Example: MOV R0, |R1|; In addition to source negation & swizzling In addition to source negation & swizzling Example: MAD R0, -|R1|.yzwy, |R2|, - R3,w; Example: MAD R0, -|R1|.yzwy, |R2|, - R3,w; Swizzle, negate, & absolute value operations are “free” source modifiers Swizzle, negate, & absolute value operations are “free” source modifiers

23 NV_vertex_program2 Condition Codes (1) Condition code state Condition code state 4-component register stores condition code values 4-component register stores condition code values Four possible values Four possible values LT –less than zero LT –less than zero EQ – equal to zero EQ – equal to zero GT –greater than zero GT –greater than zero UN– unordered, for comparisons involving NaN UN– unordered, for comparisons involving NaN Most instructions optionally update condition code state Most instructions optionally update condition code state Indicated with “C” suffix: DP4C, MOVC, etc Indicated with “C” suffix: DP4C, MOVC, etc “CC” pseudo-register used to just update condition codes “CC” pseudo-register used to just update condition codes

24 NV_vertex_program2 Condition Codes (2) Optional condition code based destination masking Optional condition code based destination masking Example: MOV R1.xy(NE.z), R0; Example: MOV R1.xy(NE.z), R0; Copy R0components to R1’s X & Y components except when condition code’s Z component is EQ Copy R0components to R1’s X & Y components except when condition code’s Z component is EQ Condition code rules: EQ, equal; GE, greater or equal; GT, greater than; LE, less or equal; LT, less than; NE, not equal; FL, false; and TR, true Condition code rules: EQ, equal; GE, greater or equal; GT, greater than; LE, less or equal; LT, less than; NE, not equal; FL, false; and TR, true Note that condition code masking rule can swizzle condition code components Note that condition code masking rule can swizzle condition code components

25 ATI R300. Vertex Shader.

26 3DLabs P10. Pipeline.

27 Matrox Parhelia. Pipeline.


Download ppt "Status – Week 281 Victor Moya. Objectives Research in future GPUs for 3D graphics. Research in future GPUs for 3D graphics. Simulate current and future."

Similar presentations


Ads by Google