Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stencil Routed A-Buffer

Similar presentations


Presentation on theme: "Stencil Routed A-Buffer"— Presentation transcript:

1 Stencil Routed A-Buffer
Kevin Myers and Louis Bavoil NVIDIA

2 Our Cool Thing

3 What is it? A-Buffer Related Work Simply a list of fragments per-pixel
“The A-buffer, an antialiased hidden surface method” [Carpenter 84] Related Work Depth Peeling [Mammen 89] [Everitt 01] k-Buffer [Bavoil et al. 07]

4 Why do I need this? Often want more than nearest Alpha blending
Volume rendering Collision detection Refraction and caustics Global illumination

5 Why is it hard? GPU’s optimized to capture nearest layer
Z buffering and early z test Fine for most real-time lighting models Wasteful if not rendering front to back

6 Things that don’t work Blending can’t just turn of z-buffering MRT
Most operations non-commutative MRT Can’t direct output Reading what you’re writing Hazardous “Multi-Layer Depth Peeling via Fragment Sort” [Liu et al. 06] k-Buffer [Bavoil et al. 07]

7 A-Buffer “A list of fragments per-pixel” MSAA
Anything on the GPU that resembles this? MSAA “A list of samples per-pixel” Samples store coverage

8 MSAA in review Multisampled Antialiasing
Fragments are rasterized at a higher res 8xMSAA == 8 x aliased resolution Pixel shader is run once per-pixel Frame buffer storage is at sample resolution

9 Say What? MSAA samples == A-Buffer pixels??
MSAA sample patterns don’t help Need all MSAA samples at pixel center

10 Line up your Sub-samples
Turn off multisampling Still render to an MSAA buffer Pixel shader output bloats to all sub-samples BOOL D3D10_RASTERIZER_DESC::MultisampleEnable Now writing 8 samples per pixel All have the same value!!

11 Bloating Your Pixel Applause? Meets the definition
“List of fragments per-pixel” Not exactly what we want Each item contains same value Next fragment will clobber the entire list Need to update one entry in the list Once and only once

12 Stencil always increments
Stencil Routing Stencil always increments Stencil passes when 4

13 Stencil Routing First introduced by Purcell et al 2003
Did not work for general rasterization Tile aligned points Fat point is spread across four pixels Four pixels get same value Stencil allows one pixel to update

14 Stencil Routing and MSAA
Stencil always operates at sample res Regardless of MultisampleEnable state DX10 Spec Use sub-samples to route Allows any pixel shader output to be routed Arbitrary primitives

15 Stencil Routing and MSAA

16 A Stencil Test That Works
StencilFunc D3D10_COMPARISON_EQUAL StencilRef 2 More on this later StencilPassOp and StencilFailOp D3D10_STENCIL_OP_DECR_SAT

17 Initializing Stencil Clear stencil buffer to pass value ( 2 )
Initializes sample 0 to 2 Use SampleMask to selectively update Stencil set to replace with refrence value

18 Why start at 2? When all sub-samples are written When overflow occurs
Most stencil values will be 0 Except the last one written Last sample written stencil == 1 When overflow occurs All stencil values will be 0

19 Occlusion Query Test Pixel did not overflow Pixel overflowed

20 Handling Overflow Set sample mask to last sample updated
Draw full screen quad Issue an occlusion query Set stencil to pass if stencil == 0 Check occlusion query Sample pass count == overflow count

21 Handling Overflow Occlusion query Good Bad Very fast
Allows for dynamic A-Buffer sizing Bad Requires some CPU intervention Ideally A-Buffer size is fixed

22 Demo Demo Time!

23 Secrets of the Dragon Single A-Buffer Post process sort RG32F
R is packed color G is depth Saves on texture loads Post process sort 8 fragment per-pixel bitonic sort Additional fragments, insertion sort

24 8800 GTX Performance Alpha Blended Stanford Dragon 8 Layers
Depth Peeling 8xABuffer ABuffer Speedup 640x480 30.9 164 5.3 800x600 30.4 139 4.6 1024x768 29.5 110 3.7 1280x960 28.1 81.4 2.9 1600x1200 26.2 54.9 2.1 16 Layers 15.5 76.7 4.9 15.3 63.0 4.1 14.7 48.0 3.3 14.1 34.6 2.5 13.3 23.1 1.7

25 Limits…DOH! 254 layers of depth max Fragments at same depth MSAA
8-bit stencil ( 255 – 1 for overflow bit ) If you do this call us cause that’s crazy Fragments at same depth Must be handled in post-process MSAA

26 Summary Stencil Routed A-Buffer A-buffer can be dynamically resized
Ideally suited for complex geometries Much faster than depth peeling A-buffer can be dynamically resized Use an occlusion query Best to pre-determine size

27 Future Work Render target arrays
Each target has its own stencil buffer Target replaces sub-sample Or augments sub-sample #arrays * MSAA level in one “CPU pass” With dx10 saturates 254 layers Use instancing for additional “GPU passes”

28 Thanks for all the fish Claudio Silva, Steven Callahan, Joao Comba, Aaron Lefohn, Cass Everitt, Peach Myers

29 The last slide… ?


Download ppt "Stencil Routed A-Buffer"

Similar presentations


Ads by Google