Presentation on theme: "Optimized Stencil Shadow Volumes"— Presentation transcript:
1Optimized Stencil Shadow Volumes Cass Everitt & Mark J. Kilgard
2Outline: Complete Lecture Review of Stenciled Shadow VolumesStenciled Shadow Volumes OptimizationsOther Shadow Volume IssuesMore Information
3Outline: Review of Stenciled Shadow Volumes Shadow Volume BasicsZpass vs. ZfailInfinite Rendering Techniques
4Stenciled Shadow Volumes in Practice Notice the proper self-shadowing!
5Shadow Volume Basics A shadow volume is simply the half-space defined Shadowing objectLight sourceShadow volume(infinite extent)A shadow volume issimply the half-space definedby a light source and a shadowing object.
6Shadow Volume Basics (2) Surface outside shadow volume(illuminated)Simple rule:samples within ashadow volumeare in shadow.Surface inside shadow volume(shadowed)Partiallyshadowedobject
7Why Eye-to-Object Stencil Counting Approach Works Light sourceShadowing objectzero+1zero+2+2Eye position+1+3
14Illuminated, In Front of Shadow Volumes (Zfail) Light sourceShadowing objectzero+1zero---++++2+2Shadowed objectEye position+1+3Shadow Volume Count = = 0
15Problem Created by Far Clip Plane (Zfail) Missed shadow volume intersection due to far clip plane clipping; leads to mistaken countzero+1zero+1+2+2+3Near clip plane
16Problem Solved by Eliminating Far Clip zero+1+1+2+2+3Near clip plane
17Nested Shadow Volumes Stencil Counts Beyond One Shadowed sceneStencil buffer contentsgreen = stencil value of 0red = stencil value of 1darker reds = stencil value > 1
18Animation of Stencil Buffer Updates for a Single Light’s Shadow Volumes Fully shaded sceneFinal stencil stateEvery frame is 5 additional stencil shadow volume polygon updates. Note how various intermediate stencil values do not reflect the final state.
19Shadows in a Real Game Scene Abducted game images courtesyJoe Riedel at Contraband Entertainment
21Blow-up of Shadow Detail Notice cable shadows on player modelNotice player’s own shadow on floor
22Scene’s Shadow Volume Geometric Complexity Wireframe shows geometric complexity of shadow volume geometryShadow volume geometry projects away from the light source
23Visible Geometry vs. Shadow Volume Geometry <<Visible geometryShadow volume geometryTypically, shadow volumes generate considerably more pixel updates than visible geometry
24Other Example Scenes (1 of 2) Visible geometryDramatic chase scene with shadowsShadow volume geometryAbducted game images courtesyJoe Riedel at Contraband Entertainment
25Other Example Scenes (2 of 2) Visible geometryScene with multiple light sourcesShadow volume geometryAbducted game images courtesyJoe Riedel at Contraband Entertainment
26Situations When Shadow Volumes Are Too Expensive Chain-link fence is shadow volume nightmare!Chain-link fence’s shadow appears on truck & ground with shadow mapsFuel game image courtesy Nathan d’Obrenan at Firetoad Software
27Shadow Volumes vs. Shadow Maps Shadow mapping via projective texturingThe other prominent hardware-accelerated shadow techniqueStandard part of OpenGL 1.4Shadow mapping advantagesRequires no explicit knowledge of object geometryNo 2-manifold requirements, etc.View independentShadow mapping disadvantagesSampling artifactsNot omni-directional
28Outline: Stenciled Shadow Volumes Optimizations Fill Rate OptimizationsDynamic Zpass vs. Zfail determinationBoundsCulling OptimizationsPortal-based cullingOccluder and cap cullingSilhouette Determination OptimizationsEfficient data structures for static occluder & dynamic lightCache SVs, update only when necessarySimplified occluder geometryShadow Volume Rendering OptimizationsVertex programs for shadow volume renderingTwo-sided stencil testingDirectional lights
29Fill Rate Optimizations Dynamic Zpass vs. Zfail determinationBoundsCSG-style intersectionsWindow-space boundsScissoringDepth bounds test (sort of a z-based scissor)
30Zpass vs. Zfail (1) Zpass counting requires no caps Zpass is often more efficient (due to occlusion culling hw)zfaileyezpass
31Zpass vs Zfail (2) When to not use zpass clipping throws the shadow count off for this whole region
32Zpass vs Zfail (3) zpass (no caps) zfail (with caps) We can decide zpass vs zfail on a per-occluder basisdo zfail when we musthow do we decide?zpass (no caps)zfail (with caps)
33Zpass vs Zfail (4) a c b How do we decide? This diagram shows one way: near plane – light pyramidThis diagram showsthree planes in the“near plane - lightpyramid”.Any object that iscompletely outsideone of the planescan be renderedwith zpass.Use object boundingvolume for speed.acbobject is at leastpartially inside ofeach plane (zfail)object is completely outside plane “c” (zpass is ok)
34Zpass vs Zfail (5) closer near plane for more zpass a b c bad for depth resolutionNV_depth_clamp can helpabcobjects completely outside plane “c” (zpass is ok)
35Exploit BoundsInfinite shadow volumes can cover a lot of pixelsWith some knowledge of bounds, we can reduce fillcompute window space 3D axis-aligned bounding box of “possible shadow”(x,y) bounds are scissor rectWhat about z? New depth bounds test.Another strategy is to truncate the shadow volume using boundsExample boundslightsattenuationenvironment (walls/floor)shadow volumesdepth range of shadowredundant shadows
36Attenuated Light Bounds For attenuated lights, we can use the scissor test to constrain the shadow volumeworks great for occluders beside the lightThis is the window-space bounding box (x,y) of “possible shadow”fill savingsscissorrect
37Attenuated Light Bounds (2) Per-light scissor rect does not always very helpfulCap the shadow volume at light bounds?expensive/complicatedreasonable for static shadow volumesPer-occluder scissor?wastedfillscissorrect
38Attenuated Light Bounds (3) Per-occluder scissor rect is nicesimpleallows effective truncation of SV without altering SV geometrygood for dynamic SVsper-occluder scissor rectreclaimedfillscissorrect
39Environmental Bounds scissor fill savings rect Some environments offer big opportunities for bounding optimizationsscissorrectfill savings
40Environmental Bounds (2) Further savings from per-occluder infointersecting scissor rects is easyvery small scissor rect!fill savingseven more fill savings
41Depth Bounds depth bounds fill savings no shadow possible depth range Some depth values cannot be in shadow, so why bother to INCR and DECR their stencil index?This is just the Z component of the window-space bounding box for the region of “possible shadow” !depth boundsfill savingszminzmaxno shadowpossibledepth rangethat can bein shadow
42Depth Bounds Discard fragment if pixel depth is out of bounds not a clip plane!sort of a scissor for zSee NV_depth_bounds_test extensionSimple APIglDepthBoundsNV( GLclampd zmin, GLclampd zmax );glEnable( GL_DEPTH_BOUNDS_TEST_NV );glDisable( GL_DEPTH_BOUNDS_TEST_NV );Depths in window spacesame [0,1] range as glDepthRange()
43Computing Window Space Bounds Shadow volumes are infinite, but the depth bounds for possible shadows are constrainedSV constrained by frustumdepth boundszminzmax
44Computing Window Space Bounds depth boundsSV constrained by light boundszminzmaxlightboundsscissorLight bounds (scissor and depth)can be used as conservative bounds.Usually too conservative.
45Computing Window Space Bounds depth boundsSV constrained by environmentzminzmaxenvironment(floor)scissor
46Computing Window Space Bounds scissor rect and depth bounds really help constrain fill – use them agressivelyuse occluder bounding volumeconservativeinexpensive
47Window Space Bounds Scissor + depth bounds Beware points behind eye! really just a window-space bounding boxBeware points behind eye!clip space w < 0depth bounds: zmin == 0scissor rect trickySee “Jim Blinn’s Corner: Calculating Screen Coverage”In Notation, Notation, Notation, andalso May ’96 issue of IEEE CG&A
48Culling Optimizations Shadow volume cullingCulling the shadow volume extrusionSpecialized shadow volume culling for ZfailCulling the near capsCulling the far capsPortal-based culling
49Shadow Volume Culling Conventional view frustum culling Simple idea: If object is not within view frustum, don’t draw objectUse conservative bounding geometry to efficiently determine when object is outside the view frustumOften done hierarchicallyShadow volume cullingJust because an object is not within the view frustum doesn’t mean it can’t cast a shadow that is visible within the view frustum!
50Shadow Volume Culling Example Light and occluder both outside the view frustum.But occluder still casts shadow into the view frustum.Must consider shadow volume of the occluder even though the occluder could be itself view frustum culled!
51Shadow Volume Culling Easy Case If the light source is within the view frustum and the occluder is outside the view frustum, there’s no occluder’s shadow is within the view frustum. Culling (not rendering) the shadow volume is correct in this case.Example: Shadow volume never intersects the view frustum!
52Generalized Shadow Volume Culling Rule Form the convex hull of the light position and the view frustumFor an infinite light, this could be an infinite convex hullIf your far plane is at infinity due to using a Pinf projection matrix, the view frustum is infiniteThe convex hull including the light (even if local) is also infiniteIf an occluder is within this hull, you must draw the shadow volume for an occluder
53Shadow Volume Culling Example Reconsidered (1) Light and occluder both outside the view frustum.But occluders still within convex hull formed by the light and view frustumMust consider shadow volume of the purple occluder even though the purple occluder could be itself view frustum culled!
54Shadow Volume Culling Example Reconsidered (2) Light and occluder both outside the view frustum.But purple occluders still also outside convex hull formed by the light and view frustumNo need to render the purple occluders or their shadow volumes!Green occluder must be considered however!
55A note about “frustum” Effectively the view frustum is confined by scissor rectdepth boundsBe sure to use this smaller frustum for culling
56Infinite Cap CullingWhen the infinite cap falls outside the scissor rect, don’t render itrendering a lot of trivially culled geometry -> pipeline bubblescan be significant optimization for per-occluder scissor
57Portal-based Shadow Volume Optimizations Portal systems are convenient forquickly identifying visible thingsquickly eliminating hidden thingsFor bounded lights, we can treat the light bounds as the “visible object” we’re testing forIf light bounds are visible, we need to process the lightIf light bounds are invisible, we can safely skip the light
58Silhouette Determination Optimizations Caching shadow volumes that are staticEfficient data structures for static occluder & dynamic lightSimplified occluder geometry
59Cache Shadow Volumes, Update As Necessary Shadows are view-independentIf occluder & light are static with respect to each other, shadow volumes can be re-usedExample: Headlights on a car have a static shadow volume w.r.t. the car itself as an occluderPrecomputation of shadow volumes can include bounding shadow volumes to the light bounds and other optimizations
60Efficient Data Structures for Static Occluder and Dynamic Light Interactions Brute force algorithm for possible silhouette detection inspects every edge to see if triangles joining form a possible silhouette edgeAlways works, but expensive for large modelsPerhaps the best practical approach for dynamic modelsBut static occluders could exploit precomputationSee SIGGRAPH 2000 paper “Silhouette Clipping” by Sander, Gui, Gortler, Hoppe, and SnyderCheck out the “Fast Silhouette Extraction” sectionData structure is useful for fast shadow volume possible silhouette edge determination
61Simplified Occluder Geometry More geometrically complex models can generate lots more possible silhouette edgesMore triangles in model meansMore time spent searching for possible silhouette edgesMore possible silhouette edges meansMore fill rate consumed by shadow volumesConsider substituting simplified model for shadow volume constructionSimplified substitute must “fit within” the original modelor you must give up self-shadowing
62Shadow Volume Rendering Optimizations Compute Silhouette Loops on CPUOnly render silhouette for zpassVertex transform sharingExtrude Triangles instead of quadsFor all directional lightsFor zpass SVs of local lightsWrapping stencil to avoid overflowsTwo-sided stencil testingVertex programs for shadow volume rendering
63Avoid Transforming Shadow Volume Vertices Redundantly Use GL_QUAD_STRIP rather than GL_QUADS to render extruded shadow volume quadsRequires determining possible silhouette loop connectivityOnce you find an edge, look for connected possible silhouette edge loops until you form a loop
64Shadow Volume Extrusion using Triangles or Triangle Fans Extrusion can be rendered with a GL_TRIANGLE_FANDirectional light’s shadow volume always projects to a single point at infinitySame is true for local lights – except the point is the “external” version of the light’s position (caveats)Scene with directional light.Clip-space view of shadow volume
65Shadow Volume Extrusion using Triangle Fans For Local Lights, triangle fans can be used to render zpass (uncapped) shadow volumesif the light position is (Lx, Ly, Lz, 1), the center of the triangle fan is the “external” point (-Lx, -Ly, -Lz, -1)this is simpler in terms of the geometrycan offer some fill efficiencies as wellWhat is an external triangle?A triangle where one or two vertices has w<0red vertex is internalred vertex is external
66Hardware Enhancements: Wrapping Stencil Operations Conventional OpenGL 1.0 stencil operationsGL_INCR increments and clamps to 2N-1GL_DECR decrements and clamps to zeroDirectX 6 introduced “wrapping” stencil operationsExposed by OpenGL’s EXT_stencil_wrap extensionGL_INCR_WRAP_EXT increments modulo 2NGL_DECR_WRAP_EXT decrements modulo 2NAvoids saturation throwing off the shadow volume depth countStill possible, though very rare, that 2N, 22N, 32N, etc. can alias to zerobackground
67Hardware Enhancements: Two-sided Stencil Testing (1) Current stenciled shadow volumes required rendering shadow volume geometry twiceFirst, rasterizing front-facing geometrySecond, rasterizing back-facing geometryTwo-sided stencil testing requires only one passTwo sets of stencil state: front- and back-facingBoolean enable for two-sided stencil testingWhen enabled, back-facing stencil state is used for stencil testing back-facing polygonsOtherwise, front-facing stencil state is usedRasterizes just as many fragments, but more efficient for CPU & GPUbackground
68Vertex Programs for Shadow Volumes Three techniques to considerFully automatic shadow volume extrusionEverything off-loaded to GPUQuite inefficient, not recommendedVertex normal-based extrusionProne to tessellation anomaliesSemi-automatic shadow volume extrusionCPU performs possible silhouette edge detection for each lightGPU projects out quads from single set of vertex data based on light position parameterDoom3’s approach
69Fully Automatic Shadow Volume Extrusion (1) 3 unique vertexes for each triangle of the original triangle meshInsert degenerate “edge quads” at each & every edge of the original triangle meshThis needs a LOT of extra verticesPrimary reason this technique is impracticalToo much transform & triangle setup overheadNo way to do zpass cap optimizationsNo way to do triangle extrusion optimizations
70Bloating the original triangle mesh 6Formula for geometry:vbloat = 3 * torigtbloat = torig + 2 * eorigBloated geometry basedonly on number of trianglesand edges of originalgeometry.6DD3d4d343b4bB3aB4cAC12A52bC12a2c5Original triangle mesh6 vertexes4 trianglesBloated triangle mesh12 vertexes10 trianglesA lot of extra geometry!
71Vertex Normal-based Extrusion (1) Use per-vertex normalsIf NL is greater or equal to zero, leave vertex aloneIf NL is less than zero, project vertex away from lightAdvantagesSimple, requires only per-vertex computationsMaps well to vertex programDisadvantagesToo simple – only per-vertexBest when normals approximate facet normals wellWorst on faceted, jaggy, or low-polygon count geometryNo way to do zpass cap optimizationsNo way to do triangle extrusion optimizations
74Vertex Normal-based Extrusion (4) This shadow volume would collapse with a directional light source...Facet-based SV Extrusion (correct)Vertex Normal-based SV Extrusion (incorrect)Incorrectly unshadowed region (incorrect)
75Vertex Normal-based Extrusion (5) Per-vertex normals are typically intended for lighting computations, not shadowingVertex normals convey curvature well, but not orientationShadowing really should be based on facet orientationsThat is, the normal for the triangle, not a vertexHack compromiseInsert duplicate vertices at the same position, but with different normals
76Semi-automatic Shadow Volume Extrusion (1) Possible silhouette edge detection done by CPUFor each lightWant to only have a single set of vertices for each modelAs opposed to a unique set of possible silhouette edge of vertices per light per modelSeparate index list per shadow volumeVertex program projects a single set of vertices appropriate for each possible silhouette edge for any particular light
77Semi-automatic Shadow Volume Extrusion (2) Make two copies of every vertex, each with 4 components (x,y,z,w)v = (x,y,z,1)v’ = (x,y,z,0)For a possible silhouette edge with vertices p & qAssume light position LRender a quad using vertices p, q, p’, q’Vertex program computes pos = pospos.w + (posL.w - Lpos.w)(1-pos.w)pos.w selects between projected and local vertex positionStraightforward to handle caps too
78Semi-automatic Shadow Volume Extrusion (3) Vertex array memory required = 8 floats / vertexIndependent of number of lightsVertex program required is very shortMUCH less vertex array memory than fully automatic approach
79Special Case: zpass Zpass requires no projection 2 local points on silhouette1 point for extrusionat infinity for infinite lightsexternal point for local lightsperfect for TRIANGLE_FANVertex memory = 4 floats / vertexno projection per-vertex required!incompatible with automatic projection
80Shadow Volume Polygon Rendering Order (1) Naïve approachSimply render cap & projection shadow volume polygons in “memory order” of the edges and polygons in CPU memoryDisadvantagesPotentially poor rasterization locality for stencil buffer updatesTypically sub-par vertex re-useAdvantagesFriendly to memory pre-fetchingObvious, easy to implement approach
81Shadow Volume Polygon Rendering Order (2) GPU Optimized ApproachPossible silhouette edges form loopsRender the projected edge shadow volume quads in “loop order”Once you find a possible silhouette edge, search for another edge sharing a vertex – continue until you get back to your original edgeWhen you must render finite and infinite capsGreedily search for adjacent cap polygonsContinue until the cap polygons bump into possible silhouette edge loops – then look for more un-rendered capping polygons
82Shadow Volume Polygon Rendering Order (3) Why use the GPU Optimized Approach?Tends to maximize vertex re-useAvoids retransforming vertices multiple times due to poor locality of referenceMaximizes stencil update efficiencyAdjacent polygons make better use of memory b/wConvenient for optimizationszpass cap eliminationzfail infinite cap eliminationtriangle instead of quad extrusionEasy to implementOnce you locate a possible silhouette edge, it’s easy to follow the loopEasy to greedily search for adjacent finite & infinite cap polygons
83Other Shadow Volume Issues Primarily an optimization talk, but...Some other important issues that need coveragequality issue with smooth shading + shadow volumesclosed surface / 2-manifold issues and the art pipelineportability to consoles
84Shadow Volumes vs Smooth Shading interpolate vertex quantitiescreates the illusion of denser geometrycoarse circle approximationfine circle approximationvertex normalsmooth normal (interpolated)
86Geometric silhouette Geometric silhouette abruptly cuts off intensity polygons pop into shadow!Incompatible lighting models...
87How to resolve? Sometimes stencil should darken Sometimes shading should darkenOld Rule:Shade when unshadowedNew Rules:Shade when trivially self-shadowed
88Trivial Self-Shadowing Pixel is trivially self shadowing whenits polygon faces away from lightit is in exactly one shadow volume (its own)Trivially Self-Shadowed
89Implementation Alternatives No alternative is fully robustlive with poppingsometimes this is ok, but not usuallyshade trivially self-shadowed polygonsback facing light and stencil == 1some problematic shapessee clever optimization insight from “ToolTech” atdon’t self shadowinconsistent shadowinginset shadow volumeproblems knowing how much to inset...You should evaluate which has right quality/performance tradeoff
902-Manifold issues Shadow volumes must be closed convenient if occluder geometry closednot always easymany existing models are not closedadds additional constraints on modelerscan just use front faces for most geometryoften geometry not strictly closed, but looks fine with back face culling enabled“open” edges always on silhouettecredit: Madoc Evans
91Portability to Consoles Stencil is supported directly on XBox and GameCubePS2 does not support stencil directly, butuse the color buffer with additive/subtractive blendcopy as texturecan’t do zfail updatesmust invert depth buffermust invert depth of SV geometrymakes dynamically switching zpass/zfail impracticalThanks to Pål-Kristian Engstad of Naughty Dogfor PS2 info.
92Shadow Volume History (1) Invented by Frank Crow [’77]Software rendering scan-line approachBrotman and Badler [’84]Software-based depth-buffered approachUsed lots of point lights to simulate soft shadowsPixel-Planes [Fuchs, et. al. ’85] hardwareFirst hardware approachPoint within a volume, rather than ray intersectionBergeron [’96] generalizationsExplains how to handle open modelsAnd non-planar polygons
93Shadow Volume History (2) Fournier & Fussell [’88] theoryProvides theory for shadow volume counting approach within a frame bufferAkeley & Foran invent the stencil bufferIRIS GL functionality, later made part of OpenGL 1.0Patent filed in ’92Heidmann [IRIS Universe article, ’91]IRIS GL stencil buffer-based approachDeifenbach’s thesis [’96]Used stenciled volumes in multi-pass framework
94Shadow Volume History (3) Dietrich slides [March ’99] at GDCProposes Zfail based stenciled shadow volumesKilgard whitepaper [March ’99] at GDCInvert approach for planar cut-outsBilodeau slides [May ’99] at Creative seminarProposes way around near plane clipping problemsReverses depth test function to reverse stencil volume ray intersection senseCarmack [unpublished, early 2000]First detailed discussion of the equivalence of Zpass and Zfail stenciled shadow volume methods
95Shadow Volume History (4) Kilgard  at GDC and CEDEC JapanProposes Zpass capping schemeProject back-facing (w.r.t. light) geometry to the near clip plane for cappingEstablishes near plane ledge for crack-free near plane cappingApplies homogeneous coordinates (w=0) for rendering infinite shadow volume geometryRequires much CPU effort for cappingNot totally robust because CPU and GPU computations will not match exactly, resulting in cracks
96Shadow Volume History (5) Everitt & Kilgard  Integrate Multiple Ideas into a robust solutionDietrich, Bilodeau, and Carmack’s Zfail approachKilgard’s homogeneous coordinates (w=0) for rendering infinite shadow volume geometrySomewhat-obscure [Blinn ’93] infinite far plane projection matrix formulationDirectX 6’s wrapping stencil increment & decrementOpenGL’s EXT_stencil_wrap extensionNVIDIA Hardware EnhancementsDepth clamping : better depth precisionTwo-sided stencil testing : performanceDepth bounds test : performance