Presentation on theme: "Optimized Stencil Shadow Volumes"— Presentation transcript:
1Optimized Stencil Shadow Volumes Cass Everitt & Mark J. Kilgard
2Outline: Complete Lecture Review of Stenciled Shadow VolumesStenciled Shadow Volumes OptimizationsMore Information
3Outline: Review of Stenciled Shadow Volumes Shadow Volume BasicsZpass vs. ZfailInfinite Rendering Techniques
4Stenciled Shadow Volumes in Practice Notice the proper self-shadowing!
5Shadow Volume Basics A shadow volume is simply the half-space defined Shadowing objectLight sourceShadow volume(infinite extent)A shadow volume issimply the half-space definedby a light source and a shadowing object.
6Shadow Volume Basics (2) Surface outside shadow volume(illuminated)Simple rule:samples within ashadow volumeare in shadow.Surface inside shadow volume(shadowed)Partiallyshadowedobject
7Why Eye-to-Object Stencil Counting Approach Works Light sourceShadowing objectzero+1zero+2+2Eye position+1+3
14Illuminated, In Front of Shadow Volumes (Zfail) Light sourceShadowing objectzero+1zero---++++2+2Shadowed objectEye position+1+3Shadow Volume Count = = 0
15Nested Shadow Volumes Stencil Counts Beyond One Shadowed sceneStencil buffer contentsgreen = stencil value of 0red = stencil value of 1darker reds = stencil value > 1
16Animation of Stencil Buffer Updates for a Single Light’s Shadow Volumes Fully shaded sceneFinal stencil stateEvery frame is 5 additional stencil shadow volume polygon updates. Note how various intermediate stencil values do not reflect the final state.
17Shadows in a Real Game Scene Abducted game images courtesyJoe Riedel at Contraband Entertainment
19Blow-up of Shadow Detail Notice cable shadows on player modelNotice player’s own shadow on floor
20Scene’s Shadow Volume Geometric Complexity Wireframe shows geometric complexity of shadow volume geometryShadow volume geometry projects away from the light source
21Visible Geometry vs. Shadow Volume Geometry <<Visible geometryShadow volume geometryTypically, shadow volumes generate considerably more pixel updates than visible geometry
22Other Example Scenes (1 of 2) Visible geometryDramatic chase scene with shadowsShadow volume geometryAbducted game images courtesyJoe Riedel at Contraband Entertainment
23Other Example Scenes (2 of 2) Visible geometryScene with multiple light sourcesShadow volume geometryAbducted game images courtesyJoe Riedel at Contraband Entertainment
24Situations When Shadow Volumes Are Too Expensive Chain-link fence is shadow volume nightmare!Chain-link fence’s shadow appears on truck & ground with shadow mapsFuel game image courtesy Nathan d’Obrenan at Firetoad Software
25Shadow Volumes vs. Shadow Maps Shadow mapping via projective texturingThe other prominent hardware-accelerated shadow techniqueStandard part of OpenGL 1.4Shadow mapping advantagesRequires no explicit knowledge of object geometryNo 2-manifold requirements, etc.View independentShadow mapping disadvantagesSampling artifactsNot omni-directional
26Outline: Stenciled Shadow Volumes Optimizations Fill Rate OptimizationsDynamic Zpass vs. Zfail determinationBoundsScissoringDepth bounds testEfficient stencil clearsCulling OptimizationsPortal-based cullingOccluder and cap cullingSilhouette Determination OptimizationsEfficient data structures for static occluder & dynamic lightPre-computing shadow volumes for static occluders & static lightsSimplified occluder geometryShadow Volume Rendering OptimizationsVertex programs for shadow volume renderingTwo-sided stencil testingDirectional lights
27Zpass vs. Zfail (1) Zpass counting requires no caps Zpass is often more efficient (due to occlusion culling hw)zfaileyezpass
28Zpass vs Zfail (2)Zpass is hard to get right if shadow volume intersects near planezpass requirescomplex near planecapping to avoidthrowing the shadowcount off for thiswhole region
29Zpass vs Zfail (3) We can decide zpass vs zfail on a per-object basis do zfail when we have tohow do we decide?zpass (no caps)zfail (with caps)
30Zpass vs Zfail (4) How do we decide? a c b one way: near plane – light pyramidThis diagram showsthree planes in the“near plane - lightpyramid”.Any object that iscompletely outsideone of the planescan be renderedwith zpass.Use object boundingvolume for speed.acbobject is at leastpartially inside ofeach plane (zfail)object is completely outside plane “c” (zpass is ok)
31Exploit Bounds Infinite shadow volumes can cover a lot of pixels With some knowledge of bounds, we can reduce filllightsattenuationenvironment (walls/floor)shadow volumesdepth range of shadowredundant shadows
32Attenuated Light Bounds For attenuated lights, we can use the scissor test to constrain the shadow volumeworks great for occluders beside the lightfill savingsscissorrect
33Attenuated Light Bounds (2) The scissor test does not help as much with occluders in front of or behind the lightCap the shadow volume at light bounds?expensive/complicatedwastedfillscissorrect
34Light Bounds due to Environment Some environments offer big opportunities for bounding optimizationsscissorrectfill savings
35Depth Bounds TestDiscards fragments when the depth of the pixel in the depth buffer (not the depth of the fragment) is outside the specified boundsExposed in upcoming NV_depth_bounds_test extensionSimple APIglDepthBoundsNV( GLclampd zmin, GLclampd zmax );glEnable( GL_DEPTH_BOUNDS_TEST_NV );glDisable( GL_DEPTH_BOUNDS_TEST_NV );The depths given are in the same [0,1] range as arguments given to glDepthRange()
36Depth Bounds Test (2)Some depth values cannot be in shadow, so why bother to INCR and DECR their stencil index?depth boundsfill savingszminzmaxno shadowpossibledepth rangethat can bein shadow
37Computing Depth Bounds Shadow volumes are infinite, but the depth bounds for possible shadows are constraineddepth boundsconstrained byfrustumdepth boundszminzmax
38Computing Depth Bounds constrained bylight boundszminzmaxlightboundsNote that zmin and zmaxof light bounds are worstcase constraints for bounded lights.
40Culling Optimizations Shadow volume cullingCulling the shadow volume extrusionSpecialized shadow volume culling for ZfailCulling the near capsCulling the far capsPortal-based culling
41Shadow Volume Culling Conventional view frustum culling Simple idea: If object is not within view frustum, don’t draw objectUse conservative bounding geometry to efficiently determine when object is outside the view frustumOften done hierarchicallyShadow volume cullingJust because an object is not within the view frustum doesn’t mean it can’t cast a shadow that is visible within the view frustum!
42Shadow Volume Culling Example Light and occluder both outside the view frustum.But occluder still casts shadow into the view frustum.Must consider shadow volume of the occluder even though the occluder could be itself view frustum culled!
43Shadow Volume Culling Easy Case If the light source is within the view frustum and the occluder is outside the view frustum, there’s no occluder’s shadow is within the view frustum. Culling (not rendering) the shadow volume is correct in this case.Example: Shadow volume never intersects the view frustum!
44Generalized Shadow Volume Culling Rule Form the convex hull of the light position and the view frustumFor an infinite light, this could be an infinite convex hullIf your far plane is at infinity due to using a Pinf projection matrix, the view frustum is infiniteThe convex hull including the light (even if local) is also infiniteIf an occluder is within this hull, you must draw the shadow volume for an occluder
45Shadow Volume Culling Example Reconsidered (1) Light and occluder both outside the view frustum.But occluders still within convex hull formed by the light and view frustumMust consider shadow volume of the purple occluder even though the purple occluder could be itself view frustum culled!
46Shadow Volume Culling Example Reconsidered (2) Light and occluder both outside the view frustum.But purple occluders still also outside convex hull formed by the light and view frustumNo need to render shadow volume of the purple occluders or the purple occluders!Green occluder must be considered however!
47Portal-based Shadow Volume Optimizations Portal systems are convenient forquickly identifying visible thingsquickly eliminating hidden thingsFor bounded lights, we can treat the light bounds as the “visible object” we’re testing forIf the light bounds are visible, we need to process the lightIf the light bounds are invisible, we can safely skip the light
48Silhouette Determination Optimizations Pre-computing shadow volumes for static occluders & static lightsEfficient data structures for static occluder & dynamic lightSimplified occluder geometry
49Precompute Shadow Volumes for Static Occluders and Static Lights Shadows are view-independentIf occluder & light are static with respect to each other, shadow volumes can be precomputedExample: Headlights on a car have a static shadow volume w.r.t. the car itself as an occluderPrecomputation of shadow volumes can include bounding shadow volumes to the light bounds
50Efficient Data Structures for Static Occluder and Dynamic Light Interactions Brute force algorithm for possible silhouette detection inspects every edge to see if triangles joining form a possible silhouette edgeAlways works, but expensive for large modelsPerhaps the best practical approach for dynamic modelsBut static occluders could exploit precomputationSee SIGGRAPH 2000 paper “Silhouette Clipping” by Sander, Gui, Gortler, Hoppe, and SnyderCheck out the “Fast Silhouette Extraction” sectionData structure is useful for fast shadow volume possible silhouette edge determination
51Simplified Occluder Geometry More geometrically complex models can generate lots more possible silhouette edgesMore triangles in model meansMore time spent searching for possible silhouette edgesMore possible silhouette edges meansMore fill rate consumed by shadow volumesConsider substituting simplified model for shadow volume constructionSimplified substitute must “fit within” the original model
52Shadow Volume Rendering Optimizations Compute Silhouette Loops on CPUOnly render silhouette for zpassVertex transform sharingExtrude Triangles instead of quadsFor all directional dightsFor zpass SVs of local lightsWrapping stencil to avoid overflowsTwo-sided stencil testingVertex programs for shadow volume rendering
53Avoid Transforming Shadow Volume Vertices Redundantly Use GL_QUAD_STRIP rather than GL_QUADS to render extruded shadow volume quadsRequires determining possible silhouette loop connectivityOnce you find an edge, look for connected possible silhouette edge loops until you form a loop
54Shadow Volume Extrusion using Triangles or Triangle Fans Extrusion can be rendered with a GL_TRIANGLE_FANDirectional light’s shadow volume always projects to a single point at infinitySame is true for local lights – except the point is the “external” version of the light’s position (caveats)Scene with directional light.Clip-space view of shadow volume
55Shadow Volume Extrusion using Triangle Fans For Local Lights, triangle fans can be used to render zpass (uncapped) shadow volumesif the light position is (Lx, Ly, Lz, 1), the center of the triangle fan is the “external” point (-Lx, -Ly, -Lz, -1)this is simpler in terms of the geometrycan offer some fill efficiencies as wellWhat is an external triangle?A triangle where one or two vertices has w<0red vertex is internalred vertex is external
56Hardware Enhancements: Wrapping Stencil Operations Conventional OpenGL 1.0 stencil operationsGL_INCR increments and clamps to 2N-1GL_DECR decrements and clamps to zeroDirectX 6 introduced “wrapping” stencil operationsExposed by OpenGL’s EXT_stencil_wrap extensionGL_INCR_WRAP_EXT increments modulo 2NGL_DECR_WRAP_EXT decrements modulo 2NAvoids saturation throwing off the shadow volume depth countStill possible, though very rare, that 2N, 22N, 32N, etc. can alias to zerobackground
57Hardware Enhancements: Two-sided Stencil Testing (1) Current stenciled shadow volumes required rendering shadow volume geometry twiceFirst, rasterizing front-facing geometrySecond, rasterizing back-facing geometryTwo-sided stencil testing requires only one passTwo sets of stencil state: front- and back-facingBoolean enable for two-sided stencil testingWhen enabled, back-facing stencil state is used for stencil testing back-facing polygonsOtherwise, front-facing stencil state is usedRasterizes just as many fragments, but more efficient for CPU & GPUbackground
58Vertex Programs for Shadow Volumes Three techniques to considerFully automatic shadow volume extrusionEverything off-loaded to GPUQuite inefficient, not recommendedVertex normal-based extrusionProne to tessellation anomaliesSemi-automatic shadow volume extrusionCPU performs possible silhouette edge detection for each lightGPU projects out quads from single set of vertex data based on light position parameterDoom3’s approach
59Fully Automatic Shadow Volume Extrusion (1) 3 unique vertexes for each triangle of the original triangle meshInsert degenerate “edge quads” at each & every edge of the original triangle meshThis needs a LOT of extra verticesPrimary reason this technique is impracticalToo much transform & triangle setup overheadNo way to do zpass cap optimizationsNo way to do triangle extrusion optimizations
60Bloating the original triangle mesh 6Formula for geometry:vbloat = 3 * torigtbloat = torig + 2 * eorigBloated geometry basedonly on number of trianglesand edges of originalgeometry.6DD3d4d343b4bB3aB4cAC12A52bC12a2c5Original triangle mesh6 vertexes4 trianglesBloated triangle mesh12 vertexes10 trianglesA lot of extra geometry!
61Vertex Normal-based Extrusion (1) Use per-vertex normalsIf NL is greater or equal to zero, leave vertex aloneIf NL is less than zero, project vertex away from lightAdvantagesSimple, requires only per-vertex computationsMaps well to vertex programDisadvantagesToo simple – only per-vertexBest when normals approximate facet normals wellWorst on faceted, jaggy, or low-polygon count geometryNo way to do zpass cap optimizationsNo way to do triangle extrusion optimizations
64Vertex Normal-based Extrusion (4) This shadow volume would collapse with a directional light source...Facet-based SV Extrusion (correct)Vertex Normal-based SV Extrusion (incorrect)Incorrectly unshadowed region (incorrect)
65Vertex Normal-based Extrusion (3) Per-vertex normals are typically intended for lighting computations, not shadowingVertex normals convey curvature well, but not orientationShadowing really should be based on facet orientationsThat is, the normal for the triangle, not a vertexHack compromiseInsert duplicate vertices at the same position, but with different normals
66Semi-automatic Shadow Volume Extrusion (1) Possible silhouette edge detection done by CPUFor each lightWant to only have a single set of vertices for each modelAs opposed to a unique set of possible silhouette edge of vertices per light per modelVertex program projects a single set of vertices appropriate for each possible silhouette edge for any particular light
67Semi-automatic Shadow Volume Extrusion (2) Make two copies of every vertex, each with 4 components (x,y,z,w)v = (x,y,z,1)v’ = (x,y,z,0)For a possible silhouette edge with vertices p & qAssume light position LRender a quad using vertices p, q, p’, q’Vertex program computes pos = pospos.w + (posL.w - Lpos.w)(1-pos.w)Straightforward to handle caps too
68Semi-automatic Shadow Volume Extrusion (3) Vertex array memory required = 8 floats / vertexIndependent of number of lightsVertex program required is very shortMUCH less vertex array memory than fully automatic approach
69Shadow Volume Polygon Rendering Order (1) Naïve approachSimply render cap & projection shadow volume polygons in “memory order” of the edges and polygons in CPU memoryDisadvantagesPotentially poor rasterization locality for stencil buffer updatesTypically sub-par vertex re-useAdvantagesFriendly to memory pre-fetchingObvious, easy to implement approach
70Shadow Volume Polygon Rendering Order (2) GPU Optimized ApproachPossible silhouette edges form loopsRender the projected edge shadow volume quads in “loop order”Once you find a possible silhouette edge, search for another edge sharing a vertex – continue until you get back to your original edgeWhen you must render finite and infinite capsGreedily search for adjacent cap polygonsContinue until the cap polygons bump into possible silhouette edge loops – then look for more un-rendered capping polygons
71Shadow Volume Polygon Rendering Order (3) Why use the GPU Optimized Approach?Tends to maximize vertex re-useAvoids retransforming vertices multiple times due to poor locality of referenceMaximizes stencil update efficiencyAdjacent polygons make better use of memory b/wConvenient for optimizationszpass cap eliminationtriangle instead of quad extrusionEasy to implementOnce you locate a possible silhouette edge, it’s easy to follow the loopEasy to greedily search for adjacent finite & infinite cap polygons
72Shadow Volume History (1) Invented by Frank Crow [’77]Software rendering scan-line approachBrotman and Badler [’84]Software-based depth-buffered approachUsed lots of point lights to simulate soft shadowsPixel-Planes [Fuchs, et. al. ’85] hardwareFirst hardware approachPoint within a volume, rather than ray intersectionBergeron [’96] generalizationsExplains how to handle open modelsAnd non-planar polygons
73Shadow Volume History (2) Fournier & Fussell [’88] theoryProvides theory for shadow volume counting approach within a frame bufferAkeley & Foran invent the stencil bufferIRIS GL functionality, later made part of OpenGL 1.0Patent filed in ’92Heidmann [IRIS Universe article, ’91]IRIS GL stencil buffer-based approachDeifenbach’s thesis [’96]Used stenciled volumes in multi-pass framework
74Shadow Volume History (3) Dietrich slides [March ’99] at GDCProposes Zfail based stenciled shadow volumesKilgard whitepaper [March ’99] at GDCInvert approach for planar cut-outsBilodeau slides [May ’99] at Creative seminarProposes way around near plane clipping problemsReverses depth test function to reverse stencil volume ray intersection senseCarmack [unpublished, early 2000]First detailed discussion of the equivalence of Zpass and Zfail stenciled shadow volume methods
75Shadow Volume History (4) Kilgard  at GDC and CEDEC JapanProposes Zpass capping schemeProject back-facing (w.r.t. light) geometry to the near clip plane for cappingEstablishes near plane ledge for crack-free near plane cappingApplies homogeneous coordinates (w=0) for rendering infinite shadow volume geometryRequires much CPU effort for cappingNot totally robust because CPU and GPU computations will not match exactly, resulting in cracks
76Shadow Volume History (5) Everitt & Kilgard  Integrate Multiple Ideas into a robust solutionDietrich, Bilodeau, and Carmack’s Zfail approachKilgard’s homogeneous coordinates (w=0) for rendering infinite shadow volume geometrySomewhat-obscure [Blinn ’93] infinite far plane projection matrix formulationDirectX 6’s wrapping stencil increment & decrementOpenGL’s EXT_stencil_wrap extensionNVIDIA Hardware EnhancementsDepth clamping : better depth precisionTwo-sided stencil testing : performanceDepth bounds test : performance