Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.

Similar presentations


Presentation on theme: "Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna."— Presentation transcript:

1 Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna University of Technology 2 VRVis Vienna

2 Michael WimmerVienna University of Technology 2 Coherent Hierarchical Culling Coherent Hierarchical Culling Motivation RRender QOcclusion Query CCull CPU GPU time Typical hardware occlusion culling scenario Typical hardware occlusion culling scenario RQ RQ RQ RQ CQ Q R R Waiting time

3 Michael WimmerVienna University of Technology 3 Occlusion Culling: Offline vs. Online Offline Offline  Global information about visibility (from region) - Difficult to implement - Accuracy and maintenance problems + No runtime overhead Online Online  Local information about visibility (from point) + Easier to implement + Greater accuracy, easy maintenance - Runtime overhead

4 Michael WimmerVienna University of Technology 4 Online Occlusion Culling Object space methods Object space methods - Need complex geometric calculations (hard to handle detailed scenes) + Do not require rasterization Image space methods Image space methods + No geometric calculations (easier to handle detailed scenes) - Require rasterization

5 Michael WimmerVienna University of Technology 5 Hardware Occlusion Culling Hardware is good at rasterization! Hardware is good at rasterization! Hardware counts rasterized fragments Hardware counts rasterized fragments  But need not update frame buffer NV/ARB_occlusion_query NV/ARB_occlusion_query  Asynchronous  Allows multiple simultaneous occlusion queries General algorithm idea: General algorithm idea:  Render simple approximation first (bbox) invisible: cull object invisible: cull object visible: render object visible: render object

6 Michael WimmerVienna University of Technology 6 Hardware Occlusion Culling Advantages Advantages  Pixel-exact  No explicit occluder rendering  Exploit rasterization power of GPU  Easy to use (API calls) Problems Problems  Delay in availability of the results  Time to execute queries  If fill-bound: only useful if several objects culled

7 Michael WimmerVienna University of Technology 7 Hierarchical Stop&Wait (S&W) Front-to-back hierarchy traversal 1. Issue visibility query for node 2. Stop and Wait for result  Invisible: cull the subtree  Visible: render or continue 1. recursively Advantage: Advantage:  Hierarchy can cull huge subtrees Problems: Problems:  Waiting causes CPU stalls and GPU starvation  Huge rasterization costs (especially for large interior nodes)

8 Michael WimmerVienna University of Technology 8 and and RxRender object x QxQuery object x CxCull object x CPU GPU CPU Stalls GPU Starvation R1Q2 R1Q2 R2Q3 R2Q3 C3Q4 R4 time Waiting time

9 Michael WimmerVienna University of Technology 9 Solution: Coherent Hierarchical Culling Scheduling based on temporal coherence Scheduling based on temporal coherence  Skipping certain visibility tests  Immediate rendering of certain geometry Clever interleaving of queries and rendering Clever interleaving of queries and rendering  Maintaining a queue of running occlusion queries Design goal: easy implementation Design goal: easy implementation

10 Michael WimmerVienna University of Technology 10 Coherent Hierarchical Culling (CHC) RxRender object x QxQuery object x CxCull object x CPU R1Q2 GPU R1Q2 R2Q3 R2Q3 C3Q4 R4 visible in previous frameAssume independent occlusion time

11 Michael WimmerVienna University of Technology 11 CHC Algorithm Outline Front-to-back hierarchy traversal Front-to-back hierarchy traversal 1.Node handling  Interior node Previously invisible: issue visibility query Previously invisible: issue visibility query Previously visible: continue 1. recursively Previously visible: continue 1. recursively  Leaf Issue visibility query Issue visibility query Previously visible: render immediately Previously visible: render immediately 2.Check availability of query results Invisible: propagate visibility change Invisible: propagate visibility change Visible: render or continue 1. recursively Visible: render or continue 1. recursively

12 Michael WimmerVienna University of Technology 12 Why Interleaving Works… Processing a node only depends on… Processing a node only depends on… 1.Front to back order 2.Results of queries for processed nodes where: Previous frame: processed node  current node S&WCHC visible  visible yesno visible  invisible yesno invisible  visible yesno invisible  invisible (different subtrees) yesno invisible  invisible (parent  child, refinement of visibility) yesyes

13 Michael WimmerVienna University of Technology 13 no queries for previously visible interior nodes CHC: Hierarchy Traversal 1011 76 5 8 1 29 3 4 5 768 1011 1213 assume no query dependencies previously visible previously invisible front-to-back order hidden regions: queries depend on parents 47 6 81213 10 9 511 3

14 Michael WimmerVienna University of Technology 14 CHC Features Reduction of CPU stalls and GPU starvation Reduction of CPU stalls and GPU starvation  Interleaving queries with rendering previously visible geometry Reduction of the number of queries Reduction of the number of queries  Avoids expensive redundant queries for interior nodes  Size of tested regions adapts to visibility pull-up: occluded region growing pull-up: occluded region growing pull-down: visible region growing pull-down: visible region growing

15 Michael WimmerVienna University of Technology 15 Implementation Issues Front-to-back traversal Front-to-back traversal  Priority queue: allows various hierarchical data structures Checking query results Checking query results  glGetOcclusionQueryivNV  GL_PIXEL_COUNT_AVAILABLE_NV  Very cheap operation Queries for previously visible nodes Queries for previously visible nodes  Use actual geometry as occludee (instead of bounding box)

16 Michael WimmerVienna University of Technology 16 Further Optimizations Conservative visibility testing Conservative visibility testing  Assume visible node remains visible n frames + Saves additional occlusion queries Approximate visibility Approximate visibility  #visible pixels < threshold  node invisible + Saves rendered geometry - Produces image errors

17 Michael WimmerVienna University of Technology 17 Results – Test Scenes Teapots 11.5M triangles 21k kD-tree nodes City 1M triangles 33k kD-tree nodes Power plant 12.7M triangles 18.7k kD-tree nodes

18 Michael WimmerVienna University of Technology 18 Results – Speedup Ideal:zero overhead – render only visible geometry

19 Michael WimmerVienna University of Technology 19 Results – Summary Comparison to hierarchical S&W Comparison to hierarchical S&W  #queries reduced by almost 2  Times for stalls reduced by 20-60x (to 0.18 –1.31ms) Close to ideal algorithm! Close to ideal algorithm!  Only 2–9ms slower  Overhead due to query time

20 Michael WimmerVienna University of Technology 20 Results – Teapot

21 Michael WimmerVienna University of Technology 21 Results – City

22 Michael WimmerVienna University of Technology 22 Results – Powerplant

23 Michael WimmerVienna University of Technology 23 Optimization Results Conservative culling, 2 frames assumed visible Conservative culling, 2 frames assumed visible  Good for deep hierarchies with simple leaf geometry  Further speedup up to 21% Approximate culling, 25 pixels threshold Approximate culling, 25 pixels threshold  Good for scenes with complex visible geometry  Further speedup up to 33%

24 Michael WimmerVienna University of Technology 24 Conclusion Efficient scheduling of hardware occlusion queries Efficient scheduling of hardware occlusion queries  Greatly reduces CPU stalls and GPU starvation  Reduces number of required queries Simple to implement Simple to implement Arbitrary hierarchical data structure Arbitrary hierarchical data structure Speedup ~4 over VFC Speedup ~4 over VFC Close to ideal solution for tested scenes Close to ideal solution for tested scenes Watch out for GPU Gems II Watch out for GPU Gems II

25 Michael WimmerVienna University of Technology 25 Thanks for Your Attention

26 Michael WimmerVienna University of Technology 26 previously visible: continue 1. recursively previously visible: render CHC: Example previously visible: issue query + render query result available: continue 1. recursivelypull-up invisibilityfinal classificationpreviously invisible: queryquery result available: renderquery result available: cull query queue GPU 1 29 10113 476 5 8 R4 5 6 Q5Q6/R6 7 Q7 8 Q8R7 10 Q10/R10 11 Q11 issued queries R6Q6/ query result available: mark visible Q10


Download ppt "Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna."

Similar presentations


Ads by Google