Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of.

Similar presentations


Presentation on theme: "Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of."— Presentation transcript:

1 Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of North Carolina-Chapel Hill http://gamma.cs.unc.edu/switch

2 UNC Chapel Hill Avneesh Sud Goal Interactive Walkthrough of complex 3D environments at high fidelity –Models from CAD, VR –High primitive count –Heterogeneous geometry –Irregular distribution –No large occluders

3 UNC Chapel Hill Avneesh Sud DoubleEagle Tanker Model 82 million triangles 127,000 objects

4 UNC Chapel Hill Avneesh Sud SWITCH A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Generic models –No assumptions on model, distribution Computation done on GPUs

5 UNC Chapel Hill Avneesh Sud Previous Work Geometric Simplification Occlusion Culling Parallel Approaches Hybrid Approaches

6 UNC Chapel Hill Avneesh Sud Previous Work UNC MMR System [Aliaga99] –Used image based imposters, occlusion culling, LODs UNC GigaWalk [Baxter02] –Uses 2 graphics pipelines and multiple CPUs

7 UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

8 UNC Chapel Hill Avneesh Sud Overview A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Parallel Occlusion Culling on separate GPUs Graphics hardware optimizations Low network bandwidth requirements General and automatic preprocessing algorithm

9 UNC Chapel Hill Avneesh Sud Overview: Parallel Occlusion Culling Two pass version of Hierarchical Z- Buffer [Greene93] Exploits temporal coherence Works on generic models, conservative to image precision Avoid readback by ‘switching’ between 2 GPUs

10 UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

11 UNC Chapel Hill Avneesh Sud Scene Representation Computing appropriate spatial representation from a functional representation is non-trivial An object varies from a small bolt to a large pipe structure Redefine objects by partitioning and clustering

12 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization Object size Balanced trees Minimal bounding box overlap of sibling nodes SWITCH : A hybrid approach combining top- down partitioning with bottom-up clustering is used

13 UNC Chapel Hill Avneesh Sud Partitioning and Clustering Partitioning splits large objects into multiple objects –Do not split polygons Clustering groups objects with low polygon counts based on spatial proximity The combination redistributes geometry with good localization and object size

14 UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results Powerplant: Original Objects Powerplant: Clustered Objects

15 UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results DoubleEagle: Original Objects DoubleEagle: Clustered Objects

16 UNC Chapel Hill Avneesh Sud Unified Hierarchy Objects are organized into a scene graph hierarchy Single unified hierarchy used for occlusion culling and LOD-based rendering –Low storage overhead –Simple conservative occlusion culling algorithm SWITCH: A top-down AABB bounding volume hierarchy is constructed from redefined objects

17 UNC Chapel Hill Avneesh Sud HLOD Generation Construct Hierarchical LODs of the AABB scene graph as in [Erikson01] Use GAPS simplification algorithm [Erikson99] HLOD generation is done out-of-core –Store only the LODs of current node and immediate children in main memory

18 UNC Chapel Hill Avneesh Sud Hierarchical Occluders A hierarchical occluder associated with a node is an approximation of the group of occluders in its subtree HLODs provide an lower polygon count approximation of a group of occluders – serve as hierarchical occluders Perform object space occluder fusion Conservative occlusion culling

19 UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

20 UNC Chapel Hill Avneesh Sud Parallel Algorithm and Architecture Three processes in parallel 1.Occluder Rendering (OR): Renders occluder set to depth buffer on GPU1 2.Hardware Culling (HC): Computes visible geometry using hardware occlusion query on GPU2 3.Render Visible Geometry (RVG): Renders visible geometry on GPU3

21 UNC Chapel Hill Avneesh Sud GPU 1 GPU 2 GPU 3 Display Geometry For Frame i RVG Render Occluders For Frame i+1 OR Hardware Cull For Frame i HC Hardware Cull For Frame i+1 HC Hardware Cull For Frame i+2 HC Render Occluders For Frame i+2 OR Render Occluders For Frame i+3 OR Display Geometry For Frame i+1 RVG Display Geometry For Frame i+2 RVG Frame iFrame i+1Frame i+2 System Timing/Data Flow Z-Buffer SWITCH

22 UNC Chapel Hill Avneesh Sud Conservative Occlusion Culling Underlying HZB algorithm used for occlusion culling is conservative to image precision Exactly same set of LODs is used for both OR and STC stages –Z buffer used for culling is consistent with the geometry

23 UNC Chapel Hill Avneesh Sud Hardware Culling Use GL_NV_OCCLUSION_QUERY to determine visible pixels Traverse scene hierarchy rendering bounding boxes of nodes

24 UNC Chapel Hill Avneesh Sud LOD Selection Pixel Error Metric: Max normal deviation of silhouette in image Traverse down scene graph till error satisfied Upper Bound: Highly conservative DE Engine Room 1K x 1K, 20 PEError Image

25 UNC Chapel Hill Avneesh Sud GPU Optimizations Multiple Occlusion Tests –Occlusion Query ‘counter’ for each node –Traverse scene graph bread first –Bunch queries for all nodes at a level –40% faster than testing one node with GL_HP_OCCLUSION_TEST

26 UNC Chapel Hill Avneesh Sud GPU Optimizations Visibility for LOD selection –Visible pixels of bounding box > visible pixels of geometry –No. of visible pixels less than error metric => early termination condition –Provides looser bounds – reduces polygon count

27 UNC Chapel Hill Avneesh Sud Bandwidth Requirements

28 UNC Chapel Hill Avneesh Sud Load Balancing Trade off between cluster size and culling efficiency Smaller clusters lead to deeper scene graph but improve culling performance Balances load between culling and rendering

29 UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

30 UNC Chapel Hill Avneesh Sud Implementation 3 Dell Precision Workstations with dual 2GHz Pentium4 CPUs, GeForce4 GPU, and 2GB main memory Network: –Implementation 1 : TCP/IP over 100Mbps Fast Ethernet –Implementation 2: TCP/IP over Myrinet

31 UNC Chapel Hill Avneesh Sud Test Model: Powerplant Original 0.5 Gigabyte dataset 13 Million Polygons 1200 objects Preprocessing 7 hours 1.2 Gigabytes 13 Million Polygons 38,000 objects

32 UNC Chapel Hill Avneesh Sud Test Model: DoubleEagle Tanker Original 4 Gigabyte dataset 82 Million polygons 127,000 objects Preprocessing 34 hours 8 Gigabytes 82 Million polygons 61,000 objects

33 UNC Chapel Hill Avneesh Sud Video

34 UNC Chapel Hill Avneesh Sud Video

35 UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 1024 x 1024 with 10 pixels of error using Ethernet

36 UNC Chapel Hill Avneesh Sud Results: Frame Rate DoubleEagle Model 1024 x 1024 with 20 pixels of error using Ethernet

37 UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Object Count

38 UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Polygon Count

39 UNC Chapel Hill Avneesh Sud Conclusions Able to interactively render large complex environments with good fidelity Integrates LODs and Occlusion Culling in a general, automatic parallel rendering algorithm A parallel architecture to balance load between 3 GPUs Efficient use of graphics hardware to solve geometric queries A unified scene hierarchy and automatic preprocessing for a generic model Introduces an end-to-end latency of 1 frame

40 UNC Chapel Hill Avneesh Sud Lessons Learned Parallelism –2 pipelines provide a speedup greater than factor of 2 for complex scenes Load Times –Asynchronous on-demand loading of geometry vastly improves system development and testing

41 UNC Chapel Hill Avneesh Sud Limitations and Future Work Static LODs lead to popping. Extend to a view-dependent framework An out-of-core algorithm to reduce main memory overhead as in [Varadhan02] Improve performance by reducing network latencies Make more novel uses of graphics hardware Target frame-rate rendering mode Drive large immersive displays

42 UNC Chapel Hill Avneesh Sud Wish List Multiple graphics cards on one motherboard NV_OCCLUSION_QUERY to also return completely visible / partially visible / completely occluded

43 UNC Chapel Hill Avneesh Sud Association with NVIDIA Obtained pre-release versions of drivers with NV_OCCLUSION_QUERY Addressed NV_OCCLUSION_QUERY bug in Linux drivers fast

44 UNC Chapel Hill Avneesh Sud Acknowledgements US ONR US ARO US DOE US NSF NVIDIA Corporation Intel Corporation NNS for the DoubleEagle model UNC Walkthrough group

45 The End

46 UNC Chapel Hill Avneesh Sud (a) Original 1 2 3 3d (b) Partitioned-I 3b 3a 3c 3e 2 (c) Clustered 2*2* 1*1* (d) Partitioned-II 2*a2*a 2*b2*b Hierarchy Generation (e) Compute a top-down AABB tree hierarchy on redefined objects

47 UNC Chapel Hill Avneesh Sud Performance Tuning Using visible geometry from 2 frames previous avoids bubbles in pipeline Tradeoff between fidelity and frame rate by adjusting pixels of error Asynchronous rendering pipeline Nth farthest Z buffer values Lower HZB resolution for occluder rendering

48 UNC Chapel Hill Avneesh Sud Video

49 UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 640 x 480 with 10 pixels of error on SGI

50 UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Object Count

51 UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Polygon Count

52 UNC Chapel Hill Avneesh Sud Outline Previous Work SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

53 UNC Chapel Hill Avneesh Sud Previous Work: Geometric Simplification Surveyed in [Leubke01] Static Vs View-Dependent Trouble with high depth complexity

54 UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms

55 UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments –Cells and Portals [Airey90] –Urban Datasets [Wonka00, Coorg97] –Large Occluders [Schaufler00] General Algorithms

56 UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms –HZB [Greene93], HOM [Zhang97]

57 UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms Performing exact visibility on large general datasets in real time is difficult Trouble with highly tessellated scenes

58 UNC Chapel Hill Avneesh Sud Previous Work: Parallel Approaches Object-Parallel, Screen-Parallel, Frame- Parallel Interactive ray tracing [Wald01] Perform culling in parallel with rendering –VFC in [Garlick90] –Occlusion Culling by occluder shrinking in [Wonka01] Scalable clusters, WireGL [Humphreys01]

59 UNC Chapel Hill Avneesh Sud Previous Work: Hybrid Approaches Combine LOD and Occlusion Culling techniques –UC Berkeley Walkthrough [Funkhouser96] –Synthetic convex occluders [Andujar01] –Approximate visibility using prioritized layer projections with view dependent rendering [ElSana01] –UNC MMR system [Aliaga99] Not demonstrated in high fidelity on complex CAD models

60 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization

61 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization

62 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Object size –Too large : loose bounding boxes, poor culling performance –Too small : very deep trees

63 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees

64 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees

65 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes

66 UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes

67 UNC Chapel Hill Avneesh Sud Clustering Clustering algorithm adapted from an image segmentation technique [FH98] MST’s to represent clusters Similar to Kruskal’s algorithm –Euclidean distance between clusters denotes edge weights –Edge weights represent variation in a cluster –2 clusters combined based on Hausdorff metric


Download ppt "Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of."

Similar presentations


Ads by Google