Download presentation
Presentation is loading. Please wait.
1
GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University
2
GH05 Motivation Accelerated raytracing –On commodity HW –Production rendering –Real-time applications? Performance trend –9800 XT : 170M ray-triangle intersects/s –X800 XT PE: 350M ray-triangle intersects/s
3
GH05 GPU Raytracing Promising early results –Simple scenes Uniform grid –Problems with complex scenes Hierarchical accelerator (kd-tree) –Improve scalability
4
GH05 Outline Background –GPU Raytracing –KD-Tree Algorithm KD-Restart, KD-Backtrack Results Future Work
5
GH05 Background RayEngine [Carr et al. 2002] –Parallel ray-triangle intersection –Host controls culling [Purcell et al. 2002] –Entire raytracing pipeline –Many rays required for efficiency –Uniform Grid
6
GH05 Why not KD-Tree? Uniform grid acceleration structure –Regular structure = efficient traversal –Regular structure = poor partitioning KD-Trees –Adapt to scene complexity –Compact storage, efficient traversal –“Best” for CPU raytracing [Havran 2000]
7
GH05 A D C KD-Tree B X Y Z X YZ A B C D tmin tmax
8
GH05 D C A B X Y Z KD-Tree Traversal X YZ A B C D
9
GH05 Per-Fragment Stacks Parallel (per-ray) push –No indexed write in fragment program Per-ray stack storage [Ernst et al. 2004] –Emulate push with extra passes –Impractical, slow
10
GH05 Our Contribution Stackless kd-tree traversal algorithms –KD-Restart –KD-Backtrack
11
GH05 D C A B X Y Z Observation X YZ A B C D Current leaf’s tmaxNext leaf’s tmin =
12
GH05 D C A B X Y Z KD-Restart Standard traversal –Omit stack operations –Proceed to 1st leaf If no intersection –Advance (tmin,tmax) –Restart from root Proceed to next leaf
13
GH05 KD-Restart Restart traversal after each leaf –m leaves –Average depth d –Cost O(m*d) Balanced tree of n nodes –Upper bound: O(n log(n)) Standard algorithm: O(n) –Expected: O( log(n) )
14
GH05 D C A B X Y Z Observation X YZ A B C D Ancestor of A is parent of Z
15
GH05 D C A B X Y Z If no intersection –Advance (tmin, tmax) –Start backtracking If node intersects (tmin, tmax) –Resume traversal Proceed to next leaf KD-Backtrack
16
GH05 KD-Backtrack Backtrack after leaf –Revisits previous nodes –At most twice: from left, right Within constant factor of standard traversal –Upper bound: O(n) –Expected: O( log(n) ) Requires additional storage –Parent pointers –Bounding boxes for internal nodes
17
GH05 Implementation Built GPU raytracer in Brook [Buck et al.] 4 intersection schemes: –Brute Force –Uniform Grid –KD-Restart –KD-Backtrack
18
GH05 Scenes Cornell Box 32 triangles BART Robots 71708 triangles BART Kitchen 110561 triangles Stanford Bunny 69451 triangles
19
GH05 Results Relative speedup over brute-force intersection. 12.9 BoxBunnyRobotsKitchen
20
GH05 Results IdealRestartBacktrack Traverse10.86M21.80M10.86M Backtrack007.78M Intersect5.91M Rays in each state throughout traversal.
21
GH05 Discussion Absolute performance –Trails best CPU implementations 5-6x Sources of inefficiency –Load balancing –Data reuse
22
GH05 Load Balancing Subset of rays intersecting, traversing –Occlusion queries to select kernel –Early-Z to cull inactive rays Approximately 5x overhead –Query, kernel switch overhead –Worse with fewer rays
23
GH05 Data Reuse Every kernel –Loads ray origin/direction –Load/Store traversal state Consumes streaming bandwidth –We are bandwidth-limited –CPU implementation stores these in registers
24
GH05 Branching Merge multiple passes into larger kernel –Fragment branches for load balancing –Avoid load/store of reused data Current branching has high overhead Shifts efficiency burden to HW
25
GH05 Conclusion Stackless Traversal –Allows efficient GPU kd-tree –Scales to larger, more complex scenes Future Work –Changes in HW –Alternative acceleration structures –“Out-of-core” scenes –Dynamic scenes
26
GH05 Acknowledgements Tim Purcell (NVIDA) –Streaming raytracer Mark Segal (ATI) –Demo machine NVIDIA, ATI : HW DARPA, Rambus : Funding
27
GH05 Questions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.