GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University.

GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University

GH05 Motivation Accelerated raytracing –On commodity HW –Production rendering –Real-time applications? Performance trend –9800 XT : 170M ray-triangle intersects/s –X800 XT PE: 350M ray-triangle intersects/s

GH05 GPU Raytracing Promising early results –Simple scenes Uniform grid –Problems with complex scenes Hierarchical accelerator (kd-tree) –Improve scalability

GH05 Outline Background –GPU Raytracing –KD-Tree Algorithm KD-Restart, KD-Backtrack Results Future Work

GH05 Background RayEngine [Carr et al. 2002] –Parallel ray-triangle intersection –Host controls culling [Purcell et al. 2002] –Entire raytracing pipeline –Many rays required for efficiency –Uniform Grid

GH05 Why not KD-Tree? Uniform grid acceleration structure –Regular structure = efficient traversal –Regular structure = poor partitioning KD-Trees –Adapt to scene complexity –Compact storage, efficient traversal –“Best” for CPU raytracing [Havran 2000]

GH05 A D C KD-Tree B X Y Z X YZ A B C D tmin tmax

GH05 D C A B X Y Z KD-Tree Traversal X YZ A B C D

GH05 Per-Fragment Stacks Parallel (per-ray) push –No indexed write in fragment program Per-ray stack storage [Ernst et al. 2004] –Emulate push with extra passes –Impractical, slow

GH05 Our Contribution Stackless kd-tree traversal algorithms –KD-Restart –KD-Backtrack

GH05 D C A B X Y Z Observation X YZ A B C D Current leaf’s tmaxNext leaf’s tmin =

GH05 D C A B X Y Z KD-Restart Standard traversal –Omit stack operations –Proceed to 1st leaf If no intersection –Advance (tmin,tmax) –Restart from root Proceed to next leaf

GH05 KD-Restart Restart traversal after each leaf –m leaves –Average depth d –Cost O(m*d) Balanced tree of n nodes –Upper bound: O(n log(n)) Standard algorithm: O(n) –Expected: O( log(n) )

GH05 D C A B X Y Z Observation X YZ A B C D Ancestor of A is parent of Z

GH05 D C A B X Y Z If no intersection –Advance (tmin, tmax) –Start backtracking If node intersects (tmin, tmax) –Resume traversal Proceed to next leaf KD-Backtrack

GH05 KD-Backtrack Backtrack after leaf –Revisits previous nodes –At most twice: from left, right Within constant factor of standard traversal –Upper bound: O(n) –Expected: O( log(n) ) Requires additional storage –Parent pointers –Bounding boxes for internal nodes

GH05 Implementation Built GPU raytracer in Brook [Buck et al.] 4 intersection schemes: –Brute Force –Uniform Grid –KD-Restart –KD-Backtrack

GH05 Scenes Cornell Box 32 triangles BART Robots 71708 triangles BART Kitchen 110561 triangles Stanford Bunny 69451 triangles

GH05 Results Relative speedup over brute-force intersection. 12.9 BoxBunnyRobotsKitchen

GH05 Results IdealRestartBacktrack Traverse10.86M21.80M10.86M Backtrack007.78M Intersect5.91M Rays in each state throughout traversal.

GH05 Discussion Absolute performance –Trails best CPU implementations 5-6x Sources of inefficiency –Load balancing –Data reuse

GH05 Load Balancing Subset of rays intersecting, traversing –Occlusion queries to select kernel –Early-Z to cull inactive rays Approximately 5x overhead –Query, kernel switch overhead –Worse with fewer rays

GH05 Data Reuse Every kernel –Loads ray origin/direction –Load/Store traversal state Consumes streaming bandwidth –We are bandwidth-limited –CPU implementation stores these in registers

GH05 Branching Merge multiple passes into larger kernel –Fragment branches for load balancing –Avoid load/store of reused data Current branching has high overhead Shifts efficiency burden to HW

GH05 Conclusion Stackless Traversal –Allows efficient GPU kd-tree –Scales to larger, more complex scenes Future Work –Changes in HW –Alternative acceleration structures –“Out-of-core” scenes –Dynamic scenes

GH05 Acknowledgements Tim Purcell (NVIDA) –Streaming raytracer Mark Segal (ATI) –Demo machine NVIDIA, ATI : HW DARPA, Rambus : Funding

GH05 Questions

GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University.

Similar presentations

Presentation on theme: "GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University.

Similar presentations

Presentation on theme: "GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback