Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
1 A B C
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
EuroCondens SGB E.
Worksheets.
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Slide 1Fig 25-CO, p.762. Slide 2Fig 25-1, p.765 Slide 3Fig 25-2, p.765.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
1 When you see… Find the zeros You think…. 2 To find the zeros...
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Break Time Remaining 10:00.
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
2013 Fox Park Adopt-A-Hydrant Fund Raising & Beautification Campaign Now is your chance to take part in an effort to beautify our neighborhood by painting.
Regression with Panel Data
Operating Systems Operating Systems - Winter 2012 Chapter 2 - Processes Vrije Universiteit Amsterdam.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
Artificial Intelligence
When you see… Find the zeros You think….
Midterm Review Part II Midterm Review Part II 40.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies Tero Karras Timo Aila

Ray tracing comes in many flavors Courtesy of Delta Tracing Lucasfilm Ltd.™, Digital work by ILM NVIDIA © Activision 2009, Game trailer by Blur Studio Courtesy of Columbia Pictures Courtesy of Dassault Systemes Interactive apps Architecture & design Movie production 1M–100M rays/frame 100M–10G rays/frame 10G–1T rays/frame

Effective performance 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒= 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒

Effective performance 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒= 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒=𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻+ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 “speed” “quality”

Effective performance 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑟𝑎𝑦 𝑡𝑟𝑎𝑐𝑖𝑛𝑔 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒= 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑟𝑒𝑛𝑑𝑒𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒=𝑡𝑖𝑚𝑒 𝑡𝑜 𝑏𝑢𝑖𝑙𝑑 𝐵𝑉𝐻+ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑟𝑎𝑦 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 Mrays/s Speed matters Both matter Quality matters Interactive apps Architecture & design Movie production 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SODA (2.2M tris) NVIDIA GTX Titan Diffuse rays Mrays/s 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SBVH [Stich et al. 2009] (CPU, 4 cores) Mrays/s 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 Build time dominates 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance HLBVH [Garanzha et al. 2011] (GPU) SBVH [Stich et al. 2009] (CPU, 4 cores) Mrays/s ??? 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance HLBVH [Garanzha et al. 2011] (GPU) Our method (GPU) SBVH [Stich et al. 2009] (CPU, 4 cores) Mrays/s 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance Best quality–speed tradeoff for wide range of applications 30M–500G rays/frame Mrays/s 97% of SBVH 𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants

Treelet restructuring Treelet root Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants R Treelet leaf

Treelet restructuring Treelet root Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes Grow R Treelet leaf

Treelet restructuring internal node Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes R Grow

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes R

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes R

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes R

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes Largest leaves → best results R

Treelet restructuring 𝑛−1=6 treelet internal nodes Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes Largest leaves → best results Valid binary tree in itself R C D A B G E F 𝑛=7 treelet leaves

Treelet restructuring Idea Build a low-quality BVH Optimize its node topology Look at multiple nodes at once Treelet Subset of a node’s descendants Grow by turning leaves into internal nodes Largest leaves → best results Valid binary tree in itself Leaves can represent arbitrary subtrees of the BVH R C D A B G E F Arbitrary subtree Actual BVH leaf

Treelet restructuring Construct optimal binary tree for the same set of leaves Replace old treelet R C D A B G E F

Treelet restructuring Construct optimal binary tree for the same set of leaves Replace old treelet Reuse the same nodes Update connectivity and AABBs New AABBs should be smaller R C D A B G E F

Treelet restructuring Construct optimal binary tree for the same set of leaves Replace old treelet Reuse the same nodes Update connectivity and AABBs New AABBs should be smaller R C D A B G E F

Treelet restructuring Construct optimal binary tree for the same set of leaves Replace old treelet Reuse the same nodes Update connectivity and AABBs New AABBs should be smaller R D G E A F B C

Treelet restructuring Construct optimal binary tree for the same set of leaves Replace old treelet Reuse the same nodes Update connectivity and AABBs New AABBs should be smaller Perfectly localized operation Leaves and their subtrees are kept intact No need to look at subtree contents R D G E A F B C

Initial BVH construction Processing stages Input triangles Initial BVH construction One triangle per leaf Optimization Post-processing

Initial BVH construction Processing stages Parallel LBVH [Karras 2012] Initial BVH construction Optimization Post-processing 60-bit Morton codes for accurate spatial partitioning

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing Restructure multiple treelets in parallel

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing

Processing stages Parallel bottom-up traversal [Karras 2012] Initial BVH construction Optimization Post-processing Strict bottom-up order → no overlap between treelets

Initial BVH construction Processing stages Initial BVH construction Optimization Post-processing Rinse and repeat (3 times is plenty)

Initial BVH construction Processing stages Initial BVH construction Optimization Post-processing Collapse subtrees into leaf nodes

Processing stages Initial BVH construction Optimization Post-processing Collect triangles into linear lists Prepare them for Woop’s intersection test [Woop 2004]

Initial BVH construction Processing stages Initial BVH construction Optimization Post-processing Fast GPU ray traversal [Aila et al. 2012]

Initial BVH construction Processing stages Triangle splitting Initial BVH construction Optimization Post-processing Fast GPU ray traversal [Aila et al. 2012]

Initial BVH construction Processing stages Triangle splitting 0.4 ms 5.4 ms Initial BVH construction No splits 6.6 ms 17.0 ms Optimization 21.4 ms 1.2 ms Post-processing 30% splits 1.6 ms Fast GPU ray traversal [Aila et al. 2012] DRAGON (870K tris) NVIDIA GTX Titan 23.6 ms / 30.0 ms

Cost model Surface area cost model [Goldsmith and Salmon 1987], [MacDonald and Booth 1990] 𝑆𝐴𝐻 ≔ 𝐶 𝑖 𝑛∈𝐼 𝐴(𝑛) 𝐴(root) + 𝐶 𝑡 𝑙∈𝐿 𝐴 𝑙 𝐴 root 𝑁(𝑙) Track cost and triangle count of each subtree Minimize SAH cost of the final BVH Make collapsing decisions already during optimization → Unified processing of leaves and internal nodes

Optimal restructuring Finding the optimal node topology is NP-hard Naive algorithm → 𝒪 𝑛! Our approach → 𝒪 3 𝑛 But it becomes very powerful as 𝑛 grows 𝑛=7 treelet leaves is enough for high-quality results Use fixed-size treelets Constant cost per treelet → Linear with respect to scene size

Optimal restructuring * SODA (2.2M tris) Treelet size Layouts Quality vs. SBVH * 4 15 78% 5 105 85% 6 945 88% 7 10,395 97% 8 135,135 98% Number of unique ways for restructuring a given treelet Ray tracing performance after 3 rounds of optimization

Optimal restructuring * SODA (2.2M tris) Almost the same thing as tree rotations [Kensler 2008] Treelet size Layouts Quality vs. SBVH * 4 15 78% 5 105 85% 6 945 88% 7 10,395 97% 8 135,135 98% Varies a lot between scenes Limited options during optimization → easy to get stuck in a local optimum

Optimal restructuring * SODA (2.2M tris) Treelet size Layouts Quality vs. SBVH * 4 15 78% 5 105 85% 6 945 88% 7 10,395 97% 8 135,135 98% Can still be implemented efficiently Consistent across scenes Surely one of these will take us forward  Further improvement is marginal

Algorithm Dynamic programming Subproblem: Solve small subproblems first Tabulate their solutions Build on them to solve larger subproblems Subproblem: What’s the best node topology for a subset of the leaves?

Algorithm input: set of 𝑛 treelet leaves for 𝑘=2 to 𝑛 do for each subset of size 𝑘 do for each way of partitioning the leaves do look up subtree costs calculate SAH cost end for record the best solution reconstruct optimal topology Process subsets from smallest to largest Record the optimal SAH cost for each

Algorithm Exhaustive search: assign each leaf to left/right subtree input: set of 𝑛 treelet leaves for 𝑘=2 to 𝑛 do for each subset of size 𝑘 do for each way of partitioning the leaves do look up subtree costs calculate SAH cost end for record the best solution reconstruct optimal topology We already know how much the subtrees will cost Backtrack the partitioning choices

Scalar vs. SIMD Scalar processing SIMD processing Each thread processes one treelet Need many treelets in flight SIMD processing 32 threads collaborate on the same treelet Need few treelets in flight ✗ Spills to off-chip memory ✗ Doesn’t scale to small scenes ✓ Trivial to implement ✓ Data fits in on-chip memory ✓ Easy to fill the entire GPU ✗ Need to keep all threads busy Parallelize over subproblems using a pre-optimized processing schedule (details in the paper)

Scalar vs. SIMD Scalar processing SIMD processing Each thread processes one treelet Need many treelets in flight SIMD processing 32 threads collaborate on the same treelet Need few treelets in flight ✗ Spills to off-chip memory ✗ Doesn’t scale to small scenes ✓ Trivial to implement ✓ Data fits in on-chip memory ✓ Easy to fill the entire GPU ✓ Possible to keep threads busy

Quality vs. speed Spend less effort on bottom-most nodes Low contribution to SAH cost Quick convergence Additional parameter 𝛾 Only process subtrees that are large enough Trade quality for speed Double 𝛾 after each round Significant speedup Negligible effect on quality

Triangle splitting Early Split Clipping [Ernst and Greiner 2007] Split triangle bounding boxes as a pre-process Large triangle Bounding box is not a good approximation Split it!

Triangle splitting Early Split Clipping [Ernst and Greiner 2007] Split triangle bounding boxes as a pre-process Resulting boxes provide a tighter bound Keep going until they are small enough

Triangle splitting Early Split Clipping [Ernst and Greiner 2007] Split triangle bounding boxes as a pre-process Keep going until they are small enough

Triangle splitting Early Split Clipping [Ernst and Greiner 2007] Split triangle bounding boxes as a pre-process Treat each box as a separate primitive Triangle itself remains the same

Triangle splitting Shortcomings of pre-process splitting Can hurt ray tracing performance Unpredictable memory usage Requires manual tuning Improve with better heuristics Select good split planes Concentrate splits where they matter Use a fixed split budget

Split plane selection Reduce node overlap in the initial BVH Root node partitions the scene at its spatial median

Split plane selection Reduce node overlap in the initial BVH Left child Right child

...the bounding boxes will overlap Split plane selection Reduce node overlap in the initial BVH If a triangle crosses the plane... Use the same spatial median as a split plane ...the bounding boxes will overlap

Split plane selection Reduce node overlap in the initial BVH Use the same spatial median as a split plane No overlap

Split plane selection Reduce node overlap in the initial BVH Splitting one triangle does not help much Need to split them all to get the benefits

Split plane selection Reduce node overlap in the initial BVH

Same reasoning holds on multiple levels Split plane selection Reduce node overlap in the initial BVH Level 3 Level 1 Level 3 Level 2 Level 0 Level 2 Same reasoning holds on multiple levels

Split plane selection Reduce node overlap in the initial BVH Look at all spatial median planes that intersect a triangle Reduce node overlap in the initial BVH Level 3 Level 1 Split it with the dominant one Level 3 Level 2 Level 0 Level 2

Algorithm Allocate memory for a fixed split budget

Algorithm Allocate memory for a fixed split budget Calculate a priority value for each triangle

Algorithm Allocate memory for a fixed split budget Calculate a priority value for each triangle Distribute the split budget among triangles Proportional to their priority values

Algorithm Allocate memory for a fixed split budget Calculate a priority value for each triangle Distribute the split budget among triangles Proportional to their priority values Split each triangle recursively Distribute remaining splits according to the size of the resulting AABBs

Split priority 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦= 2 (−𝑙𝑒𝑣𝑒𝑙) ∙ 𝐴 𝑎𝑎𝑏𝑏 − 𝐴 𝑖𝑑𝑒𝑎𝑙 1 3 𝑝𝑟𝑖𝑜𝑟𝑖𝑡𝑦= 2 (−𝑙𝑒𝑣𝑒𝑙) ∙ 𝐴 𝑎𝑎𝑏𝑏 − 𝐴 𝑖𝑑𝑒𝑎𝑙 1 3 Crosses an important spatial median plane? Has large potential for reducing surface area? …but leave something for the rest, too Concentrate on triangles where both apply

Results Compare against 4 CPU and 3 GPU builders 4-core i7 930, NVIDIA GTX Titan Average of 20 test scenes, multiple viewpoints

Ray tracing performance SBVH = 131% SweepSAH = 100% High-quality CPU builders SweepSAH [MacDonald] SBVH [Stich] Tree rotations [Kensler] Iterative reinsertion [Bittner]

Ray tracing performance SBVH = 131% Almost 2× 67% – 69% Fast GPU builders SweepSAH [MacDonald] SBVH [Stich] Tree rotations [Kensler] Iterative reinsertion [Bittner] LBVH [Karras] HLBVH [Garanzha] GridSAH [Garanzha]

Ray tracing performance Our method 30% splits No splits SweepSAH [MacDonald] SBVH [Stich] Tree rotations [Kensler] Iterative reinsertion [Bittner] TRBVH TRBVH +30% split LBVH [Karras] HLBVH [Garanzha] GridSAH [Garanzha]

Ray tracing performance 91% of SBVH 96% of SweepSAH SweepSAH [MacDonald] SBVH [Stich] Tree rotations [Kensler] Iterative reinsertion [Bittner] TRBVH TRBVH +30% split LBVH [Karras] HLBVH [Garanzha] GridSAH [Garanzha]

Effective performance SBVH [Stich] Iterative reinsertion [Bittner] SweepSAH [MacDonald] Tree rotations [Kensler] Not Pareto-optimal 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SBVH [Stich] SweepSAH [MacDonald] 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SBVH [Stich] Not Pareto-optimal LBVH [Karras] HLBVH [Garanzha] GridSAH [Garanzha] SweepSAH [MacDonald] 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SBVH [Stich] LBVH [Karras] SweepSAH [MacDonald] 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance SBVH [Stich] Our method (30% splits) Our method (no splits) LBVH [Karras] SweepSAH [MacDonald] 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Effective performance Below 7M → LBVH 7M–60G rays/frame → our method is the best choice Above 60G → SBVH 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑎𝑦𝑠 𝑝𝑒𝑟 𝑓𝑟𝑎𝑚𝑒

Conclusion General framework for optimizing trees Inherently parallel Approximate restructuring → larger treelets? Practical GPU-based BVH builder Best choice in a large class of applications Adjustable quality–speed tradeoff Will be integrated into NVIDIA OptiX

Thank you Acknowledgements Samuli Laine Jaakko Lehtinen Sami Liedes David McAllister Anonymous reviewers Anat Grynberg and Greg Ward for CONFERENCE University of Utah for FAIRY Marko Dabrovic for SIBENIK Ryan Vance for BUBS Samuli Laine for HAIRBALL and VEGETATION Guillermo Leal Laguno for SANMIGUEL Jonathan Good for ARABIC, BABYLONIAN and ITALIAN Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON Cornell University for BAR Georgia Institute of Technology for BLADE