Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University of California, San Diego
Motivation
Motivation Interactive global illumination on the GPU Interactive global illumination on the GPU Nearly have sufficient compute power and flexibility Nearly have sufficient compute power and flexibility Explore GPU-based computation algorithms Explore GPU-based computation algorithms
Related Work CPU-based interactive global illumination CPU-based interactive global illumination Supercomputers [Parker et al.] Supercomputers [Parker et al.] Clusters [Tole et al., Wald et al.] Clusters [Tole et al., Wald et al.] Global illumination on programmable GPUs Global illumination on programmable GPUs Ray tracing [Carr et al., Purcell et al.] Ray tracing [Carr et al., Purcell et al.] Photon mapping [Ma et al.] Photon mapping [Ma et al.] Radiosity [Carr et al., Coombe et al.] Radiosity [Carr et al., Coombe et al.] Translucency [Carr et al., Stamminger et al.] Translucency [Carr et al., Stamminger et al.]
Photon Mapping Algorithm Review Photon tracing Photon tracing Emission, scattering, storing into kd-tree Emission, scattering, storing into kd-tree Similar to ray tracing Similar to ray tracing Rendering Rendering Ray tracing for direct illumination Ray tracing for direct illumination Photon map visualization Photon map visualization Indirect bounce Indirect bounce
Computational Challenge for GPUs #1 Constructing a irregular or sparse data structure Constructing a irregular or sparse data structure
Computational Challenge for GPUs #2 Adaptive nearest neighbor search Adaptive nearest neighbor search Noise vs. blur Noise vs. blur
Computational Challenge for GPUs #2 Adaptive nearest neighbor search Adaptive nearest neighbor search Noise vs. blur Noise vs. blur
Photon Mapping on the CPU Balanced kd-tree Balanced kd-tree Compact storage of photons Compact storage of photons Efficient Efficient O(log n) search O(log n) search Priority queue Priority queue Nearest neighbor search Nearest neighbor search Incremental insertion and removal of photons Incremental insertion and removal of photons
Algorithmic Changes for the GPU Direct visualization of photon map Direct visualization of photon map Keeps rendering costs low Keeps rendering costs low Use grid instead of kd-tree Use grid instead of kd-tree Tried kd-tree… Tried kd-tree… Kd-tree construction is difficult Kd-tree construction is difficult Radiance estimate Radiance estimate –Fixed radius search works fine –Adaptive search needs priority queue No priority queue No priority queue Can’t build on GPU Can’t build on GPU Too much state Too much state
Contributions Mapped complete grid-based photon mapping algorithm onto the GPU Mapped complete grid-based photon mapping algorithm onto the GPU Including photon tracing, ray tracing, etc. Including photon tracing, ray tracing, etc. Implemented an adaptive k-nearest neighbor search Implemented an adaptive k-nearest neighbor search kNN-grid kNN-grid Show how to construct a sparse data structure on the GPU Show how to construct a sparse data structure on the GPU Bitonic merge sort with binary search Bitonic merge sort with binary search Stencil routing Stencil routing
Configuring the GPU for Computing GPU as data parallel compute engine GPU as data parallel compute engine Fragment programs execute compute kernels Fragment programs execute compute kernels Screen sized quad initializes computation Screen sized quad initializes computation SIMD execution SIMD execution Floating point texture memory Floating point texture memory Render-to-texture for intermediate results Render-to-texture for intermediate results Data structure storage Data structure storage Pointer dereferencing via dependent fetches Pointer dereferencing via dependent fetches
Computational Challenge #1 Building a Sparse Data Structure
Requires scatter Requires scatter Dependent texture write Dependent texture write Why don’t we have fragment scatter? Why don’t we have fragment scatter? Fragment processing has highly coherent blocked memory writes Fragment processing has highly coherent blocked memory writes Extra hardware support would be needed Extra hardware support would be needed Write hazards Write hazards Memory latencies Memory latencies
Scatter on the GPU Sort photons into grid cells Sort photons into grid cells Grid cell is sort key Grid cell is sort key Simulate scatter with fragment programs Simulate scatter with fragment programs Bitonic merge sort followed by binary search Bitonic merge sort followed by binary search Compact grid Compact grid O(log 2 n) rendering passes O(log 2 n) rendering passes
Bitonic Merge Sort O(log 2 n) rendering passes
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v2 Searching for first v5 photon initialize
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v2 v5 Searching for first v5 photon initialize step 1
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v2 v5 v2 Searching for first v5 photon initialize step 1 step 2
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v0v0v2v2v2v0v5 v2 v5 v2 v5 Searching for first v5 photon initialize step 1 step 2 step 3
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v0v0v2v2v2v0v5 v0v0v2v2v2v0v5 v2 v5 v2 v5 v5 Searching for first v5 photon initialize step 1 step 2 step 3 step 4
Scatter on the GPU Vertex programs can scatter Vertex programs can scatter Draw point to buffer Draw point to buffer Collisions? Collisions?
Scatter on the GPU Vertex programs can scatter Vertex programs can scatter Draw point to buffer Draw point to buffer Collisions? Collisions? Stencil routing Stencil routing Limit photon count per grid cell Limit photon count per grid cell –Pre-allocate grid cell space Draw photons as points Draw photons as points –Vertex program computes grid cell Stencil buffer controls location within cell Stencil buffer controls location within cell Single rendering pass Single rendering pass
Stencil Routing Fix each grid cell size to n 2 pixels Fix each grid cell size to n 2 pixels Draw fat points to cover each fat cell Draw fat points to cover each fat cell glPointSize(n) glPointSize(n) Vertex ( photon_pos ) Vertex Program Flattened Grid 4 pixels
Stencil Routing Control location written to with stencil Control location written to with stencil Pass when stencil is n 2 -1 Pass when stencil is n 2 -1 Stencil always increments Stencil always increments Location written depends on draw order Location written depends on draw order Vertex ( photon_pos ) Vertex Program Flattened Grid 1 pixel Stencil 4 pixels Stencil Values
Computational Challenge #2 Adaptive Nearest Neighbor Search
Iterative algorithm Iterative algorithm Accept or reject photons in cell visit order Accept or reject photons in cell visit order
kNN-grid Algorithm sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Candidate photons must be within max search radius Candidate photons must be within max search radius Visit voxels in order of distance to sample point Visit voxels in order of distance to sample point sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm If current number of photons in estimate is less than number requested, grow search radius If current number of photons in estimate is less than number requested, grow search radius 1 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm If current number of photons in estimate is less than number requested, grow search radius If current number of photons in estimate is less than number requested, grow search radius 2 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Don’t add photons outside maximum search radius Don’t add photons outside maximum search radius Don’t grow search radius when photon is outside maximum radius Don’t grow search radius when photon is outside maximum radius 2 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Add photons within search radius Add photons within search radius 3 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Add photons within search radius Add photons within search radius 4 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Don’t expand search radius if enough photons already found Don’t expand search radius if enough photons already found 4 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Add photons within search radius Add photons within search radius 5 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Visit all other voxels accessible within determined search radius Visit all other voxels accessible within determined search radius Add photons within search radius Add photons within search radius 6 sample point photons in estimate candidate photon Want a 4 photon estimate
kNN-grid Algorithm Finds all photons within a sphere centered about sample point Finds all photons within a sphere centered about sample point May locate more than requested k-nearest neighbors May locate more than requested k-nearest neighbors 6 sample point photons in estimate candidate photon Want a 4 photon estimate
System Implementation NVIDIA GeForce FX 5900 Ultra (NV35) NVIDIA GeForce FX 5900 Ultra (NV35) Cg compiler 1.1 Cg compiler 1.1 Trace Photons Build Photon Map Ray Trace Scene Compute Radiance Estimate Compute LightingRender Image
Demos
Glass Ball – Bitonic Sort 512x384, 5K photons
Glass Ball – Stencil Routing 512x384, 5K photons
Ring – Bitonic Sort 512x384, 16K photons
Ring – Stencil Routing 512x384, 16K photons
Cornell Box – Bitonic Sort 512x512, 65K photons
Cornell Box – Stencil Routing 512x512, 65K photons
Cornell Box – Increased Search Radius
Open Issues (1) How to prevent program execution over a subset of pixels? How to prevent program execution over a subset of pixels? Non-uniform pixel computation distribution Non-uniform pixel computation distribution Radiance estimate Radiance estimate KILL is only a write mask KILL is only a write mask Early-z occlusion culling Early-z occlusion culling No pixel level control No pixel level control Compute mask, branching, or stream buffer? Compute mask, branching, or stream buffer? Improve radiance estimate speed by 30-70% over tiling Improve radiance estimate speed by 30-70% over tiling
Open Issues (2) Scatter Scatter Makes (a programmer’s) life easier Makes (a programmer’s) life easier Is it worth implementing? Is it worth implementing? Gain factor of log 2 n avoiding sort Gain factor of log 2 n avoiding sort
Future Work Kd-trees Kd-trees Photon power redistribution Photon power redistribution Adaptive sampling Adaptive sampling Progressive refinement Progressive refinement
Conclusions The GPU can compute an entire global illumination solution The GPU can compute an entire global illumination solution Nearly interactive Nearly interactive Implemented an adaptive k-nearest neighbor query for the GPU Implemented an adaptive k-nearest neighbor query for the GPU kNN-grid kNN-grid Shown how to construct sparse data structures on the GPU Shown how to construct sparse data structures on the GPU Bitonic merge sort and binary search Bitonic merge sort and binary search Stencil routing Stencil routing Sorting and searching algorithms applicable to other computations Sorting and searching algorithms applicable to other computations
Acknowledgments Stanford FlashG Stanford FlashG Ian Buck, Mike Houston, Kekoa Proudfoot Ian Buck, Mike Houston, Kekoa Proudfoot Stencil routing Stencil routing Kurt Akeley, Matt Papakipos Kurt Akeley, Matt Papakipos Hardware and drivers Hardware and drivers David Kirk, Nick Triantos David Kirk, Nick Triantos Funding Funding NVIDIA, DARPA, NSF, 3Com NVIDIA, DARPA, NSF, 3Com