Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley

Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley
Sparsely Precomputing the Light Transport Matrix for Real Time Rendering Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley (Hi, Good morning everyone, thanks for coming to my talk this early ) My paper is: Sparsely ….. I’m Fu-Chung from UC Berkeley

Precomputed Radiance Transfer (PRT)
is a new recent development for real time rendering. Natural environment lighting, intricate shadows, and Glossy materials by pre-computation. The method makes possible rapid prototyping of lighting design, for games and movies. Although we can achieve interactive rate rendering, the time spent in pre-computation is very long. (click) Like in this case, the pre-computation takes 13 hours, and the time becomes a bottleneck prevents people using it. Our goal in this paper is to accelerate the pre-computation. So we want to define the problem first. [Sloan et al. 02] [Ng et al. 03, 04] [Liu et al. 04] [Wang et al. 04] Precomputation Time: 13 hours 22 mins

The Problem Reflectance equation Separable Dynamic lighting
Ng et al. 04 Reflectance equation Separable Dynamic lighting Static geometry Lighting Visibility BRDF So to render this nice scene, we have to consider the reflectance equation If we want to render the red dot B(x), whose value is given by the integral of (surrounding L) (the V in all direction) and (BRDF) ----- (click) The key insight here is that light is dynamically change, and is separable from the rest in the equation. (click) So we can combine the information only relevant to the static scene, by pre-computing the V and BRDF into (click) the transport function T. Then in real time, we just need to multiply the lighting with the transport function. (click) Notice that by the nature of precomputation, we are also limited to static scene. Beside this, what is the problem of pre-computing the transport function T ? Limited to static scene

Storage and Time Matrix Compression 50K vert. (rows)
24K directions (cols) 1.2B Rays Compression 10x to 50x Precomputation time is not reduced! Clustered PCA Assume in a typical scene, we have 50k vertices, Each with 24k angular direction to sample Then we have 1.2 B rays to samples, and 1.2B cell to store. (click) Then the immediate problem is how to compress the data (click) For example, people use wavelet transformation To reduce the coeffs from several thousands to a few hundreds, And they can be further quantized to 6-8bits. --- (click) On the other direction, with the *spatial coherences*, We can cluster vertices into smaller group, and compress them with very few PCA bases. (click) These combined give 10-50x compression rate, However, these requires additional time to compute? (click) and notice that: Even we reduce the storage, pre-computation time is not reduced actually! So the Q is CAN we find some coherence at sampling stage to reduce sampling time? Wavelet + Quantization

Precomputation Time Buddha Scene 5x speed up in precomputation
The answer is yes… We can render with the same quality, with reduced pre-computation time! In this example, we can sample the scene with 2 hour and 30m Achieving 5x speed up in precomputation. Sparse Sampling Full Sampling 2 hours 36 mins 13 hours 22 mins

Precomputation Time Bunny Scene 4x speed up precomputation
Here is another example with bunny model, again we achieve around 4x speedup in precomputation So before we start our algorithm, I want to first review some important literature, Sparse Sampling Full Sampling 3 hours 25mins 13 hours 8 mins

Outline Related Work Motivation / Introduction Algorithm Results
Conclusion / Future Work

Related Work Precomputation based rendering
[Nimeroff et al. 94] [Dorsey et al. 95] Focus only on added functionality [Wang et al. 04,06] [Sloan et al. 02,03] [Ng et al. 03,04] The idea of PRT can be traced back to 15 years ago, and is formally introduced by Sloal et al. They use SH to achieve real-time Low-Freq shadow, and further they propose CPCA to compress the data (click) Ng: use non-linear Wavelet APPROX to achieve all-frequency shadow, and later introduce the notion of triple product integral. (click) Want: show that glossy material is also possible for all frequency shadow PRT Notice that these are just *representative seminal paperS*. There are still a lot of work presented in SIGGRAPH/EGSR these years OK, since these are nice work, then what is the problem with them? (click) well, these methods keep adding new features for real-time rendering, But the sampling time is still not reduced Low frequency shadows CPCA compression High frequency shadows Triple product integral Glossy materials

Related Work Compressive sensing Not applicable to PRT
[Candes et al. 06] [Candes and Tao 06] Sampling Rate = k logN for k-sparse signal Not applicable to PRT No random pattern sampling in virtual scene Must sample one ray at a time [Peers et al. 09] [Sen and Darabi. 09] Compressive sensing is an interesting research topic to sparsely sample the data. Canades et al have shown that a signal with length N can be *recovered* with klogN samples, if it has a k-sparse representation (click) The result is applied to appearance acquisition, by Peers et al. And Sen and Darabi Sen and Darabi also has a new paper this year showing up later In this section (click) However, the method is not … because there is no random sampling pattern in virtual scene, Where we use ray-tracing to trace one direction at a time ----- So probably we need something that applies to point samples

Related Work Row-column sampling
Column selection is fixed across all rows R C A B = CA+R Row-column sampling is a method that we can use, since it simply sample certain row/column cells , then recover/approximate the entire matrix (click) Here is the work by Hasan et al. (click) They sparsely sample the matrix for many light contribution P. (click) They first sample some rows, and (click) by finding clusters, they use cluster centers to represent the matrix (click) Wang et al. use NYSTROM method sparse sample the matrix (click) With certain rows and column sampled, (click) They can fully recover the matrix These methods are very close reference to our work, (click) But they use the fixed column selection across all rows. [Hasan et al. 07] [Wang et al. 09]

Related Work Hierarchical and Sparse Sampling Adaptive Methods
[Kontkanen et al. 06] [Hasan et al. 07] [Lehtinen et al. 08] [Krivanek and Gautron 09] Adaptive Methods [Guo 98] [Krivanek et al. 04] [Krivanek et al. 04] [Krivanek and Gautron 09] Finally I want to reference some other work in offline rendering, In the sense of sparse-sampling-interpolation, Our method is also related to irradiance caching, And some other work However, we focus on drastically changing high frequency shadows, While these method focused on smooth indirect illumination, (click) Finally the adaptive remeshing can be seen as compliment way to our method. These methods explore the spatial coherence, but the angular coherence is not utilized

Outline Algorithm Motivation / Introduction Related Work Results
Conclusion / Future Work So after reviewing some previous work, lets talk about our method

Algorithm Outline Overview Dense Vertex Sampling
Sparse Vertex Sampling Integrating Clustered PCA I will start with simple overview

Overview - 2 Phase Sampling
Dense vertex Spatial: 20%~25% Sparse vertex Spatial: 75%~80% Angular: ~30% Our method use a two-phase sampling strategy It separate the scene in 2 kinds of vertex (click) The first kind is call dense vertex, we sample their angular direction fully or densely (click) And for the second kind called sparse vertex, (click) we sample their angular direction sparsely, (click) by using information from neighboring dense vertices, we can reconstruct it later (click) Around 20-25% of the vertices are dense, whose angular sampling rate are around 30% (click) The rest 75-80% are sparse, their angular sampling rate accounts for only 5-7% If we view the problem in a row-column sampling way Angular: 5%~7%

Overview - 2 Phase Sampling
Row-column sense Angular: ~30% Spatial: 20%~25% =6%~7% Dense Vertex Spatial: 75%~80% =4%~5% Sparse Vertex Then given a matrix, (click) we first sample 20-25% rows for dense vertex (click) with the column sampling rate ~30% (click) which account for 6-7% total cost (click) for the rest 75-80% rows are sparse vertices (click) we sample columns with 5-7%, (click) with total cost 4-5% ---- In next few slides I will explain the sampling for dense vertex Angular: 5%~7%

Algorithm Outline Dense Vertex Sampling Overview
Where? How? What? Sparse Vertex Sampling Integrating Clustered PCA So in this section, I will describe for dense vertex: Where to choose them How to choose? And what angular direction to trace ray

Dense Vertex Distribution
Observation from CPCA Non-uniform cluster sizes Large cluster: low rank Small cluster: high rank Non-uniform sampling But how? We begin by the observation from CPCA, And you can see that the clusters are not uniform in size. (click) after further inspection, we find out large cluster has low rank, and small cluster has high rank (click) this give us hint that we should have more samples at high rank area So how to sample? Without prior knowledge, we don’t have the rank information in advance?

Dense Vertex Spatial Sampling
Sampling by exploration 1st iteration Uniform Local rank -> probability 2nd iteration And so on…. Zoom-Up High rank area Here we get the information during the sampling iterations. (click) 1st we have uniform sampling, using these samples to calculate local rank. We use the rank to assign probability for the next iteration sampling (click) *Notice the red box*, which is the high rank area, so that this area is more likely to be sampled. (click) In second iteration, we again calculate rank for next iteration probability And so on. (click**) here is the zoom up, We can see samples concentrate to high rank regions. After I describe how to sample spatially, I will describe what angular direction to sample [2nd iteration] [3rd iteration] [4th iteration] [1st iteration]

Dense Adaptive Angular Sampling
1st pass: regularly 2nd pass: adaptively If values are inconsistent 50%-70% savings We use adaptive angular sampling to find features (click)We first sample regularly at lower resolution (click) For all other directions, (click) we check the sampled value in the window (click) if these value are inconsistent, (click) we simply sample that direction (click) and by this we can sample directions around the feature (click) this gives around 50~70% saving

Sparse Vertex Sampling
Algorithm Outline Overview Dense Vertex Sampling Sparse Vertex Sampling Angular Sampling Local Reconstruction Integrating Clustered PCA So after sampling the dense vertex in spatial and angular domain, then I want to show how to deal with the sparse vertex

Sparse Vertex Angular Sampling
Remember in overview… The sparse set of important angular features In overview, we’ve shown that for sparse vertex, (click) we sample their angular direction sparsely, (click) and use neighboring dense vertices for reconstruction, this section I will show how to do that (click) The first problem is to determine … How can we start? (click) Basically we can start from the neighboring dense vertices (click) Here from previous feature adaptive sampling, we know where the features are… (click) So we will select from the union of all these feature direction. But there are too many of them, (click) So we use variance to cut-off unimportant features. (click) and here is the final selection So once we’ve selected the sparse set of angular features, I’ll describe how to reconstruct the transport function Weighted selection, 5%~7% Neighboring angular features The light transport from its neighboring dense vertices Union of these features, but too many Variances of neighboring angular features

Sparse Vertex Reconstruction
Reconstructed light transport ? Sampled directions as constraints Here we use a very simple linear system for reconstruction The assumption is that: the transport function should be some linear combination from the neighboring dense vertices. Such that (click)the sampled directions should (click)Equals to the linear combination of the *selected directions from dense neighbors* (click)So by solving for alpha, we can recover the light transport. The final Q is how many neighbors should we use?

Sparse Vertex Reconstruction
How many neighbors? In reality L1 sparse solver [Kim et al. 07] No exact radius needed (click) Here we’ve experimented with increasing radius to include more neighbors (click) Ideally, if we have full information on how to interpolate the sparse vertex the green curve is something we would expect!!!! (click) In reality, we’ve got blue curve with least squares method, with increasing error, Because it over fit to the sparse constraints. (click) Inspired by recent development in sparse reconstruction, we used L1 solver, and have lower error rate for the red curve And because of the sparse nature, we don’t need to specify the exact radius.

Integrating Clustered PCA
Algorithm Outline Overview Dense Vertex Sampling Sparse Vertex Sampling Integrating Clustered PCA So here after the sampling for D and S V we further found that the current algorithm fit nicely with CPCA

Clustered PCA Incrementally adding bases
[Sloan et al. 03] Avoid local minimum Linear combination assumption! run clustering on *dense* vertices 1. for each PCA basis for each LBG iteration end for run clustering on *all* vertices The original CPCA algorithm is proposed by sloan et al 03. In order to avoid local minimum, they use incremental bases addition. (click) Their method runs a nested loop on *all vertices*, and becomes very expensive. (click) Our observation is that sparse V are linear comb from D V, (click) So we only run the nested loop for dense vertex, And you can see the clustering for dense vertex on the left Then we assign the rest sparse vertices to their nearest cluster, And run the clustering just once The result is shown on the left The result give us great amount of speed up, that we will show later Assign sparse vertex to nearest cluster 2. Run inner loop for all vertices 3.

Outline Results Motivation / Introduction Related Work Algorithm
Error analysis Performance Conclusion / Future Work Here I will show the error comparison and performance

Analysis: Angular Sampling
Adaptive sampling Non-adaptive angular sampling We first compare our angular sampling strategy with related previous work So Here is Wang et al (click) And here is Hasan et al (click) Both methods non-adaptive… Which means their angular selection is fixed, (click) So a lot of samples are wasted. (click) our method use adaptive sampling, (click) where samples are sent to really important regions [Ours] [Wang et al. 09] [Hasan et al. 07] Sampling important directions A lot of directional samples are wasted

Analysis: L2 Error for Bunny
35K + 29K vertices Scanned High error at seams For the error analysis, The bunny is a scanned model, vertices are distributed evenly. Notice that there are some high error rate at the seams But in general, error is low everywhere and un-noticeable

Analysis: L2 Error for Horse
8.5K + 29K vertices Low resolution Man made On the other hand, we have a fairly simplified mesh. This is a low-resolution and man-made model, which means vertex distribution is not uniform. A miss in important region can be large. However the error is still low in general and not noticeable in the rendering

Performance Precomputation time only
Rendering is real-time with the same quality Model Horse Dancer Bunny CPCA New CPCA Speed Up 52m 4m 12.6x 45m 3m 81m 6m 12.8x Full Sampling Sparse Sampling Sampling Rate Speed Up 4h 2m 54m ~11% 4.47x 5h 40m 1h 15m 4.77x 13h 8m 3h 26m 3.83x Now I want to show the table for precomputation time. The rendering is real-time with the same quality (click) So for the fully sampled version, it took 4-13 hours to sample (click) Our sparse method only samples 11%, the time reduced to 1-3 hours. (click) Achieving 3-4x speed up (click) here is the time for CPCA, we achieve 12x speed up.

Performance for Glossy Objects
In-Out Factorization for glossy BRDFs [Liu et al. 04] [Wang et al. 04] Model Size Full Sampling Sparse Sampling Rate Speed Up Armadillo 55K 10h 7m 2h 17m 11.43% 4.75x Buddha 13h 22m 2h 36m 11.11% 5.12x Dragon 10h 31m 2h 1m 11.16% 5.22x Bench 50K 7h 7m 1h 56m 18.54% 3.67x Here is another performance table for glossy objects We use in-out Fact. for glossy BRDF (click) the size of the scene range from 50-55K vertices, (click) the precomputation time range from 7-13 hours (click) again with our method, we can sample in about 2 hours (click) Notice that the bench scene has lots of high frequency features, so we sent more samples for this particular scene. (click) finally we can still do 3-5x faster.

Precomputation Time Bench Scene 3.6x speed up in precomputation
This is the bench scene I’ve just mentioned The is one extreme scene that we have for stressed test. Notice that there are a lot of high frequency features, Vide-dependent high light, and intricate shadows, Although this is a very difficult scene for sparse sampling, we can still precompute within 2 hours, Resulting in 3.6x sped up Sparse Sampling Full Sampling 1 hours 56mins 7 hours 7 mins

Detailed Timing Here is the system time breakdown,
(click) most cost are spent on ray-tracing And you can see our overhead is actually minimum In addition, although in our case 10% sampling accounts for 5x speedup, these are rays sent to important region, that actually need more attention, As our algorithm is designed.

Conclusion PRT research on real-time functionality
Precomputation is often the bottleneck Adaptive and sparse sampling Exploit both spatial and angular coherence Accelerated Clustered PCA compression Sparse precomputation possible 5x speed-up 12x speed-up computing CPCA So to conclude my talk, The current PRT research still focus on new functionalities But the precomputation is still the bottle neck prevent people using it (click) In this paper we propose an adaptive and sparse sampling scheme That take advantage in both S and A coherence Also we’ve show that CPCA can be integrated with the new scheme (click) Finally we show 5x speed up in sampling And 12x speed up in compression.

Future Work GPU acceleration New capabilities
Interactive precomputation New capabilities Rapid prototyping Lighting design Dynamic scenes General theory of sparse sampling Avoid heuristic parameter tuning Broader context Appearance acquisition Offline rendering For the future work, Since most of our system time are still ray-tracing, GPU can help to accelerate that, and we can expected an interactive precomputation (click) For the new capability, the fast precomputation enables a faster Rapid prototyping… Also temporal coherence can be utilized for dynamic scene (click) In addition, we would like to see some general theory for sparse sampling in the rendering community, Since most parameters are heuristically tuned (click) In a broader context, sparse sampling is also important in Appearance acquisition and offline rendering And we would like to see more of them.

The End Acknowledgements Ryan S. Overbeck Anonymous reviewers
NSF CAREER Grant IIS ONR PECASE N Intel NVIDIA Adobe Pixar

Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley

Similar presentations

Presentation on theme: "Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley

Similar presentations

Presentation on theme: "Fu-Chung Huang Ravi Ramamoorthi University of California, Berkeley"— Presentation transcript:

Similar presentations

About project

Feedback