Download presentation
Presentation is loading. Please wait.
Published byKimberly Starling Modified over 9 years ago
2
Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer »Corrinne.Yu@microsoft.com
3
Zen of multi core rendering »Take away »Compilation and survey of effective rendering techniques for current generation multi core console hardware
4
Rendering equation
5
»Radiance leaving a point »Integral of radiance in all direction
6
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function
7
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position
8
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position
9
Rendering equation »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position »Attenuation of inward light due to incident angle with surface normal
10
Compromise and cheats »This is computed per surface element »This is infeasibly expensive »In the past, we made quality compromises throughout to make run time rendering possible
11
First generation »1 to 4 dynamic lights »Simple point lights »Lambertian »Blinn-Phong approximation »Pre-computed diffuse radiosity »Shadow map optional
12
Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory
13
Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory
14
Second generation »500 million triangles per second »4 gigapixels per second »8 giga texels per second »256 gigabytes of bandwidth per second »512 megabytes of video memory
15
Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory
16
Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory
17
Second generation »Large number of lights of precomputed radiance transfer »Environment and area lights »Realistic reflectance models »Cook Torrance, Ward »Shadow map
18
Large lights integral »Large number of lights integral »Static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited
19
Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory
20
Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory
21
Amdahl’s law
22
Multi core insight »Fill rate is achieved by completely asynchronous out of order VPU (Vector Processing Unit) computation »My experience with CUDA is that there are intentionally no synchronization primitives
23
Multi core insight »On Larrabee, each core has 4 hardware threads »Each thread is out of order »But for one thread’s execution, the vertices and pixels are synchronized
24
Multi core insight »So there are essentially 256 out of order processes »Each consisting of a batch of about 16 synchronized pixels or vertices in flight at any one time
25
Multi core insight »Expectation is shader flops will grow the most »Speed not from higher clock rate »Speed from larger number of low power cores »Memory is not exepcted to catch up to shader flops
26
Multi core insight »ALU's or VPU's to increase by 300x »Future is tfetch bound, not ALU bound »Homogeneous computing »Keep ALU's or VPU's very busy with cache coherent local data
27
Multi core generation »Occlusion from static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited
28
Multi core generation »Occlusion from dynamic geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited
29
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited
30
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »Low-frequency illumination »Image-space resolution limited
31
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »Image-space resolution limited
32
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution
33
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises
34
Practical techniques »Directional light map basis »Zonal harmonics »Screen space ambient occlusion »Shadow map
35
Directional light map »Proposed by Valve's G McTaggert for Half Life »Used in many game games like Half Life and Unreal
36
Directional light map »Spatial axial basis »(- 1 / sqrt(6), - 1 / sqrt(2), 1 / sqrt(3) ) »( - 1 / sqrt(6), 1 / sqrt(2), 1 / sqrt(3) ) »( sqrt(2 / 3), 0, 1 / sqrt(3) )
37
Analysis »Static radiance can interact with directional changes of reflectance surface »Per pixel normal reflectance of radiosity »Per pixel normal specularity
38
Analysis »Basis and precision are not uniformly distributed »Radiance is correct at exactly 3 clamped directions »Radiance undersampling occurs for wide ranges of directions »Only for hemisphere
39
Pre-computed radiance transfer »Zonal harmonics »R Ramamoorthi and P Hanrahan came up with an efficient representation for irradiance environment
40
Irradiance environment map »Only 1 st 2 orders of zonal harmonics »Only use 9 terms »Average errors only 1% against raytracing »Much less error prone than directional light maps
41
Analysis »Completely feasible in current hardware »Better than directional light maps
42
Analysis »Completely feasible in current hardware »Better than directional light maps »Only the lowest of frequencies »Incapable of representing dynamic local lights
43
Screen space ambient occlusion »Developed by V Kajalin »Used first in Crysis »Used by game games like Crysis and Unreal »Sample depth difference between screen space neighbors as occlusion factor
44
Optimization »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise
45
Analysis »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise »Low number samples lead to low impact visual effect
46
Shadow map »Xbox 360 has several hardware bilinear weight fetch instructions »Performance boosters »Use it for hardware accelerated percentage closer filtering »getWeights1D, getWeights2D, getWeights3D, getWeightsCube
47
Shadow map »Poisson filter with rotating kernel is shipped in many games, including Fable 2, Brothers in Arms, and so on
48
Poisson distribution
49
Poisson filter »Generate random numbers with this distribution »Rotate them »Offset source sample by the jitters »Render weighted accumulation
50
Analysis »Shadow map itself has no soft edge »Soft shadow map is created from jitters and filters »Shadow map is an image based technique of finite resolution
51
Analysis »Still a fast technique for high frequency local lighting »10000 spherical harmonics term will not give you the occlusion shadow map will give you »Still useful for a very long time
52
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises
53
Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization
54
Dynamic radiance linear discriminant analysis BRDF factorization wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches rasterization factorized radiance caches factorized BRDF dynamic radiance
55
Radiance caches wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches
56
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization
57
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization
58
Haar wavelet basis »Spherical harmonics is not the only basis available for radiance transfer »Radiance and sum of area lights can also be represented by Haar wavelets
59
Haar wavelet radiance »What is exciting about Haar wavelet is that its radiance visibility triple integral is fast enough to run on GPU in real time
60
Haar wavelet
61
2D Haar wavelet and visibility »The visibility function V(x, theta) is also a binary function »Multiplying visibility to wavelet radiance is spatially and physically turning parts of the wavelet equation on and off
62
Wavelet and integrals »The integral of the product of wavelet radiance and visibility also simplifies the run-time equation
63
Wavelet visibility insights »In some ways, spherical harmonics is the frequency corrected distribution of the basis in directional light map »Zonal harmonics correctly samples and stores radiance contribution without a preference to a direction
64
Wavelet visibility insights »“Simulating soft shadows with graphics hardware” Heckbert, Herf, 1997 »Heckbert rendered soft shadows by rendering shadows from 100 lights to create shadow penumbra
65
Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance
66
Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance »It was GPU accelerated for its time!
67
Multi core rendering »What is the modern multi core shader / homogenous function pipeline version of this technique?
68
Multi core rendering »Not just shadows, the full radiance illumination model
69
Multi core rendering »Not just shadows, the full radiance illumination model »Not one light per pass, sample sparse wavelet data efficiently in tfetchCube
70
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization
71
Dynamic radiance »For dynamic geometry, convolution of the visibility changes with the radiance wavelet coefficients must be performed before the radiance is applied »Still challenging to perform at run time
72
Ray tracing or radiosity »Capture only occlusion »Capture the full transport and full reflectance distribution »GPU occlusion through rasterization »GPU kd-tree line trace
73
Capture only occlusion »Feasible with current hardware »Fast »GPU side, hardware occlusion »CPU side, line trace into kd-tree »Visually unsophisticated
74
Capture full reflectance transport »Visually much more complex than GPU occlusion »More expensive »Fill out wavelet probes on different threads across multiple frames »Unfinished wavelet probes still useful for radiance
75
Radiosity »The hemi-cube: a radiosity solution for complex environments. Cohen and Greenberg 1985 »Use GPU to rasterize radiance
76
Radiosity »Great for low frequency spherical harmonics »First pass has direct lighting only »For high frequency wavelets, needs excessively high resolution »No caustics, subsurface scattering
77
Radiosity »Low resolution first pass with GPU hemi-cube »Higher frequency passes with direction cube kd-tree line tracing
78
Raytracing »Direction cube techniques and ray tracer caches can take up too much memory »Reyes ray tracing may be more parallelizable, but be careful of bucket load balancing
79
Bounding volume hierarchy »Kd-tree can be 15x faster than BSP for ray tracing »SAH (surface area heuristic) only necessary in deeper nodes »For nodes close to root, divide by number of objects in boxes are good enough
80
Wavelet radiance analysis »It takes about 18 to 20 terms to represent all frequencies well »This is twice the number of terms for SH irradiance maps (9 terms)
81
Wavelet radiance analysis »Memory is much less because the probes are not pre-computed across the level »Fetching the terms to synthesize the radiance is twice or more the pixel ALU cost
82
Wavelet radiance analysis »18 wavelet terms, on the other hand, capture high frequency quality not captured by 10000 term spherical harmonics »Not exactly a 1:1 trade-off for high frequency or all frequency solution
83
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization
84
Radiance factorization »Radiance factorization is important to dynamic radiance transfer »Decompose radiance transfer
85
Radiance factorization »Spatial contribution
86
Radiance factorization »Spatial contribution »Angular contribution
87
Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution
88
Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution »Visibility contribution
89
Dimensionality reduction »Exponential growth with dimensionality and contribution factors »Dimensionality reduction to factorize the radiance triple integral
90
Dimensionality reduction »In reality, there top factors impact output more than less relevant factors
91
Dimensionality reduction »Principal components analysis »Linear discriminant analysis
92
Principal components »Principal »Orthogonal linear combinations with the largest variance »Secondary »Linear combination with the second largest variance and orthogonal to principal
93
Principal components »Use principal components to select important factors in the original radiance equation »Keep separating until factors are separated into components »Equation factored out into dynamic factors
94
Principal components »We can see how factoring principal components can factor out the primary impact of dynamic variables in the radiance equation
95
Principal component »PCA remaps an apparently complex function into feature or factor separable distribution
96
Principal components
97
Dimensionality reduction »PCA works best with purely orthogonal data »Unfortunately, radiance transfer is not very orthogonal at all »For better results, a dimensionality reduction algorithm should find separation even when there is none
98
Linear discriminant analysis »Works best for Gaussian distribution clusters »Finds separation even when there is (almost) none »LDA has potential to out-perform PCA in factorization of the rendering triple integral
99
Linear discriminant analysis »Same idea as PCA »Maximize separation by classification »Minimize variance within the classification after projection »Principal, secondary, …
100
D* for rendering? »B Guenter at MSR »Developed a compiler and declarative meta language D* »Creates optimized source code »Solve for dynamics of an equivalent system and no constraints
101
D* for rendering? »With fewer degrees of freedom »Uses analytic / symbolic approaches based on Lagrangian dynamics »Coordinate reduction and projection
102
D* for rendering? »Derive optional equations to solve for forward dynamics of the system »Necessary derivatives to linearlize the system’s equations of motion at any given configuration
103
D* for analytical models »Is there potential for D* to reduce dimension symbolically for the render equation?
104
Factorization technology »LDA and D* can be applied to factorize the triple integral »Factorization is essential to dynamic radiance
105
Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization
106
Dynamic radiance linear discriminant analysis BRDF factorization
107
Dynamic scenes »Before light reaches the eye, light undergoes a huge number of physical interactions with many objects »When these objects deform, animate, move, change, gets destroyed, reflectance distribution should update accordingly
108
Dynamic radiance »Factored dynamic radiance requires BRDF cooperation »Factored spatial radiance transfer, factored specular radiance transfer, needs to be evaluated with only the BRDF lobes that are affected
109
BRDF factorization »Efficiency and compression »Specular lobes require higher order basis for fidelity »Factorization keep the basis cost down
110
BRDFs »Cook Torrance »Oren Nayar »Ward »Linear combination of measured BRDFs
111
BRDF factorization prior work »BRDF factorization »“Interactive relighting with dynamic BRDFs” MSRA: Sun Zhou Chen Lin Shi Guo 2007 »They used PCA, not LDA. »I learned good BRDF factorization practices from this paper.
112
Factorization »The challenge of dynamic scene is that given a static world, the radiance inter-reflectance is determined by the configuration of the objects »We need factorization that takes deformation into account
113
Haar and factorization »Another reason I became interested in Haar wavelet representation of radiance is that it adapts very well with factorized tensors generated by LDA
114
Summary »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises
115
Long tail Xbox 360
116
»Use LDA at build time to reduce dimensionality »Combine classifications »Reduce number of run time variables to principal components »Speed optimization
117
Future work
118
»Spherical wavelet instead of 2D haar wavelet?
119
Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA?
120
Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA? »Dimensionality reduction on a symbolic level?
121
Summary »Rally effort to develop symbolic kernels for dynamic radiance transfer »Rally effort to factorize the rendering equation triple integral with mathematic techniques or human manual optimization
122
Thank you »Corrinne.Yu@microsoft.com »Continue our discussion and future work to implement dynamic radiance at corrinnesdotplan.blogspot.com »Please fill in the survey.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.