Download presentation

Presentation is loading. Please wait.

Published byKimberly Starling Modified about 1 year ago

2
Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer »Corrinne.Yu@microsoft.com

3
Zen of multi core rendering »Take away »Compilation and survey of effective rendering techniques for current generation multi core console hardware

4
Rendering equation

5
»Radiance leaving a point »Integral of radiance in all direction

6
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function

7
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position

8
Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position

9
Rendering equation »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position »Attenuation of inward light due to incident angle with surface normal

10
Compromise and cheats »This is computed per surface element »This is infeasibly expensive »In the past, we made quality compromises throughout to make run time rendering possible

11
First generation »1 to 4 dynamic lights »Simple point lights »Lambertian »Blinn-Phong approximation »Pre-computed diffuse radiosity »Shadow map optional

12
Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory

13
Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory

14
Second generation »500 million triangles per second »4 gigapixels per second »8 giga texels per second »256 gigabytes of bandwidth per second »512 megabytes of video memory

15
Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory

16
Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory

17
Second generation »Large number of lights of precomputed radiance transfer »Environment and area lights »Realistic reflectance models »Cook Torrance, Ward »Shadow map

18
Large lights integral »Large number of lights integral »Static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

19
Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory

20
Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory

21
Amdahl’s law

22
Multi core insight »Fill rate is achieved by completely asynchronous out of order VPU (Vector Processing Unit) computation »My experience with CUDA is that there are intentionally no synchronization primitives

23
Multi core insight »On Larrabee, each core has 4 hardware threads »Each thread is out of order »But for one thread’s execution, the vertices and pixels are synchronized

24
Multi core insight »So there are essentially 256 out of order processes »Each consisting of a batch of about 16 synchronized pixels or vertices in flight at any one time

25
Multi core insight »Expectation is shader flops will grow the most »Speed not from higher clock rate »Speed from larger number of low power cores »Memory is not exepcted to catch up to shader flops

26
Multi core insight »ALU's or VPU's to increase by 300x »Future is tfetch bound, not ALU bound »Homogeneous computing »Keep ALU's or VPU's very busy with cache coherent local data

27
Multi core generation »Occlusion from static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

28
Multi core generation »Occlusion from dynamic geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

29
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

30
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »Low-frequency illumination »Image-space resolution limited

31
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »Image-space resolution limited

32
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution

33
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

34
Practical techniques »Directional light map basis »Zonal harmonics »Screen space ambient occlusion »Shadow map

35
Directional light map »Proposed by Valve's G McTaggert for Half Life »Used in many game games like Half Life and Unreal

36
Directional light map »Spatial axial basis »(- 1 / sqrt(6), - 1 / sqrt(2), 1 / sqrt(3) ) »( - 1 / sqrt(6), 1 / sqrt(2), 1 / sqrt(3) ) »( sqrt(2 / 3), 0, 1 / sqrt(3) )

37
Analysis »Static radiance can interact with directional changes of reflectance surface »Per pixel normal reflectance of radiosity »Per pixel normal specularity

38
Analysis »Basis and precision are not uniformly distributed »Radiance is correct at exactly 3 clamped directions »Radiance undersampling occurs for wide ranges of directions »Only for hemisphere

39
Pre-computed radiance transfer »Zonal harmonics »R Ramamoorthi and P Hanrahan came up with an efficient representation for irradiance environment

40
Irradiance environment map »Only 1 st 2 orders of zonal harmonics »Only use 9 terms »Average errors only 1% against raytracing »Much less error prone than directional light maps

41
Analysis »Completely feasible in current hardware »Better than directional light maps

42
Analysis »Completely feasible in current hardware »Better than directional light maps »Only the lowest of frequencies »Incapable of representing dynamic local lights

43
Screen space ambient occlusion »Developed by V Kajalin »Used first in Crysis »Used by game games like Crysis and Unreal »Sample depth difference between screen space neighbors as occlusion factor

44
Optimization »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise

45
Analysis »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise »Low number samples lead to low impact visual effect

46
Shadow map »Xbox 360 has several hardware bilinear weight fetch instructions »Performance boosters »Use it for hardware accelerated percentage closer filtering »getWeights1D, getWeights2D, getWeights3D, getWeightsCube

47
Shadow map »Poisson filter with rotating kernel is shipped in many games, including Fable 2, Brothers in Arms, and so on

48
Poisson distribution

49
Poisson filter »Generate random numbers with this distribution »Rotate them »Offset source sample by the jitters »Render weighted accumulation

50
Analysis »Shadow map itself has no soft edge »Soft shadow map is created from jitters and filters »Shadow map is an image based technique of finite resolution

51
Analysis »Still a fast technique for high frequency local lighting »10000 spherical harmonics term will not give you the occlusion shadow map will give you »Still useful for a very long time

52
Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

53
Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization

54
Dynamic radiance linear discriminant analysis BRDF factorization wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches rasterization factorized radiance caches factorized BRDF dynamic radiance

55
Radiance caches wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches

56
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

57
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

58
Haar wavelet basis »Spherical harmonics is not the only basis available for radiance transfer »Radiance and sum of area lights can also be represented by Haar wavelets

59
Haar wavelet radiance »What is exciting about Haar wavelet is that its radiance visibility triple integral is fast enough to run on GPU in real time

60
Haar wavelet

61
2D Haar wavelet and visibility »The visibility function V(x, theta) is also a binary function »Multiplying visibility to wavelet radiance is spatially and physically turning parts of the wavelet equation on and off

62
Wavelet and integrals »The integral of the product of wavelet radiance and visibility also simplifies the run-time equation

63
Wavelet visibility insights »In some ways, spherical harmonics is the frequency corrected distribution of the basis in directional light map »Zonal harmonics correctly samples and stores radiance contribution without a preference to a direction

64
Wavelet visibility insights »“Simulating soft shadows with graphics hardware” Heckbert, Herf, 1997 »Heckbert rendered soft shadows by rendering shadows from 100 lights to create shadow penumbra

65
Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance

66
Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance »It was GPU accelerated for its time!

67
Multi core rendering »What is the modern multi core shader / homogenous function pipeline version of this technique?

68
Multi core rendering »Not just shadows, the full radiance illumination model

69
Multi core rendering »Not just shadows, the full radiance illumination model »Not one light per pass, sample sparse wavelet data efficiently in tfetchCube

70
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

71
Dynamic radiance »For dynamic geometry, convolution of the visibility changes with the radiance wavelet coefficients must be performed before the radiance is applied »Still challenging to perform at run time

72
Ray tracing or radiosity »Capture only occlusion »Capture the full transport and full reflectance distribution »GPU occlusion through rasterization »GPU kd-tree line trace

73
Capture only occlusion »Feasible with current hardware »Fast »GPU side, hardware occlusion »CPU side, line trace into kd-tree »Visually unsophisticated

74
Capture full reflectance transport »Visually much more complex than GPU occlusion »More expensive »Fill out wavelet probes on different threads across multiple frames »Unfinished wavelet probes still useful for radiance

75
Radiosity »The hemi-cube: a radiosity solution for complex environments. Cohen and Greenberg 1985 »Use GPU to rasterize radiance

76
Radiosity »Great for low frequency spherical harmonics »First pass has direct lighting only »For high frequency wavelets, needs excessively high resolution »No caustics, subsurface scattering

77
Radiosity »Low resolution first pass with GPU hemi-cube »Higher frequency passes with direction cube kd-tree line tracing

78
Raytracing »Direction cube techniques and ray tracer caches can take up too much memory »Reyes ray tracing may be more parallelizable, but be careful of bucket load balancing

79
Bounding volume hierarchy »Kd-tree can be 15x faster than BSP for ray tracing »SAH (surface area heuristic) only necessary in deeper nodes »For nodes close to root, divide by number of objects in boxes are good enough

80
Wavelet radiance analysis »It takes about 18 to 20 terms to represent all frequencies well »This is twice the number of terms for SH irradiance maps (9 terms)

81
Wavelet radiance analysis »Memory is much less because the probes are not pre-computed across the level »Fetching the terms to synthesize the radiance is twice or more the pixel ALU cost

82
Wavelet radiance analysis »18 wavelet terms, on the other hand, capture high frequency quality not captured by 10000 term spherical harmonics »Not exactly a 1:1 trade-off for high frequency or all frequency solution

83
Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

84
Radiance factorization »Radiance factorization is important to dynamic radiance transfer »Decompose radiance transfer

85
Radiance factorization »Spatial contribution

86
Radiance factorization »Spatial contribution »Angular contribution

87
Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution

88
Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution »Visibility contribution

89
Dimensionality reduction »Exponential growth with dimensionality and contribution factors »Dimensionality reduction to factorize the radiance triple integral

90
Dimensionality reduction »In reality, there top factors impact output more than less relevant factors

91
Dimensionality reduction »Principal components analysis »Linear discriminant analysis

92
Principal components »Principal »Orthogonal linear combinations with the largest variance »Secondary »Linear combination with the second largest variance and orthogonal to principal

93
Principal components »Use principal components to select important factors in the original radiance equation »Keep separating until factors are separated into components »Equation factored out into dynamic factors

94
Principal components »We can see how factoring principal components can factor out the primary impact of dynamic variables in the radiance equation

95
Principal component »PCA remaps an apparently complex function into feature or factor separable distribution

96
Principal components

97
Dimensionality reduction »PCA works best with purely orthogonal data »Unfortunately, radiance transfer is not very orthogonal at all »For better results, a dimensionality reduction algorithm should find separation even when there is none

98
Linear discriminant analysis »Works best for Gaussian distribution clusters »Finds separation even when there is (almost) none »LDA has potential to out-perform PCA in factorization of the rendering triple integral

99
Linear discriminant analysis »Same idea as PCA »Maximize separation by classification »Minimize variance within the classification after projection »Principal, secondary, …

100
D* for rendering? »B Guenter at MSR »Developed a compiler and declarative meta language D* »Creates optimized source code »Solve for dynamics of an equivalent system and no constraints

101
D* for rendering? »With fewer degrees of freedom »Uses analytic / symbolic approaches based on Lagrangian dynamics »Coordinate reduction and projection

102
D* for rendering? »Derive optional equations to solve for forward dynamics of the system »Necessary derivatives to linearlize the system’s equations of motion at any given configuration

103
D* for analytical models »Is there potential for D* to reduce dimension symbolically for the render equation?

104
Factorization technology »LDA and D* can be applied to factorize the triple integral »Factorization is essential to dynamic radiance

105
Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization

106
Dynamic radiance linear discriminant analysis BRDF factorization

107
Dynamic scenes »Before light reaches the eye, light undergoes a huge number of physical interactions with many objects »When these objects deform, animate, move, change, gets destroyed, reflectance distribution should update accordingly

108
Dynamic radiance »Factored dynamic radiance requires BRDF cooperation »Factored spatial radiance transfer, factored specular radiance transfer, needs to be evaluated with only the BRDF lobes that are affected

109
BRDF factorization »Efficiency and compression »Specular lobes require higher order basis for fidelity »Factorization keep the basis cost down

110
BRDFs »Cook Torrance »Oren Nayar »Ward »Linear combination of measured BRDFs

111
BRDF factorization prior work »BRDF factorization »“Interactive relighting with dynamic BRDFs” MSRA: Sun Zhou Chen Lin Shi Guo 2007 »They used PCA, not LDA. »I learned good BRDF factorization practices from this paper.

112
Factorization »The challenge of dynamic scene is that given a static world, the radiance inter-reflectance is determined by the configuration of the objects »We need factorization that takes deformation into account

113
Haar and factorization »Another reason I became interested in Haar wavelet representation of radiance is that it adapts very well with factorized tensors generated by LDA

114
Summary »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

115
Long tail Xbox 360

116
»Use LDA at build time to reduce dimensionality »Combine classifications »Reduce number of run time variables to principal components »Speed optimization

117
Future work

118
»Spherical wavelet instead of 2D haar wavelet?

119
Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA?

120
Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA? »Dimensionality reduction on a symbolic level?

121
Summary »Rally effort to develop symbolic kernels for dynamic radiance transfer »Rally effort to factorize the rendering equation triple integral with mathematic techniques or human manual optimization

122
Thank you »Corrinne.Yu@microsoft.com »Continue our discussion and future work to implement dynamic radiance at corrinnesdotplan.blogspot.com »Please fill in the survey.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google