Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer

Similar presentations


Presentation on theme: "Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer"— Presentation transcript:

1

2 Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer »Corrinne.Yu@microsoft.com

3 Zen of multi core rendering »Take away »Compilation and survey of effective rendering techniques for current generation multi core console hardware

4 Rendering equation

5 »Radiance leaving a point »Integral of radiance in all direction

6 Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function

7 Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position

8 Rendering equation »Radiance leaving a point »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position

9 Rendering equation »Integral of radiance in all direction »Reflectance distribution function »Light coming inward to surface position »Visibility of light to surface position »Attenuation of inward light due to incident angle with surface normal

10 Compromise and cheats »This is computed per surface element »This is infeasibly expensive »In the past, we made quality compromises throughout to make run time rendering possible

11 First generation »1 to 4 dynamic lights »Simple point lights »Lambertian »Blinn-Phong approximation »Pre-computed diffuse radiosity »Shadow map optional

12 Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory

13 Hardware »117 million triangles per second »0.933 gigapixels per second »1.86 giga texels per second »6.4 gigabytes of bandwidth per second »64 megabytes of video memory

14 Second generation »500 million triangles per second »4 gigapixels per second »8 giga texels per second »256 gigabytes of bandwidth per second »512 megabytes of video memory

15 Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory

16 Second generation »4.27x triangle throughput »4.29x pixel fill rate »4.29x texel rate »40x bandwidth »8x video memory

17 Second generation »Large number of lights of precomputed radiance transfer »Environment and area lights »Realistic reflectance models »Cook Torrance, Ward »Shadow map

18 Large lights integral »Large number of lights integral »Static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

19 Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory

20 Multi core generation »70x triangle throughput »450x pixel fill rate »390x texel rate »110x bandwidth »16x video memory

21 Amdahl’s law

22 Multi core insight »Fill rate is achieved by completely asynchronous out of order VPU (Vector Processing Unit) computation »My experience with CUDA is that there are intentionally no synchronization primitives

23 Multi core insight »On Larrabee, each core has 4 hardware threads »Each thread is out of order »But for one thread’s execution, the vertices and pixels are synchronized

24 Multi core insight »So there are essentially 256 out of order processes »Each consisting of a batch of about 16 synchronized pixels or vertices in flight at any one time

25 Multi core insight »Expectation is shader flops will grow the most »Speed not from higher clock rate »Speed from larger number of low power cores »Memory is not exepcted to catch up to shader flops

26 Multi core insight »ALU's or VPU's to increase by 300x »Future is tfetch bound, not ALU bound »Homogeneous computing »Keep ALU's or VPU's very busy with cache coherent local data

27 Multi core generation »Occlusion from static geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

28 Multi core generation »Occlusion from dynamic geometry »Precomputed visibility »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

29 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially non-varying BRDF's »Low-frequency illumination »Image-space resolution limited

30 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »Low-frequency illumination »Image-space resolution limited

31 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »Image-space resolution limited

32 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution

33 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

34 Practical techniques »Directional light map basis »Zonal harmonics »Screen space ambient occlusion »Shadow map

35 Directional light map »Proposed by Valve's G McTaggert for Half Life »Used in many game games like Half Life and Unreal

36 Directional light map »Spatial axial basis »(- 1 / sqrt(6), - 1 / sqrt(2), 1 / sqrt(3) ) »( - 1 / sqrt(6), 1 / sqrt(2), 1 / sqrt(3) ) »( sqrt(2 / 3), 0, 1 / sqrt(3) )

37 Analysis »Static radiance can interact with directional changes of reflectance surface »Per pixel normal reflectance of radiosity »Per pixel normal specularity

38 Analysis »Basis and precision are not uniformly distributed »Radiance is correct at exactly 3 clamped directions »Radiance undersampling occurs for wide ranges of directions »Only for hemisphere

39 Pre-computed radiance transfer »Zonal harmonics »R Ramamoorthi and P Hanrahan came up with an efficient representation for irradiance environment

40 Irradiance environment map »Only 1 st 2 orders of zonal harmonics »Only use 9 terms »Average errors only 1% against raytracing »Much less error prone than directional light maps

41 Analysis »Completely feasible in current hardware »Better than directional light maps

42 Analysis »Completely feasible in current hardware »Better than directional light maps »Only the lowest of frequencies »Incapable of representing dynamic local lights

43 Screen space ambient occlusion »Developed by V Kajalin »Used first in Crysis »Used by game games like Crysis and Unreal »Sample depth difference between screen space neighbors as occlusion factor

44 Optimization »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise

45 Analysis »Too many samples in reality »In practice read small number of samples from a randomly rotated kernel »Results are filtered to reduce noise »Low number samples lead to low impact visual effect

46 Shadow map »Xbox 360 has several hardware bilinear weight fetch instructions »Performance boosters »Use it for hardware accelerated percentage closer filtering »getWeights1D, getWeights2D, getWeights3D, getWeightsCube

47 Shadow map »Poisson filter with rotating kernel is shipped in many games, including Fable 2, Brothers in Arms, and so on

48 Poisson distribution

49 Poisson filter »Generate random numbers with this distribution »Rotate them »Offset source sample by the jitters »Render weighted accumulation

50 Analysis »Shadow map itself has no soft edge »Soft shadow map is created from jitters and filters »Shadow map is an image based technique of finite resolution

51 Analysis »Still a fast technique for high frequency local lighting »10000 spherical harmonics term will not give you the occlusion shadow map will give you »Still useful for a very long time

52 Multi core generation »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

53 Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization

54 Dynamic radiance linear discriminant analysis BRDF factorization wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches rasterization factorized radiance caches factorized BRDF dynamic radiance

55 Radiance caches wavelet caches distance cube (or hemi cube) radiance factorization wavelet radiance caches

56 Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

57 Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

58 Haar wavelet basis »Spherical harmonics is not the only basis available for radiance transfer »Radiance and sum of area lights can also be represented by Haar wavelets

59 Haar wavelet radiance »What is exciting about Haar wavelet is that its radiance visibility triple integral is fast enough to run on GPU in real time

60 Haar wavelet

61 2D Haar wavelet and visibility »The visibility function V(x, theta) is also a binary function »Multiplying visibility to wavelet radiance is spatially and physically turning parts of the wavelet equation on and off

62 Wavelet and integrals »The integral of the product of wavelet radiance and visibility also simplifies the run-time equation

63 Wavelet visibility insights »In some ways, spherical harmonics is the frequency corrected distribution of the basis in directional light map »Zonal harmonics correctly samples and stores radiance contribution without a preference to a direction

64 Wavelet visibility insights »“Simulating soft shadows with graphics hardware” Heckbert, Herf, 1997 »Heckbert rendered soft shadows by rendering shadows from 100 lights to create shadow penumbra

65 Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance

66 Analysis »No BRDF and inter-reflection »No radiance transfer »No specular reflectance »It was GPU accelerated for its time!

67 Multi core rendering »What is the modern multi core shader / homogenous function pipeline version of this technique?

68 Multi core rendering »Not just shadows, the full radiance illumination model

69 Multi core rendering »Not just shadows, the full radiance illumination model »Not one light per pass, sample sparse wavelet data efficiently in tfetchCube

70 Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

71 Dynamic radiance »For dynamic geometry, convolution of the visibility changes with the radiance wavelet coefficients must be performed before the radiance is applied »Still challenging to perform at run time

72 Ray tracing or radiosity »Capture only occlusion »Capture the full transport and full reflectance distribution »GPU occlusion through rasterization »GPU kd-tree line trace

73 Capture only occlusion »Feasible with current hardware »Fast »GPU side, hardware occlusion »CPU side, line trace into kd-tree »Visually unsophisticated

74 Capture full reflectance transport »Visually much more complex than GPU occlusion »More expensive »Fill out wavelet probes on different threads across multiple frames »Unfinished wavelet probes still useful for radiance

75 Radiosity »The hemi-cube: a radiosity solution for complex environments. Cohen and Greenberg 1985 »Use GPU to rasterize radiance

76 Radiosity »Great for low frequency spherical harmonics »First pass has direct lighting only »For high frequency wavelets, needs excessively high resolution »No caustics, subsurface scattering

77 Radiosity »Low resolution first pass with GPU hemi-cube »Higher frequency passes with direction cube kd-tree line tracing

78 Raytracing »Direction cube techniques and ray tracer caches can take up too much memory »Reyes ray tracing may be more parallelizable, but be careful of bucket load balancing

79 Bounding volume hierarchy »Kd-tree can be 15x faster than BSP for ray tracing »SAH (surface area heuristic) only necessary in deeper nodes »For nodes close to root, divide by number of objects in boxes are good enough

80 Wavelet radiance analysis »It takes about 18 to 20 terms to represent all frequencies well »This is twice the number of terms for SH irradiance maps (9 terms)

81 Wavelet radiance analysis »Memory is much less because the probes are not pre-computed across the level »Fetching the terms to synthesize the radiance is twice or more the pixel ALU cost

82 Wavelet radiance analysis »18 wavelet terms, on the other hand, capture high frequency quality not captured by 10000 term spherical harmonics »Not exactly a 1:1 trade-off for high frequency or all frequency solution

83 Wavelet radiance caches »Haar wavelet basis »Visibility »Radiance factorization

84 Radiance factorization »Radiance factorization is important to dynamic radiance transfer »Decompose radiance transfer

85 Radiance factorization »Spatial contribution

86 Radiance factorization »Spatial contribution »Angular contribution

87 Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution

88 Radiance factorization »Spatial contribution »Angular contribution »Temporal contribution »Visibility contribution

89 Dimensionality reduction »Exponential growth with dimensionality and contribution factors »Dimensionality reduction to factorize the radiance triple integral

90 Dimensionality reduction »In reality, there top factors impact output more than less relevant factors

91 Dimensionality reduction »Principal components analysis »Linear discriminant analysis

92 Principal components »Principal »Orthogonal linear combinations with the largest variance »Secondary »Linear combination with the second largest variance and orthogonal to principal

93 Principal components »Use principal components to select important factors in the original radiance equation »Keep separating until factors are separated into components »Equation factored out into dynamic factors

94 Principal components »We can see how factoring principal components can factor out the primary impact of dynamic variables in the radiance equation

95 Principal component »PCA remaps an apparently complex function into feature or factor separable distribution

96 Principal components

97 Dimensionality reduction »PCA works best with purely orthogonal data »Unfortunately, radiance transfer is not very orthogonal at all »For better results, a dimensionality reduction algorithm should find separation even when there is none

98 Linear discriminant analysis »Works best for Gaussian distribution clusters »Finds separation even when there is (almost) none »LDA has potential to out-perform PCA in factorization of the rendering triple integral

99 Linear discriminant analysis »Same idea as PCA »Maximize separation by classification »Minimize variance within the classification after projection »Principal, secondary, …

100 D* for rendering? »B Guenter at MSR »Developed a compiler and declarative meta language D* »Creates optimized source code »Solve for dynamics of an equivalent system and no constraints

101 D* for rendering? »With fewer degrees of freedom »Uses analytic / symbolic approaches based on Lagrangian dynamics »Coordinate reduction and projection

102 D* for rendering? »Derive optional equations to solve for forward dynamics of the system »Necessary derivatives to linearlize the system’s equations of motion at any given configuration

103 D* for analytical models »Is there potential for D* to reduce dimension symbolically for the render equation?

104 Factorization technology »LDA and D* can be applied to factorize the triple integral »Factorization is essential to dynamic radiance

105 Dynamic radiance »Haar wavelet radiance caches »Radiance transfer factorization »Dimensionality reduction »Linear discriminant analysis »BRDF factorization

106 Dynamic radiance linear discriminant analysis BRDF factorization

107 Dynamic scenes »Before light reaches the eye, light undergoes a huge number of physical interactions with many objects »When these objects deform, animate, move, change, gets destroyed, reflectance distribution should update accordingly

108 Dynamic radiance »Factored dynamic radiance requires BRDF cooperation »Factored spatial radiance transfer, factored specular radiance transfer, needs to be evaluated with only the BRDF lobes that are affected

109 BRDF factorization »Efficiency and compression »Specular lobes require higher order basis for fidelity »Factorization keep the basis cost down

110 BRDFs »Cook Torrance »Oren Nayar »Ward »Linear combination of measured BRDFs

111 BRDF factorization prior work »BRDF factorization »“Interactive relighting with dynamic BRDFs” MSRA: Sun Zhou Chen Lin Shi Guo 2007 »They used PCA, not LDA. »I learned good BRDF factorization practices from this paper.

112 Factorization »The challenge of dynamic scene is that given a static world, the radiance inter-reflectance is determined by the configuration of the objects »We need factorization that takes deformation into account

113 Haar and factorization »Another reason I became interested in Haar wavelet representation of radiance is that it adapts very well with factorized tensors generated by LDA

114 Summary »Occlusion from dynamic geometry »Dynamic visibility computation »Spatially varying BRDF's »High-frequency illumination »High quality resolution »Remove remaining compromises

115 Long tail Xbox 360

116 »Use LDA at build time to reduce dimensionality »Combine classifications »Reduce number of run time variables to principal components »Speed optimization

117 Future work

118 »Spherical wavelet instead of 2D haar wavelet?

119 Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA?

120 Future work »Spherical wavelet instead of 2D haar wavelet? »Nonlinear and kernel dimensionality reduction instead of LDA? »Dimensionality reduction on a symbolic level?

121 Summary »Rally effort to develop symbolic kernels for dynamic radiance transfer »Rally effort to factorize the rendering equation triple integral with mathematic techniques or human manual optimization

122 Thank you »Corrinne.Yu@microsoft.com »Continue our discussion and future work to implement dynamic radiance at corrinnesdotplan.blogspot.com »Please fill in the survey.


Download ppt "Zen of multi core rendering »Corrinne Yu »Halo team Principal engine programmer"

Similar presentations


Ads by Google