Download presentation

Presentation is loading. Please wait.

1
**Efficient High-Level Shader Development**

Natalya Tatarchuk 3D Application Research Group ATI Technologies, Inc. August 2003

2
**Overview Writing optimal HLSL code HLSL Shader Examples**

Compiling issues Optimization strategies Code structure pointers HLSL Shader Examples Multi-layer car paint effect Translucent Iridescent Shader Überlight Shader August 2003

3
**Why use HLSL? Faster, easier effect development**

Instant readability of your shader code Better code re-use and maintainability Optimization Added benefit of HLSL compiler optimizations Still helps to know what’s under the hood Industry standard which will run on cards from any vendor Current and future industry direction Increase your ability to iterate on a given shader design, resulting in better looking games Conveniently manage shader permutations August 2003

4
Compile Targets Legal HLSL is still independent of compile target chosen But having an HLSL shader doesn’t mean it will always run on any hardware! Currently supported compile targets: vs_1_1, vs_2_0, vs_2_sw ps_1_1, ps_1_2, ps_1_3, ps_1_4, ps_2_0, ps_2_sw Compilation is vendor-independent and is done by a D3DX component that Microsoft can update independent of the runtime release schedule August 2003

5
**Compilation Failure The obvious: program errors (bad syntax, etc)**

Compile target specific reasons – your shader is too complex for the selected target Not enough resources in the selected target Uses too many registers (temporaries, for example) Too many resulting asm instructions for the compile target Lack of capability in the target Such as trying to sample a texture in vs_1_1 Using dynamic branching when unsupported in the target Sampling texture too many times for the target (Example: more than 6 for ps_1_4) Compiler provides useful messages August 2003

6
**Use Disassembly for Hints**

Very helpful for understanding relationship between compile targets and code generation Disassembly output provides valuable hints when “compiling down” to an older compile target If successfully compiled for a more recent target (eg. ps_2_0), look at the disassembly output for hints when failing to compile to an older target (eg. ps_1_4) Check out instruction count for ALU and tex ops Figure out how HLSL instructions get mapped to assembly Although the HLSL compiler will display the reasons for compilation failure to you, you can also take a look at the disassembled code and examine the resulting assembly to get better understanding of why your compilation failed when you are pushing the limits of a particular compile target. August 2003

7
**Getting Disassembly Output for Your Shaders**

Directly use FXC Compile for any target desired Compile both individual shader files and full effects Various input arguments Allow to turn shader optimizations on / off Specify different entry points Enable / disable generating debug information August 2003

8
**Easier Path to Disassembly**

Use RenderMonkey while developing shaders See your changes in real-time Disassembly output is updated every time a shader is compiled Displays count for ALU and texture ops, as well as the limits for the selected target Can save resulting assembly code into text file Instead of going through the hoops of compiling your shaders from HLSL to binary asm through FXC, RenderMonkey integrates that functionality for the convenience of the shader developers. You also have an option to save out the resulting assembly code into a corresponding vsh and psh file if you wish to ship the asm code rather than your HLSL shader (some developers find that they would like to keep their HLSL shaders hidden away). August 2003

9
**Optimizing HLSL Shaders**

Don’t forget you are running on a vector processor Do your computations at the most efficient frequency Don’t do something per-pixel that you can do per-vertex Don’t perform computation in a shader that you can precompute in the app Use HLSL intrinsic functions Helps hardware to optimize your shaders Know your intrinsics and how they map to asm, especially asm modifiers Important objective for high performance shaders: If you are hitting the limits of your pixel shader or just plainly want to improve the speed, if you can get away with doing a computation per-vertex rather than per-pixel, then do so. These types of operations are where the biggest wins often come from. Here is where I could should an example disassembling a shader with pow using 8 and pow using a generic parameter – one should disassemble, the other won’t. I can also show an example how HLSL translates normalize() intrinsic using rsq August 2003

10
**HLSL Syntax Not Limited**

The HLSL code you write is not limited by the compile target you choose You can always use loops, subroutines, if-else statements etc If not natively supported in the selected compile target, the compiler will still try to generate code: Loops will be unrolled Subroutines will be inlined If – else statements will execute both branches, selecting appropriate output as the result Code generation is dependent upon compile target Use appropriate data types to improve instruction count Store your data in a vector when needed However, using appropriate data types helps compiler do better job at optimizing your code The choice of compile target doesn’t mean that you cannot use certain language constructs in your shaders. HLSL compiler always tries to find a way to compile all possible constructs into the desired compile target. Of course this may not be possible in some cases directly, but the compiler will try to find alternate approaches for generating resulting assembly code. For example, if a shader writer uses “for” loops, subroutines, “if-else” statements for compile targets that do not natively support those, the compiler will unroll the loops, subroutine calls will be inlined, August 2003

11
**Using If Statement in HLSL**

Can have large performance implications Lack of branching support in most asm models Both sides of an ‘if’ statement will be executed The output is chosen based on which side of the ‘if’ would have been taken Optimization is different than in the CPU programming world August 2003

12
**Example of Using If in Vs_1_1**

If ( Threshold > 0.0 ) Out.Position = Value1; else Out.Position = Value2; generates following assembly output: // calculate lerp value based on Value > 0 mov r1.w, c2.x slt r0.w, c3.x, r1.w // lerp between Value1 and Value2 mov r7, -c1 add r2, r7, c0 mad oPos, r0.w, r2, c1 August 2003

13
**Example of Function Inlining**

// Bias and double a value to take it from 0..1 range to range float4 bx2(float x) { return 2.0f * x - 1.0f; } float4 main( float4 tc0 : TEXCOORD0, float4 tc1 : TEXCOORD1, float4 tc2 : TEXCOORD2, float4 tc3 : TEXCOORD3) : COLOR // Sample noise map three times with different // texture coordinates float4 noise0 = tex2D(fire_distortion, tc1); float4 noise1 = tex2D(fire_distortion, tc2); float4 noise2 = tex2D(fire_distortion, tc3); // Weighted sum of signed noise float4 noiseSum = bx2(noise0) * distortion_amount0 + bx2(noise1) * distortion_amount1 + bx2(noise2) * distortion_amount2; // Perturb base coordinates in direction of noiseSum as function of height (y) float4 perturbedBaseCoords = tc0 + noiseSum * (tc0.y * height_attenuation.x + height_attenuation.y); // Sample base and opacity maps with perturbed coordinates float4 base = tex2D(fire_base, perturbedBaseCoords); float4 opacity = tex2D(fire_opacity, perturbedBaseCoords); return base * opacity; August 2003

14
**Code Permutations Via Compilation**

static const bool bAnimate = false; VS_OUTPUT vs_main( float4 Pos: POSITION, float2 Tex: TEXCOORD0 ) { VS_OUTPUT Out = (VS_OUTPUT) 0; Out.Pos = mul( view_proj_matrix, Pos ); if ( bAnimate ) Out.Tex.x = Tex.x + time / 2; Out.Tex.y = Tex.y - time / 2; } else Out.Tex = Tex; return Out; static const bool bAnimate = false; vs_1_1 dcl_position v0 dcl_texcoord v1 mul r0, v0.y, c1 mad r0, c0, v0.x, r0 mad r0, c2, v0.z, r0 mad oPos, c3, v0.w, r0 mov oT0.xy, v1 5 instructions bool bAnimate = false; VS_OUTPUT vs_main( float4 Pos: POSITION, float2 Tex: TEXCOORD0 ) { VS_OUTPUT Out = (VS_OUTPUT) 0; Out.Pos = mul( view_proj_matrix, Pos ); if ( bAnimate ) Out.Tex.x = Tex.x + time / 2; Out.Tex.y = Tex.y - time / 2; } else Out.Tex = Tex; return Out; vs_1_1 def c6, 0.5, 0, 0, 0 dcl_position v0 dcl_texcoord v1 mul r0, v0.y, c1 mad r0, c0, v0.x, r0 mov r1.w, c4.x mul r1.x, r1.w, c6.x mad r0, c2, v0.z, r0 mov r1.y, -r1.x mad oPos, c3, v0.w, r0 mad oT0.xy, c5.x, r1, v1 const bool bAnimate = false; 8 instructions August 2003

15
**Scalar and Vector Data Types**

Scalar data types are not all natively supported in hardware i.e. integers are emulated on float hardware Not all targets have native half and none currently have double Can apply swizzles to vector types float2 vec = pos.xy But! Not all targets have fully flexible swizzles Acquaint yourself with the swizzles native to the relevant compile targets (particularly ps_2_0 and lower) An important point to note is that the ps_2_0 and lower pixel shader models do not have native support for arbitrary swizzles. Hence, concise high level code which uses swizzles can result in fairly nasty binary asm when compiling to these targets. You should familiarize yourself with the native swizzles available in these assembly models. August 2003

16
**Integer Data Type Added to make relative addressing more efficient**

Using floats for addressing purposes without defined truncation rules can result in incorrect access to arrays. All inputs used as ints should be defined as ints in your shader It is very easy to generate extra instructions by using the int datatype in places that it should not be used. The int datatype was added to HLSL to make relative addressing familiar as well as efficient. The problem with using float datatypes for addressing purposes without truncation rules is that incorrect access to arrays can occur. In order to avoid unwanted rounding or truncation errors during addressing, the int datatype was added August 2003

17
**Example of Integer Data Type Usage**

Matrix palette indices for skinning Declaring variable as an int is a ‘free’ operation => no truncation occurs Using a float and casting it to an int or using directly => truncation will happen Out.Position = mul( inPos, World[Index]); // Index declared as float frc r0.w, r1.w add r2.w, -r0.w, r1.w mul r9.w, r2.w, c61.x mova a0.x, r9.w m4x4 oPos, v0, c0[a0.x] // Index declared as int mul r0.w, c60.x, r1.w mova a0.x, r0.w Code generated with float index vs integer index August 2003

18
**Real-World Shader Examples**

Will present several case studies of developing shaders used in ATI’s demos Multi-tone car paint effect Translucent iridescent effect Classic überlight example Examples are presented as RenderMonkeyTM workspaces Distributed publicly with version 1.0 release RenderMonkey allows you to concentrate on writing shaders without getting bogged down in app code August 2003

19
Multi-Tone Car Paint August 2003

20
**Multi-Tone Car Paint Effect**

Multi-tone base color layer Microflake layer simulation Clear gloss coat Dynamically Blurred Reflections The application of paint to a car’s body can be a complicated process. Expensive auto body paint is usually applied in layered stages and often includes dye layers, clear coat layers, and metallic flakes suspended in enamel. The result of these successive paint layers is a surface that exhibits complex light interactions, giving the car a smooth, glossy and sparkly finish. We started working on this demo at the time where the HLSL wasn’t even available yet, so we developed our shaders using assembly. The shaders have been designed from the very start to push the limits of performance and they were fast. Later we decided that we want to re-write the shaders using HLSL and this is how we approached it. and it’s has been designed to be fast – and it was originally written in assembly. August 2003

21
**Car Paint Layers Build Up**

Multi-Tone Base Color Microflake Layer Clear gloss coat Final Color Composite August 2003

22
**Multi-Tone Base Paint Layer**

View-dependent lerping between three paint colors Normal from appearance preserving simplification process, N Uses subtractive tone to control overall color accumulation The car model shown here uses a relatively low number of polygons but employs a high precision normal map generated by an appearance preserving simplification algorithm. August 2003

23
**Normal Decompression Sample from two-channel 16-16 normal map**

Derive z from +sqrt (1 – x2 – y2) Gives higher precision than typically used normal map Due to the pixel shader operations performed across the smoothly changing surfaces (such as the hood of the car), a 16-bit per channel normal map is necessary. Since the normals are stored in surface local coordinates (a.k.a. tangent space), we can assume that the z component of the normals will be positive. Thus, we can store x and y in two channels of a texture map and derive z in the pixel shader from +sqrt(1 – x2 – y2 ). August 2003

24
**Multi-Tone Base Coat Vertex Shader**

VS_OUTPUT main( float4 Pos : POSITION, float3 Normal : NORMAL, float2 Tex : TEXCOORD0, float3 Tangent : TANGENT, float3 Binormal: BINORMAL ) { VS_OUTPUT Out = (VS_OUTPUT) 0; // Propagate transformed position out: Out.Pos = mul( view_proj_matrix, Pos ); // Compute view vector: Out.View = normalize( mul(inv_view_matrix, float4( 0, 0, 0, 1)) - Pos ); // Propagate texture coordinates: Out.Tex = Tex; // Propagate tangent, binormal, and normal vectors to pixel shader: Out.Normal = Normal; Out.Tangent = Tangent; Out.Binormal = Binormal; return Out; } August 2003

25
**Multi-Tone Base Coat Pixel Shader**

float4 main( float4 Diff: COLOR0, float2 Tex: TEXCOORD0, float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2, float3 Normal: TEXCOORD3, float3 View: TEXCOORD4 ) : COLOR { float3 vNormal = tex2D( normalMap, Tex ); vNormal = 2 * vNormal - 1.0; float3 vView = normalize( View ); float3x3 mTangentToWorld = transpose( float3x3( Tangent, Binormal, Normal )); float3 vNormalWorld = normalize( mul(mTangentToWorld,vNormal)); float fNdotV = saturate( dot( vNormalWorld, vView ) ); float fNdotVSq = fNdotV * fNdotV; float4 paintColor = fNdotV * paintColor0 + fNdotVSq * paintColorMid + fNdotVSq * fNdotVSq * paintColor2; return float4( paintColor.rgb, 1.0 ); } Compute the result color by lerping three input tones using computed fresnel term. Fetch normal from a normal map and scale and bias it to move into [-1; 1] Normalize the view vector to ensure higher quality results Compute Nw • V using world-space normal vector August 2003

26
**Microflake Layer August 2003**

In this portion of the shader we simulate the appearance of metallic flakes suspended in enamel. August 2003

27
**Microflake Deposit Layer**

Simulating light interaction resulting from metallic flakes suspended in the enamel coat of the paint Uses high frequency normalized vector noise map (Nn) which is repeated across the surface of the car August 2003

28
**Computing Microflake Layer Normals**

Start out by using normal vector fetched from the normal map, N Using the high frequency noise map, compute perturbed normal Np Simulate two layers of microflake deposits by computing perturbed normals Np1 and Np2 where c = b where a << b August 2003

29
**Microflake Layer Vertex Shader**

VS_OUTPUT main(float4 Pos: POSITION, float3 Normal: NORMAL, float2 Tex: TEXCOORD0, float3 Tangent: TANGENT, float3 Binormal: BINORMAL ) { VS_OUTPUT Out = (VS_OUTPUT) 0; // Propagate transformed position out: Out.Pos = mul( view_proj_matrix, Pos ); // Compute view vector: Out.View = normalize(mul(inv_view_matrix, float4(0, 0, 0, 1))- Pos); // Propagate texture coordinates: Out.Tex = Tex; // Propagate tangent, binormal, and normal vectors to pixel // shader: Out.Normal = Normal; Out.Tangent = Tangent; Out.Binormal = Binormal; // Compute microflake tiling factor: Out.SparkleTex = float4( Tex * fFlakeTilingFactor, 0, 1 ); return Out; } Compute texture coordinates for accessing noise map using input texture coordinates and a tiling factor Possibly get rid of this slide – I don’t remember why we were changing the texture coords here August 2003

30
**Microflake Layer Pixel Shader**

float4 main(float4 Diff: COLOR0, float2 Tex : TEXCOORD0, float3 Tangent: TEXCOORD1, float3 Binormal: TEXCOORD2, float3 Normal: TEXCOORD3, float3 View: TEXCOORD4, float3 SparkleTex : TEXCOORD5 ) : COLOR { … fetch and signed scale the normal fetched from the normal map float3 vFlakesNormal = 2 * tex2D( microflakeNMap, SparkleTex ) - 1; float3 vNp1 = microflakePerturbationA * vFlakesNormal normalPerturbation * vNormal ; float3 vNp2 = microflakePerturbation * ( vFlakesNormal + vNormal ) ; float3 vView = normalize( View ); float3x3 mTangentToWorld = transpose( float3x3( Tangent, Binormal, Normal )); float3 vNp1World = normalize( mul( mTangentToWorld, vNp1) ); float fFresnel1 = saturate( dot( vNp1World, vView )); float3 vNp2World = normalize( mul( mTangentToWorld, vNp2 )); float fFresnel2 = saturate( dot( vNp2World, vView )); float fFresnel1Sq = fFresnel1 * fFresnel1; float4 paintColor = fFresnel1 * flakeColor + fFresnel1Sq * flakeColor fFresnel1Sq * fFresnel1Sq * flakeColor pow( fFresnel2, 16 ) * flakeColor; return float4( paintColor, 1.0 ); } Compute dot products of the normalized view vector with the two microflaker layer normals Fetch initial perturbed normal vector from the noise map Compute normal vectors for both microflake layers Compose the microflake layer color Microflakes normal map is a high frequency normalized vector noise map which is repeated across all surface. Fetching the value from it for each pixel allows us to compute perturbed normal for the surface to simulate appearance of microflakes suspected in the coat of paint This shader simulates two layers of microflakes suspended in the coat of paint. To compute the surface normal for the first layer, the following formula is used: August 2003

31
Clear Gloss Coat August 2003

32
**RGBScale HDR Environment Map**

Alpha channel contains 1/16 of the true HDR scale of the pixel value RGB contains normalized color of the pixel Pixel shader reconstructs HDR value from scale*8*color to get half of the true HDR value Obvious quantization issues, but reasonable for some applications Similar to Ward’s RGBE “Real Pixels” but simpler to reconstruct in the pixel shader One interesting aspect of the clear coat term is the decision to store the environment map in an RGBScale form to simulate high dynamic range in a low memory footprint. The alpha channel of the texture, shown on the right in figure 4, represents 1/16th of the true range of the data while the RGB, shown on the left, represents the normalized color. In the pixel shader, the alpha channel and RGB channels are multiplied together and multiplied by eight to reconstruct a cheap form of HDR reflectance. This is multiplied by a subtle Fresnel term before being added to the lighting terms described above. August 2003

33
**Top Face Scale in Alpha Channel**

Environment Map Ceiling of car showroom Top Cube Map Face RGB Top Face Scale in Alpha Channel August 2003

34
**Dynamically Blurred Reflections**

August 2003

35
**Dynamic Blurring of Environment Map Reflections**

A gloss map can be supplied to specify the regions where reflections can be blurred Use bias when sampling the environment map to vary blurriness of the resulting reflections Use texCUBEbias for to access the cubic environment map For rough specular, the bias is high, causing a blurring effect Can also convert color fetched from environment map to luminance in rough trim areas August 2003

36
**Clear Gloss Coat Pixel Shader**

Premultiply by alpha channel of the environment map to avoid clamping highlights and brighten the reflections float4 ps_main( ... /* same inputs as in the previous shader */ ) { // ... use normal in world space (see Multi-tone pixel shader) // Compute reflection vector: float fFresnel = saturate(dot( vNormalWorld, vView)); float3 vReflection = 2 * vNormalWorld * fFresnel - vView; float fEnvBias = glossLevel; // Sample environment map using this reflection vector and bias: float4 envMap = texCUBEbias( showroomMap, float4( vReflection, fEnvBias ) ); // Premultiply by alpha: envMap.rgb = envMap.rgb * envMap.a; // Brighten the environment map sampling result: envMap.rgb *= brightnessFactor; // Combine result of environment map reflection with the paint // color: float fEnvContribution = * fFresnel; return float4( envMap.rgb * fEnvContribution, 1.0 ); } Compute the reflection vector to fetch from the environment map Shader parameter is used to dynamically blur the reflections by biasing the texture fetch from the environment map Resulting reflective highlights // Here we just use a constant gloss value to bias reading from the environment // map, however, in the real demo we use a gloss map which specifies which // regions will have reflection slightly blurred. August 2003

37
**Compositing Multi-Tone Base Layer and Microflake Layer**

Base color and flake effect are derived from Np1 and Np2 using the following polynomial: color0(Np1·V) + color1(Np1·V)2 + color2(Np1·V)4 + color3(Np2·V)16 Base Color Flake August 2003

38
**Compositing Final Look**

{ ... // Compute final paint color: combines all layers of paint as well // as two layers of microflakes: float fFresnel1Sq = fFresnel1 * fFresnel1; float4 paintColor = fFresnel1 * paintColor fFresnel1Sq * paintColorMid fFresnel1Sq * fFresnel1Sq * paintColor pow( fFresnel2, 16 ) * flakeLayerColor; // Combine result of environment map reflection with the paint // color: float fEnvContribution = * fNdotV; // Assemble the final look: float4 finalColor; finalColor.a = 1.0; finalColor.rgb = envMap * fEnvContribution + paintColor; return finalColor; } August 2003

39
**Original Hand-Tuned Assembly**

ps.2.0 def c0, 0.0, 0.5, 1.0, 2.0 def c1, 0.0, 0.0, 1.0, 0.0 dcl_2d s0 dcl_2d s1 dcl_cube s2 dcl_2d s3 dcl t0 dcl t1 dcl t2 dcl t3 dcl t4 dcl t5 texld r0, t0, s1 texld r8, t5, s3 mad r3, r8, c0.w, -c0.z mad r6, r3, c4.r, r0 mad r7, r3, c4.g, r0 dp3 r4.a, t4, t4 rsq r4.a, r4.a mul r4, t4, r4.a mul r2.rgb, r0.x, t1 mad r2.rgb, r0.y, t2, r2 mad r2.rgb, r0.z, t3, r2 dp3 r2.a, r2, r2 rsq r2.a, r2.a mul r2.rgb, r2, r2.a dp3_sat r2.a, r2, r4 mul r3, r2, c0.w . . . mad r1.rgb, r2.a, r3, -r4 mov r1.a, c10.a texldb r0, r1, s2 mul r10.rgb, r6.x, t1 mad r10.rgb, r6.y, t2, r10 mad r10.rgb, r6.z, t3, r10 dp3 r10.a, r10, r10 rsq r10.a, r10.a mul r10.rgb, r10, r10.a dp3_sat r6.a, r10, r4 mul r10.rgb, r7.x, t1 mad r10.rgb, r7.y, t2, r2 mad r10.rgb, r7.z, t3, r2 dp3_sat r7.a, r10, r4 mul r0.rgb, r0, r0.a mul r0.rgb, r0, c2.r mov r4.a, r6.a mul r4.rgb, r4.a, c5 mul r4.a, r4.a, r4.a mad r4.rgb, r4.a, c6, r4 mad r4.rgb, r4.a, c7, r4 pow r4.a, r7.a, c4.b mad r4.rgb, r4.a, c8, r4 mad r1.a, r2.a, c2.z, c2.w mad r6.rgb, r0, r1.a, r4 mov oC0, r6 40 ALU ops 3 Tex Fetches 43 Total August 2003

40
**Car Paint Shader HLSL Compiler Disassembly Output**

ps_2_0 def c9, 0.5, 1, 0, 0 def c10, 2, -1, 16, 1 dcl t0.xy dcl t1.xyz dcl t2.xyz dcl t3.xyz dcl t4.xyz dcl t5.xy dcl_2d s0 dcl_2d s1 dcl_cube s2 texld r0, t0, s1 mad r5.xyz, c10.x, r0, c10.y mul r0.xyz, r5.y, t2 dp3 r1.x, t4, t4 mad r0.xyz, t1, r5.x, r0 rsq r0.w, r1.x mad r1.xyz, t3, r5.z, r0 mul r3.xyz, r0.w, t4 nrm r0.xyz, r1 dp3_sat r6.x, r0, r3 mul r0.xyz, r0, r6.x add r0.xyz, r0, r0 mad r0.xyz, t4, -r0.w, r0 mov r0.w, c8.x texld r1, t5, s0 texldb r0, r0, s2 mad r2.xyz, c10.x, r1, c10.y mul r1.xyz, r5, c2.x mad r1.xyz, c3.x, r2, r1 mul r4.xyz, r1.y, t2 mad r4.xyz, t1, r1.x, r4 add r2.xyz, r5, r2 mad r4.xyz, t3, r1.z, r4 nrm r1.xyz, r4 mul r2.xyz, r2, c7.x dp3_sat r5.x, r1, r3 mul r1.xyz, r2.y, t2 mul r1.w, r5.x, r5.x mad r4.xyz, t1, r2.x, r1 mul r1.xyz, r1.w, c6 mad r4.xyz, t3, r2.z, r4 mul r1.w, r1.w, r1.w nrm r2.xyz, r4 mad r1.xyz, r5.x, c4, r1 dp3_sat r2.x, r2, r3 mad r1.xyz, r1.w, c5, r1 pow r1.w, r2.x, c10.z mad r1.xyz, r1.w, c1, r1 mul r0.xyz, r0.w, r0 mad r0.w, r6.x, -c9.x, c9.y mul r0.xyz, r0, c0.x mad r0.xyz, r0, r0.w, r1 mov r0.w, c10.w mov oC0, r0 38 ALU ops 3 Tex Fetches 41 Total ! August 2003

41
**Full Result of Multi-Layer Paint**

August 2003

42
**Translucent Iridescent Shader: Butterfly Wings**

PERHAPS A BETTER SCREEN SHOT WITH THE BUTTERFLY BODY INCLUDED August 2003

43
**Translucent Iridescent Shader: Butterfly Wings**

Simulates translucency of delicate butterfly wings Wings glow from scattered reflected light Similar to the effect of softly backlit rice paper Displays subtle iridescent lighting Similar to rainbow pattern on the surface of soap bubbles Caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness Combines gloss, opacity and normal maps for a multi-layered final look Gloss map contributes to satiny highlights Opacity map allows portions of wings to be transparent Normal map is used to give wings a bump-mapped look Translucency is defined as a material that allows light to pass through yet it isn’t transparent. It receives light and can be luminous only from an outside source. If you hold a sheet of paper in front of a light source, you can see that the light makes it glow, yet you cannot see the light source through the paper because the paper scatters the light. Iridescence , which can be detected as a rainbow pattern on the surface of soap bubbles and gasoline spills, is the effect caused by the interference of light waves resulting from multiple reflections of light off of surfaces of varying thickness. Mother-of-pearl, a compact disc share this quality with the wings of some butterflies, for example, Morpho butterfly wings emit a brilliant blue color while other colors are obsorbed. August 2003

44
**RenderMonkey Butterfly Wings Shader Example**

Parameters that contribute to the translucency and iridescence look: Light position and scene ambient color Translucency coefficient Gloss scale and bias Scale and bias for speed of iridescence change Workspace: Iridescent Butterfly.rfx August 2003

45
**Translucent Iridescent Shader: Vertex Shader**

.. // Propagate input texture coordinates: Out.Tex = Tex; // Define tangent space matrix: float3x3 mTangentSpace; mTangentSpace[0] = Tangent; mTangentSpace[1] = Binormal; mTangentSpace[2] = Normal; // Compute the light vector (object space): float3 vLight = normalize( mul( inv_view_matrix, lightPos ) - Pos ); // Output light vector in tangent space: Out.Light = mul( mTangentSpace, vLight ); // Compute the view vector (object space): float3 vView = normalize( mul( inv_view_matrix, float4(0,0,0,1)) - Pos ); // Output view vector in tangent space: Out.View = mul( mTangentSpace, vView ); // Compute the half angle vector (in tangent space): Out.Half = mul( mTangentSpace, normalize( vView + vLight ) ); return Out; Compute Halfway vector H = V + L in tangent space Compute light vector in tangent space Define tangent space matrix Compute view vector in tangent space August 2003

46
**Translucent Iridescent Shader: Loading Information**

Load normal from a normal map and gloss value from a gloss map (combined in one texture map) Load base texture color and alpha value from combined base and opacity texture map float3 vNormal, baseColor; float fGloss, fTranslucency; // Load normal and gloss map: float4( vNormal, fGloss ) = tex2D( bump_glossMap, Tex ); // Load base and opacity map: float4 (baseColor, fTranslucency) = tex2D( base_opacityMap, Tex ); August 2003

47
**Diffuse Illumination For Translucency**

Light scattered on the butterfly wings is computed based on the negative normal (for scattering off the surface), light vector and translucency coefficient and value for the given pixel. float3 scatteredIllumination = saturate(dot(-vNormal, Light)) * fTranslucency * translucencyCoeff; float3 diffuseContribution = saturate(dot(vNormal,Light)) + ambient; baseColor *= scatteredIllumination + diffuseContribution; Compute diffusely reflected light using the bump-mapped normal and ambient contribution Combine diffuse and scattered light with base texture *( + ) = August 2003

48
**Adding Opacity to Butterly Wings**

Resulted color is modulated by the opacity value to add transparency to the wings: // Premultiply alpha blend to avoid clamping the highlights: baseColor *= fOpacity; Normally when you want to blend something that’s transparent, you would just do it in your alpha blending stage. But if it’s specular, you don’t want before you apply the specular highlights. One way to do it properly would be to multipass – do one diffuse pass and one specular additive pass but this is an approach to do it in a single pass). * = August 2003

49
**Making Butterfly Wings Iridescent**

Scale and bias gradient map index to make iridescence change quicker across the wings Iridescence is a view-dependent effect Sample gradient map based on the computed index Resulting iridescence image: // Compute index into the iridescence gradient map, which // consists of N*V coefficient float fGradientIndex = dot( vNormal, View) * iridescence_speed_scale + iridescence_speed_bias; // Load the iridescence value from the gradient map: float4 iridescence = tex1D( gradientMap, fGradientIndex ); August 2003

50
**Assembling Final Color**

// Compute glossy highlights using values from gloss map: float fGlossValue = fGloss * ( saturate( dot( vNormal, Half )) * gloss_scale + gloss_bias ); // Assemble the final color for the wings baseColor += fGlossValue * iridescence; Compute gloss value based on the original gloss map input and < N, H> dot product Assemble final wings color August 2003

51
**HLSL Disassembly Comparison**

12 ALU 3 Texture 15 Total ps.2.0 def c0, 0, .5, 1, 2 def c1, 4, 0, 0, 0 ... texld r1, t0, s1 mad r1.xyz, r1, c0.w, -c0.z dp3_sat r4.y, r1, t2 dp3_sat r4.w, r1, -t2 texld r0, t0, s0 mul r4.w, r4.w, r0.a mad r5.w, r4.w, c1.x, r4.y add r5.rgb, r5.w, c3 mul r0.rgb, r0, r5 sub_sat r0.a, c0.z, r0.a dp3 r6.xy, r1, t1 dp3_sat r6.y, r1, t3 mad r6.y, r6.y, c4.x, c4.y mul r6.z, r6.y, r1.w mad r6.x, r6.x, c4.z, c4.w texld r2, r6, s2 mul r0.rgb, r0, r0.a mad r0.rgb, r6.z, r2, r0 mov oC0, r0 ps_2_0 def c6, 2, -1, 1, 0 texld r0, t0, s1 mad r2.xyz, c6.x, r0, c6.y dp3_sat r0.x, r2, t3 mov r1.w, c5.x mad r1.w, r0.x, r1.w, c3.x dp3 r0.x, r2, t1 mul r2.w, r0.w, r1.w mov r0.w, c2.x mad r0.xy, r0.x, r0.w, c0.x texld r1, r0, s2 texld r0, t0, s0 dp3_sat r4.x, r2, t2 dp3_sat r3.x, -r2, t2 add r2.xyz, r4.x, c4 mul r1.w, r0.w, r3.x mul r1.xyz, r2.w, r1 mad r2.xyz, r1.w, c1.x, r2 mul r0.xyz, r0, r2 add r0.w, -r0.w, c6.z mad r0.xyz, r0, r0.w, r1 Hand-Tuned Assembly Code HLSL Compiler-Generated Disassembly Code 15 ALU 3 Texture 18 Total August 2003

52
**Example of Translucent Iridescent Shader**

August 2003

53
**Optimization Study: Überlight**

Flexible light described in JGT article “Lighting Controls for Computer Cinematography” by Ronen Barzel of Pixar Überlight is procedural and has many controls: light type, intensity, light color, cuton, cutoff, near edge, far edge, falloff, falloff distance, max intensity, parallel rays, shearx, sheary, width, height, width edge, height edge, roundness and beam distribution Code here is based upon the public domain RenderMan® implementation by Larry Gritz JGT == Journal of Graphics Tools August 2003

54
**Überlight Spotlight Mode**

Spotlight mode defines a procedural volume with smooth boundaries Shape of spotlight is made up of two nested superellipses which are swept along direction of light Also has smooth cuton and cutoff planes Can tune parameters to get all sorts of looks August 2003

55
**Überlight Spotlight Volume**

Roundness = ½ Cuton and cutoff planes are left out for this diagram August 2003

56
**Überlight Spotlight Volume**

Outer swept superellipse Roundness = 1 b Inner swept superellipse a A Cuton and cutoff planes are left out for this diagram B August 2003

57
**Original clipSuperellipse() routine**

Computes attenuation as a function of a point’s position in the swept superellipse. Directly ported from original RenderMan source Compiles to 42 cycles in ps_2_0, 40 cycles on R3x0 float clipSuperellipse ( float3 Q, // Test point on the x-y plane float a, // Inner superellipse float b, float A, // Outer superellipse float B, float roundness) // Same roundness for both ellipses { float x = abs(Q.x), y = abs(Q.y); float re = 2/roundness; // roundness exponent float q = a * b * pow (pow(b*x, re) + pow(a*y, re), -1/re); float r = A * B * pow (pow(B*x, re) + pow(A*y, re), -1/re); return smoothstep (q, r, 1); } Computes ellipse roundness exponent for every point This is a key subroutine in the uberlight shader. It computes attenuation as a function of a point’s position in the swept superellipses. 1 inside inner ellipse. 0 outside outer ellipse. Smoothstep in between. Separate calculations of absolute value August 2003

58
**Vectorized Version Precompute functions of roundness in app**

Vectorize abs() and all of the multiplications Compiles to 33 cycles in ps_2_0, 28 cycles on R3x0 float clipSuperellipse ( float2 Q, // Test point on the x-y plane float4 aABb, // Dimensions of superellipses float2 r) // Two precomputed functions of roundness { float2 qr, Qabs = abs(Q); float2 bx_Bx = Qabs.x * aABb.wzyx; // Swizzle to unpack bB float2 ay_Ay = Qabs.y * aABb; qr.x = pow (pow(bx_Bx.x, r.x) + pow(ay_Ay.x, r.x), r.y); qr.y = pow (pow(bx_Bx.y, r.x) + pow(ay_Ay.y, r.x), r.y); qr *= aABb * aABb.wzyx; return smoothstep (qr.x, qr.y, 1); } Compute b * x and B * x in a single instruction and a * y and A * y in another instruction Vectorized computation of the absolute value Contains precomputed 2/roundness and –roundness / 2 parameters Final result computation that feeds into smoothstep() function The R3x0 cycles are less due to the ability to do coissue as well as some other secret sauce we aren’t telling about. August 2003

59
**smoothstep() function**

Standard function in procedural shading Intrinsics built into RenderMan and DirectX HLSL: 1 edge0 edge1 August 2003

60
**C implementation float smoothstep (float edge0, float edge1, float x)**

{ if (x < edge0) return 0; if (x >= edge1) return 1; // Scale/bias into [0..1] range x = (x - edge0) / (edge1 - edge0); return x * x * (3 - 2 * x); } August 2003

61
HLSL implementation The free saturate handles x outside of [edge0..edge1] range float smoothstep (float edge0, float edge1, float x) { // Scale, bias and saturate x to 0..1 range x = saturate((x - edge0) / (edge1 – edge0)); // Evaluate polynomial return x * x * (3 – 2 * x); } Know how to use saturate to do this kind of thresholding for you August 2003

62
**Vectorized HLSL Implementation**

Precompute 1/(edge1 – edge0) Done in the app for edge widths at cuton and cutoff planes Operation performed on float3s to compute three different smoothstep operations in parallel With these optimizations, the entire spotlight volume computation of überlight compiles to 47 cycles in ps_2_0, 41 cycles on R3x0 float3 smoothstep3 (float3 edge, float3 OneOverWidth, float3 x) { // Scale, bias and saturate x to [0..1] range x = saturate( (x - edge) * OneOverWidth ); // Evaluate polynomial return x * x * (3 – 2 * x); } This multiplication can be done as a vector operation while rcp is defined to be a scalar operation and hence would have broken the vector nature of this routine. OneOverWidth is computed outside of the shader for two of the three smoothsteps in uberlight, so this optimization is a win. August 2003

63
**Summary Writing optimal HLSL code Shader Examples Compiling issues**

Optimization strategies Code structure pointers Shader Examples Shipped with RenderMonkey version 1.0 see MultiTone Car Paint.rfx Iridescent Butterfly.rfx August 2003

Similar presentations

OK

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google