Designing a Modern GPU Interface

Slides:



Advertisements
Similar presentations
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advertisements

POST-PROCESSING SET09115 Intro Graphics Programming.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Solving Some Common Problems in a Modern Deferred Rendering Engine
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
CS 4363/6353 BASIC RENDERING. THE GRAPHICS PIPELINE OVERVIEW Vertex Processing Coordinate transformations Compute color for each vertex Clipping and Primitive.
Programming Types of Testing.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Overview [See Video file] Architecture Overview.
Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
NVTune Kenneth Hurley. NVIDIA CONFIDENTIAL NVTune Overview What issues are we trying to solve? Games and applications need to have high frame rates Answer.
Maths & Technologies for Games DirectX 11 – New Features Tessellation & Displacement Mapping CO3303 Week 19.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
CSE 381 – Advanced Game Programming GLSL. Rendering Revisited.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Maths & Technologies for Games Graphics Optimisation - Batching CO3303 Week 5.
Emerging Technologies for Games Capability Testing and DirectX10 Features CO3301 Week 6.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Advanced D3D10 Shader Authoring Presentation/Presenter Title Slide.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
How to use a Pixel Shader CMT3317. Pixel shaders There is NO requirement to use a pixel shader for the coursework though you can if you want to You should.
2D Graphics Optimizations
Muen Policy & Toolchain
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Real-Time Rendering Buffers in OpenGL 3.3
Loops BIS1523 – Lecture 10.
The Basics: HTML5, Drawing, and Source Code Organization
A Crash Course on Programmable Graphics Hardware
The Hardware/Software Interface CSE351 Winter 2013
Graphics Processing Unit
Deferred Lighting.
Introduction to OpenGL
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
The HP OpenVMS Itanium® Calling Standard
Operation System Program 4
Optimizing Malloc and Free
Functions BIS1523 – Lecture 17.
UMBC Graphics for Games
Chapter VI OpenGL ES and Shader
Introduction to Database Systems
Graphics Processing Unit
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
UMBC Graphics for Games
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Introduction to Data Structure
Computer Graphics Practical Lesson 10
Programming with OpenGL Part 3: Shaders
UE4 Vulkan Updates & Tips
Computer Graphics Introduction to Shaders
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
03 | Creating, Texturing and Moving Objects
Introduction to OpenGL
COMP755 Advanced Operating Systems
Computer Graphics Introducing DirectX
Software Development Techniques
OpenGL-Rendering Pipeline
CS 480/680 Fall 2011 Dr. Frederick C Harris, Jr. Computer Graphics
Presentation transcript:

Designing a Modern GPU Interface Brooke Hodgman ( @BrookeHodgman) http://tiny.cc/gpuinterface

Brooke Hodgman ( @BrookeHodgman) How to make a wrapper for D3D9/11/12, GL2/3/4, GL|ES2/3, Metal, Mantle, Vulkan, GNM & GCM without going (completely) insane Brooke Hodgman ( @BrookeHodgman) http://tiny.cc/gpuinterface

Agenda GPU Interface Pipeline State management Resource management “wrapper” around the native GPU APIs for every platform Pipeline State management Resource management Shader program management (e.g. Microsoft .fx or Nvidia .cgfx) Q&A

Where does this fit in? Shading Pipeline Scene Manager Deferred/Forward shading, post-processing, order of passes, high level techniques Scene Manager Spatial partitioning, Camera management, object culling Specific Drawable Types Generic models, Particle systems, Animated meshes, Instanced meshes GPU Interface Lowest level portable rendering API. Just a GPU abstraction. Game Engine Today!

Goals Flexibility Productivity Performance Simplicity Can do anything that the native APIs let us do. No cutting out features. Productivity Much simpler to use than the native APIs. Less code, and less mental tax. Performance Similar CPU frame-time to hand-written native code. Simplicity Keep the interface as small as possible.

Dog food test ’22’ Don Bradman Cricket (port) Rugby League Live 3 22 is my ‘indie’ project. We started off on D3D9, and have since added support for D3D11 & Mantle, with plans for OpenGL, D3D12 and Vulkan. Big Ant Studios contracted me to port Bradman to PS4 and Xbox One. They hated their current rendering abstraction, so I convinced them to move to this model. All of the high level code (game-level materials, meshes, models, UI’s) was mostly unaffected – we just shifted the rug out from under them. I helped Big Ant a lot on their next project to optimize the architecture and port it to the previous-gen console, allowing a lot of the rendering systems to work across five SKUs. PC, PS4, Xbox One PS4, Xbox One Steam, PS4, PS3, Xbox One, Xbox 360

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 SM2 Draw: Input Assembler Vertex Shader Rasterizer Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers)

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 + Vertex texture fetch SM3 Draw: Input Assembler Vertex Shader Rasterizer Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers)

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 + Geometry Shader & Stream Out stage + Compute shaders SM4 Draw: Rasterizer Input Assembler Vertex Shader Geometry Shader Pixel Shader Output Merger Stream Out Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 + Read-Write resources (UAVs) at pixel shader + Tessellation stages SM5 Draw: Rasterizer Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Pixel Shader Output Merger Stream Out Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 + Read-Write resources at every stage SM5+ Draw: Rasterizer Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Pixel Shader Output Merger Stream Out Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

(Textures, Buffers, Samplers) The GPU pipeline 2005 - 2015 Most common features Draw: Rasterizer Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

Buffer / Texture / Sampler The GPU pipeline 2005 - 2015 Most common features, API view API states: Input Layout Programs Depth / Stencil Blend Raster Rasterizer Draw Command Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out Resource bindings: Buffer Buffer Depth Texture Colour Texture Buffer / Texture / Sampler

The GPU pipeline 2005 - 2015 Most common features, API view Programs Dispatch Command Compute Shader Buffer / Texture

Stateless Rendering

Native APIs are state machines Draw(3, TRIANGLES)– Behaviour depends on the current state ??? ??? ??? ??? ??? Rasterizer Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out ??? ??? ??? ??? ???

Native APIs are state machines BindTexture( t )– plug some resources in BindVertexBuffer( v ) BindRenderTarget( r ) ??? ??? ??? ??? ??? Rasterizer Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out v ??? ??? r t

Native APIs are state machines SetBlend( OPAQUE )– configure some fixed-function bits SetShaderProgram( s )– plug some procedures in ??? s ??? OPAQUE ??? Rasterizer Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out v ??? r t

Native APIs are state machines SetInputLayout( l ) SetRaster( SOLID ) SetDepthTest( DISABLED ) l s DIS- ABLED OPAQUE SOLID Rasterizer Draw(3, TRIANGLES) Input Assembler Vertex Shader Pixel Shader Output Merger Stream Out v r t

State machine issues (and features) Objects can specify that they “don’t care” about a state (by not setting it) “Don’t care” states can be inherited from the calling logic. SetBlend( Translucent ) House.Draw() Tree.Draw()

State machine issues and features But… this system of inherited state can be very fragile to code modifications. void House::Draw(){ SetBlend( Opaque ) Draw( TRIANGLES,3 ) } We didn’t configure the fixed-function depth test, which means we inherit it’s configuration from the previous draw. SetBlend( Translucent ) House.Draw() Tree.Draw() – uh oh!

State machine issues and features It can also lead to inefficiencies as your graphics programmers become pessimistic. void Tree::Draw(){ SetBlend( Opaque ) ... } void House::Draw(){

Stateless Alternative Simplify the API remove the entire state machine concept! Less mental tax no worrying about leaky states Retain the flexibility of “don’t care” states but remove the fragility that it has in state-machine APIs

Draw Items Pipeline state Resources Primitives Bundle all native API state and all resource bindings together into a “Draw Item”. Missing / “don’t care” states are always filled in by some form of default value. Pipeline state Resources Primitives Input Layout Depth / Stencil Raster Blend Programs Buffer Texture Sampler Draw Command

Draw Items draw_item = CreateDrawItem( ... ) Submit( draw_item ) – Behaviour depends only on the contents of the draw item draw_item IA VS Raster PS Less Solid Code Layout VB Colour OM Opaque Tex Depth 3 triangles

Leaky states Now impossible to get state “leakage”. Every draw is completely independent and immune to code modifications in other drawing systems. Submit( House.GetDrawItem() ) Submit( Tree.GetDrawItem() ) We didn’t configure the fixed-function depth test, which means we inherit it’s configuration from the previous draw.

State Groups Container for Pipeline States and Resource Bindings. Plain-old-data, generated by a ‘writer’ object. StateGroupWriter sgw sgw.Begin() sgw.BindTexture( t ) sgw.BindVertexBuffer( v ) sgw.SetBlend( Opaque ) sgw.SetShaderProgram( s ) StateGroup* sg = sgw.End() Blend Programs Buffer Texture

State Group Stacks Allow different systems to contribute pipeline-states and resource bindings. StateGroup* mesh = ... StateGroup* material = ... StateGroup* stack[] = {material, mesh} Input Layout Buffer Blend Programs Raster Texture

State Overrides Stack ordering dictates priority for overrides. Placing a state-group at the front of the array causes it’s values to be chosen in any state conflicts. StateGroup* mesh = ... StateGroup* material = ... StateGroup* override = ... StateGroup* stack[] = {override, material, mesh} Input Layout Buffer Blend Programs Raster Texture Blend

State Overrides Stack ordering dictates priority for overrides. Placing a state-group at the front of the array causes it’s values to be chosen in any state conflicts. StateGroup* stack[] = {override, material, mesh} override material mesh Input Layout Buffer Blend Blend Programs Raster Texture

State Defaults Stack ordering dictates priority for overrides. Placing a state-group at the back of the array causes it’s values to only be chosen as a fall-back. StateGroup* mesh = ... StateGroup* material = ... StateGroup* defaults = ... StateGroup* stack[] = {material, mesh, defaults}

State Defaults Stack ordering dictates priority for overrides. Placing a state-group at the back of the array causes it’s values to only be chosen as a fall-back. StateGroup* stack[] = {material, mesh, defaults} material mesh defaults Blend Programs Input Layout Buffer Input Layout Blend Raster Raster Texture Depth / Stencil Programs

Compiling a Draw Item Given a stack and a draw command, they can be pre-compiled into a draw item. override material mesh defaults Blend Blend Programs Input Layout Buffer Input Layout Blend Raster Raster Texture Depth / Stencil Programs StateGroup* stack[] = {override, material, mesh, defaults} DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command ) Draw Command

Compiling a Draw Item draw Given a stack and a draw command, they can be pre-compiled into a draw item. Depth / Stencil Raster Programs Input Layout Buffer Blend Texture ??? Draw Command override material mesh defaults Blend Blend Programs Input Layout Buffer Input Layout Blend Raster Rasterizer Input Assembler Vertex Shader Pixel Shader Output Merger Raster Texture Depth / Stencil Programs Stream Out Fancy animating slide… Resolve conflicts Keep remaining states and resources Copy into template StateGroup* stack[] = {override, material, mesh, defaults} DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command ) Draw Command

Render Passes Draw Items defined all of the pipeline state except for the Depth/Stencil Target and Render Targets. Render Passes define these destination resources, plus the default and override state groups. RenderPass* pass = CreatePass( depth, color, defaults, override ) StateGroup* stack[] = { material, mesh } DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command, pass ) DrawItem* draws[] = { draw } Submit( pass, draws ) StateGroup* stack[] = {override, material, mesh, defaults} DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command )

Resource Bindings

Resource ID’s (and state ID’s) Similar to GL, we use small integer types to refer to resource allocations & views. No reference counting – a higher level of the engine can ‘wrap’ reference counting around this simple ‘integer handle’ scheme if necessary (a la std::shared_ptr). Helps decouple platform-specific types from the client code. This can be a significant memory saving per compiled Draw Item – Pointers are 64 bits! Most resource IDs should fit in <16 bits Some kinds of state IDs might fit in <8 bits! (how many blend modes do you really use?)

Resource slots Most resource binding points are arrays Conflicts are resolved per individual array elements override material Sampler 1 Sampler 2 Blend Programs Sampler 0 Sampler 1 StateGroup* stack[] = {override, material}

Resource slots Resource slots aren’t named, only numbered? Sampler 0, Sampler 1, Sampler 2… Constant Buffer 0, Constant Buffer 1, Constant Buffer2… Using this assumption at this level of the engine greatly simplifies development. Our ‘shader programs’ struct can use a sampler bitmask of 0x05 to indicate that it uses sampler slot #0 and slot #2 (i.e. ((1<<0)|(1<<2)) == 0x5) The State Group conflict / merging system is built on super simple integer comparisons.

Resource slots Using numbered slots requires defining convention. Constant Buffer 0 is always used for the per-camera matrix data. Constant Buffer 1 is always used for lighting data etc. This is actually quite useful for “magic” engine-generated data, which always conforms to a known (hard-coded) structure. …such as camera matrices, which you want to automatically plug into every object. These are also a good use for the defaults/overrides state groups!

Resource slots Using named resources requires reflection. To bind data to a named slot, simply use the shader reflection system. Check with the object’s shader to discover the number that’s associated with that name. This is useful for less rigidly defined structures, such as materials, which may change often during development and vary from object to object.

Input Assembler (D3D11) Input Assembler Index buffer Vertex buffer(s) Binding slots: Input Assembler Input Layout (Formats, strides for each element) Index buffer (Buffer + offset) Vertex buffer(s) AKA Vertex Attribute Object, Vertex Declaration, etc… Input layouts are problematic, as they’re a link between mesh data layout and the vertex shader. This forces your mesh management code to be strongly coupled with your material code. Some of the vertex buffers being bound will be used for per-vertex attributes, and some will be used for per-instance. Input Assembler Vertex Shader Stream Out Rasterizer Pixel Shader Depth / Stencil Raster Programs Input Layout Buffer Output Merger Blend API states: Resource bindings: Buffer / Texture / Sampler Depth Texture Colour Texture Draw Command

Input Layouts and Vertex Shaders Input layouts tell the VS where to find the vertex attributes. Stream #0 data: Position 1 Position 2 Position 3 Stream #1 data: TexCoord 1 Normal 1 TexCoord 2 Normal 2 TexCoord3 Normal 3 Offset: Stride: VS_Input_Full reads every attribute that’s defined by this stream format. VS_Input_Thin does not. struct VS_Input_Full { float3 p : Position; float2 t : TexCoord; float3 n : Normal } struct VS_Input_Thin { float3 p : Position; }

Input Layouts and Vertex Shaders Lua config files define stream formats (memory layouts for buffers) and vertex formats (VS input structures). StreamFormat("example_stream", { [VertexStream(0)] = { Float32, 3, Position }, }, [VertexStream(1)] = { Float32, 3, Normal }, { Float32, 2, TexCoord, 0 }, }) VertexFormat("VS_Input_Full", { { "p", float3, Position }, { "t", float2, TexCoord, 0 }, { "n", float3, Normal }, }) InputLayout( "example_stream", "VS_Input_Full" ) InputLayout( "example_stream", "VS_Input_Thin" ) InputLayout( "simple_stream", "VS_Input_Thin" )

Input Assembler (Simplified) Vertex Data Index buffer (Buffer ID + offset) Vertex buffer(s) Stream Format Input Layout (now hidden from the user) Instance Data Binding slots: Vertex Attribute Object, Vertex Declaration, etc…

Shader Resources (D3D11) Constant Buffer View(s) Buffer Binding slots: Pixel Shader Constant Buffer View(s) Buffer Shader Resource View(s) Texture Sampler(s) Unordered Access view(s) …repeat for other shader stages GL notes – UAV==SSBO, CBV+Buffer==UBO, SRV+Buffer==‘Buffer Texture’, SRV+Texture+Sampler==Texture Input Assembler Vertex Shader Stream Out Rasterizer Pixel Shader Depth / Stencil Raster Programs Input Layout Buffer Output Merger Blend API states: Resource bindings: Buffer / Texture / Sampler Depth Texture Colour Texture Draw Command

Resource Lists D3D11 allows for 128 texture slots per shader stage. Can we still allow the user to access a hundred textures without the overhead of managing a hundred binding points? How did APIs already solve this for constants / uniforms? Resource lists are constant buffers (UBOs) for texture bindings. Similar to “bindless” resources. Ports well to Mantle/Vulkan/D3D12 descriptor lists! Only a small number of resource list binding points required. Resource List Diffuse Map ID Normal Map ID Specular Map ID

Shader Resources (simplified) Binding slots: Shader Stages (all) Constant Buffer ID(s) Resource List ID(s) Buffer ID / Texture ID Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID) We’re using ID’s instead of pointers to view objects. We’ve merged all the shader stages together – the ability to bind 128 textures to the pixel shader and a different 128 textures to the vertex shader did not seem worth it, when compared to the simplicity gained by removing the need to manually specify which stage you’re binding to. Often, constant-buffers are used by several stages, and it’s a pain in the arse to issue to keep track of this and remember to bind it to every stage… On the modern APIs (d3d12/etc), this “all-stage” binding system also lets us share the same descriptor tables across stages if we desire. Resource lists blah

Draw Item Resources Draw Item Final size of each draw item is usually <1 cache line Draw Item Resource List Buffer ID / Texture ID Constant Buffer ID(s) Raster ID Blend ID 2 – 256 bytes Depth / Stencil Program ID Resource List ID(s) Draw commands not only specify the number of primitives to draw (and the type), they also specify a vertex offset, which allows many draw items to share the same IA config. Input Assembler Config ID Sampler ID(s) Input Assembler Config Draw Command Vertex Data Unordered Access view(s) (Buffer ID / Texture ID) Instance Data 20 – 128 bytes 32 – 80 bytes

State Group Resources Final look at actual State Group members (all optional) State Group Constant Buffer ID(s) Raster ID Depth / Stencil Blend ID Resource List ID(s) Vertex Data Instance Data Sampler ID(s) Technique ID Shader Options Draw Item Unordered Access view(s) (Buffer ID / Texture ID) Program ID

Shaders

Program management Out of the box, shaders are hard to manage. One program = Pixel Shader + Vertex Shader (+Geometry + Tessellation…) Most objects/materials require more than one program. Deferred rendering – write GBuffer attributes. Forward rendering – compute all shading and lighting. Shadow mapping – write depth only. Material LOD – enable disable features (e.g. normal mapping at a distance). Loop unrolling – compile the shader once for each value of N. All of these programs grouped together form a single Technique.

Techniques, Passes, Options, Permutations… A technique is a single shader file (Effect in MS lingo) Each technique contains several passes Gbuffer, Forward, Depth-Only, etc… Each pass can contain several options Normal Mapping (y/n), Number of lights [0..8), etc… For each technique, for each pass, for each permutation of options, pre- compile the shader source file into a program Careful – each 1-bit option doubles the number of programs!

[FX] syntax All the APIs we use (except mobile/Mac/Linx) use a shader language that is close enough to HLSL that we can just write all our shader code in HLSL! A header file full of #defines is enough to smooth over the small differences in syntax. However, resource declaration syntax varies widely. Not all platforms support constant buffers (We support prev-gen / D3D9 / GL2 era). Not all platforms support Resource Lists. Not all platforms support separate Textures and Samplers 

[FX] syntax Small amount of code generation used to smooth over these issues. We search for comment blocks of the pattern /*[FX] … */ and execute their contents as Lua code. The Lua VM has been pre-registered with functions such as below, to create a domain-specific-language for declaring shader resources and techniques/passes/options: CBuffer( slot, stages, name, values ) TextureList( slot, stages, name, values ) Option( name, range ) Pass( slot, name, parameters )

[FX] Examples CBuffer( 0, Pixel, 'Material', { { g_emissive = float }, }) TextureList( 0, Pixel, 'Material', { { Tex2D, 's_Diffuse', 'Linear' }, }) Sampler(0, {Pixel,Vertex}, 'Linear', { MinFilter = Linear, MagFilter = Linear, MipFilter = Linear, AddressU = Wrap, AddressV = Wrap, AddressW = Wrap, })

[FX] Examples Pass( 0, 'Opaque', { vertexShader = 'vs_main'; pixelShader = 'ps_main'; vertexLayout = { 'VS_Input_Full' }; pixelOptions = ‘LightCount'; })

Shader Options Shader options are all packed together into a bitmask. Option( 'NormalMapped' ) -- pick a bit for me (use reflection!) Option( 'NormalMapped', {id=3} ) -- mask == 0x8 (i.e. 1<<3) Option( 'LightCount', {id=4, min=1, max=4} ) 7654 3210 0x00 / 0000 0000 == LightCount: 1 0x10 / 0001 0000 == LightCount: 2 0x20 / 0010 0000 == LightCount: 3 0x30 / 0011 0000 == LightCount: 4 1 to 4 inclusive is 4 values. 4 values requires 2 bits. Start bit is #4, will also use bit #5. To encode, subtract min, shift left 4 places. To decode, shift right 4 places, mask off two bits, add min.

Shader Options Given a pass with: Option( 'NormalMapped', {id=0} ) Option( 'LightCount', {id=4, min=1, max=4} ) The permutations would be: 7654 3210 0x00 / 0000 0000 == NormalMapped: 0, LightCount: 1 0x01 / 0000 0001 == NormalMapped: 1, LightCount: 1 0x10 / 0001 0000 == NormalMapped: 0, LightCount: 2 0x11 / 0001 0001 == NormalMapped: 1, LightCount: 2 0x20 / 0010 0000 == NormalMapped: 0, LightCount: 3 0x21 / 0010 0001 == NormalMapped: 1, LightCount: 3 0x30 / 0011 0000 == NormalMapped: 0, LightCount: 4 0x31 / 0011 0001 == NormalMapped: 1, LightCount: 4 We would compile the shader code 8 times, with different #define values for ‘NormalMapped’ and ‘LightCount’

Program selection I lied earlier – I said that a Render Pass has just a depth-texture, render- target(s), defaults state group and overrides state group. A Render Pass also specifies a “shader pass” integer. Look up the technique, then look up the right pass within the technique… … and then you’ve got a potentially long list of permutations  State Group Technique ID Shader Options Render Pass Pass ID Draw Item Program ID Step 1 Step 2 Profit!

Shader Options - runtime Conflict/merging of shader options ‘state’ is implemented a little differently. State Group State Group U32 value U32 mask value = 0x04 mask = 0x0F Shader Options Merged Options = 0x84 Instead of a simple all-or-nothing conflict resolution, each ‘Shader Options’ State Group member contains a bitfield value, and a mask. The mask allows us to resolve conflicts on each individual bit. Technique ID State Group value = 0x80 mask = 0xF0 Render Pass Pass ID

Permutation selection When compiling your permutations, sort them by CountBitsSet(options_bitmask) such that permutations with more options bits set appear earlier in the array. At runtime, the user creates their own bitmask of requested features. Linearly search through the permutations list, stop when: (requested_options & permutation_options) == permutation_options i.e. stop as soon as you’re not delivering options that weren’t asked for. You won’t necessarily be able to satisfy the user’s request exactly, but this algorithm will give them the program that enables as many of their requests as possible. The sort order of ties doesn’t matter. Requested features bitmask may or may not actually match any particular program that exists. The final permutation in the list will always have a bitmask of zero, which always satisfies the condition and terminates the loop.

Permutation Selection (code) int SelectProgramsIndex( u32 techniqueId, u32 passId, u32 featuresRequested ) { Technique& technique = techniques[techniqueId]; List<Pass>& passes = technique.passes; Pass& pass = passes[passId]; List<Permutation>& permutations = pass.permutations; for( int i = 0, end = permutations.count; i != end; ++i ) Permutation& permutation = permutations[i]; if( (featuresRequested & permutation.features) == permutation.features ) return permutation.bindingIdx; } return -1; For those of you downloading the slides…

Q&A? @BrookeHodgman http://tiny.cc/gpuinterface

Thanks! @BrookeHodgman http://tiny.cc/gpuinterface

Bonus slides That I was going to write but then I didn’t

GLSL notes GL + GLSL are just specifications – vendors create implementations (which are all broken) Validate your shaders using the Khronos reference compiler*. Don’t ship your source files. Implement a pre-processor for #include, etc. Obfuscate your shipping code if you feel the need. No guarantees that every vendor will optimize (or compile) your code properly! Implement a GLSL->AST->GLSL optimizing compiler. Or better: a HLSL->AST->GLSL optimizing compiler! Automate this! *http://tiny.cc/khronos

Draw sorting Write a function that hashes a compiled Draw Item. More expensive state changes should be associated with more significant bits in the output. Draw Item Constant Buffer ID(s) Resource List ID(s) Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID) Depth / Stencil Raster ID Program ID Blend ID Draw Command Input Assembler Config ID IA Config Shader & pipeline state Textures Hash 0x12345678 Sorting key

Transparent Draw sorting Alpha-blended geometry must be rendered from back to front. Don’t use the draw item’s hash, use it’s distance from the camera. Depth Distance ~*(u32*)distance 0xABCDEF12 Sorting key

Hybrid Draw sorting 0xABCD1357 0x12345678 For opaque geometry to make use of Hi-Z, you want to render front-to-back. However, you also want to sort by state to reduce CPU costs. Compromise by using a hybrid Distance Coarse Depth Original Hash Merge 0xABCD1357 0x12345678 New sorting key Original Sorting key

Redundant state filtering Each draw item is a very compact structure, containing state IDs. XOR’ing two draw items creates a bitmask that highlights any changes. Masking out sections of that bitmask and comparing them to zero lets you quickly check if a state has changed since the previous draw item.

Resource Management

Data conditioning / compilation

Shader compilation fun

Devices, contexts & command lists

Devices

Contexts

Multithreading on old APIs

Higher level layer examples

Scene manager

Materials

Lighting