Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting The Best Out Of D3D12 Evan Hart, Principal Engineer, NVIDIA Dave Oldcorn, D3D12 Technical Lead, AMD.

Similar presentations

Presentation on theme: "Getting The Best Out Of D3D12 Evan Hart, Principal Engineer, NVIDIA Dave Oldcorn, D3D12 Technical Lead, AMD."— Presentation transcript:

1 Getting The Best Out Of D3D12 Evan Hart, Principal Engineer, NVIDIA Dave Oldcorn, D3D12 Technical Lead, AMD

2 Prerequisites ● An interest in D3D12  Ideally, already looked at D3D12 ● Experienced Graphics Programmer ● Console programming experience  Beneficial, not required

3 Brief D3D12 Overview

4 The ‘What’ of D3D12 ● Broad rethinking of the API ● Much closer to HW realities ● Model is more explicit  Less driver magic

5 “With great power comes great responsibility.” ● D3D12 answers many developer requests ● Be ready to use it wisely and it can reward you

6 Console Vs PC ● D3D12 offers a great porting story  More of the explicit control console devs crave  Much less driver interference ● Still a heterogeneous environment  Need to test carefully  Heed API and tool warnings (exposed corners)  Game will run on HW you never tested

7 Central Objects to D3D12 ● Command Lists ● Bundles ● Pipeline State Objects ● Root Signature and Descriptor Tables ● Resource Heaps

8 Using Bundles And Lists Draw Dispatch Bundle Command List Frame

9 Command Lists & Bundles ● Bundle  Small object recording a few commands  Great for reuse, but a subset of commands  Like drawing 3 meshes in an object ● Command List  Useful for recording/submitting commands  Used to execute bundles and other commands

10 Pipeline State Object ● Collates most render state  Shaders, raster, blend ● All packaged and swapped together

11 Pipeline State Object Pipeline State Pixel Shader Vertex Shader Rasterizer State Depth State Blend State Input Layout Topology RT Format Geometry Shader Hull Shader Domain Shader Compute Shader

12 Root Signature & Descriptor Tables ● New method for resource setting ● Flexible interface  Methods for changing large blocks  Methods for small bits quickly  Indexing and open-ended tables enable “bindless”-like behaviour

13 Resource Heaps ● New memory management primitive ● Tie multiple related resources into one heap ● App controls residency on the heap  Somewhat coarse ● Enables console-like memory aliasing

14 New HW Features ● Conservative Rasterization ● Raster Ordered Views ● Typed UAV ● PS write of stencil reference ● Volume tiled resources

15 Advice for the D3D12 Dev

16 Practical Developer Advice ● Small nuggets on key issues ● Advice is from experience  Multiple engines have done trial ports  Many months of experimentation Driver, API, and app level

17 Efficient Submission ● Record commands in parallel ● Reuse fragments via bundles ● Taking over some driver/runtime work  Make sure your code is efficient (and parallel) ● Submit in batches with ExecuteCmdLists  Submit throughout the frame

18 Engine organisation ● Consider task oriented engines  Divide rendering into tasks  Run CPU tasks to build command lists  Use dependencies to order GPU submission  Also helps with resource barriers

19 Threading: Done Badly Render Thread Command List 0 Command List 1 Submit Create Resource Present Game Thread Aux Thread App render code, runtime, driver all on one!

20 Async Thread Worker Thread Threading: Done Well Master Render Thread Command List 0 Command List 1 Submit CL0 Submit CL0 Submit CL1 Submit CL1 Create Resource Present Game Thread Many solutions, key is parallelism! Create Resource Compile PSO Command List 2 Command List 3 Submit CL2 Submit CL2 Submit CL3 Submit CL3

21 PSO Practicalities ● Merged state removes driver validation costs ● Don’t needlessly thrash state  Just because it is a PSO, doesn’t mean every state needs to flip in HW Avoid toggling compute/graphics Avoid toggling tessellation  Use sensible defaults for don’t care fields

22 Creating PSOs ● PSO creation can be costly  Probably means a compile ● Streaming threads should handle PSO  Gather state and create on async threads  Prevents stalls  Can handle specializations too

23 Deferred PSO Update ● “Quick first compile; better answer later”  Simple / generic / free initial shader  Start the compile of the better result  Substitute PSO when it’s ready ● Generic / specialized especially useful  Precompile the generic case  More optimal path for special cases, compiled on low priority thread

24 Using Bundles And Lists Draw Dispatch Bundle Command List Frame

25 Bundle Advice ● Aim for a moderate size (~12 draws)  Some potential overhead with setup ● Limit resource binding inheritance when possible  Enables more complete cooking of bundle

26 Lists Advice ● Aim for a decent size  Typically hundreds of draw calls ● Submit together when feasible ● Don’t expect lots of list reuse  Per-frame changes + overlap limitation  Post-processing might be an exception Still need 2-3 copies of that list

27 Using Command Allocators

28 Allocators and Lists ● Invisible consumers of GPU memory ● Hold on to memory until Destroy ● Reuse on similar data  Warm list == no allocation during list creation ● Destroy on different data  Reuse on disparate cases grows all lists to size of worst case over time Initial 100 draws Reset Same 100 draws 200 draws List / Allocator memory usage (Guaranteed no new allocations) Different 100 draws 5 draws

29 Allocator Advice ● Allocators are fastest when warm  Keep reusing allocator with lists of equal size ● Need 2T + N allocators minimum  T -> threads creating command lists  N -> extra pool for bundles  All lists/bundles on an allocator freed together Need to double/triple buffer for reusing the allocators

30 Root Signature ● Carefully layout root signature  Group tables by frequency of change  Most frequent changes early in signature ● Standardize slots  Signature change costs Per-Draw Table Pointer Per-Draw Table Pointer Tex Const Buf (shader params) Const Buf (shader params) Tex Const Buf (shader params) Const Buf (shader params) Tex Const Buf (camera, eye...) Const Buf (camera, eye...) Constant Buffer pointer (Modelview matrix, skinning) Constant Buffer pointer (Modelview matrix, skinning) Per-draw constants Per-Material Table Pointer Per-Material Table Pointer Per-Frame Table Pointer Per-Frame Table Pointer Tex

31 Root Signature Cnt’d ● Place single items which change per-draw in the root arguments ● Costs of setting new table vary across HW  Cost varies from nearly 0 to O(N) work where N is items in table ● Avoid changes to individual items in tables  Requires app to instance table if in flight  Try to update whole table atomically

32 Managing Resources with Heaps ● Committed  Monolithic, D3D11-style ● Placed  Offset in existing heap ● Reserved  Mapped to heaps like tiled resources Resource [VA] Heap G-buffer Postprocess buffer Heap

33 Choosing a resource type: Committed Need per-resource residency Don’t need aliasing Placed Cheaper create / destroy Can group in heaps of similar residency Want to alias over others Small resources Tiled / Reserved Need flexibility of memory management Can tolerate CPU and GPU overheads of ResourceMap

34 Resource tips ● Committed gives driver more knowledge ● Tiled resources have separate caps  Need to prepare for HW without it ● Memory might be segmented  Cannot allocate entire space in a single heap

35 Residency tips ● MakeResident:  Batch these up  Expect CPU and GPU cost for page table updates ● MakeUnresident  Cost of move may be deferred; may be seen on future MakeResident

36 Working Set Management ● Application has much more control in D3D12 ● Directly tells the video memory manager which resources are required ● App can be sharper on memory than before  On D3D11, working set per frame typically much smaller than registered resource  Less likely to end up with object in slow memory

37 Working to a budget ● “Budget” is the memory you can use ● Get under the budget using residency  MakeUnresident makes object candidate to swap to system memory  It is much cheaper to unresident, then later resident again, than to destroy and create ● Tiled resources can drop mip levels dynamically

38 Barriers & Hazards ● Most objects stay in one state from creation  Don’t insert redundant barriers ● Always specify the right set of target units  Allows for minimal barrier ● Group barriers into same Barrier call  Will take the worst case of all, rather than potentially incurring multiple sequential barriers

39 Barriers enhance concurrency ● Resources both read and written in a given draw created dependency between draws  Most common case was UAV used in adjacent dispatches Dispatch 0 Dispatch 1 Dispatch 2 Dispatches (D3D11) Draw 0 Draw 1 Draw 2 Draw 3 Draw 0 Draw 1 Draw 2 Draw 3 Logical view of draws GPU timeline of draws Barrier

40 Barrier enables overlap ● Explicit barrier eliminates issue  App tells API when a true dependency exists, rather than it being assumed Dispatch 0 Dispatch 1 Dispatch 2 Dispatch 0 Dispatch 1 Dispatch 2 Logical view of dispatches Dispatches with explicit barrier control

41 CPU side ● D3D12 simplifies picture  Easier to associate driver effort with application actions  Less likely that driver itself is the bottleneck ● Be aware of your system buses

42 GPU side ● Environment is new  Less familiar without console experience  Interesting new hardware limits are now accessible ● Use the tools

43 Wrap up

44 Get Ready ● D3D12 done right isn’t just an API port  More so when referring to consoles ● Good engine design offers a lot of opportunity ● The power you’ve been asking for is here

45 Questions

Download ppt "Getting The Best Out Of D3D12 Evan Hart, Principal Engineer, NVIDIA Dave Oldcorn, D3D12 Technical Lead, AMD."

Similar presentations

Ads by Google