Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Performance in Your Game DirectX 12: Bennett Sorbo Program Manager Direct3D, Windows Graphics.

Similar presentations


Presentation on theme: "Improving Performance in Your Game DirectX 12: Bennett Sorbo Program Manager Direct3D, Windows Graphics."— Presentation transcript:

1

2 Improving Performance in Your Game DirectX 12: Bennett Sorbo Program Manager Direct3D, Windows Graphics

3 Agenda Overview Improving GPU efficiency Reducing CPU overhead Summary / Next Steps

4 Overview DirectX 12 provides a single API for low-level access to a variety of GPU hardware Enables games to leverage higher-level knowledge to achieve great performance gains Today, we’ll discuss best practices for specific DirectX 12 features to achieve these gains in your game

5 Increasing GPU efficiency

6 GPU Efficiency Three key areas for GPU-side gains Explicit resource transitions Parallel GPU execution GPU-generated workloads

7 Modern GPUs require resources to be in different ‘states’ for different use cases, and knowledge of when these transitions need to occur In DirectX 12, app is responsible for identifying when these transitions need to occur. Making these transitions explicit makes it clear when operations are expensive.. GPU Efficiency: Explicit resource transitions

8 .. but also gives games the opportunity to eliminate unnecessary transitions. Two key opportunities: First, UAV synchronization is now exposed as an explicit resource barrier. Previously, driver would ensure all writes to a UAV were in order of dispatch by inserting “Wait for Idle” commands after each dispatch. GPU Efficiency: Explicit resource transitions (cont’d) Dispatch WaitForIdle Dispatch

9 If app has high-level knowledge that dispatches can run out of order, WaitForIdle’s can be removed But more importantly, dispatches can then run in parallel to achieve higher GPU occupancy Particularly beneficial for large numbers of dispatches with low thread counts GPU Efficiency: Explicit resource transitions (cont’d) Dispatch WaitForIdleDispatch WaitForIdleDispatch

10 Second, the ResourceBarrier API allows application to perform transitions over a period of time. App specifies starting/destination states at ‘begin’ and ‘end’ ResourceBarrier calls. Promises not to use resource while in transition. Driver can use this information to eliminate redundant pipeline stalls, cache flushes GPU Efficiency: Explicit resource transitions (cont’d)

11 Example rendering scenario (before) Example rendering scenario (after) GPU Efficiency: Explicit resource transitions (cont’d) Draw call that renders to Tex1 Resource Barrier (Tex1) Render Target -> SRV Driver emits ‘WaitForIdle’ command SetDescriptorHeap Driver emits ‘WaitForIdle’ command Bind Tex1 as SRV, sample in Draw call API Calls Hardware Commands … Draw call that renders to Tex1 Resource Barrier (Tex1) Render Target -> SRV BEGIN SetDescriptorHeap Driver emits ‘WaitForIdle’ command Bind Tex1 as SRV, sample in Draw call API Calls Hardware Commands … Resource Barrier (Tex1) Render Target -> SRV END

12 Modern hardware has the ability to run multiple workloads in parallel on multiple ‘engines’ DirectX 12 allows games to target engines explicitly. The developer knows best about what operations can happen in parallel, what the dependencies are Three engine types exposed in DirectX 12: 3D, Compute, Copy Up to app to know, manage dependencies between queues GPU Efficiency: Parallel GPU execution

13 The copy engine type is great for getting data around without blocking/interrupting the main 3D engine. Two notable use cases: Texture streaming ‘lazy’ CPU readback Especially great if going across PCI-E Demo GPU Efficiency: Parallel GPU execution (cont’d)

14 GPU Efficiency: Parallel GPU execution

15 Really excited about compute engine scenarios as well Two notable use cases: Long-running, low priority compute work Tightly interleaved 3D/Compute work within a frame Get the gain from running different types of workloads that stress different parts of GPU Canonical example: compute-heavy dispatches during shadow map generation. GPU Efficiency: Parallel GPU execution (cont’d)

16 ID3D11Asynchronous -> ID3D12QueryHeap Query Heaps generalize query functionality – output stored into any buffer on the GPU or in system memory. ID3D12CommandList::ResolveQueryData( ID3D12QueryHeap *pQueryHeap, D3D12_QUERY_TYPE Type, UINT StartElement, UINT ElementCount, ID3D12Resource *pDestinationBuffer, UINT64 AlignedDestinationBufferOffset ) Two key performance opportunities: Binary occlusion Batched query ‘resolve’ operations GPU Efficiency: GPU-generated workloads

17 Predication has also been generalized ID3D12CommandList::SetPredication( ID3D12Resource *pBuffer, UINT64 AlignedBufferOffset, D3D12_PREDICATION_OP Operation) Predicate on general buffer: query-derived, CPU-populated, GPU- populated – enables new rendering scenarios GPU Efficiency: GPU-generated workloads (cont’d)

18 ExecuteIndirect – powerful new API for executing GPU-generated Draw/Dispatch workloads Broad hardware compatibility Can vary the following between invocations: Vertex/Index buffers Root constants, Inline SRV/UAV/CBV descriptors Enables new scenarios, dramatic efficiency improvements GPU Efficiency: GPU-generated workloads (cont’d)

19 Demo Always going to be very efficient: two ways to maximize Set a proper ‘max count’, or just use CPU count. Group these together, ideally put space between generation and consumption of arguments. GPU Efficiency: GPU-generated workloads (cont’d)

20 Reducing CPU Overhead

21 CPU Overhead Many improvements just for showing up: No high-frequency ref-counting No hazard tracking No state shadowing Three other opportunities to take advantage of: Resource Binding Multi-threading Memory allocation

22 CPU Overhead: Resource Binding What’s new: Descriptor Heap access Root Signatures Descriptor Heap: Actual GPU memory that contains resource access metadata Root Signature: Binding parameters that can be passed to a shader invocation. Can contain: Location in descriptor heap ‘Inline’ descriptors Actual constant data

23 Descriptor Heap best practices Do: keep your descriptor heap as static as possible. Avoid: frequently changing descriptor heaps. Root Signature best practices Do: keep your root signature small Do: take advantage of inline descriptors/data Avoid: binding unnecessary pipeline stages This is an area where you can move the needle on CPU performance – take advantage of the new flexibility here. CPU Overhead: Resource Binding (cont’d)

24 CPU Overhead: Multi-threading In DirectX 11, driver created background thread outside app control. In DirectX 12, multi-threading is app-controlled, first-class citizen via ID3D12CommandList. Not just command lists: you can create PSO and buffers/textures on background threads. Recommendation: Serial workload? Create own background submission thread.

25 CPU Overhead: Resource allocation In DirectX 11, driver-managed versioning, sub-allocation behind app’s back. DirectX 12 provides tools like fences, resource placement to put apps in charge. Persistently-mapped resources. Recommendations: Use appropriate number of fences Expire resources based on engine knowledge

26 Ashes of the Singularity case study Dan Baker Graphics Architect, Oxide Games

27 Resource Binding in Nitrous Nitrous designed from start to map to hardware binding models Three key engine design points: Textures pre-grouped in descriptor heap Bindings shared across shader stages – less bind calls Built around Static Samplers Findings: Easy to stay within one descriptor heap/frame Important to avoid redundant state sets Optional usage of Root CBVs can provide win Result: resource binding overhead is a fraction of what it is on D3D11

28 Resource Management in Nitrous Nitrous also benefits from more explicit resource management Two classes of resources: Formally tracked, persistent resources Temporary, frame-specific resources Frame-specific resources linearly allocated out of heap, with no resource tracking – minimal overhead

29 Demo

30 Conclusion Many opportunities with DirectX 12 to achieve dramatic performance improvements in your game Get started today! Enroll in the Early Access program at to receive the latest SDK, DirectX 12 drivers, documentation, … Check out Channel9 for previous DirectX12 talks Q/A

31 © 2015 Microsoft Corporation. All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

32 Backup < WDDM2 residency management provides flexibility/performance.


Download ppt "Improving Performance in Your Game DirectX 12: Bennett Sorbo Program Manager Direct3D, Windows Graphics."

Similar presentations


Ads by Google