Presentation is loading. Please wait.

Presentation is loading. Please wait.

Windows Display Driver Model (WDDM) v2 And Beyond Steve Pronovost, Microsoft Henry Moreton, NVIDIA Tim Kelley, ATI.

Similar presentations

Presentation on theme: "Windows Display Driver Model (WDDM) v2 And Beyond Steve Pronovost, Microsoft Henry Moreton, NVIDIA Tim Kelley, ATI."— Presentation transcript:

1 Windows Display Driver Model (WDDM) v2 And Beyond Steve Pronovost, Microsoft Henry Moreton, NVIDIA Tim Kelley, ATI

2 Outline Introduction Trends in use of GPU(s) WDDM v1.0 overview WDDM v.2.x overview Scenarios that benefit

3 Trends In Use Of GPU Windows XP: Single client at a time GDI desktop Video decoding Full screen game CAD/Workstation applications GPUs getting more flexible Direct3D pushing increased programmability, precision and performance Massive processing power, not fully utilized today

4 Trends In Use Of GPU Windows Vista: Multiple clients together Desktop window manager WinFX APIs based on Direct3D 9 Picture, video playback, capture, encode, transcode, edit leverage GPUs In-box games Emerging General – Purpose-GPU trend Physics, image processing, etc.

5 WDDM v1.0 Designed to work on existing GPUs Increase stability, robustness and security GPU scheduling Virtualized video memory Resource virtualization seamless across legacy API Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL Use new API to take full advantage of resource virtualization Direct3D 9Ex, Direct3D 10

6 WDDM v2.0 New generation of GPUs designed for multi-tasking Mid command buffer preemption Demand faulting of resources Surface fault (preferred mode for v2.0) Page fault (stall the GPU) Per process page tables Better multi-tasking than WDDM v1.0, still some client cooperation required

7 WDDM v2.1 Everything WDDM v2.0 GPU can do Fine grained context switching Can preempt mid pixel Doesn’t stall GPU on page fault True preemptive multi-tasking Ultimate flexibility for the GPU GPU can be used for any scenarios without impact on the desktop

8 WDDM Cheat Sheet WDDM v1.0 WDDM v2.0 WDDM v2.1 SchedulingPacketRunListRunList PreemptionPacket Mid Packet Mid Pixel Demand faulting Not supported Surface/ Page (STALL) Page MemoryManagement Physical/ Contiguous Virtual/ Page table Virtual/ Multi-taskingCooperative Mostly Preemptive Truly Preemptive

9 WDDM 2.x Scheduling, Performance And Multi-GPU Support Henry Moreton NVIDIA

10 GPUs On The Desktop The power of the GPU is finally tapped GraphicsVideo Bandwidth and floating point (GPGPU) Applications are vying for this powerful resource The Vista Desktop Window Manager (DWM) Photo editing Video feeds Personal Video Recorder

11 GPU Management Is Crucial Applications naturally see the processor as their own Great GPU tasks really exploit the power But... Some GPU operations are so massive they take non-trivial time Some GPU operations are time sensitive Management of the GPU is crucial to success (a happy user)

12 Watching The Daily Show © Doodling with photos I find a great program for creating panoramas... Today I set it up with twelve, 6 mega-pixel images Press go and wait... a long time (minutes) Soon, with GPU acceleration, I press go and wait a second or two A Typical Situation (For Me)

13 But A Second Or Two Is A Long Time Managed as a shared resource the GPU Renders my video unaffected Builds my panorama in no time... Unmanaged The Daily Show risks being a slide show...

14 So Scheduling Is Important How does scheduling vary across WDDM v1.0 WDDM v2.0 WDDM v2.1 What are the mechanics? What is the context switch behavior? What is expected performance? With varying numbers of active contexts...

15 WDDM v2.x – The Care And Feeding Of The GPU User Mode Driver (UMD) Creates DMA buffer of commands Kernel Mode Driver (KMD) Appends DMA buffer to GPU context’s queue The GPU Scheduler schedules contexts A Run List of contexts each with its own ring buffer of DMA buffers

16 Run Lists List of contexts (box) GPU processes a context until Context is completed (get new run list) Scheduler pre-empts Page fault – WDDM v2.1 Protection fault Synchronization event Multiple contexts per Run List Hide latency

17 How Nimble Is Context Switching? XP All Q’d DP2 buffers must complete (very coarse) WDDM v1.0 – Basic scheduling Current DMA buffer must complete (coarse) WDDM v2.0 Switch on command/triangle (fine) WDDM v2.1 Switch “immediately” (very fine)

18 Context Switch Guarantees Pre WDDM v2.1 (XP, v1.0, v2.0) No guarantee VERY long shader, VERY large triangle slow to switch expected performance Relatively coarse switching for XP and v1.0 V2.0: Good average/typical switch time WDDM v2.1 Guaranteed to context switch Same average/typical switch time as v2.0 Much better switch time on applications with long shaders

19 Context Switch Challenge Because GPUs are heavily threaded there is much more state than on a CPU Consider rendering @ 60 fps 17 millisecond frame time With a context switch time of 100µs Three concurrent applications see a ~2% context switch overhead Fast GPU context switching is important and challenging!

20 WDDM v2.x Efficiencies WDDM v1.0 User Mode Driver (UMD) creates GPU-specific command buffer KMD patches addresses Copies to GPU visible DMA buffer WDDM v2.0 and 2.1 UMD creates DMA buffer directly in GPU memory No copy, no patch, fast and efficient

21 Performance – Memory Footprint WDDM v1.0 No demand fault (page or surface) Entire surfaces resident – coarse grained OS must guarantee residence – CPU overhead WDDM v2.0 WDDM v2.0 Surface fault – supports load on bind GPU switches to new context, no stalling Fault and stall – permits partial eviction GPU stalls waiting for missing page WDDM v2.1 Page fault – permits partial eviction/residence GPU switches to new context, no stalling

22 Multi-Engine, Multi-GPU Support GPUs are composed of nodes of engines Homogeneous nodes 3D nodes Video nodes Copy, etc. RunList per engine GPU Device-common address space Multiple GPU Contexts (per engine) Synchronization Fence, Trap, Wait, Signal GPU3D video

23 Multi-GPU Linked Adapter Single logical adapter Multiple physical adapters Memory Mirrored or instanced Broadcast – multiple DMA buffer references Split Frame Rendering

24 WDDM v2.x Memory Management And Robustness Tim Kelley ATI

25 WDDM v1.0 Surface Mgmt All allocations (surfaces) referenced in DMA buffer must be resident at GPU submit Driver tracks every allocation reference in the DMA buffer Contiguous memory for each allocation DMA buffers patched with physical addresses once surfaces are resident Driver defines DMA split points to identify minimal working set Significant risk of graphics memory thrashing

26 WDDM v2.0 Surface Faulting A step in the right direction GPU supports per process virtual memory Two faulting behaviors Surface fault and context switch Page fault and stall In surface faulting, GPU probes first page of surface On probe of non-resident surface GPU faults GPU context switches to next run list entry Context switch is coarse grained; graphics pipeline drains OS VidMm issues paging requests

27 WDDM v2.0 Page Fault And Stall Even if surface probe succeeds, entire surface may not be resident GPU must still support page faulting On access to a non-resident page GPU faults and stalls Driver informs OS of missing pages OS VidMm issues paging requests Driver restarts GPU once pages are resident Entire working set doesn’t have to be resident simultaneously

28 WDDM v2.1 Page Faulting Finally, full fledged page faulting with context switching! GPUs support general page faulting and virtual memory per process On a page fault, GPU context switches to next run list entry Context switch is “immediate” OS can partially populate allocations to reduce an app’s working set GPU faults on non-resident page access GPU context switches to next run list entry

29 Dedicated Paging Engine Addition of high bandwidth copy engine for paging Operates in parallel to 3D engine GPU can perform paging operations for one context in parallel with 3D rendering for another context

30 Paging Determination GPU reports faulting address GPU/Driver determine set of pages needed to make further progress GPU maintains a set of page access bits OS VidMm uses the above to determine appropriate paging operations (including evictions) Additionally, OS uses heuristics to preload pages

31 Efficient Memory Management Steady state residency of surface data for applications No texture thrashing for apps whose working set fits into graphics memory No need for entire surface to be resident Apps with large surfaces run fast in smaller local memory if working set fits Page access info guides VidMm eviction and promotion Reduced minimum physical memory requirements

32 WDDM v2.x Robustness WDDM V2.x increases OS robustness GPU uses virtual addressing instead of physical Kernel mode driver (KMD) no longer patches DMA buffers with physical addresses User Mode Driver (UMD) builds DMA buffer KMD no longer validates command buffer KMD no longer copies cmd buffer to DMA buffer No DMA buffer splitting UMD no longer identifies split points OS no longer splits DMA buffers to fit resources

33 WDDM v2.1 Robustness Guaranteed sub-triangle context switching Driver processing on fault essentially eliminated No application can hog GPU Better application responsiveness Applications with arbitrarily complex GPU processing do not hinder other applications E.g., Complex GPGPU number crunching alongside glitch free video

34 Security Per-process virtual memory Protection moved to GPU Patching eliminated from driver Privileged Operations Privileged memory More secure platform for future premium content protection

35 Privileged Operations DMA buffers created in user mode cannot compromise the system Can’t access memory belonging to other processes Can’t interfere with correct and robust operation Certain GPU operations are privileged and only available to KMD-built DMA buffers; Examples include Display settings GPU configuration Context switching controls UMD-created DMA buffers cannot perform privileged operations

36 Privileged Memory Provides secure location for page tables, ring buffers, and other allocations that should be protected Malicious apps cannot compromise system security GPU maintains per-page privilege setting (in page table) Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMD GPU access allowed for privileged DMA buffers constructed by KMD Page Table Bad DMA Buffer V2.1 GPU Process Ring Buffer

37 WDDM Future And Conclusion Steve Pronovost Microsoft

38 Future: WDDM 3.x All the features of WDDM v2.1 Better support for content streaming Virtual machine support

39 Call To Action Invest in WDDM v2.x GPU Find new interesting ways to use the GPU

40 Questions Or Feedback? Send e-mail to DirectX @

41 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Windows Display Driver Model (WDDM) v2 And Beyond Steve Pronovost, Microsoft Henry Moreton, NVIDIA Tim Kelley, ATI."

Similar presentations

Ads by Google