Presentation is loading. Please wait.

Presentation is loading. Please wait.

DirectX® And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric.

Similar presentations


Presentation on theme: "DirectX® And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric."— Presentation transcript:

1 DirectX® And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric Rudolph, Software Design Engineer Microsoft Corporation

2

3 Speakers “DirectX Graphics Drivers,” Jeff Noyle, Lead Developer, DirectDraw®/Direct3D®, Microsoft Corporation “DirectX VA Video Acceleration Drivers,” Gary Sullivan, Software Design Engineer, DMD Video Services Group, Microsoft Corporation “Writing AVStream Minidrivers for Windows® XP,” William Messmer, Software Design Engineer, Digital Audio-Video, Microsoft Corporation “Testing Your WDM Driver with DirectShow®,” Eric Rudolph, SDE, DirectShow Editing Services, Microsoft

4 DirectX Graphics Drivers Jeff Noyle Development Lead DirectDraw/Direct3D Microsoft Corporation

5 Prerequisites I’m assuming
Basic familiarity with DirectDraw and Direct3D concepts: System Architecture Surfaces Page flipping The DDK can be hard to read

6 Agenda Single-source issues Windows 9x issues OS-independent issues
DirectX 7.0 implementation details Changes in DirectX 8.0 What can you do next?

7 Single-Source Issues Stuff you should know if you want one code-base to support Windows 9x OS versions and Windows NT® OS versions

8 Allocating System Memory Per-Surface
(Do NOT use this process to allocate surface memory itself...See later) Normally system memory is charged against a particular process Can’t free it in some other process (as in ctrl-alt-del mechanism) Use EngAllocPrivateUserMem and EngFreePrivateUserMem Uses DirectDraw object to locate proper process context

9 YUV/FOURCC Surfaces System memory YUV/FOURCC surfaces on NT systems
DirectDraw Kernel-mode “pretends” that these surfaces are 8bpp RGB for the purposes of allocating memory DXTn: Height: height in 4x4 blocks Width: width in blocks * sizeof(block) You must undo these transformations at CreateSurface time

10 YUV/FOURCC Surfaces NT kernel mode doesn’t understand any FOURCC formats, so: The driver must handle video memory allocation for these types The driver must handle Lock for these types

11 Windows 2000 Issue (Fixed In Windows XP)
During allocation of an AGP surface... If the driver fails to allocate and: returns DDHAL_DRIVER_HANDLED AND sets an error code in ddRVal AND sets the surface’s lpVidMemHeap to non-zero Then the system will ignore the error So NULL the lpVidMemHeap on error!

12 Atomic Surface Creation
On Windows 9x, drivers are given a list of surfaces On Windows NT, drivers are given surfaces one-at-a-time, unless: Driver reports GUID_NTPrivateDriverCaps and sets DDHAL_PRIVATECAP_ ATOMICSURFACECREATION

13 Windows NT Extra You can use the GUID_NTPrivateDriverCaps to request notification of primary surface: Set DDHAL_PRIVATECAP_ NOTIFYPRIMARYCREATION

14 Windows 9x Issues

15 System-To-Video Blts To speed up some titles, implement system-to-video blts All you need to implement is SRCCOPY, no stretch But you should implement sub-rects DirectDraw assumes your driver requires system memory to be pagelocked during Blt If this is not true, set DDCAPS2_NOPAGELOCKREQUIRED

16 OS-Independent Issues

17 HeapVidmemAllocAligned
It’s an “Eng” function in Windows NT versions It’s a ddraw.dll export in Windows 9x You can use this to allocate surface memory You must have passed the heap to DirectDraw previously You must fill in the fpHeapOffset, fpVidmem and lpVidmemHeap of the surface

18 Heap Offsets Explained
Return values from HeapVidmemAllocAligned are these offsets: fpEnd (points TO last byte) Heap (Note fpStart is set to 0x1000 by DirectDraw for AGP heaps) Surface Return value from HVMAA and fpHeapOffset fpStart “0”

19 DDSCAPS_VIDEOMEMORY Remember that this includes AGP unless combined with DDSCAPS_LOCALVIDMEM At GetAvailDriverMem time, a request that specifies DDSCAPS_VIDEOMEMORY (and not any explicit type: local or non-local) should include both types in the total

20 GetScanLine Implement this, if you can!
DirectX 8.0 uses it a lot for presentation-Blt timing Set DDCAPS_READSCANLINE, so DirectX 8.0 knows

21 CreateSurfaceEx More on this later
NEVER fail CreateSurfaceEx for system memory surfaces, even if you don’t understand the pixel format Just return DDHAL_DRIVER_HANDLED and DD_OK (Otherwise new system-memory formats used by the reference rasterizer can’t be created)

22 Alpha-In-The-Primary
If your driver can do this in 32bpp: Create an A8R8G8B8 render target Blt that to the primary surface IGNORING the alpha channel (And stretch/shrink (please)) Then you should set: DDHALINFO.vmiData.ddpfDisplay. dwFlags |= DDPF_ALPHAPIXELS DDHALINFO.vmiData.ddpfDisplay. dwRGBAlphaBitMask = 0xFF000000

23 Windowed Applications And Blt Queuing
Don’t allow “many” presentation-blts in your queue That is, don’t allow a large latency between scheduling and retiring a presentation-blt WHQL enforces low latency for DirectX 8.0 drivers Check DDBLT_PRESENTATION, and don’t allow more than three More info in ddraw.h

24 DDBLT_WAIT And DDBLT_DONOTWAIT
Drivers should never look at these They are set by the application/ DirectDraw runtime They are handled by the DirectDraw runtime Sometimes DirectDraw spins, and wants to do that in user-mode Applies to DDFLIP_WAIT as well

25 DDBLT_ASYNC Ignore this flag
Always perform your blts asynchronously, if possible

26 What Are DDROPS? We don’t know either
An idea of the original designer of DirectDraw, but never implemented or specified In short: ignore!

27 Blt And YUV Surfaces DirectShow can gain performance benefits if it knows it can use Blt to copy Overlay surfaces Check to see if you can support DDCAPS2_COPYFOURCC This means you can SRCCOPY, no sub-rects, no stretch, no overlap between two FOURCC surfaces of the same type

28 Update Overlay, Etc. If multiple overlays are created, but you have hardware for only one: Succeed all CreateSurface calls Fail the UpdateOverlay call

29 Flip Flags DDFLIP_NOVSYNC
This means: flip immediately; do not wait for vertical blank The hardware must be capable of re-latching the new primary surface address immediately, or at least on the next scanline In other words, don’t allow the remaining raster scans to read from the old back buffer

30 Flip Flags DDFLIP_INTERVALn
Please don’t implement by busy-waiting in the driver But please do implement if your hardware can defer flips for n frames

31 Gamma Ramps DirectDraw and Direct3D’s gamma ramps are passed through the GDI DDI call SetDeviceGammaRamp This call is poorly prototyped This is the struct you will be passed: struct { WORD red[256]; //WORDs not BYTEs WORD green[256]; WORD blue[256]; };

32 DirectX 7.0 Implementation Details

33 Overview Of DirectX 7.0 Model
Direct3D refers to surfaces via “handles” Driver keeps a look-up table indexed by handle Driver keeps everything it needs to know about a surface in this table

34 CreateSurfaceEx Called after CreateSurface
Assigns a Direct3D-allocated handle to the surface(s) Driver runs attachment lists, creates internal structures for each surface in list

35 CreateSurfaceEx Is Hard
Driver has to run surface attachment list Z buffer might be attached, or separate surface Cubic Environment Maps are the hardest...

36 Cubemap Attachments (Abstract View)
Positive X Mip Sub- Level Negative X Mip Sub- Level Positive Y Mip Sub- Level ... ... ... ...

37 Cubemaps (Struct View)
Positive X lpAttachList lpLink lpAtt.. lpLink lpAtt.. lpLink lpAtt.. lpLink lpAtt.. Positive Y lpAttachList + X Mip Negative X lpAttachList lpLink lpAtt.. lpAttachList lpLink lpAtt.. + X Mip - X Mip lpAttachList

38 Drivers Cannot Keep pointers to DirectDraw’s surface structures in their own structures Flip confusion (explained later) Overhead Under DirectX 8.0, we don’t keep the DirectDraw structure ...So DirectX 8.0 drivers CAN’T store pointers – they will crash

39 Flip Confusion Explained
User Mode Front Buffer Handle A User Mode Back Buffer Handle A Before Flip: Driver Surface A Driver Surface B

40 After Flip User Mode Front Buffer Handle B User Mode Back Buffer
Handle A The user-mode structures now refer to different pieces of memory. => You cannot store pointers to the user-mode structs in the driver structs. Driver Surface A Driver Surface B

41 Aliasing: What It Is Video memory is a shared resource
On mode switch, all must be given up But the application may be writing directly to video memory We re-map the application’s view of video memory to a dummy page, then allow the mode switch to proceed Only done at app’s request: DDLOCK_NOSYSLOCK

42 Aliasing: How It’s Done
When the driver returns a pointer to video memory at CreateSurface time: The offset into the frame buffer is calculated, and then an equivalent aliased pointer is returned to the application If the pointer lies outside of video memory, no aliasing is done (we don’t know enough to do so)

43 Aliasing: How To Break It
On Windows NT systems, the driver must NOT return a pointer outside of video memory at Lock time This pointer will not be aliased The application will crash if a mode switch happens Drivers should allocate system memory at CreateSurface time (PLEASE_ALLOC_USERMEM)

44 Changes For DirectX 8.0

45 Driver Capabilities Are Constant Across Modes
This means everything in D3DCAPS8 The caps are allowed to be “nothing” in some modes, e.g., 24bpp You are allowed to support different back buffer formats That is, the one that matches the front buffer

46 Pixel Formats In DirectX 8.0
Goodbye DDPIXELFORMAT Hello D3DFORMAT All FOURCCs are D3DFORMATs D3DFMT has this form Byte 3 Byte 2 Byte 1 Byte 0 Vendor ID (0=Microsoft) Nonzero Format (Use your PCI Vendor ID) => FOURCC Number

47 D3DFORMAT Examples D3DFMT_A1R5G5B5 IHV-defined Format FOURCC “UYVY”
0xACAT0001 (PCI ID 0xACAT, not FOURCC, format 1) FOURCC “UYVY” 0x (Byte 2 is non-zero)

48 IHV-Def’d Texture Formats
Since Direct3D doesn’t understand These formats cannot be “managed” Applications can lock these surfaces directly (In fact this is the only way to fill such surfaces with data)

49 DirectX 8.0 Format Op-list
The format op-list tells DirectX 8.0 everything about capabilities that vary with surface format For each format, the driver sets bits that indicate: Can Texture from this format Render to this format Switch display mode to this format Has caps in modes of this format

50 Format Op-List Tricks The runtime searches for the first entry that has all required capabilities Example: Application wishes to render to 565 texture Runtime will search for an Op-List entry with: D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET

51 Format Op-List Tricks Driver A can render to 565 texture
Sets this entry: Format = D3DFMT_R5G6B5 Ops = D3DFORMAT_OP_TEXTURE | D3DFORMAT_OP_OFFSCREEN _RENDERTARGET

52 Format Op-List Tricks Driver B can NOT render and texture from the same surface, but can do both operations individually Sets TWO entries Format1 = D3DFMT_R5G6B5 Ops1 = D3DFORMAT_OP_TEXTURE Format2 = D3DFMT_R5G6B5 Ops2 = D3DFORMAT_OP_OFFSCREEN _RENDERTARGET

53 What Can You Do Next? If you develop DX Graphics Drivers:
You need a relationship with Microsoft’s DirectX team, and should contact IHV Program Manager: Michele Boland Install and run against DEBUG runtimes Available in the DirectX SDK Will output debug messages for common errors

54 DirectX VA Video Acceleration Drivers Gary Sullivan GarySull@microsoft
DirectX VA Video Acceleration Drivers Gary Sullivan Software Design Engineer DMD Video Services Group Microsoft Corporation

55 Agenda DirectX VA design and status
Current and future requirements and tests Future plans and potential extensions What can you do next?

56 DirectX VA Prime Directive
Decouple software decoder operation from hardware accelerator design to achieve full interoperability Any other MPEG-2 MPEG-4 DirectX VA H.263++ MPEG-1 H.261 Motion Comp Inverse DCT VLD

57 What Is DXVA? What Can It Achieve?
Interoperable interface between video decoding software and advanced- capability graphics accelerators Increases video capability for the consumer’s PC Increases the demand for advanced graphics accelerators and video applications Decreases implementation effort for software decoder writers Decreases support burden for graphics accelerator companies Decreases testing burden for OEMs

58 DirectX VA General Status
Spec went 1.0 with DirectX 8.0 Beta 2 (October ’00) See OEMs love it – it enables separate WHQL qualification of decoders and drivers Software decoder companies are developing with it (Mediamatics, Intervideo, Ravisent, Cyberlink, MGI/Zoran, MbyN, …) Hardware accelerator companies are supporting it in drivers (ATI, Nvidia, Intel, SiS, S3, SiliconMotion, …)

59 DirectX VA Capabilities
Emphasis on MPEG-2 and DVD “sub-picture” Support of all important video coding standards (H.261, H.263, MPEG-1, MPEG-2, MPEG-4) And some non-standard variations on the standards Alpha graphic blending (e.g., DVD subpicture) Three basic degrees of decoding configuration capability: Motion compensation on accelerator with host residual difference decoding Motion compensation and IDCT on accelerator Full raw bitstream decoding Externally-defined encryption support

60 How Does DXVA Operate? Operation with Windows 2000 Overlay Mixer (OVM) or new Windows XP Video Mixing Renderer (VMR) Requires DirectX 8.0 or Windows XP Decoders use it through existing Windows 2000 “IAMVideoAccelerator” API Drivers use it through corresponding Windows 2000 “MoComp” DDI DirectVA specifies payload content of data buffers that previously had accelerator-specific formats

61 Host Versus Accelerator Functional Split
Bitstream processing either on host or accelerator Accelerator handles the primary data flow and performs the intensive signal processing PCI/AGP is the bridge between the two Reconstruction loop maintained in graphics Accelerator memory Host processing converts standard-specific streams into generic Accelerator work units

62 Today’s DirectX VA (Content Protection Supported Outside of Scope)
Compressed Video Source Variable-Length Decoding Residual Difference Decoding (IDCT) Motion Compensation Sum & Clip Frame Storage OVM/VMR/3D Graphic Source Graphic Decoder Graphic Blending (Content Protection Supported Outside of Scope)

63 Constrained Parameter Profiles
Strategy is to define a general interface and a number of constrained-parameter profiles, with decoder data structure configuration settings Profiles defined: MPEG-2 Main Profile with and without DVD Subpicture Several H.263/MPEG-4 profiles MPEG-1 H.261 with and without deblocking post-processing

64 Defined Buffer Types Picture-level decoding parameter buffers
Buffers for bitstream decoding: Bitstream data buffers Bitstream slice control buffers Inverse quantization matrix buffers Buffers for macroblock-level decoding: Macroblock control buffers Residual difference data buffers Buffers for graphic blending: Alpha+YUV graphic buffers AI44 graphic buffers DVD DPXD graphic buffers DVD highlight definition buffers DVD display control command buffers Alpha blend combination buffers Deblocking filter control buffers Picture resampling buffers Read-back data buffers

65 DXVA Requirement Plans Primary Goals
Clear specification for MPEG-2 interoperability (and front-end DVD subpicture) is the primary goal Driver and decoder that claim video acceleration must support DXVA Specific “minimal interoperability set” for each defined profile

66 July ’01 Stated Requirements
MPEG2_A and MPEG2_C required MPEG1_A required H263_A required (?!) Arithmetic accuracy required IDCT accuracy required Picture resolutions up to 720x576 Uncompressed surface types must include NV12 in supported list Must have “front end” capability to convert to YUY2 from format in use

67 July ’01 Actual Tests StRowe test decoder developed
Test driver also developed Released DCT400 driver tests cover MPEG2_A, _B, _C, _D profiles Pass/Fail based on MPEG2_A and _B Tests are currently of functional operation and visual performance Contact us (?!) if any test problems Don’t ship untested features (?!)

68 Structure Of Motion Comp Data
All standards send only luma motion vectors, deriving chroma vectors from luma vectors Each standard derives chroma vectors in its own way Switches for configuring the motion comp are provided to minimize host “translation” requirements MPEG-2 Dual-Prime motion vectors derived on host

69 DXVA Macroblock Control Example
/* Basic form for P and B pictures */ typedef struct _DXVA_MBctrl_P_OffHostIDCT_1 { WORD wMBaddress; WORD wMBtype; DWORD dwMB_SNL; WORD wPatternCode; UINT8 NumCoef[6]; DXVA_MVvalue MVector[4]; } DXVA_MBctrl_P_OffHostIDCT_1;

70 Structure Of Residual Data Background (1 of 2)
Things that vary within and across standards: Coefficient scan schemes Intra Coefficient prediction schemes VLC schemes Inverse quantization schemes Mismatch-control schemes These things need lots of logic – not always justified for accelerator implementation

71 Structure Of Residual Data Background (2 of 2)
Things that do not vary within and across standards IDCT definition Conformance rules may slightly differ – but multi-standard conformance not a big problem Many zero-valued coefficients Predicted-versus-Intra operation Only a few currently-specified inverse scans

72 Structure Of Residual Data The Chosen Method
Keep standard-specific issues on the host to the extent possible Support host-based or accelerator-based IDCT Send only non-zero coefficients Send index or run-length for coefficients

73 Residual Difference Example (Off-Host IDCT 16b TCOEFF)
typedef struct _DXVA_TCoefSingle { WORD wIndexWithEOB; SHORT TCoefValue; } DXVA_TCoefSingle, *LPDXVA_TCoefSingle; /* Macros for Reading EOB and Index Values */ #define readDXVA_TCoefSingleIDX(ptr) ((ptr)->wIndexWithEOB >> 1) #define readDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB & 1) /* Macros for Writing EOB and Index Values */ #define writeDXVA_TCoefSingleIndexWithEOB(ptr, idx, eob) ((ptr)->wIndexWithEOB = ((idx) << 1) | (eob)) #define setDXVA_TCoefSingleIDX(ptr, idx) ((ptr)->wIndexWithEOB |= ((idx) << 1)) #define setDXVA_TCoefSingleEOB(ptr) ((ptr)->wIndexWithEOB |= 1)

74 Decoding Configurations (Part 1 of 2)
Bitstream decoding vs. Host VLD Encryption: Bitstream data if bitstream decoding Macroblock control commands and/or residual difference data if Host VLD Type of encryption protocol supported For Host VLD: Host-based residual difference decoding versus Accelerator-based residual difference decoding versus both Macroblock control commands in raster-scan order versus arbitrary order

75 Decoding Configurations (Part 2 of 2)
For host-based residual difference decoding 8b vs. 16b differences If 8b differences, overflow supported, or not If 8b differences, subtract second pass, or not Interleaved chroma or not Host clips range of data, or not Intra residuals unsigned, or not For accelerator-based difference decoding Specific IDCT support Inverse scan on host or accelerator Coefficients sent in groups of four, or singly

76 Alpha Blending Configurations
AYUV alpha blend graphic loading AI44 or IA44 +palette or DPXD+Highlight or AYUV Alpha blend combination operation: Front-end versus back-end Picture resizing or not Only use picture destination area or not Graphic resizing or not Whole plane alpha or not

77 Longer Term Requirements
Include H263_A, _B, _C in tested requirements Include mathematical motion comp and IDCT accuracy in tests Add speed performance testing Picture resolutions up to 1920x1088 Six or more uncompressed surfaces Specific FOURCC surface types for uncompressed surfaces

78 Kill Superfluous Configs
bConfigRasterOrder = 0 bConfigResidDiffHost = 1 (bConfigResid8Subtraction = 1 with bConfigSpatialResid8 = 1) or (bConfigResidDiffHost = 1 with (bConfigSpatialResid8 = 0 and bConfigSpatialHost8or9Clipping = 0)) bConfigIntraResidUnsigned = 0 bConfigSpatialResidInterleaved = 0 bConfigHostInverseScan = 0 bConfig4GroupedCoefs = 0

79 Enhance Blending Configs
Eliminate duplication of AI44 & IA44 (bConfigDataType = 0 & 1) Require both AYUV and AI44/IA44 (bConfigDataType = 3 and 0/1) Require front-end blend (bConfigBlendType = 0) bConfigPictureResizing = 1 bConfigOnlyUsePicDestRectArea = 0 bConfigGraphicResizing = 1 bConfigWholePlaneAlpha = 1

80 Hot Issue: WMV/H.263/MPEG-4
Codecs beyond MPEG-2 need support H263_A profile needs: Different derivation of chroma motion H263_B profile needs: Rounding control Motion vectors over picture boundaries 8x8 motion vectors Alternative inverse scan (or host inverse scan) H263_C profile needs: Deblocking filter support (also in H263_B?!)

81 Desirable Future Extensions
De-interlacing Interoperable encryption / DRM Compressed-video encoding (including ME, DCT, and so on) Inverse-telecine Hue/contrast/brightness/gamma/color corrections Future decoding methods (MPEG-4v2, WMV, H.26L) Frame rate conversion Precise separable re-sampling Gen lock/frame rate synchronization TV out control

82 New GUIDs Reducing Memory Use
Add three new GUIDs to parallel MPEG2_A, MPEG2_B, and MPEG2_D New GUID adds raw bitstream decoding to the “minimal interoperability set” of the corresponding existing GUID Driver with raw bitstream support then need not allocate buffers for macroblock-level processing with these GUIDs Drivers could also not expose bitstream processing with existing GUIDs to save memory

83 Interoperable Encryption
Define an interoperable encryption scheme Much like the old draft DXVA scheme Certificates for establishing trust (perhaps X.509 or something else rather than old draft scheme) RSA key exchange AES (RIJNDAEL) content encryption

84 Other In-Scope Additions
Add new features for other codecs – WMV, H.26L, MPEG-4v2, etc. 1/4-sample motion comp Added motion comp sizes and shapes New inverse transforms (e.g., 4x4) Fine granularity scalability Global motion comp Studio profile features More possible GUIDs for precise codec/configuration needs

85 New Video Building Blocks
Deinterlacing Inverse Telecine Frame rate conversion Contrast/Brightness/Gamma/Color Precisely-specified resampling Video compression encoding

86 Deinterlace/ Inverse Telecine
Deinterlace is crucial Becoming a standard feature of high-end consumer TVs 1080i in weave can look awful 1080i in bob can look wrong too Deinterlace can be useful for either decoding or encoding

87 Hypothetical DXVA Structure
Interoperable DRM/Conditional Access/Content Protection/Encryption Today’s Scope of DirectX VA De-interlace / Inverse Telecine Frame Rate Conversion ? OVM/VMR/3D Color Conversions & Adjustments ?? Scaling ???

88 Video Encoding ? Uncompressed Video Source Motion Estimation
Frame Storage Inverse Telecine / De-interlace Motion Compensation Sum and Clip Color Conversions And Adjustments Mode & Motion Vector Decision Residual Difference Transform (DCT) Quantization Residual Difference Decoding (IDCT) Variable Length Encoding ?

89 What Can You Do Next? (To All) Give Us Your Proposals
About any difficulties/problems in design About encryption design About new in-scope feature needs About how to support new features Deinterlace/inverse telecine Encoding Frame rate conversion Contrast/Brightness/Gamma/Color Resampling

90 What Can You Do Next? (For Graphic Accelerator Designers)
Make your MPEG-2 and DVD subpicture DXVA solution rock-solid, fully-tested with every available decoder, and frighteningly fast Fully support YUV surfaces as textures for input to 3-D Conversion to RGB, and so on Design maximal WMV/H.263/MPEG-4 feature support into your next generation But don’t expose them unless fully tested Move to the preferred configurations and uncompressed surface types Support new memory-conserving GUIDs

91 Writing AVStream Minidrivers For Windows XP William Messmer, SDE Digital Audio-Video Microsoft Corporation

92 Agenda AVStream minidriver architecture Data processing
When and why to use AVStream Exposing minidriver functionality Data processing Writing a minidriver: key issues and pitfalls Walk through sample code Common problems and mistakes DirectX 8.0 versus Windows XP What can you do next?

93 Why AVStream THE next generation class driver
More efficient streaming Reduces the amount of minidriver code Simplifies development; faster to market One minidriver, one model – no more confusion over stream class versus port class New features, new technologies will only be supported in AVStream; stream and port class, however, are still supported!

94 When To Use AVStream BDA Drivers New Device Types Combined A/V devices
Which are not already written to stream class or port class Combined A/V devices Kernel Software Transforms Audio Global Effects (GFX) Filters No necessity to port existing stream or port class drivers

95 Minidriver Architecture
Functionality is exposed as a tree hierarchy described through static descriptors Device – described by Device Descriptor Filter Factory – creates a type of Filter Filter – described by Filter Descriptor Pin Factory – creates a type of Pin Pin – described by Pin Descriptor Functionality provided through static dispatch and automation tables

96 Minidriver Architecture
Device Filter Factory >= 1 Device Descriptor Device Dispatch Add Device Filter Create Device Dispatch Filter Descriptor Filter Dispatch Filter Automation Pin Create Filter Dispatch Pin Factory Filter >= 1 Pin Descriptor Pin Dispatch Pin Automation Pin Dispatch Pin >= 1 Minidriver Provided Table Public AVStream Construct Private AVStream Construct Key: Minidriver Dispatch Routine

97 Exposing Minidrivers Expose your driver to AVStream
Call KsInitializeDriver in DriverEntry passing your Device Descriptor Return the status from KsInitializeDriver AVStream handles PnP to get your driver set up; minidriver gets calls through device dispatch Filter Factories set up by AVStream during Add Device and Start Device

98 Exposing Minidrivers AVStream creates filters/pins based on descriptors Minidriver receives creation dispatch Creation dispatch associates minidriver specific context with object Object bags available as containers for dynamic memory like contexts AVStream handles cleanup of objects based on bags No forgetting to free dynamic memory

99 Minidriver Architecture
Sample Code (Exposing Functionality)

100 Data Processing AVStream queues data/buffers
Minidriver queues not necessary Cancellation handled in the queue Data exposed through two abstractions: stream pointers and process pins Stream pointers are robust and allow versatile queue management; typically used in hardware drivers Process pins work purely at a single buffer level making for very simple software transforms

101 Design Issues Two distinct ways to handle data processing
Filter-Centric processing Specify filter process dispatch Pin-Centric processing Specify pin process dispatches The choice of which to use will influence design greatly

102 Filter-Centric Processing
Filter is called to process data in a context where data is available on all required pins Typically used for software transforms Stream pointer use not required Processing based on an index of process pins Index/pins stable during processing Minidriver does transform, specifies how many bytes of each buffer used

103 Process Pins One per pin – points back to the pin
Contains a stream pointer if needed Contains a buffer virtual address and size for data manipulation Informs the process routine of the pin’s relationships with other pins InPlaceCounterpart – other pin in an in-place transform pair CopySource – pin data is copied from DelegateBranch – pin that delegates frames (in the same pipe)

104 Transform Example INPUT OUTPUT IN OUT 1920 Bytes Data
1. Frame(s) arrive Frame Gone Frame (1920) Frame (960) Frame Gone Frame (2880) 2. Filter is called to process. Filter sees two process pins: 3. Process Pins Point to Buffers Frame (1100) Frame (140) 4. Filter performs transform; Sets 1920 bytes used on input and output 5. Filter is called back; more data to transform IN OUT 1920 Bytes Data 2880 Bytes Buffer 6. Process Repeats Similarly 1100 Bytes Data 960 Bytes Buffer

105 Pin-Centric Processing
Each pin called to process data in a context independent of other pins Typically used for hardware drivers Data accessed through stream pointer abstraction

106 Stream Pointers Reference a single frame in a queue
Hold that frame in the queue Can be in multiple states Locked – referenced data is safe to access; Irp cannot be cancelled Unlocked – not guaranteed to even reference data; Irp can be cancelled Can be cloned to create new pointers into the data stream Can schedule time-outs

107 Stream Pointers Contain two offsets into the data stream for ease of in-place use Address data at one of two granularities: Byte – access via virtual address Mapping – access via logical DMA address KSPIN_FLAG_GENERATE_MAPPINGS Minidriver usable context available per stream pointer

108 Stream Pointers And Queues
Oldest Frames Frame (1) (0) Leading Edge Frame (1) (0) Leading Edge Trailing Frame (1) (2) (3) Leading Edge Clone Clones Newest Frames

109 Direct DMA Example QUEUE
1. Frame(s) arrive; minidriver called to process 2. Processing routine acquires leading edge KsPinGetLeadingEdgeStreamPointer Frame (2) Frame (1) Frame Gone Frame (1) Frame 3. Leading edge is cloned KsStreamPointerClone Frame Frame (2) Frame Gone Frame (1) Frame (1) 4. DMA Hardware is programmed 5. Leading edge is advanced Frame (1) Frame (2) Frame Frame Gone Frame (1) 6. Process may repeat for more frames 7. Hardware interrupts for DMA completion 8. ISR Schedules a DPC 9. DPC releases the associated frames KsStreamPointerDelete 10. May need to continue processing KsPinAttemptProcessing

110 Data Frame Control Held non-cancelable for a period
Use locked stream pointers Consider stream pointer timeouts Can relinquish claim with callback Use unlocked stream pointers with a cancel callback Periodic access where frame can disappear between accesses Use unlocked stream pointers and lock periodically

111 Processing Decisions Filter-Centric Pin-Centric
All pins are involved in the decision Each pin type can have separate requirements One pin not fulfilling requirements will veto processing for the entire filter Pin-Centric Only one pin is involved in the decision Each pin type can have separate requirements which do not influence other pins

112 When Processing Happens
Default case (no pin flags) Attempt made when frame arrives and leading edge points to no frame Attempt will succeed if Involved pin(s) are >= KSSTATE_PAUSE Involved pin(s) all have data Continuing processing STATUS_SUCCESS returned from dispatch and conditions still met

113 Adjusting Processing KSPIN_FLAG_ _INITIATE_PROCESSING_ON_EVERY…
Every frame arrival initiates _DO_NOT_INITIATE_PROCESSING No frame arrival initiates PROCESS_IN_RUN_STATE_ONLY Pin must be in KSSTATE_RUN FRAMES_NOT_REQUIRED… Data is not required on this pin

114 Adjusting Processing Some mentioned flags useful for pin-centric
Most flags useful for filter-centric where all pins are involved in the decision as to when to process data See the DDK for a complete description of flags Understand when processing happens based on your flags!

115 Adjusting Processing Processing can happen in a DPC!
KSFILTER_FLAG_DISPATCH_LEVEL_PROCESSING KSPIN_FLAG_DISPATCH_LEVEL_ PROCESSING Dispatch level processing still synchronized Processing mutex still held during dispatch level processing Can still be used to synchronize with processing Data manipulation (stream pointer) API fully dispatch level ready!

116 Walkthrough Sample Code
Pin-centric sample code

117 Common Problems Internal mutexes are exposed
Three mutex types in a hierarchy Device Mutex Filter Control Mutex Processing Mutex Some calls require mutexes held Sometimes AVStream holds the mutex for you; sometimes you must hold the mutex! See the DDK for this!

118 Common Problems Mutex Rules
Do NOT take mutexes out of order: device then control then processing Do NOT take a mutex and call out – not for properties, not for anything! Walking the object hierarchy requires mutexes held: Device Mutex – device down to filter Filter Control Mutex – filter down to pins

119 Common Problems Do not traverse the object tree (filters and pins) during processing! KsFilterGetFirstChildPin KsPinGetNextSiblingPin Pin-centric filters should not need to do this; filter-centric filters have the process pins index

120 DirectX 8.0 Versus Windows XP
Mutexes in DirectX 8.0 are fast mutexes Certain APIs require mutexes held Client must be careful of when to acquire mutexes! Mutexes in Windows XP are full mutexes Completely backwards compatible with DirectX 8.0 drivers Less APIs require mutex acquisition Mutex acquisition more lenient

121 DirectX 8.0 Versus Windows XP
New flags in Windows XP _SOME_FRAMES_REQUIRED… One or more pin instances of this type requires frames Can be programmatically done in DirectX 8.0 _PROCESS_IF_ANY_IN_RUN_STATE One or more pin instances of this type must be >= KSSTATE_RUN; others must be >= KSSTATE_PAUSE Processing routine must check in DirectX 8.0

122 What Can You Do Next? Install the DirectX 8.0 or Windows XP DDK
Try out the samples in the DDK Write AVStream minidrivers for new hardware!

123 Testing your WDM Driver with DirectShow
Eric Rudolph System Design Engineer DirectShow Editing Services Microsoft Corporation

124 Agenda DirectShow supports capture from 1394, USB, analog video/audio, TV tuner, and custom devices Demonstrate the use of the DirectShow-based generic graph editor, GraphEdt, as a WDM driver test tool Walk through sample code that uses the GraphBuilder COM object

125 What tools exist to test your driver?
Included in DX8: GraphEdt.exe, a generic graph editor Also in DX8: AmCap.exe, a simple capture application New for Windows XP: Still Image devices show up in the shell (Explorer) New for Windows XP: Movie Maker (on Start Menu)

126 GraphEdt Overview Ships with DX8
Provides UI to build dataflow graphs and then uses DirectShow to run, pause, and stop the data Views different filter categories Capture, compressor, crossbar, DMO, and so on Connects different filters together Accesses property pages Writes out files Controls 1394 devices

127 GraphEdt Filter Categories
Categories enable you to easily find a particular type of DirectShow filter Many categories predefined in ksuuids.h & uuids.h WDM drivers have many of their own categories Capture devices can show up in both non-WDM and WDM categories As you add/remove WDM devices, if they send device notifications, they will auto show/hide from category lists

128 GraphEdt Property Pages
The filter itself can expose multiple property pages Each pin can expose 1 or more property pages When you query an output pin’s property pages, you will see 1 extra page per pin which lists available output media connection types Capture property pages are often exposed by capture applications (using standard DirectShow methods), so make them look nice! Example property page

129 GraphEdt Property Pages And Media Types
Output pins provide one or more media types Input pins normally do not provide a list of types, but instead accept types When you render a pin, DirectShow will try to find appropriate filters to render When you try to connect two pins, DirectShow will find try and find intermediate filters The media types must agree between any output pin and its connected input pin Buffers are also negotiated The different media types Indeo 5.11 decompressor provides

130 Common Problems Hot unplug while streaming
Device add/remove while streaming Enter hibernation while streaming Multiple camera enumeration Multiple camera streaming (one driver, multiple devices) Video shows up black or wrong Changing display props while streaming Overlay and DDraw issues

131 GraphEdt Demos Part 1 Capture from USB, both with 1 pin and with 2 pins (capture & preview) DV capture and device control Device Insertion / Removal and how the Graph refreshes

132 GraphEdt Demos Part 2 How to write AVI, WAV, and WM files
New Video Mixing Renderer has slightly different connection model than old Video Renderer How to force a filter to produce a media type with a Type Enforcer Timestamps are important! Using .GRF files

133 Sample Code Using the GraphBuilder COM Object
CaptureGraphBuilder makes connecting capture devices easy See the AmCap sample code in the DX8/DirectShow SDK directory Sample code walkthrough

134 What Can You Do Next? Test your WDM drivers! Under many different conditions! Read up on the DX8 docs, they’re great! DirectShow contact: Get on the DirectX A/V list

135


Download ppt "DirectX® And Streaming Video Drivers Jeff Noyle, Development Lead Gary Sullivan, Software Design Engineer William Messmer, Software Design Engineer Eric."

Similar presentations


Ads by Google