Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winsock Kernel Best Practices Osman N. Ertugay Software Design Engineer Windows Network Developer Platform Microsoft Corporation.

Similar presentations

Presentation on theme: "Winsock Kernel Best Practices Osman N. Ertugay Software Design Engineer Windows Network Developer Platform Microsoft Corporation."— Presentation transcript:

1 Winsock Kernel Best Practices Osman N. Ertugay Software Design Engineer Windows Network Developer Platform Microsoft Corporation

2 Session Outline Brief Winsock Kernel (WSK) refresher Familiarity with WSK documentation and WSK sample in WDK ensures the most benefit from this session WSK programming guidelines and best practices WSK registration and deregistration I/O Request Packet (IRP) handling Buffer ownership and manipulation Using socket callbacks versus socket functions Memory/throughput tradeoff in stream data transfer Transport Address security Dual-family sockets

3 WSK Refresher Kernel-mode Network Programming Interface WSK replaces the Transport Driver Interface (TDI) for “consumers” of TDI (i.e., TDI clients) WSK is not a “provider” interface for new transport development WSK goals/benefits Easier to use, consistent API Higher performance, better scalability Better fit for the Next Generation TCP/IP Stack Similar to Winsock2, but not the same Easy to port to for existing TDI clients

4 WSK Refresher WSK Client Driver User-mode Kernel-mode WSK Registration Library I/O Manager Network Module Registrar (NMR) WSK Subsystem TCP (IPv6/IPv4)... WSK_CLIENT Socket Functions Socket Callbacks Client Functions Client Callbacks UDP (IPv6/IPv4)Raw (IPv6/IPv4) WSK Registration WSK WSK_SOCKET

5 WSK Programming Guidelines And Best Practices

6 WSK Registration And Deregistration Use the new WSK registration library: WskRegister WskDeregister WskCaptureProviderNPI WskReleaseProviderNPI WskSampleClientDispatch const WSK_CLIENT_DISPATCH WskSampleClientDispatch = { MAKE_WSK_VERSION(1, 0), // WSK version 1.0 0, // Reserved NULL // No WskClientEvent callback in WSK version 1.0 }; WskSampleRegistration WSK_REGISTRATION WskSampleRegistration ; NTSTATUS DriverEntry(...) { NTSTATUS status; WSK_CLIENT_NPI wskClientNpi;... wskClientNpi.ClientContext = NULL; WskSampleClientDispatch wskClientNpi.Dispatch = & WskSampleClientDispatch ; WskRegisterWskSampleRegistration status = WskRegister (&wskClientNpi, & WskSampleRegistration );... } Network Module Registrar APIs still available

7 WSK Registration And Deregistration Capture the WSK_PROVIDER_NPI, use it, release it Do NOT use the WSK_PROVIDER_NPI after releasing it WaitTimeOut usage in WskCaptureProviderNPI WSK_NO_WAIT WSK_INFINITE_WAIT  Do NOT use if calling from DriverEntry! NTSTATUS SomeWorkerRoutine(...) { NTSTATUS status; WSK_PROVIDER_NPI wskProviderNpi;... WskCaptureProviderNPIWskSampleRegistration status = WskCaptureProviderNPI (& WskSampleRegistration, WSK_INFINITE_WAIT, &wskProviderNpi); if(NT_SUCCESS(status)) { wskProviderNpi.Dispatch->WskSocket status = wskProviderNpi.Dispatch->WskSocket ( wskProviderNpi.Client, AF_INET6,...); WskReleaseProviderNPIWskSampleRegistration WskReleaseProviderNPI (& WskSampleRegistration ); }... }

8 WSK Registration And Deregistration WskDeregister Must be called exactly once for each successful WskRegister when WSK client stops using WSK Will block until All captured provider NPI instances are returned All outstanding calls to provider NPI functions completed All sockets are closed Must close all sockets and release all captured provider NPI instances for WskDeregister to return Will cause WskCaptureProviderNPI calls waiting in other threads (with WSK_INFINITE_WAIT or some timeout) to return VOID DriverUnload(...) {... WskDeregisterWskSampleRegistration WskDeregister (& WskSampleRegistration );... }

9 IRP Handling WSK Client IO Manager WSK Subsystem IoAllocateIrp(1, …) IoSetCompletionRoutine(Irp, CompletionRoutine, Context, CompletionRoutine, Context, TRUE, TRUE, TRUE) TRUE, TRUE, TRUE) WskSend(Socket, …, Irp) IoCompleteRequest(Irp, …) CompletionRoutine(…, Irp, Context) STATUS_MORE_PROCESSING_REQUIRED IoFreeIrp(Irp) IoReuseIrp(Irp, …)

10 IRP Handling Simple example that waits for IRP completion synchronously (Also demonstrating how to distinguish and optimize for “inline” IRP completion) NTSTATUS SyncIrpCompRtn(PDEVICE_OBJECT Reserved, PIRP Irp, PVOID Context) { PKEVENT compEvent = (PKEVENT)Context; if(Irp->PendingReturned) if(Irp->PendingReturned) KeSetEvent(compEvent, 2, FALSE); return STATUS_MORE_PROCESSING_REQUIRED ; } NTSTATUS SetSocketOption(PWSK_SOCKET Socket,...) { NTSTATUS status; CONST WSK_PROVIDER_BASIC_DISPATCH *dispatch = Socket->Dispatch; KEVENT compEvent; PIRP irp; KeInitializeEvent(&compEvent, SynchronizationEvent, FALSE); irp = IoAllocateIrp(1, FALSE); if(irp == NULL) return STATUS_INSUFFICIENT_RESOURCES; IoSetCompletionRoutineTRUETRUETRUE IoSetCompletionRoutine (irp, SyncIrpCompRtn, &compEvent, TRUE, TRUE, TRUE ); status = status = dispatch->WskControlSocket(Socket,..., irp); if(status == STATUS_PENDING) KeWaitForSingleObject(&compEvent, Executive, KernelMode, FALSE, NULL); status = irp->IoStatus.Status; IoFreeIrp(irp); return status; }

11 Buffer Ownership And Manipulation Setting up a WSK_BUF WSK_BUF.Mdl IoAllocateMdl(BufferAddress, BufferLength,...) MmProbeAndLockPages vs MmBuildMdlForNonPagedPool WSK_BUF.Length Must be <= (BufferLength – WSK_BUF.Offset) WSK_BUF.Offset Must lie within the first MDL if WSK_BUF.Mdl points to a chain of MDLs BufferAddress BufferLength WSK_BUF.Mdl WSK_BUF.Offset WSK_BUF.Length Page Boundary MDL ByteOffset

12 Example: Copy data from WSK_DATA_INDICATION list to a buffer Buffer Ownership And Manipulation NTSTATUS CopyDataIndicationListToBuffer(__in PWSK_DATA_INDICATION DataIndication, __in SIZE_T BufSize, __out_bcount(BufferSize) PUCHAR Buf) { SIZE_T bytesCopied = 0; while(DataIndication != NULL) { PMDL mdl = DataIndication->Buffer.Mdl; offset ULONG offset = DataIndication->Buffer.Offset; length SIZE_T length = DataIndication->Buffer.Length; while(length > 0 && mdl != NULL) { lengthoffset SIZE_T copyLength = min(length, MmGetMdlByteCount(mdl)-offset); PUCHAR sysAddr = (PUCHAR)MmGetSystemAddressForMdlSafe(mdl, LowPagePriority); if(sysAddr == NULL) return STATUS_INSUFFICIENT_RESOURCES; else if((BufSize-bytesCopied) < copyLength) return STATUS_BUFFER_TOO_SMALL; offset RtlCopyMemory(Buf+bytesCopied, sysAddr+offset, copyLength); offset = 0; offset = 0; // WSK_BUF.Offset applies only to the first MDL bytesCopied += copyLength; length length -= copyLength; mdl = mdl->Next; } DataIndication = DataIndication->Next; } return STATUS_SUCCESS; }

13 May “retain” (take temporary ownership of) a WSK data indication by returning STATUS_PENDING from WskReceiveEvent or WskReceiveFromEvent callbacks Any status other than STATUS_PENDING means data indication was NOT retained, hence no need to call WskRelease Must release retained data indications via WskRelease Do not retain data indications with WSK_FLAG_RELEASE_ASAP flag if possible. If you do have to retain such indications, release them within a bounded short amount of time (in the order of a few seconds) Buffer Ownership And Manipulation

14 Socket Callbacks Versus Functions Accepting incoming connections WskAccept Client keeps one or more accept IRPs pended in WSK Connections rejected by WSK when no pending IRP exists WskAcceptEvent WSK hands over “sockets” to client for arriving connections Client accepts or rejects Guidance Use WskAcceptEvent to accept as many connections as the system can handle at any given time Use WskAccept to accept only a few fixed number of connections at any given time WSK does not have equivalent of listen backlog in Winsock2

15 Socket Callbacks Versus Functions Receiving datagrams WskReceiveFrom Data buffer owned by client, must allocate before data arrives Client keeps one or more receive IRPs pended in WSK Datagrams dropped by WSK when no pending IRP exists WskReceiveFromEvent Data buffer owned by WSK, allocated when data arrives Each arriving datagram handed over to client by WSK Guidance Always use WskReceiveFromEvent as long as you do not retain datagram indications too long Use WskReceiveFrom only if you must always copy datagrams into your own buffers anyway WSK does not buffer datagrams

16 Socket Callbacks Versus Functions Receiving stream data WskReceive Data buffer owned by client, must allocate before data arrives 0-copy into client buffer possible Data buffered by transport if no pending receive IRP exists WskReceiveEvent Data buffer owned by WSK 0-copy into client buffer not possible Data handed over to client until client rejects indication Client needs to use WskReceive to retrieve rejected data Guidance Use WskReceive for large block transfers Combined usage: Get initial data via WskReceiveEvent, then get rest of the data via WskReceive WskReceiveEvent  Amount of retained data and the time retained must be bounded and small

17 Socket Callbacks Versus Functions Both socket callbacks and the IRP completions for socket functions mostly occur in Deferred Procedure Call (DPC) context Must limit amount of processing in callback and IRP completion routines Consider using System worker threads for tasks that won’t last too long Dedicated system thread for long lasting tasks

18 Memory/Throughput Tradeoff Stream sockets  subject to transport flow control Send requests may remain pended until acknowledged by peer Too much pended send data  Poor memory usage Too little pended send data  Suboptimal throughput So, how much data to keep pended (Ideal send backlog: “ISB”)? As much as the network can sustain As much as the receiver can sustain Use the SIO_WSK_QUERY_IDEAL_SEND_BACKLOG IOCTL and the WskSendBacklogEvent callback Initial ISB to use  SIO_WSK_QUERY_IDEAL_SEND_BACKLOG Get ISB change notifications  WskSendBacklogEvent Always have two or more WskSend requests pended with ISB worth of data in total. Example: ISB = 64 K  2 WskSend requests, each with 32 K data

19 Transport Address Security Secure by default: Creating socket with NULL SecurityDescriptor and binding it to an address results in SO_EXCLUSIVEADDRUSE behavior Refrain from designing applications based on address sharing If you must allow address sharing May set SO_REUSEADDR to TRUE  Anybody else can reuse the address (not good from security perspective) May use a SecurityDescriptor  Sharing is allowed/denied based on an access check performed by the system WSK (transport) uses the SecurityDescriptor specified by the first socket and the SECURITY_SUBJECT_CONTEXT captured from the OwningProcess and OwningThread specified by the second socket to perform the access check

20 Dual Family Sockets Use a single IPv6 socket to handle both IPv6 and IPv4 traffic Set the IPV6_V6ONLY option to FALSE (default is TRUE) Bind to wildcard address // Example dual family listening socket optVal = 0 ; ULONG optVal = 0 ;... WskSetOption, status = dispatch->WskControlSocket(IPv6ListeningSocket, WskSetOption, IPV6_V6ONLY, IPPROTO_IPV6, optVal IPV6_V6ONLY, IPPROTO_IPV6, sizeof(optVal), & optVal, 0, NULL, NULL, irp);... status = dispatch->WskBind(IPv6ListeningSocket, Ipv6WildcardAddress (PSOCKADDR) Ipv6WildcardAddress, 0, irp);... IPv4 addresses represented in V4MAPPED IPv6 address format Can use the INETADDR_ISV4MAPPED macro from mstcpip.h to check if a given SOCKADDR represents a V4MAPPED address

21 Call To Action Port your existing kernel-mode TDI applications to WSK and use WSK for new development Move from using TDI filter drivers to WFP for network traffic interception Follow the practices outlined in this session to achieve optimal performance and stability from WSK

22 Additional Resources Web Resources Windows Network Developer Platform (WNDP) Team Blog WNDP Team Connect Site Join the ‘WNDP’ program at Related Sessions How to Use the Windows Filtering Platform to Integrate with Windows Networking Using NDIS 6.0, TCP Chimney Offload, and RSS to Achieve High Performance Networking

23 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Winsock Kernel Best Practices Osman N. Ertugay Software Design Engineer Windows Network Developer Platform Microsoft Corporation."

Similar presentations

Ads by Google