Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing TCP Chimney Drivers for Windows 7 Joe Nievelt Vivek Bhanu Software Design Engineer TCP/IP - Networking

Similar presentations


Presentation on theme: "Developing TCP Chimney Drivers for Windows 7 Joe Nievelt Vivek Bhanu Software Design Engineer TCP/IP - Networking"— Presentation transcript:

1

2 Developing TCP Chimney Drivers for Windows 7 Joe Nievelt Vivek Bhanu Software Design Engineer TCP/IP - Networking joeniev@microsoft.com vbhanu@microsoft.com

3 Agenda Overview Architecture Chimney Offload Overview Requirements for Chimney Targets Windows Implementation Specifics High Performance Considerations Contacts / References

4 Overview Reduce server’s CPU utilization due to TCP for applications with long-lived connections Fewer interrupts because TCP Ack packets processed in offload target Zero copy receives for apps that pre-post receive buffers Host stack implements all network management protocols Administrator can control connections eligible/ineligible for offload netsh [add | delete] [chimneyports | chimneyapplications] Administrator can view chimney statistics such as number of offloaded connections netsh interface tcp show [chimneystats | chimneyports | chimneyapplications] Windows Server 2008 R2 focus 10 Gb Ethernet Stability Logo clarification Characterizing file and web server workload performance

5 Architecture Application: Existing binaries run over either software stack or hardware TCP Switch: Controls whether data transfer is through the host stack or the offload target stack Application TCP Switch TCP Network Layer Framing Layer Offload Target

6 Implementing Chimney Offload Register optional miniport handlers with NDIS Generic Chimney miniport handlers manage 3 types of state variables Const: provided by stack and never change Cached: provided by stack and updated through MiniportUpdateOffload Delegated: initialized by stack and queried through MiniportQueryOffload TCP Chimney miniport handlers manage send and receive Initiation of send and receive is serialized per connection Sending/receiving are not serialized across separate connections

7 Offload Block List – Depth First Traversal

8 Application Interaction Most major network applications fall into three categories Pre-post Application keeps receive requests outstanding NIC may DMA directly to the posted buffer, avoiding a copy operation Examples: Backup applications Benefits the most from Chimney Indicate and post Application waits for receive indication, may partially consume, then posts a receive for the remainder Requires copy from indication buffer to posted receive buffer Examples: SMB, iSCSI Indicate and consume Application waits for receive indication, consumes entirely Examples: legacy TDI applications Benefits the least from Chimney

9 Requirements for Chimney Targets List of RFCs to implement – TCP: 793, 813, 1122, 1323, 2018, 2581, 2582, 2923, 2988, 3042, 3465, 3517 – IP: 791, 894, 1042, 1191, 1122, 2461 – Consult the Chimney WDK and Logo Requirements for the rest RFCs often provide multiple approaches to the same problem – E.g. Should a TCP zero window probe contain data? 793 & 1122 are ambiguous Use the Windows stack behavior as a guideline – Provide the same level of security and performance as Windows – Avoid interoperability problems with applications

10 Chimney & Receive-Side Scaling (RSS) RSS distributes receive indications across processors Uses 4-tuple hash and indirection table to determine processor for an indication Prevents a single processor from becoming a bottleneck Processor load is not necessarily even across the system With Chimney, processing may still be bottlenecked on a single CPU Applications may process traffic in the receive context Applications that do their own load distribution incur context switching costs Chimney with RSS allows applications to scale

11 Handling Indirection Table Updates with Chimney Each connection receives all indications on one processor at a time Indications may be in progress on one CPU as the connection is redirected to another CPU Indicating to multiple processors at once creates timing conditions where reordering may occur Non-offloaded connections can tolerate out-of-order packet indications at a performance cost Offloaded connections cannot tolerate out-of-order receive indications or completions Offload indications on the original processor must complete before beginning indications on the new processor

12 Chimney & Virtualization Chimney capabilities are exposed to child VMs in Windows 7 Existing drivers work without modification Source MAC address will vary Live migration is supported Collecting TCP and IP statistics per source MAC address Improves manageability and diagnostics Not a Logo requirement for 6.20 drivers Coexistence with virtual machine queue (VMQ) Windows 7 will use only VMQ if both chimney and VMQ are available

13 Receive Window Auto-Tuning Default receive window size may limit connection throughput in high Bandwidth Delay Product situations Windows tries to make its RcvWnd at least as large as the peer’s CWnd so that it isn’t a bottleneck Vendors may implement any algorithm of their choice

14 Receive Window Auto Tuning (contd.) Indicate the RcvWnd reported to the peer as part of upload Exclude the bytes buffered from the maximum window advertised Make sure a fine grained RTT estimate is reported in SRTT Avoid feedback loops in which RcvWnd restricts the sender unnecessarily Don’t shrink the RcvWnd right edge

15 Windows Implementation Specifics Zero Window Probing RFCs 793 / 1122 allow zero window probes to contain no data or one (fake) byte of new data which must be ignored by receivers RFCs don’t mention a FIN being sent as part of a zero window probe Windows generates zero window probes with one byte of new data and may generate one with the FIN flag set Logo requires window probes with one byte of new data Retransmission Timeout Windows offloads connections with SRTT & RTTVAR represented as 8xSRTT RTTVAR sent as 4xRTTVAR Logo requires minimum value of 300ms for RTO Logo requires maximum of 30s for RTT sample

16 Windows Implementation Specifics (contd.) Silly Window Syndrome (SWS) Chimney NICs must store the value of the largest window received For performance reasons, Windows ignore SWS suppression if a sub- MSS segment would reach a push boundary Logo requires that SWS be ignored if it can reach push boundary Black Hole Detection Many black hole routers still out there Logo requirement to support RFC 2923 TCP ACK Frequency RFC 1122 suggests sending ACK for every 2 segments, with a sub 500ms timeout Windows allows the frequency to be configured, sending an ACK for every N packets, default N=2 Windows allows the timeout to be configured with 10ms granularity, default 200ms

17 Windows Implementation Specifics (contd.) Keep Alive (KA) Timer Logo requires that duplicate data segments reset the KA timer Receive Window Updates If delivery and ACK frequency overlap to generate ACK segments, consolidate them to reduce network traffic Appropriate Byte Counting RFC 3465 specifies: CWnd += (BytesAcked>= CWnd) ? MSS : 0 Windows uses: CWnd += max((MSS * min(MSS * L, BytesAcked)) /CWnd, 1) Logo accepts Windows CWnd calculation or the simpler RFC 3465 calculation Loss Recovery RFC 2581 specifies: SsThresh = max (FlightSize / 2, 2*SMSS) Windows uses: SsThresh = max(2*SMSS, min(CWnd, RcvWnd) / 2) Logo requires the latter calculation

18 Windows Implementation Specifics (contd.) TCP Reassembly Many caveats around reassembly Conflicts with out of order data Conflicts with FIN Possible resource constraints around reassembly holes Must support at least 2 reassembly holes Must be prepared to extend holes in either direction and coalesce Generation of RESET segment RST is generated by Windows for : In window SYN Expiration of FIN_WAIT_2 timer RexmitCount expiry Others Chimneys must generate RST only if the application aborted the connection Upload the connection in the other cases

19 Timer Implementation Accurate timers are needed to minimize the difference between uploaded and offloaded connections Accurate timers also improve round trip time and bandwidth calculations Logo requirement Resolution must be 10ms or better Must not drift significantly compared to typical CPU timers

20 Call to Action Implement per-source MAC address statistics for virtualization Support receive-side scaling with TCP Chimney Implement receive window auto-tuning Design and develop with Logo requirements in mind

21 Resources Windows Server 2008 Chimney WDK and Logo Kit: http://connect.microsoft.com http://connect.microsoft.com Windows 7 WDK will be available as of WinHEC Windows Logo Program Web Site: http://www.microsoft.com/whdc/winlogo/default.mspx http://www.microsoft.com/whdc/winlogo/default.mspx NDIS 6 Feedback alias: ndis6fb@microsoft.com ndis6fb@microsoft.com Test and Logo questions: offloadt@microsoft.com offloadt@microsoft.com

22 Related Sessions SessionDay / Time Implementing Efficient RSS Capable Hardware and Drivers for Windows 7Tues. 1:30-2:30 Windows Logo Program Tests for NDISMon. 11-12 and Wed. 9:45-10:45 Driver ScalabilityMon. 11-12 and Tues. 11-12 Virtual Machine Queue Architecture ReviewTues. 2:45-3:45 Virtual Machine Queue Driver DevelopmentTues. 4-5 Ask the ExpertsTues. evening


Download ppt "Developing TCP Chimney Drivers for Windows 7 Joe Nievelt Vivek Bhanu Software Design Engineer TCP/IP - Networking"

Similar presentations


Ads by Google