Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Transmissions in TCP Dr. Rocky K. C. Chang 17 October 2006.

Similar presentations


Presentation on theme: "1 Data Transmissions in TCP Dr. Rocky K. C. Chang 17 October 2006."— Presentation transcript:

1 1 Data Transmissions in TCP Dr. Rocky K. C. Chang 17 October 2006

2 2 TCP sliding window protocol  The classical TCP employs a sliding window protocol with +ve acknowledgment and without selective repeat. Recover lost data, and perform congestion and flow control.  Failure of receiving ACKs within a timeout period is possibly due to Data/ACKs dropped by intermediate routers or end hosts due to errors, or Data/ACKs dropped by intermediate routers due to congestion, or

3 3 TCP sliding window protocol Data/ACKs dropped by end hosts due to a lack of buffer (overflow) Packet reordering  The size of the sender’s sliding window Determines the rate of sending segments, and is Determined jointly by the sender and receiver.  Max. throughput = min{( SND_WND * 8)/RTT, B} SND_WND is the sender window’s size in bytes. B is the network bandwidth in bits/second. RTT is the round-trip time.

4 4 TCP sliding window protocol sender receiver ACK 1st byte of data

5 5 TCP buffering Application buffer Socket send buffer Kernel Application Application data Application segmentation TCP segmentation (segments not larger than MSS) Application buffer Application data Socket receive buffer

6 6 Send sequence space  Each segment written to the socket send buffer can be in any of the following states: Sent and acknowledged (removed from buffers) Sent and unacknowledged Can be sent immediately Cannot be sent until the window moves  Use three variables: SND_WND : size of the send window SND_UNA : oldest unacknowledged SN SND_NXT : SN of the next segment to be sent

7 7 Send sequence space  Assume here that the sender’s window is determined only by the receiver’s offered window size.  An acceptable ACK is one for which SND_UNA  AN  SND_NXT AN = SND_UNA is a duplicate ACK.  When a segment is retransmitted, SND_NXT is set to an older value.  What is the condition for “all segments have been acknowledged?” The condition is given by snd_nxt = snd_una The condition is given by snd_nxt = snd_una

8 8 Send sequence space 123456789123456789 SND_UNASND_NXT SND_WND (advertised by the receiver) Sent and ackedSent and unackedCan sent ASAP Wait for the window

9 9 Receive sequence space  Use two variables: RCV_WND : size of the receive window RCV_NXT : SN of the next segment to be received  The receiver considers a received segment valid if all the data in a segment fit in the receive window: RCV_NXT  beginning SN of segment < RCV_NXT + RCV_WND, and RCV_NXT  ending SN of segment < RCV_NXT + RCV_WND.

10 10 Receive sequence space  An ACK may be sent when RCV_NXT = beginning SN of a received segment. 123456789123456789 RCV_NXT RCV_WND (advertised to sender) Acked Future SNs

11 11 A processing sequence  When a TCP receiver is in the ESTABLISHED state, it will process a segment according to the following order: Check the SN. Check the RST bit. Check the security and precedence. Check the SYN bit. Check the AN. Check the URG bit. Process the segment text. Check the FIN bit.

12 12 Sequence number and max window size  Given a SN space, what is the maximum window size? Given a maximum window size, what is the smallest SN space?  The SN wraparound problem  Take a simplest case, let the maximum window size be 1.

13 13 Acknowledgment strategies  Send an ACK for every segment received (RFC 793). Cumulative acknowledgments When a out-of-ordered segment is received, send an ACK = RCV_NXT (a duplicate ACK).  Delayed acknowledgment (RFC 1122) Give the application an opportunity to update the window and perhaps to send a response. In remote login, a delayed ACK can reduce the number of segments by a factor of 3 (ACK, window update, and echo character).

14 14 Delayed acknowledgements However, excessive delays on ACKs can disturb the round-trip timing and packet “clocking” algorithms.  Guidelines in RFC 1122: In a stream of MSS-sized segments, there should be an ACK for at least every second segment. Should not delay sending acknowledgment for more than 500ms (delay acknowledgment timer).  Newer systems use 200ms instead (any time between 0 and 200ms).

15 15 Selective acknowledgements (SACKs)  When multiple segments are lost, the sender either wait a roundtrip time to find out about each lost segment, or to unnecessarily retransmit segments which have been correctly received.  SACK allows a receiver to acknowledge noncontiguous blocks of segments to the sender. The SACK option does not change the meaning of AN in the TCP header.

16 16 Selective acknowledgements (SACKs)  SACKs are implemented in two TCP options. SACK-Permitted option sent in a SYN segment. SACK option sent in data segments.

17 17 Retransmissions and repacketization  A sender may retransmit the segment starting with SN = SND_UNA : Upon retransmission timeout or Upon receiving the third duplicate ACK (fast retransmission).  When a retransmission takes place, the retransmitted segment may also include other segments. Linux 2.2-12 does not repacketize old segments with new segments, but it repacketizes old segments with old segments.

18 18 Retransmissions and timeouts  BSD uses a coarse-grain timer for TCP’s six timers. The coarse-grain timer ticks off every 500ms. TCP timers: connection-establishment, retransmission, persist, keepalive, FIN_WAIT, TIME_WAIT  The retransmission timer is bounded between 1 and 64 seconds, and a function of the round-trip time estimate. It also depends on the time of starting the timer in reference to the coarse-grain timer.

19 19 Estimating the RTT  Problem: How does a TCP sender determine its timeout value? If over-estimate the timeout value, delay the retransmission. If under-estimate the timeout value, inject duplicate packets into the network.  TCP uses an adaptive transmission algorithm to accommodate varying delays in the Internet: A TCP sender monitors the RTT, either in coarse-grain or fine-grain measurement. Exponential backoff (will be discussed later)

20 20 RTT measurements and timeout  Given a new RTT measurement M, TCP updates an estimate of the average RTT by R  R + (1 )M.  is a filter gain constant (0 <  < 1), determining how much the new measurement contributes to the estimate.  is usually set to 0.9.  The timeout value RTO is set to R.  accounts for the variation in the RTT.  is usually set to 2.

21 21 RTT measure. and timeout (from [1])

22 22 A better estimator  Estimate the variation in the RTT by D  D + (1)|RM|. A mean deviation is used instead of standard deviation to avoid integer overflow due to multiplication. The mean deviation is also more conservative than the standard deviation.  The timeout value is now given by RTO = R + 2D or R + 4D.  How does the initialization of the parameters affect the estimator?

23 23 A better estimator (from [1])

24 24 Silly window syndrome (RFC 813)  SWS problem: “a stable pattern of small incremental window movements.” The sender window moves by a very small amount. The sender is forced to send small segments (smaller than MSS). SWS can only occur during the transmission of a large amount of data.

25 25 Sender-side SWS and Nagle algo.  For example, the sender window size = 4*MSS. After sending 3 MSS-sized segments, the sender only has 0.5*MSS of data to send. Shortly after, the sender also sends another 0.5*MSS of data. When the ACK for the first 0.5*MSS data returns, the sender can only send 0.5*MSS, instead of an MSS-sized segment.

26 26 Sender-side SWS and Nagle algo.  Nagle algorithm (RFC 896) If a TCP sender has less than an MSS-sized segment to transmit, and if any previous segment had not yet been acknowledged, do not transmit the segment. Open-loop congestion avoidance mechanism  Nagle’s algorithm needs to be turned off for some applications, e.g., X-window, and transaction-based applications.

27 27 Receiver-side SWS and delayed ACK  The sender window can also be advanced incrementally when the receiver sends ACKs too frequent or/and increase the offered window size by small amounts.  Receiver-side SWS solutions: Delayed acknowledgment (probably with a new window update). Send a window update only if it could advance by a “significant amount.”  E.g., 35% of the receive buffer size or 2*MSS.

28 28 Temporary deadlocks  Temporary deadlocks as a result of an interaction between Nagle algorithm and the receiver-side SWS algorithms. Nagle algorithm prevents the sender from sending more data. The delayed ACK algorithm and window update algorithm prevent the receiver from sending ACK and window updates.  For example, the send window = 2*MSS and the data passed to the TCP socket buffer is slightly less than 4*MSS.

29 29 Temporary deadlocks S-->R: 2 MSS-sized segments and then stop (due to the window full). R-->S: 1 ACK for the 2 segments (based on ACK every other MSS-sized segment) S-->R: 1 MSS-sized segment and then stop (due to Nagle algorithm). R-->S: Do not send an ACK or window update immediately after receiving the 3rd MSS-sized segment (due to the receiver-side SWS algms). R-->S: Send an ACK after 200ms when the delayed ACK timer fires.

30 30 Temporary deadlocks S-->R: After receiving the ACK, send the last nonMSS-sized segment. The total time required is 3*RTT + 200ms, instead of 2*RTT.  Similar temporary deadlocks can occur when there is an application buffer tearing, the socket send buffer is not large enough, and the MTU is too large.

31 31 Zero advertised window  Problem: A deadlock occurs when segment 9 is lost or corrupted. ACKs are not reliable. 45674567 8989 win 0 win 4096 1024 senderreceiver

32 32 Persist timer  Solution: A sender uses a persist timer to periodically send a window probe when the receive window closes up. Exponential backoff until the period reaches a limit, say 2 minutes. Then a window probe is sent every 2 minutes until the window opens up or either side of the application closes.  The window probe contains 1 byte of data.  TCP is always allowed to send 1 byte of data beyond the end of a closed window.

33 33 An idle TCP connection  If neither process at the ends of a TCP connection are sending data, nothing is exchanged between the two processes. Assume that the application protocol that uses the TCP does not detect inactivity. If a router or a link between them is down and is restored later on, can the two ends still use the connection?  A keepalive timer is (normally) used by a server to know whether a client is crashed and is down, or is crashed or is rebooted.

34 34 Keepalive timer  If there is no activity on a TCP connection for 2 hours, the server sends a probe segment to the client. If the client is up, it responds to the probe. If the client has crashed and is still down, the server times out (after 75 sec) and resends the probe again (every 75 sec) for a number of times (10). If the client has crashed and is rebooted, the client responds by sending a RESET segment.

35 35 Summary  When moved to the Established state, TCP uses a sliding window protocol to control the transmission rate and recover lost segments.  TCP employs a cumulative ACK strategy with an optional SACK scheme.  Retransmissions take place upon timeouts which are functions of the RTT estimates.  Special care was taken to ensure that the sender window does not increase on small increments.

36 36 Summary  Temporary deadlock could occur when Nagle algorithm interacts with delayed ACK and window update algorithms.  Special care was also taken for special circumstances, such as zero window update and client crash before terminating the connection properly.

37 37 References 1. Requirements for Internet Hosts -- Communication Layers (RFC 1122) 2. Van Jacobson, “Congestion avoidance and control,” Proc. SIGCOMM, vol. 18, no. 4, Aug. 1988. 3. J. Mogul and G. Minshall, “Rethinking the TCP Nagle Algorithm,” ACM Computer and Commun. Review, Jan. 2001.


Download ppt "1 Data Transmissions in TCP Dr. Rocky K. C. Chang 17 October 2006."

Similar presentations


Ads by Google