Presentation on theme: "TCP Variations: Tahoe, Reno, New Reno, Vegas, Sack Paul D. Amer, Professor Computer & Information Sciences University of Delaware."— Presentation transcript:
TCP Variations: Tahoe, Reno, New Reno, Vegas, Sack Paul D. Amer, Professor Computer & Information Sciences University of Delaware
What are TCP Variations? Implementations of TCP that use different algorithms to achieve end-to-end congestion control. – Tahoe – Reno – NewReno – Vegas
Evolution of TCP TCP & IP RFC 793 & TCP described by Vint Cerf, Bob Kahn In IEEE Trans Comm 1983 BSD Unix 4.2 supports TCP/IP 1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1987 Karn’s algorithm to better estimate round-trip time 1986 Congestion collapse 1 st observed BSD Reno fast recovery delayed ACK’s 1975 Three-way handshake Ray Tomlinson In SIGCOMM Van Jacobson’s algorithms slow start, congestion avoidance, fast retransmit (all implemented in 4.3BSD Tahoe) SIGCOMM 88
Evolution of TCP ECN Explicit Congestion Notification (Floyd) 1993 TCP Vegas(not implemented) real congestion avoidance (Brakmo et al) 1994 T/TCP Transaction TCP (Braden) 1996 NewReno modified fast recovery SACK TCP Selective Ack (Floyd et al) 1996 Improving TCP startup (Hoe) 1996 FACK TCP Forward Ack extension to SACK (Mathis et al)
How did TCP cause Congestion? (original recipe TCP) Poor Efficiency In telnet-like applications, TCP sends 1 byte of data with 4000% overhead. Sending too much, too soon Unnecessary retransmits Send window too big Very little change in behavior due to congestion
Self-clocking or ACK Clock PrPr PbPb ArAr AbAb Receiver Sender AsAs Self-clocking systems tend to be very stable under a wide range of bandwidths and delays. The principal issue with self-clocking systems is getting them started.
TCP Algorithms: Four intertwined algorithms used commonly in TCP implementations Slow Start - Every ack increases the sender’s window (cwnd) size by 1 Congestion Avoidance - Reducing sender’s window size by half at experience of loss, and increase the sender’s window at the rate of about one packet per RTT (NOTE: not per ack) Fast Retransmit - Don’t wait for retransmit timer to go off, retransmit packet if 3 duplicate acks received Fast Recovery - Since duplicate ack came through, one packet has left the wire. Perform congestion avoidance, don’t jump down to slow start
t=xr cwnd=4 t=(x+1)r cwnd=5 t=(x+2)r cwnd=6 t=(x+3)r cwnd=7 Congestion Avoidance Packet Ack TCP Receiver TCP Sender new cwnd = old cwnd + # acks/cwnd After cwnd reaches a certain threshold
assume ssthresh = 8*MSS Example: Slow Start/Congestion Avoidance cwnd = 10 cwnd = 4 Eight ACKs cwnd = 2 cnwd = 8 cwnd = 1 cwnd = 9 Eight TCP-PDUs nineACKs nine TCP-PDUs ten ACKs ten TCP-PDUs cwnd = 11 ssthresh S R
Slow Start & Congestion Avoidance ssthresh Initally: - cwnd = 1*MSS - ssthresh = very high If a new ACK comes: -if cwnd < ssthresh update cwnd according to slow start -if cwnd > ssthresh update cwnd according to congestion avoidance -If cwnd = ssthresh either If timeout (i.e. loss) : - ssthresh = flight size/2; - cwnd = 1*MSS time cwnd Loss, e.g. timeout slow start – in green congestion avoidance – in blue (initial) ssthresh
t=xr t=(x+1)r Fast Retransmit Packet Ack At a random point in the transfer pkt 15 pkt 16 pkt 17 pkt 18 pkt 19 pkt 20 ack 15 ack 16 pkt 17 (retransmit) ack 16 3 dup acks TCP Receiver TCP Sender Fast Retransmit
TCP Tahoe’s Fast Retransmit 1.Sender receives 3 dupACKS. 2.Sender infers that the segment is lost. 3.Sender re- sends the segment immediately! 4.Sender returns to slow-start. ACK 1 segment 1 cwnd = 1 cwnd = 2 segment 2 segment 3 ACK 3 cwnd = 4 segment 4 segment 5 segment 6 segment 7 ACK 2 3 duplicate ACKs ACK 3 segment 4 fast-retransmit of segment 4 SR
TCP Variants : TCP-Tahoe: – implements the slow start, congestion avoidance, and fast retransmit algorithms TCP-Reno: – implements the slow start, congestion avoidance, fast retransmit, and fast recovery algorithms Among other implementations are Vegas, NewReno (the most commonly implemented on webservers today, according to a survey) and SACK TCP.
TCP Tahoe Trace (with one dropped segment) Lost segment Fast Retransmit Begin slow-start Begin congestion avoidance
TCP Tahoe Trace (with one dropped segment)
Could Tahoe be Improved? Receipt of dupACKs tells the sender that the receiver is still getting new segments, i.e. there is still data flowing between sender and receiver Why does sender go back to slow start after fast retransmit? Why does sender let Ack clock die?
TCP Variation: TCP Reno 2nd Improvement was TCP Reno (1990) – Nagle’s algorithm – Improved RTO calculation and back-off – AIMD congestion avoidance with slow-start – Fast retransmit & fast recovery
Fast Recovery cwnd Slow StartCongestion Avoidance Time “inflating” cwnd with dupACKs “deflating” cwnd with a new ACK (initial) ssthresh new ACK fast-retransmit new ACK timeout Concept: After fast retransmit, reduce cwnd by half, and continue sending segments at this reduced level. Problems: Sender has too many outstanding segments. How does sender transmit packets on a dupACK? Need to use a “trick” - inflate cwnd.
After receiving 3 dupACKS: 1.Retransmit the lost segment. 2.Set ssthresh = flight size/2. 3.Set cwnd = ssthresh, and ndupacks = 3. N.B. In Reno: send_win = min ( rwnd, cwnd + ndupacks ). If dupACK arrives: – ++ ndupacks – Transmit new segment, if allowed. If new ACK arrives: – ndupacks = 0 – Exit fast recovery. If RTO expires: – ndupacks = 0 – Perform slow-start - ( ssthresh = flight size/2, cwnd = 1 ) Fast Retransmit & Fast Recovery
TCP Reno Trace (with one dropped segment)
TCP Tahoe & Reno Trace (with two dropped segments)
What if There are Multiple Losses in a Window? With two losses in a window, Reno will occasionally timeout. With three losses in a window, Reno will usually timeout. With four losses in a window, Reno is guaranteed to timeout! With three or more losses in a window, Tahoe typically out performs Reno!
TCP Reno Trace (with two dropped segments)
TCP Variation: TCP NewReno 3rd Improvement was TCP NewReno (1995) – Nagle’s algorithm – Improved RTO calculation and back-off – AIMD congestion avoidance with slow-start – Fast retransmit & modified fast recovery
Modifications to Fast Recovery – Partial ACKs: An ACK that acknowledges some but not all the segments that were outstanding at the start of fast recovery. NewReno interprets this as an indication of multiple loss. – If partial ACK received, re-transmit the next lost segment immediately and set ndupacks = 0 (deflate send_win). – Sender remains in fast recovery until all data outstanding when fast recovery was initiated is ACK’ed. Additional dupACK’s increase ndupacks.
TCP NewReno Trace (with two dropped segments)
Tahoe, Reno & NewReno Trace (with two dropped segments)
Is There a Better Way? The only way Tahoe, Reno and NewReno can detect congestion is by creating congestion! – They carefully probe for congestion by slowly increasing their sending rate. – When they find (create), congestion, they cut sending rate at least in half! This slow advance and rapid retreat approach results in a saw-toothed sending rate, and highly erratic throughput. What if TCP could detect congestion without causing congestion?
TCP Variation: TVP Vegas (True Congestion Avoidance) Introduced by Brakmo and Peterson (1994) Three changes to TCP Reno – Modified congestion avoidance Don’t wait for a timeout, if actual throughput < expected throughput decrease the congestion window. (AIAD!) – New retransmission mechanism motivation: what if sender never receives 3-dupACKs (due to lost segments or window size is too small.) mechanism: sender does retransmission after a dupACK received, if RTT estimate > timeout. – Modified slow start motivation: sender tries finding correct window size without causing a loss.
Vegas vs. NewReno TCP NewReno throughput with simulated background traffic TCP Vegas throughput with simulated background traffic Source: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp – 1480
What Variations are being used? Experimental results obtained by testing 3728 web servers: – NewReno42% – Tahoe27%(w/o Fast Retransmit) – Reno18% – Other 8% – Tahoe 5% Source: Padhye and Floyd, SIGCOMM ‘01, August 27-31, San Diego, CA
Summary of TCP Behavior When entering slow start, if connection is new, ssthresh = arbitrarily large value cwnd = 1. else, ssthresh = max(flight size/2, 2*MSS) cwnd = 1. In slow start ++cwnd on new ACK TCP Variatio n Response to 3 dupACK’s Response to Partial ACK of Fast Retransmission Response to “full” ACK of Fast Retransmission Tahoe Do fast retransmit, enter slow start ++cwnd Reno Do fast retransmit, enter fast recovery Exit fast recovery, deflate window, enter congestion avoidance NewRen o Do fast retransmit, enter modified fast recovery Fast retransmit and deflate window – remain in modified fast recovery Exit modified fast recovery, deflate window, enter congestion avoidance When entering either fast recovery or modified fast recovery, ssthresh = max(flight size/2, 2*MSS) cwnd = ssthresh. In congestion avoidance cwnd += 1*MSS per RTT
34 TCP without SACK TCP uses cumulative ACKs Receiver identifies the last byte of data successfully received Out of order segments are not ACKed Receiver sends duplicate ACKs TCP without SACK forces the TCP sender Either to wait an RTT to find out a segment was lost Or, unnecessarily retransmit data that has been correctly received Can result in reduced overall throughput
35 TCP with Selective Ack (SACK) SACK + Selective Repeat Retransmission Policy allows receiver informs sender about all segments that are successfully received. sender fast retransmits only the missing data segments SACK is implemented using two TCP Options SACK-Permitted Option SACK Option
37 SACK Rules With SACKs, the ACK field is still a cum ACK A SACK cannot be sent unless the SACK-Permitted option has been received (in the SYN) The 1 st SACK block MUST specify the contiguous block of data containing the segment which triggered this acknowledgment If SACKs are sent, SACK option should be included in all ACK’s which do not ACK the highest sequence number in the data receiver’s queue
38 Generating SACKs – data receiver behavior If the data receiver has not received a SACK-Permitted Option for a given connection, the receiver must not send SACK options on that connection The receiver should send an ACK for every valid segment that arrives containing new data The data receiver should include as many distinct SACK blocks as possible in the SACK option SACK option should be filled out by repeating the most recently reported SACK blocks The data receiver provides the sender with the most up-to-date info about the state of the network and the receiver’s queue
39 Interpreting SACKs - Data Sender behavior The sender records the SACK for future reference Maintains a retransmission queue containing unacknowledged segments One possible implementation Turns on SACK bit for the segment in retransmission queue when it receives a SACK Skips SACKed data during any later fast retransmission On fast retransmit, retransmits data not SACKed so far and less than the highest SACKed data Turns off SACK bit after retransmission time out
40 Another SACK Example sender receiver ACK Receiver Buffer ACK 300, SACK ACK 300, SACK ,
41 sender receiver ACK 700, SACK ACK Another SACK Example (cont’d)
42 Without SACK vs. With SACK sender receiver ACK ACK 200 fast retransmit ACK 600 sender receiver ACK ACK 200, SACK ACK 200, SACK fast retransmit ACK 600 ACK 200, SACK TCP without SACK TCP with SACK
43 SACK Observations SACK TCP follows standard TCP congestion control; Adding SACK to TCP does not change the basic underlying congestion control algorithms SACK TCP has major advantages when compared TCP Tahoe, Reno, Vegas and New Reno, as PDUs have been provided with additional information due to the SACK Difference in behavior when multiple packets are dropped from one window of data SACK information allows the sender to better decide what to retransmit and what not to
Any questions ? …
45 Data Receiver Reneging Reneging – fail to fulfill a promise or obligation Data receiver is permitted to discard data in its queue that has not been acknowledged to the data sender, even if the data has already been SACKed Such discarding of SACKed segments is discouraged, but may occur if the receiver must give buffer space back to the OS If reneging occurs first SACK should reflect the newest segment even if its going to be discarded Except for the newest segment, all SACK blocks MUST NOT report any old data which is no longer actually held by the receiver
47 Consequences of Reneging Sender must maintain normal TCP timeouts Data cannot be considered “communicated” until a cum ACK is sent Sender must retransmit the data at the left window edge after a retransmit timeout, even if that data has been SACKed by the receiver Sender MUST NOT discard data before being acked by the Cum Ack
48 SACK-Permitted Option Sack–Permitted option is allowed only in a SYN Segment. indicates sender handles SACKs, and receiver should send SACKs if possible. SACK option can be used once connection is established Cumulative Ack No. 6 TCP header length Sequence Number Source port addressDestination port address NOP options kind=1 SACK- permitted kind=4 length=2 Window size Urgent pointerChecksum 1 SYN bit TCP Header
49 SENDER SACK-Permitted Option and SACK RECEIVER SYN “SACK-permitted” SYN/ACK “SACK-permitted” ACK TCP connection establishment phase data transfer phase cum ack and optional SACKs Cumulative Ack No. Sequence Number Source port addressDestination port address SACK- permitted kind=4 length=2 Window size Urgent pointerChecksum NOP options kind=1 1 SYN bit Cumulative Ack No. Sequence Number Source port addressDestination port address SACK- permitted kind=4 length=2 Window size Urgent pointerChecksum NOP options kind=1 1 SYN bit 1 ACK bit Cumulative Ack No. Sequence Number Source port addressDestination port address Window size Urgent pointerChecksum kind=1 1 ACK bit
50 SACK Option Cumulative Ack No. Sequence Number Source port address Window size Urgent pointer Checksum HLEN Destination port address Right edge of nth block Left edge of nth block Right edge of 1st block Left edge of 1st block Kind=1 Length=??Kind=1Kind=5 = (2 + 8 * n) bytes Max number of SACK blocks possible? = 4 SACK blocks (barring no other TCP Options) Max number bytes available for TCP Options? = 40 bytes Length of SACK with n blocks?