Presentation on theme: "TCP Variations: Tahoe, Reno, New Reno, Vegas, Sack"— Presentation transcript:
1TCP Variations: Tahoe, Reno, New Reno, Vegas, Sack Paul D. Amer, ProfessorComputer & Information SciencesUniversity of Delaware
2What are TCP Variations? Implementations of TCP that use different algorithms to achieve end-to-end congestion control.TahoeRenoNewRenoVegas
3Van Jacobson’s algorithms Evolution of TCP1984Nagel’s algorithmto reduce overheadof small packets;predicts congestion collapse19904.3BSD Renofast recoverydelayed ACK’s1987Karn’s algorithmto better estimate round-trip time1975Three-way handshakeRay TomlinsonIn SIGCOMM 751988Van Jacobson’s algorithmsslow start, congestion avoidance, fast retransmit (all implemented in 4.3BSD Tahoe)SIGCOMM 881983BSD Unix 4.2supports TCP/IP1986Congestion collapse1st observed1974TCP described byVint Cerf, Bob KahnIn IEEE Trans Comm1981TCP & IPRFC 793 & 7911975198019851990
4TCP Vegas(not implemented) Evolution of TCP1996NewReno modified fast recoverySACK TCPSelective Ack(Floyd et al)1994T/TCPTransaction TCP(Braden)1993TCP Vegas(not implemented)real congestion avoidance (Brakmo et al)1994ECNExplicitCongestionNotification(Floyd)1996FACK TCPForward Ack extension to SACK(Mathis et al)1996Improving TCP startup(Hoe)199319941996
5How did TCP cause Congestion? (original recipe TCP) Poor EfficiencyIn telnet-like applications, TCP sends 1 byte of data with 4000% overhead.Sending too much, too soonUnnecessary retransmitsSend window too bigVery little change in behavior due to congestion
6Self-clocking or ACK Clock PrPbArAbReceiverSenderAsImplications of ack-clocking:More batching of acks => bursty trafficLess batching leads to a large fraction of Internet traffic being just acks (overhead)Self-clocking systems tend to be very stable under a wide range of bandwidths and delays.The principal issue with self-clocking systems is getting them started.
7TCP Algorithms:Four intertwined algorithms used commonly in TCP implementationsSlow Start - Every ack increases the sender’s window (cwnd) size by 1Congestion Avoidance - Reducing sender’s window size by half at experience of loss, and increase the sender’s window at the rate of about one packet per RTT (NOTE: not per ack)Fast Retransmit - Don’t wait for retransmit timer to go off, retransmit packet if 3 duplicate acks receivedFast Recovery - Since duplicate ack came through, one packet has left the wire. Perform congestion avoidance, don’t jump down to slow start
9Packet Ack After cwnd reaches a certain threshold cwnd=4 cwnd=5 = Congestion AvoidanceAfter cwnd reachesa certain thresholdTCPReceiverSendert=xrcwnd=4t=(x+1)rcwnd=5new cwnd=old cwnd+# acks/cwndt=(x+2)rcwnd=6t=(x+3)rcwnd=7
11Slow Start & Congestion Avoidance Initally:- cwnd = 1*MSS- ssthresh = very highIf a new ACK comes:- if cwnd < ssthresh update cwnd according to slow startif cwnd > ssthresh update cwnd according to congestion avoidanceIf cwnd = ssthresh eitherIf timeout (i.e. loss) :- ssthresh = flight size/2;cwnd(initial) ssthreshLoss, e.g. timeoutssthreshtimeslow start – in greencongestion avoidance – in blue
12Packet Ack At a random point in the transfer pkt 15 pkt 16 pkt 17 Fast RetransmitAt a random pointin the transferTCPReceiverSenderpkt 15t=xrpkt 16pkt 17pkt 18pkt 19pkt 20ack 15ack 16ack 16t=(x+1)r3 dup acksFastRetransmitpkt 17(retransmit)
13TCP Tahoe’s Fast Retransmit Sender receives 3 dupACKS.Sender infers that the segment is lost.Sender re-sends the segment immediately!Sender returns to slow-start.segment 1cwnd = 1ACK 1cwnd = 2segment 2segment 3ACK 2ACK 3cwnd = 4segment 4segment 5segment 6ACK 3segment 73 duplicateACKsACK 3ACK 3Recall: a duplicate ACK means that an out-of sequence segment was receivedNotes:duplicate ACKs due packet reordering!if window is small don’t get duplicate ACKs!segment 4fast-retransmitof segment 4
14TCP Variants : TCP-Tahoe: TCP-Reno: implements the slow start, congestion avoidance, and fast retransmit algorithmsTCP-Reno:implements the slow start, congestion avoidance, fast retransmit, and fast recovery algorithmsAmong other implementations are Vegas, NewReno (the most commonly implemented on webservers today, according to a survey) and SACK TCP.
15TCP Tahoe Trace (with one dropped segment) Lost segmentFast RetransmitBegin congestion avoidanceBegin slow-start
17Could Tahoe be Improved? Receipt of dupACKs tells the sender that the receiver is still getting new segments, i.e. there is still data flowing between sender and receiverWhy does sender go back to slow start after fast retransmit?Why does sender let Ack clock die?
18TCP Variation: TCP Reno 2nd Improvement was TCP Reno (1990)Nagle’s algorithmImproved RTO calculation and back-offAIMD congestion avoidance with slow-startFast retransmit & fast recovery
19Fast Recovery Concept: After fast retransmit, reduce cwnd by half, and continue sending segments at this reduced level.Problems:Sender has too many outstanding segments.How does sender transmit packets on a dupACK? Need to use a “trick” - inflate cwnd.cwnd(initial) ssthreshfast-retransmitfast-retransmittimeoutnew ACKnew ACKTimeSlow StartCongestion Avoidance“inflating” cwnd with dupACKs“deflating” cwnd with a new ACK
20Fast Retransmit & Fast Recovery After receiving 3 dupACKS:Retransmit the lost segment.Set ssthresh = flight size/2.Set cwnd = ssthresh, and ndupacks = N.B. In Reno: send_win = min ( rwnd, cwnd + ndupacks ).If dupACK arrives:++ ndupacksTransmit new segment, if allowed.If new ACK arrives:ndupacks = 0Exit fast recovery.If RTO expires:Perform slow-start - ( ssthresh = flight size/2, cwnd = 1 )
22TCP Tahoe & Reno Trace (with two dropped segments)
23What if There are Multiple Losses in a Window? With two losses in a window, Reno will occasionally timeout.With three losses in a window, Reno will usually timeout.With four losses in a window, Reno is guaranteed to timeout!With three or more losses in a window, Tahoe typically out performs Reno!
25TCP Variation: TCP NewReno 3rd Improvement was TCP NewReno (1995)Nagle’s algorithmImproved RTO calculation and back-offAIMD congestion avoidance with slow-startFast retransmit & modified fast recovery
26Modifications to Fast Recovery Partial ACKs: An ACK that acknowledges some but not all the segments that were outstanding at the start of fast recovery. NewReno interprets this as an indication of multiple loss.If partial ACK received, re-transmit the next lost segment immediately and set ndupacks = 0 (deflate send_win).Sender remains in fast recovery until all data outstanding when fast recovery was initiated is ACK’ed. Additional dupACK’s increase ndupacks.
28Tahoe, Reno & NewReno Trace (with two dropped segments)
29Is There a Better Way?The only way Tahoe, Reno and NewReno can detect congestion is by creating congestion!They carefully probe for congestion by slowly increasing their sending rate.When they find (create), congestion, they cut sending rate at least in half!This slow advance and rapid retreat approach results in a saw-toothed sending rate, and highly erratic throughput.What if TCP could detect congestion without causing congestion?
30TCP Variation: TVP Vegas (True Congestion Avoidance) Introduced by Brakmo and Peterson (1994)Three changes to TCP RenoModified congestion avoidanceDon’t wait for a timeout, if actual throughput < expected throughput decrease the congestion window. (AIAD!)New retransmission mechanismmotivation: what if sender never receives 3-dupACKs (due to lost segments or window size is too small.)mechanism: sender does retransmission after a dupACK received, if RTT estimate > timeout.Modified slow startmotivation: sender tries finding correct window size without causing a loss.
31Vegas vs. NewRenoTCP NewReno throughput with simulated background trafficTCP Vegas throughput with simulated background trafficSource: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp – 1480
32What Variations are being used? Experimental results obtained by testing 3728 web servers:NewReno 42%Tahoe 27% (w/o Fast Retransmit)Reno 18%Other 8%Tahoe 5%Source: Padhye and Floyd, SIGCOMM ‘01, August 27-31, San Diego, CA
33Summary of TCP Behavior TCP VariationResponse to 3 dupACK’sResponse to Partial ACK of Fast RetransmissionResponse to “full” ACK of Fast RetransmissionTahoeDo fast retransmit,enter slow start++cwndRenoenter fast recoveryExit fast recovery, deflate window, enter congestion avoidanceExit fast recovery, deflate window, enter congestion avoidanceNewRenoenter modified fast recoveryFast retransmit and deflate window – remain in modified fast recoveryExit modified fast recovery, deflate window, enter congestion avoidanceWhen entering slow start, if connection is new,ssthresh = arbitrarily large valuecwnd = 1.else,ssthresh = max(flight size/2, 2*MSS)In slow start ++cwnd on new ACKWhen entering either fast recovery or modified fast recovery,ssthresh = max(flight size/2, 2*MSS)cwnd = ssthresh.In congestion avoidancecwnd += 1*MSS per RTT
34TCP without SACK TCP uses cumulative ACKs Receiver identifies the last byte of data successfully receivedOut of order segments are not ACKedReceiver sends duplicate ACKsTCP without SACK forces the TCP senderEither to wait an RTT to find out a segment was lostOr, unnecessarily retransmit data that has been correctly receivedCan result in reduced overall throughput
35TCP with Selective Ack (SACK) SACK + Selective Repeat Retransmission Policy allowsreceiver informs sender about all segments that are successfully received.sender fast retransmits only the missing data segmentsSACK is implemented using two TCP OptionsSACK-Permitted OptionSACK Option
37SACK Rules With SACKs, the ACK field is still a cum ACK A SACK cannot be sent unless the SACK-Permitted option has been received (in the SYN)The 1st SACK block MUST specify the contiguous block of data containing the segment which triggered this acknowledgmentIf SACKs are sent, SACK option should be included in all ACK’s which do not ACK the highest sequence number in the data receiver’s queue
38Generating SACKs – data receiver behavior If the data receiver has not received a SACK-Permitted Option for a given connection, the receiver must not send SACK options on that connectionThe receiver should send an ACK for every valid segment that arrives containing new dataThe data receiver should include as many distinct SACK blocks as possible in the SACK optionSACK option should be filled out by repeating the most recently reported SACK blocksThe data receiver provides the sender with the most up-to-date info about the state of the network and the receiver’s queue
39Interpreting SACKs - Data Sender behavior The sender records the SACK for future referenceMaintains a retransmission queue containing unacknowledged segmentsOne possible implementationTurns on SACK bit for the segment in retransmission queue when it receives a SACKSkips SACKed data during any later fast retransmissionOn fast retransmit, retransmits data not SACKed so far and less than the highest SACKed dataTurns off SACK bit after retransmission time out
41Another SACK Example (cont’d) 6993005009001099senderreceiver6993005009001099ACK 700, SACK7003005009001099ACK 11001100
42Without SACK vs. With SACK TCP with SACKTCP without SACKsenderreceiverACK 200ACK 200, SACKACK 200, SACKfast retransmitACK 600ACK 200, SACKsenderreceiverACK 200fast retransmitACK 600
43SACK ObservationsSACK TCP follows standard TCP congestion control; Adding SACK to TCP does not change the basic underlying congestion control algorithmsSACK TCP has major advantages when compared TCP Tahoe, Reno, Vegas and New Reno, as PDUs have been provided with additional information due to the SACKDifference in behavior when multiple packets are dropped from one window of dataSACK information allows the sender to better decide what to retransmit and what not to
45Data Receiver Reneging Reneging – fail to fulfill a promise or obligationData receiver is permitted to discard data in its queue that has not been acknowledged to the data sender, even if the data has already been SACKedSuch discarding of SACKed segments is discouraged, but may occur if the receiver must give buffer space back to the OSIf reneging occursfirst SACK should reflect the newest segment even if its going to be discardedExcept for the newest segment, all SACK blocks MUST NOT report any old data which is no longer actually held by the receiver
47Consequences of Reneging Sender must maintain normal TCP timeoutsData cannot be considered “communicated” until a cum ACK is sentSender must retransmit the data at the left window edge after a retransmit timeout, even if that data has been SACKed by the receiverSender MUST NOT discard data before being acked by the Cum Ack
48SACK-Permitted Option is allowed only in a SYN Segment.indicates sender handles SACKs, and receiver should send SACKs if possible.SACK option can be used once connection is establishedTCP HeaderNOPoptionskind=1Source port addressDestination port addressSequence Number6TCP header lengthCumulative Ack No.1SYN bitWindow sizeChecksumUrgent pointerSACK-permittedkind=4length=2
49SACK-Permitted Option and SACK SENDERRECEIVERSYN“SACK-permitted”TCP connection establishment phaseSYN/ACK“SACK-permitted”ACKCumulative Ack No.Sequence NumberSource port addressDestination port addressSACK-permittedkind=4length=2Window sizeUrgent pointerChecksumNOPoptionskind=11SYN bitACK bitCumulative Ack No.Sequence NumberSource port addressDestination port addressSACK-permittedkind=4length=2Window sizeUrgent pointerChecksumNOPoptionskind=11SYN bitCumulative Ack No.Sequence NumberSource port addressDestination port addressWindow sizeUrgent pointerChecksumkind=11ACK bitdata transfer phasecum ack and optional SACKs
50SACK Option Sequence Number Cumulative Ack No. HLEN Window size Source port addressDestination port addressLength of SACK with n blocks?= (2 + 8 * n) bytesSequence NumberCumulative Ack No.HLENWindow sizeMax number bytes available for TCP Options?ChecksumUrgent pointer= 40 bytesKind=1Kind=1Kind=5Length=??Left edge of 1st blockMax number of SACK blocks possible?Right edge of 1st block= 4 SACK blocks (barring no other TCP Options)Left edge of nth blockRight edge of nth block