TCP End-To-End Congestion Control Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001)

Contents : - TCP Congestion Control Concepts - TCP Flavors

TCP Congestion Control - Obey a ‘packet conservation’ principle : “ In equilibrium, a new packet is not put into the network until an old packet leaves ” - Avoid ‘congestion collapses’ : “ The severe drop of the network throughput caused by the congestion ”

- A collection of collaborating mechanisms : Slow-Start Accurate Retransmission Timeout Estimation Congestion Avoidance Fast Retransmit Fast Recovery Selective Acknowledgement

TCP Basics - Congestion Window (cwnd) : “ A TCP state variable that limits the amount of data a TCP can send” “ The window at the sender site controlled by congestion control and avoidance algorithms ” - Advertised Window (Receiver Window) : “ The available buffer size at the receiver site ”

- Sender’s maximum window (maxwin) : “ min(cwnd, advertised window) ” - Sender’s usable window : “ maxwin - unacknowledged segments ” - TCP maintains a Retransmission Timer for each packet, say x, which has been sent and not yet acknowledged. If the ACK for the packet x does not reach the sender before its timer is expired, the packet x is assumed to be lost and the sender will retransmit the packet x.

Self-Clocking - The ‘packet conservation’ property can be expressed in the sense that: “ The sender will be able to inject a new data packet into the network only if it receives an ‘ACK’ from the receiver “ So, the protocol is self-clocking ! “ The sender uses ACKs as a ‘clock’ to strobe new packets into the network ”

- However, how is the clock started ? The problem is : “ An ACK is generated when the receiver receives a data packet correctly “ and “ To make the system robust, the data packet will be injected into the network only when there is an ACK triggering the sender to do so ” - Answer: “ A new algorithm called ‘Slow-Start’ has been introduced to gradually increase the amount of data in transit”

receiver PrPr PbPb AbAb sender AsAs ArAr P b : the minimum packet spacing (the inter-packet interval) on the bottleneck link P r : the receiver’s network packet spacing [P b = P r ] A r : the spacing between acks on the receiver’s network [if the processing time is the same for all packets, P b = P r = A r ] A b : the ack spacing on the bottleneck link A s : the ack spacing on the sender’s network [A s = P b ]

Getting to Equilibrium: Slow-Start Algorithm - When starting, initialize ‘cwnd’ to 1 When restarting after a loss, set ‘cwnd’ to 1 cwnd = 1 - Every time the sender sends data packets: min ( cwnd, advertised window) – # unacked paeket - Upon receiving an ACK for new data, increase congestion window by one cwnd = cwnd + 1

1 one RTT one pkt time 0R 2 1R 3 4 2R 5 6 7 8 3R 9 10 11 12 13 14 15 1 23 4567

- However, the slow-start is not that slow to increase the congestion window of the sender site: “ Let W be the window size (packets) Let RTT be the round-trip time it takes time RTT * log 2 W to open the congestion window from 1 to W ” - Therefore, the window is increased fast enough to have negligible effect on performance

Conservation at equilibrium: round-trip timing - Once data is flowing reliably, the problem that the sender injects a new packet before an old packet has exited must represent a failure of sender’s retransmission timer - TCP decided to estimate the retransmission timer for each packet in term of RTT ( wait at least one RTT before retransmitting ! ) - too short RTT => unnecessary retransmission too long RTT => low throughput - What model should be used to estimate the RTT ? “ Estimated RTT must be adaptive due to the condition of the network, but not too fast and not too slow ”

- Initial RTO estimator: New RTT = * old RTT + (1 - ) * M where M : a round trip time measurement from the most recently acked data packet (Round Trip Sample) : a filter gain constant with suggested value of 0.9 RTO = * New RTT where : accounts for RTT variation with suggested value of 2

- How to measure accurately Round Trip Samples? AB ACK Sample RTT AB Original transmission retransmission Sample RTT Original transmission retransmission ACK Acknowledgement Ambiguity phenomenon Complication arises because TCP’s acknowledgement refers to data received, not to the instance of a specific datagram that carried the data

- Karn’s RTO estimator Accounts for the Acknowledgement Ambiguity phenomenon Combination of the initial RTO estimator and a timer back off strategy. As usual, to compute an initial timeout value, use the formula : New RTT = * old RTT + (1 - ) * M RTO = * New RTT

If the timer expires and causes retransmission, TCP does not count RTT sample for that segment but keeps back-off the timeout on each retransmission by the formula : until it can successfully transfer a segment New RTO = * old RTO The suggested value for is 2

- Jakobson’s RTO estimator Key Observations: At high load, there is a wide range of variation in delay Queuing theory suggested that by using the formula and limiting to the suggested value of 2, the RTO estimation can adapt to loads of at most 30 % RTO = * New RTT

DIFF = SAMPLE - old RTT Smoothed RTT = old RTT + * DIFF DEV = old DEV + * ( |DIFF| - old DEV ) Timeout = Smoothed RTT + * DEV Solutions: Estimate both average round trip time and the variance, and use the estimated variance in place of the constant where DEV : the estimated mean deviation : a fraction between o and 1 that controls how quickly the new sample affects the weighted average (Smoothed RTT) : a fraction between o and 1 that controls how quickly the new sample affects the mean deviation : a factor that controls how much the deviation affects the RTO (suggested value of is 4)

Adapting to the path: Congestion Avoidance - Use coarse grained timeout to indicate congestion in the network - If loss occurs (timeout) when cwnd = W The network can absorb up to W segments Set cwnd to 0.5 * W (multiplicative decrease) - Upon receiving an ACK, Increase cwnd by 1/cwnd (additive increase)

Review: Congestion control algorithms must obey the “ Packet Conservation Principle ”. * to get to the equilibrium state, to get high utilization of the network BW, but not want to bomb the network with a big burst, USE ‘SLOWSTART’ algorithm * to maintain the equilibrium state (not inject a new packet into the network until an old packet has been taken out), USE an unambiguous situation to measure RTT (Karn’s algorithm) & USE an accurate model to calculate RTO (Jacobson’s model) * to adapt to the network condition, USE a mechanism to detect occurring of loss (coarse-grained timeout) USE congestion avoidance to avoid exceeding the available BW

The combined slow-start with congestion avoidance algorithm - Use 2 state variables : cwnd : the congestion window at the sender site ssthresh : the threshold used to switch between the two algorithms - The sender always sends min(cwnd, advertised window) - # unacked packet - If a packet is dropped, we loss self-clocking - We need to implement both algorithms together to avoid loosing a packet as much as we can.

- The algorithm starts with slow-start; on a timeout, ssthresh = cwnd/2 cwnd = 1 - Now, upon receiving an ACK if (cwnd < ssthresh) cwnd += 1 ; /* implement slow-start */ else cwnd += 1/cwnd ; /* implement congestion avoidance */

Slow-Start and Congestion Avoidance SENDER RECEIVER PKT#0 ACK #0, wait for #1 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 (1) (2) (4) (8) (14) ACK #12 dup ACK #12... SENDER RECEIVER #15 #26 #16... Timeout ssthresh = 15/2 = 7 ( cwnd = 1 ) “start slow-start again” Retx #13 ACK #26, wait for #27 #27 #28 (2) (4)

SENDER RECEIVER (7) #29 #30 #31 #32 “enter congestion avoidance” #33 #34 #35 #36 #37 #38 #39 (8) #40 #41 #42 #43 #44 #45 #46 (8.125) #47... Timeout ssthresh = 8/2 = 4 ( cwnd = 1 ) “start slow-start again” Retx #41 ACK #47, wait for #48 #48 #49 (2) (4) “enter congestion avoidance”...

The congestion window for slow-start/congestion avoidance algorithm time Congestion window 1 W1W1 0.5 W 1 W2W2 0.5 W 2 Timeout

Impacts of timeout - Timeout can cause sender to: Slow-start Retransmit a portion of window (possibly large) - Employ duplicate ACKs to signal the sender Fast Retransmit : use a number of duplicate ACKs to signal the sender about the packet loss (shorten the idle time for waiting for the timeout) Fast Recovery : advance congestion window more aggressively to reach high utilization faster

Fast Retransmit - Duplicate ACKs can be caused by: Segment Dropped Segment Re-ordering - TCP receiver should send an immediate duplicate ACK when an out-of-order segment arrives - TCP receiver should send an immediate ACK when an incoming segment fills in all or part of a gap in the sequence space.

- Assume that segment re-ordering is infrequent, TCP sender uses receipt of 3 duplicate ACKs as an indication of a segment has been lost “3 duplicate ACKs” means 4 identical ACKs without the arrival of any other intervening ACK packets Set ssthresh = 0.5 * current cwnd, cwnd = 1, and retransmit the dropped segment before timeout Wait for a non-duplicate ACK and continue with slow-start

- Fast Retransmit removes the idle time the sender waits for the coarse grained timeout, since the sender can retransmit the dropped segment upon receiving the third duplicate ACK - However, the throughput of the system is still suffered from the fact that the sender has to enter slow-start every time a retransmission occurs - Moreover, Fast Retransmit causes unnecessary retransmission when multiple drops in a single window occur

Fast Recovery - Key Observation: A duplicate ACK is caused by a receipt of a segment at the receiver site In another word, each duplicate ACK corresponds to taking one segment out of the network So, it is possible to use the duplicate ACKs to clock the sending of segments - Solution: If n duplicate ACKs arrive at the sender, advance cwnd by n

Fast Retransmit & Fast Recovery - Upon receiving the third duplicate ACK of segment X, Retransmit segment N (Fast Retransmit) Set ssthresh = 0.5 * current cwnd Set cwnd = ssthresh + 3 (Fast Recovery) - After that, upon receiving a duplicate ACK, inflate the congestion window by one - If the sender’s usable window allows, send new data segment - Upon receiving a non-duplicate ACK, exit Fast Recovery Set cwnd = ssthresh (the value in step 1) and continue with congestion avoidance

- Fast Recovery helps enhancing the throughput of the system reasonably since duplicate ACKs are used to clock sending(s) - However, it is suffered a lot if multiple drops in a single window occur. The throughput is dramatically dropped especially when there are 3 non-consecutive drops in a window

Modified Fast Recovery (Conservative version) - Key Observation: Fast Recovery is suffered from multiple drops since it has to enter Fast Recovery several times - Solution: Change the sender’s behavior during Fast Recovery when a partial ACK is received A partial ACK is the one that acknowledges some but not all of the segments that were outstanding at the start of the Fast Recovery period

In the original Fast Recovery, partial ACKs cause TCP sender to exit Fast Recovery by deflating the congestion window back to the size of ssthresh In the modified Fast Recovery, partial ACKs do not take TCP sender out of Fast Recovery Instead, partial ACKs received during Fast Recovery trigger the sender to retransmit the segment immediately following the acknowledged segment TCP sender remains in Fast Recovery until all of the data outstanding when Fast Recovery was initiated has been acknowledged

Selective Acknowledgement (SACK) - TCP receiver provides more information about hole(s) in the sequence buffer to the sender - The SACK option field contains a number of SACK blocks, where each SACK block reports a non-contiguous set of data that has been received and queued. The 1st block is required to report the most recently received segment The additional SACK blocks repeat the most recently reported SACK blocks

- The minimum number of SACK blocks in the SACK option field is two. It can have more than two blocks depending on the other option fields implemented in TCP. - The simulation referenced by this presentation used assumed to have three blocks in the SACK option field

- SACK TCP Sender enters Fast Recovery upon receiving 3rd duplicate ACK of a certain segment. Like the regular Fast Recovery, the sender cuts cwnd are cut in half and retransmit the dropped segment - During Fast Recovery, SACK maintains a variable, named ‘pipe’, representing the estimated number of segments outstanding in the path - The sender also maintains a data structure, called ‘scoreboard’, which remembers acknowledgements from previous SACK options

- The sender only sends new or retransmitted data when “pipe < cwnd” - ‘pipe’ is incremented by one when the sender either sends a new segment or retransmits an old packet - ‘pipe’ is decremented by one when the sender receives a dup ACK packet with a SACK option reporting that new data has been received at the receiver - Upon receiving a partial ACK, ‘pipe’ is decremented by two - The sender exits Fast Recovery when it receives a recovery acknowledgement acknowledging all data that was outstanding when it enters Fast Recovery

- When the sender is allowed to send a segment, It retransmits the next segment inferred to be missing If no such segments and the advertised window is sufficiently large, the sender sends a new packet - When the retransmitted packet is itself dropped, the TCP sender detects drop with RTO, retransmits the dropped segment and then slow-starts.

TCP Flavors - Tahoe, Reno, New-Reno, Vegas - TCP Tahoe (distributed with 4.3 BSD Unix) includes: Slow-start (exponential increase congestion window) Congestion Avoidance (additive increase) Fast Retransmit (use 3 dup ACKs)

- TCP Reno (1990) includes : All mechanisms in Tahoe Fast Recovery ( governing the transmission after retransmit the lost segment ) Delayed Acknowledgement ( to avoid silly window syndrome ) - TCP New Reno : Makes a small change in responding to partial ACKs during Fast Recovery

Tahoe: 1 drop SENDER RECEIVER #0 (1) (2) #1 #2 ACK #1 - #2 (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter fast retransmit” ssthresh = 15/2 = 7 (cwnd = 1) “continue with slow-start”... 14th dup ACK #13 Retx #13 ACK #28 (2) #29 #30 ACK #29 - #30 (4)

#31 #32 #33 #34 ACK #31 - #34 (7) SENDER RECEIVER “enter congestion avoidance” #35 #36 #37 #38 #39 #40 #41 ACK #35 - #41 (8)...

Reno : 1 drop SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... ACK #28 3 dup ACKs #13 “enter fast recovery” ssthresh = 15/2 = 7 (cwnd = 7) 4th dup ACK #13 (11) 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 13th dup ACK #13 14th dup ACK #13 #29 #30 #31 #32 #33 #34 “exit fast recovery” ssthresh = 7 (cwnd = 7) continue with congestion avoidance ! #35

SENDER RECEIVER ACK #29 - #35 (8) #36 #37 #38 #39 #40 #41 #42 #43 ACK #36 - #43 (9)...

Tahoe: 2 drops SENDER RECEIVER #0 (1) (2) #1 #2 ACK #1 - #2 (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 SENDER RECEIVER “enter fast retransmit” ssthresh = 8/2 = 4 (cwnd = 1) continue with slow-start 3 dup ACKs #6 6th dup ACK #13... Retx #7 ACK #8 (2) #9 (retx) ACK #14 #15 #16... #10 1st dup ACK #14 (3) #17 ACK #15 - #17 (4.67) “enter congestion avoidance”

Reno : 2 drops (causing “retransmission timeout”) SENDER RECEIVER #0 (1) (2) #1 #2 ACK #1 - #2 (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 SENDER RECEIVER “enter fast recovery” ssthresh = 8/2 = 4 (cwnd = 4) 3 dup ACKs #6 6th dup ACK #6 Retx #7 ACK #8 #15 #16 1st dup ACK #8 #17 #18 ACK #17 - #18 (4) (10) “exit fast recovery” ssthresh = 4 (cwnd = 4) cannot send more data since the outstanding no. of segments is 8 2nd dup ACK #8... Timeout Retx #9 “enter slow-start” (cwnd = 1) ACK #16 (2) 4th dup ACK #6 (8) 5th dup ACK #6 (9)

Reno : 2 drops (causing “two successive Fast Recovery”) SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter fast recovery” ssthresh = 15/2 = 7 (cwnd = 7) 4th dup ACK #13 (11) 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 (12) (13) (14) (15) (16) (17) (18) (19) (20) 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 13th dup ACK #13 Retx#14 #29 #30 #31 #32 #33 ACK#27 “exit fast recovery” ssthresh = 7 (cwnd = 7) #34

SENDER RECEIVER 3 dup ACKs #27 4th dup ACK #27 5th dup ACK #27 “enter fast recovery” ssthresh = 7/2 = 3 (cwnd = 3) (7) (8) 6th dup ACK #27 (9) #35 #36 Retx#28 ACK#34 “exit fast recovery” ssthresh = 3 (cwnd = 3) continue with congestion avoidance ACK#35 #37 #38 ACK#36 #39 ACK#37 (4) #40 #41 #42 ACK#38 ACK#39 #43 ACK#40 #44 ACK#41 #45 (5)

New Reno : 2 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter fast recovery” ssthresh = 15/2 = 7 (cwnd = 7) 4th dup ACK #13 (11) 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 (12) (13) (14) (15) (16) (17) (18) (19) (20) 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 13th dup ACK #13 Retx#14 #29 #30 #31 #32 #33 ACK#27 “receive a partial ACK; retransmit segment#28 immediately”

SENDER RECEIVER Retx#28 (7) (8) (9) (10) (11) (12) 5 dup ACKs #27 #34 #35 #36 #37 #38 #39 ACK#33 “exit fast recovery” ssthresh = 7 (cwnd = 7) continue with congestion avoidance

SACK TCP : 2 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter Fast Recovery” pipe = cwnd - ndup = 15 - 3 = 12 ssthresh = 15/2 = 7 cwnd = 7 4th dup ACK #13 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 13th dup ACK #13 (7, 11) (7, 10) (7, 9) (7, 8) (7, 7) (7, 6) (7, 5) (7, 4) (7, 3) (7, 2) Retx#14 Can send five more segments

SENDER RECEIVER ACK#27 #29 #30 #31 #32 #33 (7, 3) (7, 4) (7, 5) (7, 6) (7, 7) #34 #35 (7, 5) (7, 6) (7, 7) 5 dup ACKs #27 (7, 6) (7, 5) (7, 4) (7, 3) (7, 2) Retx#28 #36 #37 #38 #39 (7, 7) 2 dup ACKs #27 (7, 6) (7, 5) #40 #41 (7, 7) ACK#35 “exit fast recovery” ssthresh = 7 (cwnd = 7) continue with congestion avoidance SENDER RECEIVER #42 ACK#36 ACK#37 ACK#38 ACK#39 #43 #44 #45 #46 ACK#40 ACK#41 #47 #48 ACK#42 #49 (8) #50 #51 #52 #53 #54 ACK#43 ACK#44 ACK#45 ACK#46

Example: SACK 2 drops (#14 and #28) At sender: Receive ACK# 7 No Gap 70-6 ACK# 8 No Gap 80-7 ACK# 9 No Gap 90-8 ACK#10 No Gap 100-9 ACK#11 No Gap 110-10 ACK#12 No Gap 120-11 ACK#13 No Gap 130-12 1st dup a hole at #14 ACK#13 150-13 2nd dup a hole at #14 ACK#13 160-13 15 3rd dup a hole at #14 ACK#13 170-13 15-16 *** Enter Fast Recovery ! ssthresh = cwnd = 15/2 = 7 outstanding segment = #14 - #28 Retransmit #14 13th dup a hole at #14 ACK#13 270-13 15-26...

ACK#27 No Gap *** The first partial ACK is caused by retransmitted segment #14 ‘pipe’ is decremented by two 270-27 1st dup a hole at #28 ACK#27 290-27 2nd dup a hole at #28 ACK#27 300-27 29 3rd dup a hole at #28 ACK#13 310-27 29-30 4th dup a hole at #28 ACK#13 320-27 29-31 7th dup a hole at #28 ACK#13 350-27 29-34... Retransmit #28 ACK#35 No Gap *** The recovery ACK is caused by retransmitted segment #28 It brings TCP sender out of Fast Recovery *** Exit Fast Recovery ! ssthresh = cwnd = 15/2 = 7 continue with congestion avoidance 350-34

Tahoe: 3 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) #15 #16 #17 #18 #26 #27 #28... SENDER RECEIVER “enter fast retransmit” ssthresh = 15/2 = 7 (cwnd = 1) continue with slow-start 12th dup ACK #13 Retx#14 ACK #25 #26 (retx) #27 (2) (3) ACK #27 1st dup ACK #27 #28 (retx) #29 #30 ACK #28 ACK #29 ACK #30 (4) (5) (6)

SENDER RECEIVER #31 #32 #33 #34 #35 #36 ACK #31 - #36 (7) #37 #38 #39 #40 #41 #42 “enter congestion avoidance”

Reno : 3 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter fast recovery” ssthresh = 15/2 = 7 (cwnd = 7) 4th dup ACK #13 (11) 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 (12) (13) (14) (15) (16) (17) (18) (19) 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 Retx#14 #29 #30 #31 #32 ACK#25 “exit fast recovery” ssthresh = 7 (cwnd = 7) continue with congestion avoidance

SENDER RECEIVER 3 dup ACKs #25 “enter fast recovery” ssthresh = 7/2 = 3 (cwnd = 3) 4th dup ACK #25 (7) Retx#26 ACK#27 “exit fast recovery” ssthresh = 3 (cwnd = 3) continue with congestion avoidance... Timeout “enter slow-start” (cwnd = 1) Retx#28 ACK#32 (2) #33 #34 ACK#33 - #34 (3) continue with congestion avoidance

New Reno : 3 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... 3 dup ACKs #13 “enter fast recovery” ssthresh = 15/2 = 7 (cwnd = 7) 4th dup ACK #13 (11) 5th dup ACK #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 (12) (13) (14) (15) (16) (17) (18) (19) 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 Retx#14 #29 #30 #31 #32 ACK#25 “receive a partial Acknowledgement” retransmit #26 immediately

SENDER RECEIVER Retx#26 (7) (8) (9) (10) (11) #33 #34 #35 #36 4 dup ACKs #25 “receive a partial Acknowledgement” retransmit #28 immediately (7) 4 dup ACKs #27 (8) (9) (10) (11) #37 #38 Retx#28 ACK#27 ACK#36 “exit fast recovery” ssthresh = 7 (cwnd = 7) #39 #40 #41 #42 #43

SACK TCP : 3 drops SENDER RECEIVER #0 #1 #2 ACK #1 - #2 (1) (2) (4) #3 #4 #5 #6 ACK #3 - #6 (8) #7 #8 #9 #10 #11 #12 #13 #14 ACK #7 - #13 (15) SENDER RECEIVER #15 #16 #17 #18 #26 #27 #28... “enter Fast Recovery” pipe = cwnd - ndup = 15 - 3 = 12 ssthresh = 15/2 = 7 cwnd = 7 4th dup ACK #13 5th dup ACK #13 3 dup ACKs #13 6th dup ACK #13 7th dup ACK #13 8th dup ACK #13 9th dup ACK #13 10th dup ACK #13 11th dup ACK #13 12th dup ACK #13 (7, 11) (7, 10) (7, 9) (7, 8) (7, 7) (7, 6) (7, 5) (7, 4) (7, 3) Retx#14 Realize that #26 has been lost, and right now we can send 4 segments

SENDER RECEIVER #29 #30 #31 Retx #26 (7, 4) (7, 5) (7, 6) (7, 7) ACK #25 The first partial ACK (7, 5) #32 #33 (7, 7) 3 dup ACKs #25 (7, 6) (7, 5) (7, 4) These 3 dup ACKs contain information indicating holes at segment #26 and #28. Retx #28 #34 #35 ACK #27 The second partial ACK (7, 5) (7, 7) #36 #37 (7, 7) (7, 5) 2 dup ACKs #27 #38 #39 ACK #33 “exit Fast Recovery” ssthresh = 7 cwnd = 7 continue with congestion avoidance

- TCP Vegas (1995) implements 3 new techniques to increase throughput and decrease losses : New retransmission mechanism Congestion avoidance mechanism Modified Slow-Start mechanism to avoid packet losses while trying to find the available bandwidth during the initial use of slow-start give TCP the ability to anticipate congestion, and adjust its transmission rate accordingly Results in a more timely decision to retransmit a dropped segment

TCP Vegas New Retransmission Mechanism - Vegas reads and records the system clock each time a segment is sent - When an ACK arrives, Vegas reads the arriving time again and does the RTT calculation RTT = Segment sending time - ACK arriving time Goals: 1. To be able to detect lost segments even though there may be no second or third duplicate ACK 2. To reduce the time to detect lost segments ( can retransmit before receiving the third duplicate ACK )

When a duplicate ACK #n is received, Vegas checks the difference between the current time and the sending time of the segment #n+1. If it is greater than the timeout value, Vegas retransmits the segment #n+1 without having to wait for 3 duplicate ACKS When a non-duplicate ACK #n is received and it is the first or second one after a retransmission, Vegas checks the difference between the current time and the sending time of segment #n+1. If it is greater than the timeout value, Vegas retransmits segment #n+1 without having to wait for 3 duplicate ACKS - Vegas then uses this more accurate RTT estimate to decide to retransmit in the following two situations :

- In addition to being able to detect lost segment sooner than the original TCP Reno, the congestion window in TCP Vegas is decreased due to only losses that happened at the current sending rate, and not due to losses that happened at an earlier, higher rate - This concept is also implemented in TCP New-Reno where any partial ACKs do not bring TCP sender out of Fast Recovery

Vegas : retransmit mechanism (diagram) SENDER RECEIVER #0 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 ACK#7 ACK#8 ACK#9 ACK#10 ACK#11 ACK#12 #15 1 RTT 1st dup ACK#12 : ACK#13 is expected : This is the 1st dup ACK #12 (due to #14) Vegas checks the sending time of the segment #13 and decides to retransmit it. The congestion window is also reduced by half 14/2 = 7 Retx #13 ACK#14 : This is the 1st ACK after retransmission Vegas checks timestamp of #15 and decides to retransmit it The congestion window is not reduced by half since the loss happens before the last window decreases ( we know because it is a partial ACK ). Such a loss does not imply that the network is congested for the current congestion window size, and therefore, does not imply that it should be decreased again 1 RTT : ACK#15 is expected

TCP Vegas Congestion Avoidance Mechanism - It uses the loss of segments as a signal of network congestion - It is reactive, rather than proactive since it cannot detect the incipient stage of congestion and prevent it (before losses occur) Review of TCP Reno’s congestion detection and control mechanism : - As a result, Reno needs to create losses to find the available bandwidth of the connection

Several proactive algorithms : - Based on the fact that as the network approaches congestion, the queue size in intermediate node is increased, resulting in increasing of the RTT for each successive segment : Wang and Crowcroft’s DUAL algorithm Jain’s CARD (Congestion Avoidance using Round-Trip Delay) - Based on the fact that as the network approaches congestion, the sending rate is flattening : Wang and Crowcroft’s Tri-S scheme

Vegas’s congestion avoidance actions : - Generally, Vegas measures and controls the right amount of extra data the connection has in transit - Extra data mean data that would not have been sent if the bandwidth used by the connection exactly matched the available bandwidth of the network - Too much extra data : congestion Too little extra data : cannot respond rapidly enough to transient increases in the available network bandwidth - Based on changes in the estimated amount of extra data in the network, not only dropped segments

- BaseRTT mean the RTT of a segment when the connection is not congested In practice, Vegas sets BaseRTT to the minimum of all measured round trip times - Assumed that the connection is not overflowing, the Expected throughput can be given by : Expected = WindowSize / BaseRTT where WindowSize is the size of the current congestion window (assumed to be the number of bytes in transit)

- Once per round-trip time, Actual sending rate is calculated : Computing the RTT for the distinguished segment when its acknowledgement arrives, and dividing the number of bytes transmitted by the sample RTT - Compare Actual to Expected : Diff = Expected - Actual Since, Expected >= Actual (from the definition), Diff is positive or zero

- Define two thresholds : ( in terms of KB/s ) - Both thresholds represent the lower bound and the upper bound of extra data for a connection - In practice, during congestion avoidance, we express the two thresholds in terms of buffers rather than extra bytes in transit is set to 1 is set to 3 These values can be interpreted as : TCP sender should try to use at least one extra buffer at the bottleneck router, but no more than three extra buffers

Diff : leaves the congestion window unchanged Diff : decreases the congestion window linearly during the next RTT The farther away the actual throughput gets from the expected throughput, the more congestion there is in the network - Diff : increases the congestion window linearly during the next RTT The closer the actual throughput and the expected throughput, the more the network is in danger of not utilizing the available bandwidth

TCP Vegas Modified Slow-Start Mechanism - Slow-Start in TCP Reno : TCP is a “self-clocking” protocol It uses ACKs as a “clock” to strobe new segments into the network At the beginning of a connection or after a retransmit timeout, Slow-Start is used to gradually increase the amount of data in transit (the size of congestion window -- cwnd) The Slow-Start period ends when the exponentially increasing congestion window reaches the threshold window -- ssthresh

Once a retransmit timeout occurs, ssthresh is set to one half of the current cwnd However, when the connection starts, there is no idea how much ssthresh should be initialized to Too small initial ssthresh : throughput suffers Too large initial ssthresh : losses occur For Reno, the initial ssthresh is set to a very high value. TCP sender is blindly in the slow-start phase until a retransmit timeout occurs (timeout means segment losses) At that time, TCP sender has some idea about the available bandwidth of the connection

- Modified Slow-Start in TCP Vegas : Find a connection’s available bandwidth without allowing losses during the initial slow-start Every other RTT, exponential growth is allowed In between, the congestion window stays fixed and the comparison of the expected and actual rates is made When “Expected - Actual == 1”, Vegas switch from Slow-Start to linear/decrease mode Incorporate the congestion detection mechanism into slow-start

SENDER RECEIVER #0 (1) (2) Vegas : modified slow-start mechanism (diagram) Comparison is made #1 #2 #3 #4 (4) Exponential growth #5 #6 #7 #8 Comparison is made...

TCP End-To-End Congestion Control Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001)

Similar presentations

Presentation on theme: "TCP End-To-End Congestion Control Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TCP End-To-End Congestion Control Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001)

Similar presentations

Presentation on theme: "TCP End-To-End Congestion Control Wanida Putthividhya Dept. of Computer Science Iowa State University Jan, 27 th 2002 (May, 25 th 2001)"— Presentation transcript:

Similar presentations

About project

Feedback