Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion Control Dr. Son T. Vuong.

Similar presentations


Presentation on theme: "University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion Control Dr. Son T. Vuong."— Presentation transcript:

1 University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion Control Dr. Son T. Vuong May 24, 2012 The World Connected

2 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

3 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 Point-to-point: one sender, one receiver Reliable, in-order byte steam: no “message boundaries” Pipelined data: TCP congestion and flow control set window size Send & receive buffers Full duplex data: bi-directional data flow in same connection MSS: maximum segment size Connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange Flow controlled: sender will not overwhelm receiver Congestion control CICS 515 – Summer © Dr. Son Vuong

4 TCP Flow/Congestion Control
CICS 515 – Summer © Dr. Son Vuong

5 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number Receive window Urg data pointer checksum F S R P A U head len not used Options (variable length) URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) CICS 515 – Summer © Dr. Son Vuong

6 CICS 515 – Summer 2012 © Dr. Son Vuong
UDP Header Format Checksum computed over: Pseudoheader + UDP header + Data Pseudoheader = src, dest IP addresses + Protocol no. (17) + UDP length CICS 515 – Summer © Dr. Son Vuong

7 4.3 TCP Seq/ACK # Peer Instr - Question
Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK Q: how receiver handles out-of-order segments? A: TCP spec doesn’t say, - up to implementor Host A Host B User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq= 79, ACK= 43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq= ?, ACK= ? time Simple Telnet scenario CICS 515 – Summer © Dr. Son Vuong

8 4.3 TCP Seq/ACK # Peer Instr - Answer
Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK Host A Host B User types ‘Computer’ Seq=42, ACK=79, data = ‘Computer’ host ACKs receipt of ‘Computer’, echoes back ‘Computer’ Seq= 79, ACK= 50, data = ‘Computer’ host ACKs receipt of echoed ‘Computer’ A. <Seq=43, Ack=80> B. <Seq=50, Ack=80> C. <Seq=50, Ack=87> D. <Seq=50, Ack=90> E. None of the above Seq= ?, ACK= ? time Simple Telnet scenario CICS 515 – Summer © Dr. Son Vuong

9 Q 4.2 TCP Peer Instruction Question
Refer to Fig 1 on TCP Seq, Ack, the correct values of Seq and Ack are:  A. <Seq=43, Ack=80> B. <Seq=50, Ack=80> C. <Seq=50, Ack=87> D. <Seq=50, Ack=90> E. None of the above CICS 515 – Summer © Dr. Son Vuong

10 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

11 TCP Connection Management - Setup
Three way handshake: Step 1: client host sends TCP SYN segment to server specifies initial seq # no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data Recall: TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); CICS 515 – Summer © Dr. Son Vuong

12 Connection Establishment
Three-Way Handshake Active participant Passive participant (client) (server) SYN, SequenceNum = x SYN + ACK, SequenceNum = y , ACK, Acknowledgment = + 1 Acknowledgment = When responding with an ACK, both parties will set Acknowledment = x + 1 (or y + 1), since they are announcing which sequence number they are expecting next. The sequence numbers don’t start at 0. They start at a random value, so that two consecutive uses of the same port won’t conflict with each other? Question: Why is this likely? CICS 515 – Summer © Dr. Son Vuong

13 Two-Way Handshake: Obsolete Data Segment
Thus there is a need to introduce an initial sequence no when starting a new connection, that would be far removed from the last seq no of the last connection CICS 515 – Summer © Dr. Son Vuong

14 Two-Way Handshake: Obsolete SYN Segment
CICS 515 – Summer © Dr. Son Vuong

15 Three-Way Handshake: Examples
+1 Three-Way Handshake: Examples CICS 515 – Summer © Dr. Son Vuong

16 CICS 515 – Summer 2012 © Dr. Son Vuong
The Two-Army Problem Attack at 12:00 Blue Army # 1 Blue Army # 2 Ack White Army CICS 515 – Summer © Dr. Son Vuong

17 TCP State Transition Diagram
CLOSED Active open / SYN Passive open Close Close LISTEN SYN/SYN + ACK Send/ SYN SYN/SYN + ACK SYN_RCVD SYN_SENT ACK SYN + ACK/ACK This is only for the connection and termination parts of TCP. Close /FIN ESTABLISHED Close / FIN FIN/ACK FIN_WAIT_1 CLOSE_WAIT FIN/ACK ACK Close / FIN ACK + FIN/ACK FIN_WAIT_2 CLOSING LAST_ACK ACK Timeout after two ACK FIN/ACK segment lifetimes TIME_WAIT CLOSED CICS 515 – Summer © Dr. Son Vuong

18 TCP Connection Management (cont.)
client server Closing a connection: client closes socket: clientSocket.close(); Step 1: client end system sends TCP FIN control segment to server Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. close FIN ACK close FIN ACK timed wait closed CICS 515 – Summer © Dr. Son Vuong

19 TCP Connection Management (cont.)
client server Step 3: client receives FIN, replies with ACK. Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. Connection closed. Note: with small modification, can handle simultaneous FINs. closing FIN ACK closing FIN ACK closed timed wait closed CICS 515 – Summer © Dr. Son Vuong

20 TCP Connection Management (cont)
TCP server TCP client CICS 515 – Summer © Dr. Son Vuong

21 4.1 TCP 3-way Hanshake PeerInstr Question
Why does TCP use 3-way handshake instead of 2-way ? Answer: (A): To ensure the presence of the remote host (B): To ensure the remote host is willing to accept the connection request. (C): To ensure common agreement on the initial sequence number at both sides. (D): To prevent a SYN segment of an old connection from causing confusion on the new connection establishment. (E): C and D CICS 515 – Summer © Dr. Son Vuong

22 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

23 TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service Pipelined segments Cumulative acks TCP uses single retransmission timer Retransmissions are triggered by: timeout events duplicate acks Initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control p. 239 of Text Kurose (4th para), conceptually would need 1 timer for each segment transmitted, but multiple timers lead to much overhead. So, RFC2988 proposes the use of a single retx timer (even if there are multiple outstanding segments) – See Tanenbaum 4th ed. Pages : 1 timer and a linked list of difference in time for consecutive outstanding packets. CICS 515 – Summer © Dr. Son Vuong

24 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP sender events: Data rcvd from app: Create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval Timeout: retransmit segment that caused timeout restart timer Ack rcvd: If acknowledges outstanding (previously unacked) segments update what is known to be acked (update SendBase) start timer if there are still outstanding segments So, retransmitted segment would have a later timeout set, i.e. it would become a later event than some subsequent segment(s). EX: Timeout1 Timeout2 Timeout3 | | | [realtime, -] [5, 1, -]->[8, 2,-]-> [6, 3,X] Frame1 Frame2 Frame3 After first timeout, linked list becomes (effectively starting timer for Frame2): Timeout2 Timeout3 | | [realtime, -] [8, 2,-]-> [6, 3,X] Frame2 Frame3 “Start timer if there are outstanding segments” simply means update timer linked list, e.g. remove the current entry in the timer linked list effectively initiating a new timer (for the next outstanding segment – to be scheduled next). CICS 515 – Summer © Dr. Son Vuong

25 TCP sender (simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) } } /* end of loop forever */ TCP sender (simplified) Comment: SendBase-1: last cumulatively ack’ed byte Example: SendBase-1 = 71; y= 73, so the rcvr wants 73+ ; y > SendBase, so that new data (B 72) is acked Note retransmit only the smallest un-acked segment (e.g. 1 segment) instead of all outstanding segments as in GoBackN. The start timer event will be inserted into (the end of) the timer linked list. CICS 515 – Summer © Dr. Son Vuong

26 TCP vs GobackN vs Selective Repeat
RWS =1, W =< M-1 Discard out-of-sequence segments Cumulative Ack e.g. ACK n Retransmit All outstanding segments e.g. segments n, n+1, …, N Selective Repeat RWS = SWS >1, W =< M/2 Store out-of-sequence segments received Individual Ack (Selective Ack is an option) e.g. ACK n Retransmit only 1 (oldest un-acked) segment, e.g. segment n only TCP is a hybrid compbined GobackN and Selective Repeat (as shown in red) CICS 515 – Summer © Dr. Son Vuong

27 TCP: retransmission scenarios
Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=92 timeout Seq=100, 20 bytes data timeout ACK=100 X ACK=100 loss ACK=120 Seq=92, 8 bytes data Sendbase = 100 Seq=92, 8 bytes data SendBase = 120 Seq=92 timeout ACK=100 ACK=120 SendBase = 100 SendBase = 120 time time Premature timeout Lost ACK scenario CICS 515 – Summer © Dr. Son Vuong

28 TCP retransmission scenarios (more)
Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 timeout timeout Seq=100, 20 bytes data X ACK=120 loss SendBase = 120 SendBase = 120 ACK=120 time time Fig 1.(a) CumulativeACK scenario (b) CumulativeACK scenario CICS 515 – Summer © Dr. Son Vuong

29 TCP retransmission scenarios (more)
Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data X Seq=100, 20 bytes data loss Seq=100, 20 bytes data timeout timeout ACK=120 ACK1= ? SendBase = 120 SendBase = 120 Seq=92, 8 bytes data ACK2= ? time time Fig 2 (a) CumulativeACK scenario (b) CumulativeACK scenario CICS 515 – Summer © Dr. Son Vuong

30 Q 4.2 TCP Peer Instruction Question
Refer to Fig 2b on cumulative Ack, the correct values of Ack1 and Ack 2 are:  A. <Ack1=93, Ack2=101> B. <Ack1=92, Ack2=100> C. <Ack1=92, Ack2=120> D. <Ack1=100, Ack2=120> E. <Ack1=120, Ack2=120> CICS 515 – Summer © Dr. Son Vuong

31 TCP retransmission scenarios (more)
Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data X Seq=100, 20 bytes data loss Seq=100, 20 bytes data timeout timeout ACK=120 ACK=92 SendBase = 120 SendBase = 120 Seq=92, 8 bytes data ACK=120 time time (a’) Cumulative ACK scenario (b’) Cumulative ACK scenario CICS 515 – Summer © Dr. Son Vuong

32 4.5 TCP cummulative ACK PeerInstr Question
Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data X Seq=100, 20 bytes data loss Seq=100, 20 bytes data Seq=120, 20 bytes data timeout timeout ACK=120 ACK1= ? SendBase = 120 SendBase = 120 Seq=92, 8 bytes data ACK2= ? time time Fig 3. (a) CumulativeACK scenario (b) CumulativeACK scenario CICS 515 – Summer © Dr. Son Vuong

33 Q 4.2 TCP Peer Instruction Question
Refer to Fig 3b on cumulative Ack, the correct values of Ack1 and Ack 2 are:  A. <Ack1=93, Ack2=101> B. <Ack1=92, Ack2=100> C. <Ack1=120, Ack2=140> D. <Ack1=92, Ack2=140> E. <Ack1=140, Ack2=140> CICS 515 – Summer © Dr. Son Vuong

34 TCP retransmission scenarios (more)
Host A Host B Host A Seq=92, 8 bytes data loss timeout Host B X Seq=100, 20 bytes data ACK1= ? time ACK4= ? Seq=120, 20 bytes data Seq=140, 20 bytes data ACK2= ? ACK3= ? Seq=92, 8 bytes data Seq=100, 20 bytes data timeout ACK=120 SendBase = 120 SendBase = 120 time (a’) Cumulative ACK scenario (b’) Cumulative ACK scenario CICS 515 – Summer © Dr. Son Vuong

35 TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed expected seq #. But one other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq # . Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK (Start Cumm_Ack Timer). Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments. Stop Cumm_Ack Timer. Store out-of-order segment Immediately send duplicate ACK, indicating seq # of next expected byte Accept segment. Immediate send cumulative ACK, expecting segments starting at lower end of gap CICS 515 – Summer © Dr. Son Vuong

36 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

37 TCP Flow/Congestion Control
CICS 515 – Summer © Dr. Son Vuong

38 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Flow Control sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control receive side of TCP connection has a receive buffer: speed-matching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer CICS 515 – Summer © Dr. Son Vuong

39 TCP Flow control: how it works
Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow guarantees receive buffer doesn’t overflow (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead] CICS 515 – Summer © Dr. Son Vuong

40 TCP Dynamic Window Protocol
Window management in TCP. CICS 515 – Summer © Dr. Son Vuong

41 Q 4.3 TCP Dynamic Window Protocol – Peer Instruction – Question
ACK= ? , WIN = ? (A) 4097, 1 (B) 5096, (C) 5120, (D) 6144, (E) None Window management in TCP CICS 515 – Summer © Dr. Son Vuong

42 CICS 515 – Summer 2012 © Dr. Son Vuong
Fast Retransmit Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Sender often sends many segments back-to-back If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 duplicate ACKs for the same data (it supposes that segment after ACKed data was lost): fast retransmit: resend segment before timer expires CICS 515 – Summer © Dr. Son Vuong

43 Fast retransmit algorithm:
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y Init: count = 0 not-yet-acknowledged segments = outstanding segments, start timer for next outstanding segment (easier to understand to look at single timer approach for multiple outstanding segments) a duplicate ACK for already ACKed segment fast retransmit CICS 515 – Summer © Dr. Son Vuong

44 Example TCP/IP internet
H4 H5 H3 H2 H1 Network 2 (Ethernet) Network 1 (Ethernet) H6 Network 3 (FDDI) Network 4 (point-to-point) H7 R3 H8 H1 H8 TCP TCP R1 R2 R3 IP IP IP IP IP ETH ETH FDDI FDDI PPP PPP ETH ETH CICS 515 – Summer © Dr. Son Vuong

45 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number Receive window Urg data pointer checksum F S R P A U head len not used Options (variable length) URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) CICS 515 – Summer © Dr. Son Vuong

46 CICS 515 – Summer 2012 © Dr. Son Vuong
Limitations of TCP? As with IP, TCP also has limited sized fields (SequenceNum is 32 bits, AdvertisedWindow is 16 bits). Will reuse of these values be a problem if packets hang around on the network for too long? (say, 60 seconds) Can sequence numbers be reused (i.e. 232 = 4GBytes = 32 Gbits transmitted) within 60 seconds? Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds CICS 515 – Summer © Dr. Son Vuong

47 Q4.4 TCP Send # Limit PeerInstr Question
Assume Max Segment Lifetime (MSL) is 30 s, what Internet access data rate would cause a problem with the TCP segment sequence number (32 bits) wrapping around ? Answer: (A): Mbps (T1) (B): 155Mbps (T3) (C): 622Mbps (T12) (D): 1.2Gbps (T24) (E): None of those CICS 515 – Summer © Dr. Son Vuong

48 CICS 515 – Summer 2012 © Dr. Son Vuong
Limitations Cont. How about AdvertisedWindow? (16-bits) ? Worst! Assuming we have large buffers, large RTT, and a high speed link, will 64Kbytes of data be enough to keep the pipe full? For 100ms RTT (typical for cross-country): Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Delay x Bandwidth Product 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB CICS 515 – Summer © Dr. Son Vuong

49 Q 4.5 TCP AdvertiseWin Limit PeerInstr Question
Assume the RTT between A and B is 1s, what Internet access data rate would cause a problem with the size of the TCP advertiseWindow (16 bits) that limits the amount of outstanding data ? Answer: (A): Mbps (T1) (B): 10 Mbps (Ethernet) (C): 45 Mbps (T3) (D): 155 Mbps (OC 3) (E): None of those CICS 515 – Summer © Dr. Son Vuong

50 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Extensions We’re almost at the limits of TCP. We’ll still be able to send data correctly, but not as efficiently. Solutions? TCP Extensions (specified as options in header): 32-bit timestamp (to calculate RTT) “Extend” SequenceNum field: <32-bit timestamp> <32-bit SequenceNum> Window Scale Factor (1 byte) to extend AdvertisedWindow x 2ScaleFactor Another TCP option: MSS ( bytes) (default = 536 bytes) – requested by destination, not the source. Option Padding (1B): End-of-Option; No-operation. CICS 515 – Summer © Dr. Son Vuong

51 Silly Window Syndrome (SWS)
SWS arises when either sending application creates data slowly or receiving application consumes data slowly, or both (1byte at at time). Sender: may create lots of 21-byte segments (41-byte IP packets) to send 1 byte data. Receiver: may send AdvertisedWindow of 1 byte forcing sender to send segments of 1 byte data CICS 515 – Summer © Dr. Son Vuong

52 Silly Window Syndrome - Solutions
Sender (Nagle’s algorithm): After sending first segment, wait until ACK (or MSS buffer filling) before sending next segment. (Taking into account speed of application and speed of network) Receiver - two solutions: ACK with window 0 until enough space for MSS or ½ buffer size (Clark) Delayed ACK (max ½ sec) CICS 515 – Summer © Dr. Son Vuong

53 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

54 TCP Round Trip Time and Timeout
Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT CICS 515 – Summer © Dr. Son Vuong

55 Adaptive Retransmission
In any reliable protocol, the sender will retransmit a message if it doesn’t receive an acknowledgement within a timeout period. How long should the timeout be? On the Internet, it varies quite dramatically. We must adaptively calculate a timeout value, based on our previous/current experience of how long an acknowledgement takes. Three methods: Original Method Karn/Partridge Jacobson/Karels CICS 515 – Summer © Dr. Son Vuong

56 Adaptive Retransmission Original Algorithm
Measure SampleRTT for each Packet/ACK pair Compute the weighted (smoothed) average of RTT EstimatedRTT = a x SampleRTT + (1-a) x EstimatedRTT where a is between 0.1 and 0.2, e.g Set timeout based on EstimatedRTT TimeOut = 2 x EstimatedRTT CICS 515 – Summer © Dr. Son Vuong

57 Example RTT estimation:
CICS 515 – Summer © Dr. Son Vuong

58 Karn/Partridge Algorithm
Problem with original method: The RTT time would also include those packets that were retransmitted. Hence, the RTT was effectively doubled, biasing the smoothed value. Solution: Don’t sample RTT for the packets that were retransmitted. Another modification: Double the timeout after each retransmission, since you are expecting that it might take longer. (Like Ethernet’s Binary Exponential Backoff) CICS 515 – Summer © Dr. Son Vuong

59 SampleRTT - Associating ACK with (re)transmission
Sender Receiver Original transmission ACK SampleR TT Retransmission (a) (b) CICS 515 – Summer © Dr. Son Vuong

60 Jacobson/Karels Algorithm
The variance of RTT indicates how likely we are to retransmit Low variance means consistent RTT, a quiet network, unlikely to lose packets. High variance, unpredictable delay, many packets lost. Solution: Keep track of the variance in the RTT, and add it onto our RTT estimate. CICS 515 – Summer © Dr. Son Vuong

61 CICS 515 – Summer 2012 © Dr. Son Vuong
Jacobson/Karels Cont. New calculation for average RTT Diff = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (d x Diff) Deviation = Deviation + d(|Diff|- Deviation) where d is a fraction between 0 and 1 To determine the timeout value: TimeOut = m x EstimatedRTT + f x Deviation where m = 1 and f = 4 TimeOut = EstimatedRTT + 4x Deviation CICS 515 – Summer © Dr. Son Vuong

62 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

63 Principles of Congestion Control
informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! CICS 515 – Summer © Dr. Son Vuong

64 Causes/costs of congestion: scenario 1
Host A lout two senders, two receivers one router, infinite buffers no retransmission lin : original data Host B unlimited shared output link buffers C large delays when congested maximum achievable throughput CICS 515 – Summer © Dr. Son Vuong

65 Causes/costs of congestion: scenario 2
one router, finite buffers sender retransmission of lost packet Host A lout lin : original data l'in : original data, plus retransmitted data Host B finite shared output link buffers CICS 515 – Summer © Dr. Son Vuong

66 Causes/costs of congestion: scenario 2
l in out = always: (goodput) “perfect” retransmission only when loss: retransmission of delayed (not lost) packet makes larger (than perfect case) for same l in out > l in l out “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt CICS 515 – Summer © Dr. Son Vuong

67 Causes/costs of congestion: scenario 3
four senders multihop paths timeout/retransmit l in Q: what happens as and increase ? l in Host A lout lin : original data l'in : original data, plus retransmitted data finite shared output link buffers Host B CICS 515 – Summer © Dr. Son Vuong

68 Causes/costs of congestion: scenario 3
Host A Host B lout Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted! CICS 515 – Summer © Dr. Son Vuong

69 Approaches towards congestion control
Two broad approaches towards congestion control: End-end congestion control: (host-based) no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at CICS 515 – Summer © Dr. Son Vuong

70 CICS 515 – Summer 2012 © Dr. Son Vuong
Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP segment structure connection management reliable data transfer flow control RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control CICS 515 – Summer © Dr. Son Vuong

71 TCP Congestion Control
end-end control (no network assistance) sender limits transmission: LastByteSent-LastByteAcked  CongWin Roughly, CongWin is dynamic, function of perceived network congestion How does sender perceive congestion? loss event = timeout or 3 duplicate acks TCP sender reduces rate (CongWin) after loss event three mechanisms: AIMD slow start conservative after timeout events rate = CongWin RTT Bytes/sec CICS 515 – Summer © Dr. Son Vuong

72 TCP Congestion Control
Objective: adjust to changes in the available capacity New state variable per connection: CongestionWindow limits how much data source has in transit MaxWin = MIN(CongestionWindow, AdvertisedWindow) EffWin = MaxWin - (LastByteSent - LastByteAcked) Idea: increase CongestionWindow when congestion goes down decrease CongestionWindow when congestion goes up Question: how does the source determine whether or not the network is congested? Answer: a timeout occurs timeout signals that a packet was lost packets are seldom lost due to transmission error lost packet implies congestion CICS 515 – Summer © Dr. Son Vuong

73 Additive Increase/Multiplicative Decrease
Algorithm: increment CongestionWindow by one packet per RTT (per CongestionWindow’s worth of data) (linear increase) divide CongestionWindow by two whenever a timeout occurs (multiplicative decrease) In practice: increment a little for each ACK Increment = MSS * MSS/CongestionWindow CongestionWindow += Increment CICS 515 – Summer © Dr. Son Vuong

74 CongestionWindow - linear increase
Source Destination CICS 515 – Summer © Dr. Son Vuong

75 CongestionWindow Trace CICS 515 – Summer 2012 © Dr. Son Vuong
Example trace: sawtooth behavior 70 60 50 40 KB 30 20 10 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 Time (seconds) CICS 515 – Summer © Dr. Son Vuong

76 CICS 515 – Summer 2012 © Dr. Son Vuong
Slow Start Objective: To ramp up to the available capacity Idea: begin with CongestionWindow = 1 packet double CongestionWindow each RTT (increment by 1 packet for each ACK) Exponential growth, but slower than all in one blast Need a additional parameter: ssthreh ssthresh = MSS when starting a connection Used... when first starting connection when connection goes dead waiting for a timeout CICS 515 – Summer © Dr. Son Vuong

77 Slow Start - exponential increase of CongestionWindow
Source Destination Problem: lose up to half a CongestionWindow‘s worth of data CICS 515 – Summer © Dr. Son Vuong

78 Slow Start and Congestion Avoidance
Increasing Strategy: Slow start Start connection with cwnd=1 (MSS), ssthresh=1/2 AdvWin Increment cwnd at each ACK, to some max (ssthresh) (i.e. exponential increase) Congestion avoidance For cwnd >=ssthresh, increase cwnd by 1 for each RTT (i.e. linear increase) Decreasing Strategy: When a timeout occurs Set slow start threshold to half current congestion window ssthresh=cwnd/2 and cwnd = 1; and Repeat Increasing Strategy: slow start (until cwnd=ssthresh) and congestion avoidance (when cwnd >=ssthresh) CICS 515 – Summer © Dr. Son Vuong

79 Slow Start and Congestion Avoidance
CICS 515 – Summer © Dr. Son Vuong

80 Fast Retransmit and Fast Recovery
Problem: idle periods by coarse-grain TCP timeouts Fast retransmit: use 3 duplicate ACKs to trigger retx. Fast recovery: remove the slow start phase; go directly to half the last successful CongestionWindow Sender Receiver Packet 1 Packet 2 ACK 1 Packet 3 Packet 4 ACK 2 Packet 5 ACK 2 Packet 6 ACK 2 ACK 2 Retransmit packet 3 ACK 6 CICS 515 – Summer © Dr. Son Vuong

81 Summary: TCP Congestion Control (Reno)
When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. CICS 515 – Summer © Dr. Son Vuong

82 Q 4.6 TCP Congestion Control – Peer Instruction – Question
Consider a TCP design and implementation with a window extension that allows window sizes much larger than 64KB.  Suppose this extended TCP is used to transfer a 20-MB file over a 1-Gbps link with a  roundtrip latency (RTT) of 500ms.  Assume the TCP advertise window is larger than 16MB and TCP sends 2-KB packets.  Furthermore, assume no congestion and no lost packets. (1)  How many RTTs does it take until slow start opens the send window to 4MB? Answer: A:  10        B: 11       C: 12        D: 13       E: None CICS 515 – Summer © Dr. Son Vuong

83 Q 4.7 TCP Congestion Control PI Question
Identify the following: 1. Timeouts 2. 3 Duplicate ACKs 3. (start of) Congestion Avoidance (CA) 4. Start of Slow Start C B E G D I L F H K A J CICS 515 – Summer © Dr. Son Vuong

84 Q 4.8 TCP Congestion Control PI Quest
How long would it take to send a file of 200,000 B assuming the segment size is 1000B and each round (RTT) is 10 ms ? Use the congestion window scenario in the figure below; and assume the data in the previous round (RTT) get lost when timeout or 3 duplicate Acks occur. Answer: (A): 200 s (B): 170 s (C): 60 s (D): 30 s (E): None of those CICS 515 – Summer © Dr. Son Vuong

85 CICS 515 – Summer 2012 © Dr. Son Vuong
Q 4.9 TCP Throughput and Congestion Control – Peer Instruction – Question To achieve a throughput of 1Gbps over the Internet (using TCP), what would be the constraint (upper bound) on the segment loss rate ? Answer: A:  10**(-6)       B: 10**(-8)     C: 10**(-9)     D: 10**(-10)         E: None CICS 515 – Summer © Dr. Son Vuong

86 TCP sender congestion control
Event State TCP Sender Action Comments ACK receipt for previously unacked data Slow Start (SS) CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance” Resulting in a doubling of CongWin every RTT Congestion Avoidance (CA) CongWin = CongWin+MSS * (MSS/CongWin) Additive increase, resulting in increase of CongWin by 1 MSS every RTT Loss event detected by triple duplicate ACK SS or CA Threshold = CongWin/2, CongWin = Threshold, Set state to “Congestion Avoidance” Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS. Timeout CongWin = 1 MSS, Set state to “Slow Start” Enter slow start Duplicate ACK Increment duplicate ACK count for segment being acked CongWin and Threshold not changed CICS 515 – Summer © Dr. Son Vuong

87 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP throughput What’s the average throughout of TCP as a function of window size and RTT? Ignore slow start Let W be the window size when loss occurs. When window is W, throughput is W/RTT Just after loss, window drops to W/2, throughput to W/2RTT. Average throughout: .75 W/RTT CICS 515 – Summer © Dr. Son Vuong

88 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Futures Example: 2000 byte segments, 100ms RTT, want 10 Gbps throughput Requires window size W = 83,333 in-flight segments Throughput in terms of loss rate: ➜ Loss Rate =< L = 2· hard to achieve ! New versions of TCP for high-speed needed! TP = >= 10 Gbps TP = .75 * W/RTT => W = RTT * TP / .75 = 4/3 * 100 ms * 10 Gbps = 4/3 * 10**9 bits = (4/3)/8 * 10**9 Bytes = 1/6 GB = GB = 1/6 * 10**9 Bytes /1500 B/Segment = 1,000,000/(6*1.5) = 1M/9 = segments Some error in given data: Segment size should be 2000B instead of 1500B = > GB/2000 B = 83,333 segments L = (1.22 * 2000B / (100ms * 10 **10) )2 = (1.22 * 2 * 10**-5) 2 = 1.48 * 4 * 10**-10 = 5.92 * 10**-10 (NOT 2 * 10**-10) Presumably K&R assume MSS = 1500. L = (1.22 * 1500 B / (100ms * 10 **10) )2 = (1.22 * 1.5 * 10**-5) 2 = 1.48 * 2.25 * 10**-10 = 3.33 * 10**-10 (Still not quite 2 * 10**-10) == Proof that TP = 1.22*MSS/ (RTT* sqrt(L)) where L=loss rate assuming of 1 segment for each cycle starting from W/2: Problem 38 The loss rate, , is the ratio of the number of packets lost over the number of packets sent. In a cycle, 1 packet is lost. The number of packets sent in a cycle is: W/2 + (W/2 + 1) +… + W = Sum[from n=0 to n=W/2] (W/2 + n) = W/2(W/2 + 1) + (W/2)(W/2 +1)/2 = W**2/4 + W/2 + W**2/8 + W/4 = (3/8) W**2 + (3/4)W Thus the loss rate is L = 1 / ((3/8)W**2 + (3/4) W)) b) For large W, W**2 >> W, Thus L = 8/(3W**2) or W = sqrt((8/3)/L) = 1.63/sqrt(L) messages. From the text, we therefore have : Average throughout: .75 W/RTT = .75*1.63 MSS/(RTT*sqrt(L)) = 1.22*MSS/(RTT*sqrt(L)) CICS 515 – Summer © Dr. Son Vuong

89 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R CICS 515 – Summer © Dr. Son Vuong

90 CICS 515 – Summer 2012 © Dr. Son Vuong
Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 2 throughput loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R CICS 515 – Summer © Dr. Son Vuong

91 CICS 515 – Summer 2012 © Dr. Son Vuong
Fairness (more) Fairness and UDP Multimedia apps often do not use TCP do not want rate throttled by congestion control Instead use UDP: pump audio/video at constant rate, tolerate packet loss UDP traffic can bring Internet TP to a halt! Research area: TCP friendly Congestion control against UDP traffic Fairness and parallel TCP connections nothing prevents app from opening parallel cnctions between 2 hosts. Web browsers do this Example: link of rate R supporting 9 cnctions; If new app asks for 1 TCP, gets rate R/10 If new app asks for 11 TCPs, gets > R/2 : unfair ! CICS 515 – Summer © Dr. Son Vuong

92 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP/UDP Summary TCP provides reliable full-duplex connections. TCP Streams, credit flow control, 3-way handshake TCP extension: large packet, scaling Slow-start, Fast retransmit/recovery UDP is end-to-end, connectionless and simple. No flow/error control Next: leaving the network “edge” (application, transport layers) and going into the network “core” CICS 515 – Summer © Dr. Son Vuong

93 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Releases Original: Slowstart at CongWin = 1 MSS Tahoe: (4.3 BSD): Slowstart + Congestion Avoidance Reno: BSD Network Rel (BNR1): Fast Retx after 3 DupAck + Go to CongAvoid (thresh = 1/2 CongWin) (BNR2) Vegas: Rate-based Compare ExpRate = CW/BaseRTT with ActualRate = SegSize/RTT If Diff<a : (no congestion) Increase rate Diff > b: (congestion) Reduce rate CICS 515 – Summer © Dr. Son Vuong

94 CICS 515 – Summer 2012 © Dr. Son Vuong
Timers in TCP Retx Timer (RT): start when sending a segment (Issue: how to estimate Retx timeout interval) Persistence Timer (PT): start when receiving Ack,Win=0, Timeout -> Send Probe(1B data) Value(PT) = Value(RT); Timeout -> Probe at binary exponent interval till 60s; repeat Probe every 60s till Win is reopened. Keepalive Timer (KT): Check if client is alive (not muet). Val(KT) = 2 hrs; -> 20 Probes at 75s; close conn. Time-waited Timer: wait 2 segment lifetime (2x60s) before starting a new connection CICS 515 – Summer © Dr. Son Vuong

95 CICS 515 – Summer 2012 © Dr. Son Vuong

96 Delay modeling (bonus slides)
Notation, assumptions: Assume one link between client and server of rate R S: MSS (bits) O: object size (bits) no retransmissions (no loss, no corruption) Window size: First assume: fixed congestion window, W segments Then dynamic window, modeling slow start Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: TCP connection establishment data transmission delay slow start CICS 515 – Summer © Dr. Son Vuong

97 Fixed congestion window (1)
First case: WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R CICS 515 – Summer © Dr. Son Vuong

98 Fixed congestion window (2)
Second case: WS/R < RTT + S/R: wait for ACK after sending window’s worth of data sent delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] CICS 515 – Summer © Dr. Son Vuong

99 TCP Delay Modeling: Slow Start (1)
Now suppose window grows according to slow start Will show that the delay for one object is: R S RTT P O Latency ) 1 2 ( - ú û ù ê ë é + = where P is the number of times TCP idles at server: - where Q is the number of times the server idles if the object were of infinite size. - and K is the number of windows that cover the object. CICS 515 – Summer © Dr. Son Vuong

100 TCP Delay Modeling: Slow Start (2)
Delay components: 2 RTT for connection estab and request O/R to transmit object time server idles due to slow start Server idles: P = min{K-1,Q} times Example: O/S = 15 segments K = 4 windows Q = 2 P = min{K-1,Q} = 2 Server idles P=2 times CICS 515 – Summer © Dr. Son Vuong

101 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Delay Modeling (3) R S RTT P O idleTime k p ) 1 2 ( ] [ delay - + = å CICS 515 – Summer © Dr. Son Vuong

102 CICS 515 – Summer 2012 © Dr. Son Vuong
TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K ? ú ù ê é + = - ) 1 ( log )} : { min } 2 / S O k K L Calculation of Q, number of idles for infinite-size object, is similar (see HW). CICS 515 – Summer © Dr. Son Vuong

103 CICS 515 – Summer 2012 © Dr. Son Vuong
HTTP Modeling Assume Web page consists of: 1 base HTML page (of size O bits) M images (each of size O bits) Non-persistent HTTP: M+1 TCP connections in series Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Persistent HTTP: 2 RTT to request and receive base HTML file 1 RTT to request and receive M images Response time = (M+1)O/R + 3RTT + sum of idle times Non-persistent HTTP with X parallel connections Suppose M/X integer. 1 TCP connection for base file M/X sets of parallel connections for images. Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times CICS 515 – Summer © Dr. Son Vuong

104 CICS 515 – Summer 2012 © Dr. Son Vuong
HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 2 4 6 8 10 12 14 16 18 20 28 Kbps 100 1 Mbps Mbps non-persistent persistent parallel non- For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections. CICS 515 – Summer © Dr. Son Vuong

105 CICS 515 – Summer 2012 © Dr. Son Vuong
HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delaybandwidth networks. CICS 515 – Summer © Dr. Son Vuong


Download ppt "University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion Control Dr. Son T. Vuong."

Similar presentations


Ads by Google