Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jump to first page CICS 515 (Part 1) University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion.

Similar presentations


Presentation on theme: "Jump to first page CICS 515 (Part 1) University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion."— Presentation transcript:

1

2 Jump to first page CICS 515 (Part 1) University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion Control Dr. Son T. Vuong May 24, 2012 The World Connected

3 CICS 515 – Summer 2012 © Dr. Son Vuong 2 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

4 CICS 515 – Summer 2012 © Dr. Son Vuong 3 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 Full duplex data: u bi-directional data flow in same connection u MSS: maximum segment size Connection-oriented: u handshaking (exchange of control msgs) init’s sender, receiver state before data exchange Flow controlled: u sender will not overwhelm receiver Congestion control Point-to-point: u one sender, one receiver Reliable, in-order byte steam: u no “message boundaries” Pipelined data: u TCP congestion and flow control set window size Send & receive buffers

5 CICS 515 – Summer 2012 © Dr. Son Vuong 4 TCP Flow/Congestion Control

6 CICS 515 – Summer 2012 © Dr. Son Vuong 5 TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number Receive window Urg data pointer checksum F SR PAU head len not used Options (variable length) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) # bytes rcvr willing to accept counting by bytes of data (not segments!) Internet checksum (as in UDP)

7 CICS 515 – Summer 2012 © Dr. Son Vuong 6 UDP Header Format Checksum computed over: Pseudoheader + UDP header + Data Pseudoheader = src, dest IP addresses + Protocol no. (17) + UDP length

8 CICS 515 – Summer 2012 © Dr. Son Vuong TCP Seq/ACK # Peer Instr - Question Seq. #’s: u byte stream “number” of first byte in segment’s data ACKs: u seq # of next byte expected from other side u cumulative ACK Q: how receiver handles out-of- order segments? u A: TCP spec doesn’t say, - up to implementor Host A Host B Seq=42, ACK=79, data = ‘C’ Seq= 79, ACK= 43, data = ‘C’ Seq= ?, ACK= ? User types ‘C’ host ACKs receipt of echoed ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ time Simple Telnet scenario

9 CICS 515 – Summer 2012 © Dr. Son Vuong TCP Seq/ACK # Peer Instr - Answer Seq. #’s: u byte stream “number” of first byte in segment’s data ACKs: u seq # of next byte expected from other side u cumulative ACK Host A Host B Seq=42, ACK=79, data = ‘Computer’ Seq= 79, ACK= 50, data = ‘Computer’ Seq= ?, ACK= ? User types ‘Computer’ host ACKs receipt of echoed ‘Computer’ host ACKs receipt of ‘Computer’, echoes back ‘Computer’ time Simple Telnet scenario A. B. C. D. E. None of the above

10 CICS 515 – Summer 2012 © Dr. Son Vuong 9 Q 4.2 TCP Peer Instruction Question Refer to Fig 1 on TCP Seq, Ack, the correct values of Seq and Ack are: A. B. C. D. E. None of the above

11 CICS 515 – Summer 2012 © Dr. Son Vuong 10 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

12 CICS 515 – Summer 2012 © Dr. Son Vuong 11 TCP Connection Management - Setup Recall: TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: u seq. #s  buffers, flow control info (e.g. RcvWindow ) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1: client host sends TCP SYN segment to server u specifies initial seq # u no data Step 2: server host receives SYN, replies with SYNACK segment u server allocates buffers u specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data

13 CICS 515 – Summer 2012 © Dr. Son Vuong 12 Connection Establishment Three-Way Handshake Active participant (client) Passive participant (server) SYN, SequenceNum = x SYN + ACK, SequenceNum = y, ACK, Acknowledgment = y + 1 Acknowledgment = x + 1

14 CICS 515 – Summer 2012 © Dr. Son Vuong 13 Two-Way Handshake: Obsolete Data Segment

15 CICS 515 – Summer 2012 © Dr. Son Vuong 14 Two-Way Handshake: Obsolete SYN Segment

16 CICS 515 – Summer 2012 © Dr. Son Vuong 15 Three-Way Handshake: Examples +1

17 CICS 515 – Summer 2012 © Dr. Son Vuong 16 The Two-Army Problem Blue Army # 1 Blue Army # 2 White Army Attack at 12:00 Ack

18 CICS 515 – Summer 2012 © Dr. Son Vuong 17 TCP State Transition Diagram This is only for the connection and termination parts of TCP. CLOSED LISTEN SYN_RCVDSYN_SENT ESTABLISHED CLOSE_WAIT LAST_ACKCLOSING TIME_WAIT FIN_WAIT_2 FIN_WAIT_1 Passive open Close Send/SYN SYN/SYN + ACK SYN + ACK/ACK SYN/SYN + ACK ACK Close/FIN FIN/ACK Close / FIN FIN/ACK ACK + FIN/ACK Timeout after two segment lifetimes FIN/ACK ACK Close / FIN Close CLOSED Active open / SYN

19 CICS 515 – Summer 2012 © Dr. Son Vuong 18 TCP Connection Management (cont.) Closing a connection: client closes socket: clientSocket.close(); Step 1: client end system sends TCP FIN control segment to server Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. client FIN server ACK FIN close closed timed wait

20 CICS 515 – Summer 2012 © Dr. Son Vuong 19 TCP Connection Management (cont.) Step 3: client receives FIN, replies with ACK. u Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. Connection closed. Note: with small modification, can handle simultaneous FINs. client FIN server ACK FIN closing closed timed wait closed

21 CICS 515 – Summer 2012 © Dr. Son Vuong 20 TCP Connection Management (cont) TCP client TCP server

22 CICS 515 – Summer 2012 © Dr. Son Vuong TCP 3-way Hanshake PeerInstr Question Why does TCP use 3-way handshake instead of 2-way ? Answer: (A): To ensure the presence of the remote host (B): To ensure the remote host is willing to accept the connection request. (C): To ensure common agreement on the initial sequence number at both sides. (D): To prevent a SYN segment of an old connection from causing confusion on the new connection establishment. (E): C and D

23 CICS 515 – Summer 2012 © Dr. Son Vuong 22 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

24 CICS 515 – Summer 2012 © Dr. Son Vuong 23 TCP reliable data transfer TCP creates rdt service on top of IP’s unreliable service Pipelined segments Cumulative acks TCP uses single retransmission timer Retransmissions are triggered by: u timeout events u duplicate acks Initially consider simplified TCP sender: u ignore duplicate acks u ignore flow control, congestion control

25 CICS 515 – Summer 2012 © Dr. Son Vuong 24 TCP sender events: Data rcvd from app: Create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval Timeout: retransmit segment that caused timeout restart timer Ack rcvd: If acknowledges outstanding (previously unacked) segments u update what is known to be acked (update SendBase) u start timer if there are still outstanding segments

26 CICS 515 – Summer 2012 © Dr. Son Vuong 25 TCP sender (simplified) NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ Comment: SendBase-1: last cumulatively ack’ed byte Example: SendBase-1 = 71; y= 73, so the rcvr wants 73+ ; y > SendBase, so that new data (B 72) is acked

27 CICS 515 – Summer 2012 © Dr. Son Vuong 26 TCP vs GobackN vs Selective Repeat GobackN RWS =1, W =< M-1 Discard out-of-sequence segments Cumulative Ack e.g. ACK n Retransmit All outstanding segments e.g. segments n, n+1, …, N Selective Repeat RWS = SWS >1, W =< M/2 Store out-of-sequence segments received Individual Ack (Selective Ack is an option) e.g. ACK n Retransmit only 1 (oldest un-acked) segment, e.g. segment n only TCP is a hybrid compbined GobackN and Selective Repeat (as shown in red)

28 CICS 515 – Summer 2012 © Dr. Son Vuong 27 TCP: retransmission scenarios Host A Seq=100, 20 bytes data ACK=100 time Premature timeout Host B Seq=92, 8 bytes data ACK=120 Seq=92, 8 bytes data Seq=92 timeout ACK=120 Host A Seq=92, 8 bytes data ACK=100 loss timeout Lost ACK scenario Host B X Seq=92, 8 bytes data ACK=100 time Seq=92 timeout SendBase = 100 SendBase = 120 SendBase = 120 Sendbase = 100

29 CICS 515 – Summer 2012 © Dr. Son Vuong 28 TCP retransmission scenarios (more) Host A Seq= 92, 8 bytes data ACK=100 loss timeout (b) CumulativeACK scenario Host B X Seq= 100, 20 bytes data ACK=120 time SendBase = 120 Host A Seq=92, 8 bytes data timeout Fig 1.(a) CumulativeACK scenario Host B Seq=100, 20 bytes data ACK=120 time SendBase = 120

30 CICS 515 – Summer 2012 © Dr. Son Vuong 29 TCP retransmission scenarios (more) Host A Seq= 92, 8 bytes data loss timeout (b) CumulativeACK scenario Host B X Seq= 100, 20 bytes data ACK1= ? time SendBase = 120 Host A Seq=92, 8 bytes data timeout Fig 2 (a) CumulativeACK scenario Host B Seq=100, 20 bytes data ACK=120 time SendBase = 120 Seq=92, 8 bytes data ACK2= ?

31 CICS 515 – Summer 2012 © Dr. Son Vuong 30 Q 4.2 TCP Peer Instruction Question Refer to Fig 2b on cumulative Ack, the correct values of Ack1 and Ack 2 are: A. B. C. D. E.

32 CICS 515 – Summer 2012 © Dr. Son Vuong 31 TCP retransmission scenarios (more) Host A Seq= 92, 8 bytes data loss timeout (b’) Cumulative ACK scenario Host B X Seq= 100, 20 bytes data ACK=92 time SendBase = 120 Host A Seq=92, 8 bytes data timeout (a’) Cumulative ACK scenario Host B Seq=100, 20 bytes data ACK=120 time SendBase = 120 Seq=92, 8 bytes data ACK=120

33 CICS 515 – Summer 2012 © Dr. Son Vuong TCP cummulative ACK PeerInstr Question Host A Seq= 92, 8 bytes data loss timeout (b) CumulativeACK scenario Host B X Seq= 120, 20 bytes data ACK1= ? time SendBase = 120 Host A Seq=92, 8 bytes data timeout Fig 3. (a) CumulativeACK scenario Host B Seq=100, 20 bytes data ACK=120 time SendBase = 120 Seq=92, 8 bytes data ACK2= ? Seq= 100, 20 bytes data

34 CICS 515 – Summer 2012 © Dr. Son Vuong 33 Q 4.2 TCP Peer Instruction Question Refer to Fig 3b on cumulative Ack, the correct values of Ack1 and Ack 2 are: A. B. C. D. E.

35 CICS 515 – Summer 2012 © Dr. Son Vuong 34 TCP retransmission scenarios (more) (b’) Cumulative ACK scenario SendBase = 120 Host A Seq=92, 8 bytes data timeout (a’) Cumulative ACK scenario Host B Seq=100, 20 bytes data ACK=120 time SendBase = 120 Host A Seq= 92, 8 bytes data loss timeout Host B X Seq= 100, 20 bytes data ACK1= ? time Seq=92, 8 bytes data ACK4= ? Seq= 120, 20 bytes data Seq= 140, 20 bytes data ACK2= ? ACK3= ?

36 CICS 515 – Summer 2012 © Dr. Son Vuong 35 TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. But one other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq #. Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK (Start Cumm_Ack Timer). Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments. Stop Cumm_Ack Timer. Store out-of-order segment Immediately send duplicate ACK, indicating seq # of next expected byte Accept segment. Immediate send cumulative ACK, expecting segments starting at lower end of gap

37 CICS 515 – Summer 2012 © Dr. Son Vuong 36 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

38 CICS 515 – Summer 2012 © Dr. Son Vuong 37 TCP Flow/Congestion Control

39 CICS 515 – Summer 2012 © Dr. Son Vuong 38 TCP Flow Control receive side of TCP connection has a receive buffer: speed-matching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control

40 CICS 515 – Summer 2012 © Dr. Son Vuong 39 TCP Flow control: how it works (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow = RcvBuffer - [LastByteRcvd - LastByteRead] Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow u guarantees receive buffer doesn’t overflow

41 CICS 515 – Summer 2012 © Dr. Son Vuong 40 TCP Dynamic Window Protocol Window management in TCP.

42 CICS 515 – Summer 2012 © Dr. Son Vuong 41 Q 4.3 TCP Dynamic Window Protocol – Peer Instruction – Question Window management in TCP ACK= ?, WIN = ? (A) 4097, 1 (B) 5096, 1000 (C) 5120, 1024 (D) 6144, 2048 (E) None

43 CICS 515 – Summer 2012 © Dr. Son Vuong 42 Fast Retransmit Time-out period often relatively long: u long delay before resending lost packet Detect lost segments via duplicate ACKs. u Sender often sends many segments back-to-back u If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 duplicate ACKs for the same data (it supposes that segment after ACKed data was lost): u fast retransmit: resend segment before timer expires

44 CICS 515 – Summer 2012 © Dr. Son Vuong 43 event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } Fast retransmit algorithm: a duplicate ACK for already ACKed segment fast retransmit

45 CICS 515 – Summer 2012 © Dr. Son Vuong 44 Example TCP/IP internet R1 ETH FDDI IP ETH TCP R2 FDDI PPP IP R3 PPP ETH H1 IP ETH TCP H8 R2 R1 H4 H5 H3 H2 H1 Network 2 (Ethernet) Network 1 (Ethernet) H6 Network 3 (FDDI) Network 4 (point-to-point) H7R3H8 IP

46 CICS 515 – Summer 2012 © Dr. Son Vuong 45 TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number Receive window Urg data pointer checksum F SR PAU head len not used Options (variable length) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) # bytes rcvr willing to accept counting by bytes of data (not segments!) Internet checksum (as in UDP)

47 CICS 515 – Summer 2012 © Dr. Son Vuong 46 Limitations of TCP? As with IP, TCP also has limited sized fields (SequenceNum is 32 bits, AdvertisedWindow is 16 bits). Will reuse of these values be a problem if packets hang around on the network for too long? (say, 60 seconds) Can sequence numbers be reused (i.e = 4GBytes = 32 Gbits transmitted) within 60 seconds? Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds

48 CICS 515 – Summer 2012 © Dr. Son Vuong 47 Q4.4 TCP Send # Limit PeerInstr Question Assume Max Segment Lifetime (MSL) is 30 s, what Internet access data rate would cause a problem with the TCP segment sequence number (32 bits) wrapping around ? Answer: (A): Mbps (T1) (B): 155Mbps (T3) (C): 622Mbps (T12) (D): 1.2Gbps (T24) (E): None of those

49 CICS 515 – Summer 2012 © Dr. Son Vuong 48 Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Delay x Bandwidth Product 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB Limitations Cont. How about AdvertisedWindow? (16-bits) ? Worst! Assuming we have large buffers, large RTT, and a high speed link, will 64Kbytes of data be enough to keep the pipe full? For 100ms RTT (typical for cross-country):

50 CICS 515 – Summer 2012 © Dr. Son Vuong 49 Q 4.5 TCP AdvertiseWin Limit PeerInstr Question Assume the RTT between A and B is 1s, what Internet access data rate would cause a problem with the size of the TCP advertiseWindow (16 bits) that limits the amount of outstanding data ? Answer: (A): Mbps (T1) (B): 10 Mbps (Ethernet) (C): 45 Mbps (T3) (D): 155 Mbps (OC 3) (E): None of those

51 CICS 515 – Summer 2012 © Dr. Son Vuong 50 TCP Extensions We’re almost at the limits of TCP. We’ll still be able to send data correctly, but not as efficiently. Solutions? TCP Extensions (specified as options in header): u 32-bit timestamp (to calculate RTT) u “Extend” SequenceNum field: u Window Scale Factor (1 byte) to extend AdvertisedWindow x 2 ScaleFactor Another TCP option: MSS ( bytes) (default = 536 bytes) – requested by destination, not the source. Option Padding (1B): End-of-Option; No-operation.

52 CICS 515 – Summer 2012 © Dr. Son Vuong 51 Silly Window Syndrome (SWS) SWS arises when either sending application creates data slowly or receiving application consumes data slowly, or both (1byte at at time). Sender: may create lots of 21-byte segments (41-byte IP packets) to send 1 byte data. Receiver: may send AdvertisedWindow of 1 byte forcing sender to send segments of 1 byte data

53 CICS 515 – Summer 2012 © Dr. Son Vuong 52 Silly Window Syndrome - Solutions Sender (Nagle’s algorithm): After sending first segment, wait until ACK (or MSS buffer filling) before sending next segment. (Taking into account speed of application and speed of network) Receiver - two solutions: u ACK with window 0 until enough space for MSS or ½ buffer size (Clark) u Delayed ACK (max ½ sec)

54 CICS 515 – Summer 2012 © Dr. Son Vuong 53 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

55 CICS 515 – Summer 2012 © Dr. Son Vuong 54 TCP Round Trip Time and Timeout Q: how to set TCP timeout value? longer than RTT u but RTT varies too short: premature timeout u unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT : measured time from segment transmission until ACK receipt u ignore retransmissions SampleRTT will vary, want estimated RTT “smoother”  average several recent measurements, not just current SampleRTT

56 CICS 515 – Summer 2012 © Dr. Son Vuong 55 Adaptive Retransmission In any reliable protocol, the sender will retransmit a message if it doesn’t receive an acknowledgement within a timeout period. How long should the timeout be? On the Internet, it varies quite dramatically. We must adaptively calculate a timeout value, based on our previous/current experience of how long an acknowledgement takes. Three methods: u Original Method u Karn/Partridge u Jacobson/Karels

57 CICS 515 – Summer 2012 © Dr. Son Vuong 56 Adaptive Retransmission Original Algorithm Measure SampleRTT for each Packet/ACK pair Compute the weighted (smoothed) average of RTT  EstimatedRTT =  x SampleRTT + (1-  x EstimatedRTT  where  is between 0.1 and 0.2, e.g Set timeout based on EstimatedRTT u TimeOut = 2 x EstimatedRTT

58 CICS 515 – Summer 2012 © Dr. Son Vuong 57 Example RTT estimation:

59 CICS 515 – Summer 2012 © Dr. Son Vuong 58 Karn/Partridge Algorithm Problem with original method: The RTT time would also include those packets that were retransmitted. Hence, the RTT was effectively doubled, biasing the smoothed value. Solution: Don’t sample RTT for the packets that were retransmitted. Another modification: Double the timeout after each retransmission, since you are expecting that it might take longer. (Like Ethernet’s Binary Exponential Backoff)

60 CICS 515 – Summer 2012 © Dr. Son Vuong 59 SampleRTT - Associating ACK with (re)transmission SenderReceiver Original transmission ACK SampleR TT Retransmission SenderReceiver Original transmission ACK SampleR TT Retransmission (a)(b)

61 CICS 515 – Summer 2012 © Dr. Son Vuong 60 Jacobson/Karels Algorithm The variance of RTT indicates how likely we are to retransmit u Low variance means consistent RTT, a quiet network, unlikely to lose packets. u High variance, unpredictable delay, many packets lost. Solution: Keep track of the variance in the RTT, and add it onto our RTT estimate.

62 CICS 515 – Summer 2012 © Dr. Son Vuong 61 Jacobson/Karels Cont. New calculation for average RTT u Diff = SampleRTT - EstimatedRTT  EstimatedRTT = EstimatedRTT + (  x Diff)  Deviation = Deviation +  (|Diff|- Deviation)  where  is a fraction between 0 and 1 To determine the timeout value:  TimeOut =  x EstimatedRTT +  x Deviation  where  = 1 and  = 4 TimeOut = EstimatedRTT +  4x Deviation

63 CICS 515 – Summer 2012 © Dr. Son Vuong 62 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

64 CICS 515 – Summer 2012 © Dr. Son Vuong 63 Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: u lost packets (buffer overflow at routers) u long delays (queueing in router buffers) a top-10 problem!

65 CICS 515 – Summer 2012 © Dr. Son Vuong 64 Causes/costs of congestion: scenario 1 two senders, two receivers one router, infinite buffers no retransmission large delays when congested maximum achievable throughput unlimited shared output link buffers Host A in : original data Host B out C

66 CICS 515 – Summer 2012 © Dr. Son Vuong 65 Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet finite shared output link buffers Host A in : original data Host B out ' in : original data, plus retransmitted data

67 CICS 515 – Summer 2012 © Dr. Son Vuong 66 Causes/costs of congestion: scenario 2 always: (goodput) “perfect” retransmission only when loss: retransmission of delayed (not lost) packet makes larger (than perfect case) for same in out = in out > in out “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt

68 CICS 515 – Summer 2012 © Dr. Son Vuong 67 Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit in Q: what happens as and increase ? in finite shared output link buffers Host A in : original data Host B out ' in : original data, plus retransmitted data

69 CICS 515 – Summer 2012 © Dr. Son Vuong 68 Causes/costs of congestion: scenario 3 Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted! H os t A H os t B o u t

70 CICS 515 – Summer 2012 © Dr. Son Vuong 69 Approaches towards congestion control End-end congestion control: (host-based) no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems u single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) u explicit rate sender should send at Two broad approaches towards congestion control:

71 CICS 515 – Summer 2012 © Dr. Son Vuong 70 Outline (Ch. 3) 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer (Sliding Window Protocol) 3.5 Connection-oriented transport: TCP u segment structure u connection management u reliable data transfer u flow control u RTT estimate 3.6 Principles of congestion control 3.7 TCP congestion control

72 CICS 515 – Summer 2012 © Dr. Son Vuong 71 TCP Congestion Control end-end control (no network assistance) sender limits transmission: LastByteSent-LastByteAcked  CongWin Roughly, CongWin is dynamic, function of perceived network congestion How does sender perceive congestion? loss event = timeout or 3 duplicate acks TCP sender reduces rate ( CongWin ) after loss event three mechanisms: u AIMD u slow start u conservative after timeout events rate = CongWin RTT Bytes/sec

73 CICS 515 – Summer 2012 © Dr. Son Vuong 72 TCP Congestion Control Objective: adjust to changes in the available capacity New state variable per connection: CongestionWindow u limits how much data source has in transit MaxWin = MIN(CongestionWindow, AdvertisedWindow) EffWin = MaxWin - (LastByteSent - LastByteAcked)

74 CICS 515 – Summer 2012 © Dr. Son Vuong 73 Algorithm:  increment CongestionWindow by one packet per RTT (per CongestionWindow’s worth of data) (linear increase)  divide CongestionWindow by two whenever a timeout occurs (multiplicative decrease) In practice: increment a little for each ACK u Increment = MSS * MSS/CongestionWindow u CongestionWindow += Increment Additive Increase/Multiplicative Decrease

75 CICS 515 – Summer 2012 © Dr. Son Vuong 74 CongestionWindow - linear increase SourceDestination …

76 CICS 515 – Summer 2012 © Dr. Son Vuong 75 Example trace: sawtooth behavior CongestionWindow Trace KB Time (seconds)

77 CICS 515 – Summer 2012 © Dr. Son Vuong 76 Slow Start Objective: To ramp up to the available capacity Idea:  begin with CongestionWindow = 1 packet  double CongestionWindow each RTT (increment by 1 packet for each ACK) Exponential growth, but slower than all in one blast Need a additional parameter: ssthreh u ssthresh = MSS when starting a connection

78 CICS 515 – Summer 2012 © Dr. Son Vuong 77 Slow Start - exponential increase of CongestionWindow SourceDestination … Problem: lose up to half a CongestionWindow‘s worth of data

79 CICS 515 – Summer 2012 © Dr. Son Vuong 78 Slow Start and Congestion Avoidance Increasing Strategy: u Slow start l Start connection with cwnd=1 (MSS), ssthresh=1/2 AdvWin l Increment cwnd at each ACK, to some max (ssthresh) (i.e. exponential increase) u Congestion avoidance l For cwnd >=ssthresh, increase cwnd by 1 for each RTT (i.e. linear increase) Decreasing Strategy: u When a timeout occurs u Set slow start threshold to half current congestion window l ssthresh=cwnd/2 and l cwnd = 1; and Repeat Increasing Strategy: slow start (until cwnd=ssthresh) and congestion avoidance (when cwnd >=ssthresh)

80 CICS 515 – Summer 2012 © Dr. Son Vuong 79 Slow Start and Congestion Avoidance

81 CICS 515 – Summer 2012 © Dr. Son Vuong 80 Fast Retransmit and Fast Recovery Problem: idle periods by coarse-grain TCP timeouts Fast retransmit: use 3 duplicate ACKs to trigger retx. Fast recovery: remove the slow start phase; go directly to half the last successful CongestionWindow Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 Packet 6 Retransmit packet 3 ACK 1 ACK 2 ACK 6 SenderReceiver ACK 2

82 CICS 515 – Summer 2012 © Dr. Son Vuong 81 Summary: TCP Congestion Control (Reno) When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.

83 CICS 515 – Summer 2012 © Dr. Son Vuong 82 Q 4.6 TCP Congestion Control – Peer Instruction – Question Consider a TCP design and implementation with a window extension that allows window sizes much larger than 64KB. Suppose this extended TCP is used to transfer a 20-MB file over a 1- Gbps link with a roundtrip latency (RTT) of 500ms. Assume the TCP advertise window is larger than 16MB and TCP sends 2-KB packets. Furthermore, assume no congestion and no lost packets. (1) How many RTTs does it take until slow start opens the send window to 4MB? Answer: A: 10 B: 11 C: 12 D: 13 E: None

84 CICS 515 – Summer 2012 © Dr. Son Vuong 83 Identify the following: 1. Timeouts 2. 3 Duplicate ACKs 3. (start of) Congestion Avoidance (CA) 4. Start of Slow Start L A G F E D C B J I HK Q 4.7 TCP Congestion Control PI Question

85 CICS 515 – Summer 2012 © Dr. Son Vuong 84 Q 4.8 TCP Congestion Control PI Quest How long would it take to send a file of 200,000 B assuming the segment size is 1000B and each round (RTT) is 10 ms ? Use the congestion window scenario in the figure below; and assume the data in the previous round (RTT) get lost when timeout or 3 duplicate Acks occur. Answer: (A): 200 s (B): 170 s (C): 60 s (D): 30 s (E): None of those

86 CICS 515 – Summer 2012 © Dr. Son Vuong 85 Q 4.9 TCP Throughput and Congestion Control – Peer Instruction – Question To achieve a throughput of 1Gbps over the Internet (using TCP), what would be the constraint (upper bound) on the segment loss rate ? Answer: A: 10**(-6) B: 10**(-8) C: 10**(-9) D: 10**(-10) E: None

87 CICS 515 – Summer 2012 © Dr. Son Vuong 86 TCP sender congestion control EventStateTCP Sender ActionComments ACK receipt for previously unacked data Slow Start (SS) CongWin = CongWin + MSS, If (CongWin > Threshold) set state to “Congestion Avoidance” Resulting in a doubling of CongWin every RTT ACK receipt for previously unacked data Congestion Avoidance (CA) CongWin = CongWin+MSS * (MSS/CongWin) Additive increase, resulting in increase of CongWin by 1 MSS every RTT Loss event detected by triple duplicate ACK SS or CAThreshold = CongWin/2, CongWin = Threshold, Set state to “Congestion Avoidance” Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS. TimeoutSS or CAThreshold = CongWin/2, CongWin = 1 MSS, Set state to “Slow Start” Enter slow start Duplicate ACK SS or CAIncrement duplicate ACK count for segment being acked CongWin and Threshold not changed

88 CICS 515 – Summer 2012 © Dr. Son Vuong 87 TCP throughput What’s the average throughout of TCP as a function of window size and RTT? u Ignore slow start Let W be the window size when loss occurs. When window is W, throughput is W/RTT Just after loss, window drops to W/2, throughput to W/2RTT. Average throughout:.75 W/RTT

89 CICS 515 – Summer 2012 © Dr. Son Vuong 88 TCP Futures Example: 2000 byte segments, 100ms RTT, want 10 Gbps throughput Requires window size W = 83,333 in-flight segments Throughput in terms of loss rate: ➜ Loss Rate =< L = 2· hard to achieve ! New versions of TCP for high-speed needed! TP = >= 10 Gbps

90 CICS 515 – Summer 2012 © Dr. Son Vuong 89 Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck router capacity R TCP connection 2 TCP Fairness

91 CICS 515 – Summer 2012 © Dr. Son Vuong 90 Why is TCP fair? Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally R R equal bandwidth share Connection 1 throughput Connection 2 throughput congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2

92 CICS 515 – Summer 2012 © Dr. Son Vuong 91 Fairness (more) Fairness and UDP Multimedia apps often do not use TCP u do not want rate throttled by congestion control Instead use UDP: u pump audio/video at constant rate, tolerate packet loss UDP traffic can bring Internet TP to a halt! Research area: u TCP friendly u Congestion control against UDP traffic Fairness and parallel TCP connections nothing prevents app from opening parallel cnctions between 2 hosts. Web browsers do this Example: link of rate R supporting 9 cnctions; u If new app asks for 1 TCP, gets rate R/10 u If new app asks for 11 TCPs, gets > R/2 : unfair !

93 CICS 515 – Summer 2012 © Dr. Son Vuong 92 TCP/UDP Summary TCP provides reliable full-duplex connections. TCP Streams, credit flow control, 3-way handshake TCP extension: large packet, scaling Slow-start, Fast retransmit/recovery UDP is end-to-end, connectionless and simple. No flow/error control Next: leaving the network “edge” (application, transport layers) and going into the network “core”

94 CICS 515 – Summer 2012 © Dr. Son Vuong 93 TCP Releases Original: Slowstart at CongWin = 1 MSS Tahoe: (4.3 BSD): Slowstart + Congestion Avoidance Reno: BSD Network Rel (BNR1): Fast Retx after 3 DupAck + Go to CongAvoid (thresh = 1/2 CongWin) (BNR2) Vegas: Rate-based Compare ExpRate = CW/BaseRTT with ActualRate = SegSize/RTT If Diff b: (congestion) Reduce rate

95 CICS 515 – Summer 2012 © Dr. Son Vuong 94 Timers in TCP Retx Timer (RT): start when sending a segment (Issue: how to estimate Retx timeout interval) Persistence Timer (PT): start when receiving Ack,Win=0, Timeout -> Send Probe(1B data) Value(PT) = Value(RT); Timeout -> Probe at binary exponent interval till 60s; repeat Probe every 60s till Win is reopened. Keepalive Timer (KT): Check if client is alive (not muet). Val(KT) = 2 hrs; -> 20 Probes at 75s; close conn. Time-waited Timer: wait 2 segment lifetime (2x60s) before starting a new connection

96 CICS 515 – Summer 2012 © Dr. Son Vuong 95

97 CICS 515 – Summer 2012 © Dr. Son Vuong 96 Delay modeling (bonus slides) Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: TCP connection establishment data transmission delay slow start Notation, assumptions: Assume one link between client and server of rate R S: MSS (bits) O: object size (bits) no retransmissions (no loss, no corruption) Window size: First assume: fixed congestion window, W segments Then dynamic window, modeling slow start

98 CICS 515 – Summer 2012 © Dr. Son Vuong 97 Fixed congestion window (1) First case: WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R

99 CICS 515 – Summer 2012 © Dr. Son Vuong 98 Fixed congestion window (2) Second case: WS/R < RTT + S/R: wait for ACK after sending window’s worth of data sent delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R]

100 CICS 515 – Summer 2012 © Dr. Son Vuong 99 TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start Will show that the delay for one object is: where P is the number of times TCP idles at server: - where Q is the number of times the server idles if the object were of infinite size. - and K is the number of windows that cover the object. R S R S RTTP R O Latency P )12(2        

101 CICS 515 – Summer 2012 © Dr. Son Vuong 100 TCP Delay Modeling: Slow Start (2) Example: O/S = 15 segments K = 4 windows Q = 2 P = min{K-1,Q} = 2 Server idles P=2 times Delay components: 2 RTT for connection estab and request O/R to transmit object time server idles due to slow start Server idles: P = min{K-1,Q} times

102 CICS 515 – Summer 2012 © Dr. Son Vuong 101 TCP Delay Modeling (3) R S R S RTTP R O R S R S R O idleTimeRTT R O P k P k P p p )12(][2 ]2[2 2 delay        

103 CICS 515 – Summer 2012 © Dr. Son Vuong 102 TCP Delay Modeling (4) Calculation of Q, number of idles for infinite-size object, is similar (see HW). Recall K = number of windows that cover object How do we calculate K ?              )1(log )}1(log:{min }12:{ }/222:{ }222:{ S O S O kk S O k SOk OSSSkK k k k  

104 CICS 515 – Summer 2012 © Dr. Son Vuong 103 HTTP Modeling Assume Web page consists of: u 1 base HTML page (of size O bits) u M images (each of size O bits) Non-persistent HTTP: u M+1 TCP connections in series u Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Persistent HTTP: u 2 RTT to request and receive base HTML file u 1 RTT to request and receive M images u Response time = (M+1)O/R + 3RTT + sum of idle times Non-persistent HTTP with X parallel connections u Suppose M/X integer. u 1 TCP connection for base file u M/X sets of parallel connections for images. u Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times

105 CICS 515 – Summer 2012 © Dr. Son Vuong 104 HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections Kbps 100 Kbps 1 Mbps10 Mbps non-persistent persistent parallel non- persistent

106 CICS 515 – Summer 2012 © Dr. Son Vuong 105 HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delay  bandwidth networks.


Download ppt "Jump to first page CICS 515 (Part 1) University of British Columbia CICS 515 (Part 1) Internet Computing Lectures 4 – Transport Layer (TCP) Flow and Congestion."

Similar presentations


Ads by Google