3 Roadmap UDP –Unreliable, connectionless datagram service TCP –Reliable, in order, connection-oriented, byte stream service Principles –Multiplexing/demultiplexing –How to build reliable service on top of unreliable service
4 Orientation We move one layer up and look at the transport layer.
5 Orientation Transport layer protocols are end-to-end protocols They are only implemented at the hosts
6 Transport Protocols in the Internet UDP - User Datagram Protocol datagram oriented unreliable, connectionless simple unicast and multicast useful only for few applications, e.g., multimedia applications used a lot for services –network management (SNMP), routing (RIP), naming (DNS), etc. TCP - Transmission Control Protocol byte stream oriented reliable, connection-oriented complex only unicast used for most Internet applications: –web (http), email (smtp), file transfer (ftp), terminal (telnet), etc. The most commonly used transport protocols are UDP and TCP.
7 UDP - User Datagram Protocol UDP supports unreliable transmissions of datagrams –Each output operation by a process produces exactly one UDP datagram The only thing that UDP adds is multiplexing and demultiplexing Protocol number: 17
8 UDP Format Port numbers identify sending and receiving applications (processes). Maximum port number is 2 16 -1= 65,535 Message Length is at least 8 bytes (I.e., Data field can be empty) and at most 65,535 Checksum includes UDP header and data.
9 Port Numbers UDP (and TCP) use port numbers to identify applications A globally unique address at the transport layer (for both UDP and TCP) is a tuple There are 65,535 UDP ports per host.
11 Overview TCP = Transmission Control Protocol Connection-oriented protocol Provides a reliable unicast end-to-end byte stream over an unreliable internetwork.
12 Connection-Oriented Before any data transfer, TCP establishes a connection: Analogy: making a phone call One TCP entity is waiting for a connection (“server”) The other TCP entity (“client”) contacts the server Each connection is full duplex
13 Reliable Byte stream is broken up into chunks which are called seg- ments Receiver sends acknowledgements (ACKs) for segments TCP maintains a timer. If an ACK is not received in time, the segment is retransmitted Detecting errors and packet losses: TCP has checksums for header and data. Segments with invalid checksums are discarded Each byte that is transmitted has a sequence number
14 Byte Stream Service To the lower layers, TCP handles data in blocks, the segments. To the higher layers TCP handles data as a sequence of bytes and does not identify boundaries between bytes So: Higher layers do not know about the beginning and end of segments !
15 TCP Format TCP segments have a 20 byte header with >= 0 bytes of data.
16 TCP header fields Port Number: A port number identifies the endpoint of a connection. A pair identifies one endpoint of a connection. Two pairs and identify a TCP connection.
17 TCP header fields Sequence Number (SeqNo): –Sequence number is 32 bits long. –So the range of SeqNo is 0 <= SeqNo <= 2 32 -1 4.3 Gbyte –The sequence number in a segment identifies the first byte in the segment –Initial Sequence Number (ISN) of a connection is set during connection establishment
18 TCP header fields Acknowledgement Number (AckNo): –Acknowledgements are piggybacked, I.e a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction –A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”) –The AckNo contains the next SeqNo that a host is expecting Example: The acknowledgement for a segment with sequence numbers 0-1460 is AckNo=1461 –ACK is cumulative
19 TCP header fields Header Length ( 4bits): –Length of header in 32-bit words –Note that TCP header has variable length (with minimum 20 bytes)
20 TCP header fields Flag bits: –URG: Urgent pointer is valid –If the bit is set, the following bytes contain an urgent message in the range: SeqNo <= urgent message <= SeqNo+urgent pointer –ACK: Acknowledgement Number is valid –PSH: PUSH Flag –Notification from sender to the receiver that the receiver should pass all data that it has to the application. –Normally set by sender when the sender’s buffer is empty
21 TCP header fields Flag bits: –RST: Reset the connection –The flag causes the receiver to reset the connection –Receiver of a RST terminates the connection and indicates higher layer application about the reset –SYN: Synchronize sequence numbers –Sent in the first packet when initiating a connection –FIN: Sender is finished with sending –Used for closing a connection –Both sides of a connection must send a FIN
22 TCP header fields Window Size: –Each side of the connection advertises the window size –Window size is the maximum number of bytes that a receiver can accept. –Maximum window size is 2 16 -1= 65535 bytes TCP Checksum: –TCP checksum covers over both TCP header and TCP data (also covers some parts of the IP header) Urgent Pointer: –Only valid if URG flag is set
24 TCP header fields Options: –NOP is used to pad TCP header to multiples of 4 bytes –Maximum Segment Size –Window Scale Options »Increases the TCP window from 16 to 32 bits, I.e., the window size is interpreted differently »This option can only be used in the SYN segment (first segment) during connection establishment time –Timestamp Option »Can be used for roundtrip measurements
25 Connection Management in TCP Opening a TCP Connection Closing a TCP Connection State Diagram
26 TCP Connection Establishment TCP uses a three-way handshake to open a connection:
27 A Closer Look with tcpdump 1 aida.poly.edu.1121 > mng.poly.edu.telnet: S 1031880193:1031880193(0) win 16384 2 mng.poly.edu.telnet > aida.poly.edu.1121: S 172488586:172488586(0) ack 1031880194 win 8760 3 aida.poly.edu.1121 > mng.poly.edu.telnet:. ack 172488587 win 17520 4 aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880194:1031880218(24) ack 172488587 win 17520 5 mng.poly.edu.telnet > aida.poly.edu.1121: P 172488587:172488590(3) ack 1031880218 win 8736 6aida.poly.edu.1121 > mng.poly.edu.telnet: P 1031880218:1031880221(3) ack 172488590 win 17520
29 TCP Connection Termination Each end of the data flow must be shut down independently (“half-close”) If one end is done it sends a FIN segment. The other end sends ACK. Four messages to complete shut down a connection FIN ACK FIN A B B can still send to A
34 2MSL Wait State 2MSL Wait State = TIME_WAIT When TCP does an active close, and sends the final ACK, the connection must stay in in the TIME_WAIT state for twice the maximum segment lifetime. 2MSL= 2 * Maximum Segment Lifetime Why? TCP is given a chance to resent the final ACK. (Server will timeout after sending the FIN segment and resend the FIN) The MSL is set to 2 minutes or 1 minute or 30 seconds. FIN ACK FIN A B X
35 Resetting Connections Resetting connections is done by setting the RST flag When is the RST flag set? –Connection request arrives and no server process is waiting on the destination port –Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment
37 Interactive and bulk data transfer TCP applications can be put into the following categories bulk data transfer- ftp, mail, http interactive data transfer- telnet, rlogin TCP has heuristics to deal these application types. For interactive data transfer: Try to reduce the number of packets For bulk data transfer: High throughput
38 Telnet session on a local network This is the output of typing 3 (three) characters : Time 44.062449:Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Time 44.063317:Neon Argon: Push, SeqNo 1:2(1), AckNo 1 Time 44.182705:Argon Neon: No Data, AckNo 2 Time 48.946471: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Time 48.947326:Neon Argon: Push, SeqNo 2:3(1), AckNo 2 Time 48.982786: Argon Neon: No Data, AckNo 3 Time 55.116581: Argon Neon: Push, SeqNo 2:3(1) AckNo 3 Time 55.117497:Neon Argon: Push, SeqNo 3:4(1) AckNo 3 Time 55.183694: Argon Neon: No Data, AckNo 4
39 Interactive applications: Telnet Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client. For each character typed, you see three packets: 1.Client Server: Send typed character 2.Server Client: Echo of character (or user output) and acknowledgement for first packet 3.Client Server: Acknowledgement for second packet
40 Why 3 packets per character? We would expect four packets per character: However, tcpdump shows this pattern: What has happened? TCP has delayed the transmission of an ACK
41 Delayed Acknowledgement TCP delays transmission of ACKs for up to 200ms The hope is to have data ready in that time frame. Then, the ACK can be piggybacked with a data segment. Delayed ACKs explain why the ACK and the “echo of character” are sent in the same segment.
42 Telnet session to a distant host This is the output of typing nine characters : Time 16.401963:Argon Tenet: Push, SeqNo 1:2(1), AckNo 2 Time 16.481929:Tenet Argon: Push, SeqNo 2:3(1), AckNo 2 Time 16.482154:Argon Tenet: Push, SeqNo 2:3(1), AckNo 3 Time 16.559447: Tenet Argon: Push, SeqNo 3:4(1), AckNo 3 Time 16.559684: Argon Tenet: Push, SeqNo 3:4(1), AckNo 4 Time 16.640508: Tenet Argon: Push, SeqNo 4:5(1) AckNo 4 Time 16.640761: Argon Tenet: Push, SeqNo 4:8(4) AckNo 5 Time 16.728402: Tenet Argon: Push, SeqNo 5:9(4) AckNo 8
43 Delayed Acks do not kick in if there are data to send Observation: Transmission of segments follows a different pattern, i.e., there are only two packets per character typed The delayed acknowled- gment does not kick in The reason is that there is always data at Argon ready to sent when the ACK arrives.
44 Nagle’s Algorithm Observation: –Argon never has multiple unacknowledged segments outstanding –There are fewer transmissions than there are characters. Sending one byte per packet is inefficient. Solution: Nagle’s Algorithm Small segments cannot be sent until outstanding data is acked. The algorithm can be disabled, because it could be a problem to interactive applications such as X window.
46 What is Flow/Congestion Control ? Flow Control: Algorithms to prevent that the sender overruns the receiver buffer Congestion Control: Algorithms to prevent that the sender overloads the network Sliding window implements both control mechanisms.
48 TCP Flow Control TCP implements sliding window flow control Sending acknowledgements is separated from setting the window size at sender. Acknowledgements do not automatically increase the window size Acknowledgements are cumulative
49 Sliding Window Flow Control Sliding Window Protocol is performed at the byte level: Here: Sender can transmit sequence numbers 6,7,8.
50 Sliding Window: “Window Opens” Acknowledgement is received that enlarges the window to the right (AckNo = 5, Win=6): A receiver opens a window when TCP buffer empties (meaning that data is delivered to the application). 1234567891011 1234567891011 AckNo = 5, Win = 6 is received
51 Window Management in TCP The receiver is returning two parameters to the sender The interpretation is: I am ready to receive new data with SeqNo= AckNo, AckNo+1, …., AckNo+Win-1 Receiver can acknowledge data without opening the window Receiver can change the window size without acknowledging data
54 TCP Congestion Control Keep a sender from congesting the network. The sender has two internal parameters: –Congestion Window (cwnd) –Slow-start threshhold Value (ssthresh) Sliding window size is set to the minimum of (cwnd, receiver advertised win) Congestion control works in two modes: –slow start (cwnd < ssthresh) Probe the available bandwidth –congestion avoidance (cwnd >= ssthresh) Try not to overload the network.
55 Slow Start Initial value: Set cwnd = 1 Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size) Modern TCP implementation may set initial cwnd to 2 Each time an ACK is received by the sender, the congestion window is increased by 1 segment: cwnd = cwnd + 1 If an ACK acknowledges two segments, cwnd is still increased by only 1 segment. Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1. Question: how can you accelerate your TCP download?
56 Slow Start Example The congestion window size grows very rapidly –For every ACK, we increase cwnd by 1 irrespective of the number of segments ACK’ed TCP slows down the increase of cwnd when cwnd > ssthresh
57 Congestion Avoidance Congestion avoidance phase is started if cwnd has reached the slow-start threshold value If cwnd >= ssthresh then each time an ACK is received, increment cwnd as follows: cwnd = cwnd + 1/ cwnd So cwnd is increased by one only if all cwnd segments have been acknowledged.
58 Example of Slow Start/Congestion Avoidance Assume that ssthresh = 8 Roundtrip times Cwnd (in segments) ssthresh
59 Responses to Congestion TCP uses packet loss as congestion signal A TCP sender can detect lost packets via: Receipt of a duplicate ACK Timeout of a retransmission timer
60 Response to Timeout TCP interprets a Timeout as a severe congestion signal. When a timeout occurs, the sender performs: –cwnd is reset to one: cwnd = 1 –ssthresh is set to half of the current size of the congestion window: ssthressh = cwnd / 2 –and slow-start is entered
61 Reaction to Duplicate ACKs Fast retransmit –Three duplicate ACKs indicate a packet loss –Retransmit without timeout Fast recovery –Avoid slow start –Retransmit “lost packet” –ssthresh = cwnd/2 –cwnd = cwnd+3 –Increment cwnd by one for each additional duplicate ACK When ACK arrives that acknowledges “new data” set: cwnd=ssthresh enter congestion avoidance
67 Retransmissions in TCP A TCP sender retransmits a segment when it assumes that the segment has been lost: 1.No ACK has been received and a timeout occurs 2.Multiple ACKs have been received for the same segment
68 Retransmission Timer TCP sender maintains one retransmission timer for each connection When the timer reaches the retransmission timeout (RTO) value, the sender retransmits the first segment that has not been acknowledged The timer is started when 1.When a packet with payload is transmitted and timer is not running 2.When an ACK arrives that acknowledges new data, 3.When a segment is retransmitted The timer is stopped when –All segments are acknowledged
69 How to set the timer Retransmission Timer: –The setting of the retransmission timer is crucial for good performance of TCP –Timeout value too small results in unnecessary retransmissions –Timeout value too large long waiting time before a retransmission can be issued –A problem is that the delays in the network are not fixed –Therefore, the retransmission timers must be adaptive
70 Setting the value of RTO: The RTO value is set based on round-trip time (RTT) measurements that each TCP performs Each TCP connection measures the time difference between the transmission of a segment and the receipt of the corresponding ACK There is only one measurement ongoing at any time (i.e., measurements do not overlap) Figure on the right shows three RTT measurements
71 Setting the RTO value RTO is calculated based on the RTT measurements –Uses an exponential moving average to estimate RTT (srtt) and variance of RTT (rttvar) from –The influence of past samples decrease exponentially The RTT measurements are smoothed by the following estimators srtt and rttvar: srtt n+1 = RTT + (1- ) srtt n rttvar n+1 = ( | RTT - srtt n | ) + (1- ) rttvar n RTO n+1 = srtt n+1 + 4 rttvar n+1 –The gains are set to =1/4 and =1/8
72 Setting the RTO value (cont’d) Initial value for RTO: –Sender should set the initial value of RTO to RTO 0 = 3 seconds RTO calculation after first RTT measurements arrived srtt 1 = RTT rttvar 1 = RTT / 2 RTO 1 = srtt 1 + 4 rttvar n+1 When a timeout occurs, the RTO value is doubled RTO n+1 = max ( 2 RTO n, 64) seconds This is called an exponential backoff
73 Karn’s Algorithm Karn’s Algorithm: Don’t update RTT on any segments that have been retransmitted If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. RTT measurements is ambiguous in this case
74 Summary UDP: connectionless, unreliable, datagram service TCP: reliable, connection-oriented, byte stream service –TCP header –Connection management –Delayed ACKs and nagle’s algorithm –TCP flow control –TCP congestion control –TCP retransmission and timeout References –TCP/IP illustrated vol. 1, chapter11, 17-24 –RFC793 (Transmission Control Protocol) –RFC768 (User Datagram Protocol) –RFC2581 (TCP Congestion control) –RFC2988 (Computing TCP’s Retransmission Timer) –RFC3390 (Increasing TCP’s Initial Window)