Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSci232: Transport Layer & TCP

Similar presentations


Presentation on theme: "CSci232: Transport Layer & TCP"— Presentation transcript:

1 CSci232: Transport Layer & TCP
Transport Layer Services connection-oriented vs. connectionless multiplexing and demultplexing UDP: Connectionless Unreliable Service TCP: Connection-Oriented Reliable Service connection management: set-up and tear down reliable data transfer protocols flow and congestion control Readings: Chapter 5 Fall 2007 CSci232: Transport Layer & TCP

2 CSci232: Transport Layer & TCP
Transport Protocols Lowest level end-to-end protocol. Header generated by sender is interpreted only by the destination Routers view transport header as part of the payload 7 7 6 6 5 5 Transport Transport IP IP IP Datalink 2 2 Datalink Physical 1 1 Physical router Fall 2007 CSci232: Transport Layer & TCP

3 Transport Services and Protocols
provide logical communication between app processes running on different hosts transport protocols run in end systems send side: breaks app messages into segments, passes to network layer rcv side: reassembles segments into messages, passes to app layer more than one transport protocol available to apps Internet: TCP and UDP application transport network data link physical network data link physical network data link physical network data link physical logical end-end transport network data link physical network data link physical application transport network data link physical Fall 2007 CSci232: Transport Layer & TCP

4 Transport Layer Services
Underlying best-effort network drops messages re-orders messages delivers duplicate copies of a given message delivers messages after an arbitrarily long delay Common end-to-end services guarantee message delivery deliver messages in the same order they are sent deliver at most one copy of each message allow the receiver to flow control the sender support multiple application processes on each host Fall 2007 CSci232: Transport Layer & TCP

5 Transport vs. Application and Network Layer
application layer: application processes and message exchange network layer: logical communication between hosts transport layer: logical communication support for app processes relies on, enhances, network layer services Household analogy: 12 kids sending letters to 12 kids processes = kids app messages = letters in envelopes hosts = houses transport protocol = Ann and Bill network-layer protocol = postal service Fall 2007 CSci232: Transport Layer & TCP

6 CSci232: Transport Layer & TCP
End to End Issues Transport services built on top of (potentially) unreliable network service packets can be corrupted or lost Packets can be delayed or arrive “out of order” Do we detect and/or recover errors for apps? Error Control & Reliable Data Transfer Do we provide “in-order” delivery of packets? Connection Management & Reliable Data Transfer Potentially different capacity at destination, and potentially different network capacity Flow and Congestion Control Fall 2007 CSci232: Transport Layer & TCP

7 Internet Transport Protocols
TCP service: connection-oriented: setup required between client, server reliable transport between sender and receiver flow control: sender won’t overwhelm receiver congestion control: throttle sender when network overloaded UDP service: unreliable data transfer between sender and receiver does not provide: connection setup, reliability, flow control, congestion control Internet protocol stack provides two types of end to end transport services corresponding to two protocols. TCP is transmission control protocol and UDP is user datagram protocol. TCP provides, connection oriented reliable transport between sender and receiver. It also does flow control such that sender wont transmit faster than receiver can consume. It also reduces the sending rate when the network is loaded. UDP on the other hand provides unreliable connectionless datagram service. It provides demultiplexing and error checking beyond what is provided by the IP. IP knows how to deliver the packet to a host but not to a specific application on the host. TCP service is the most commonly used one since many applications want reliable transport. Why then UDP. There are cases where you have only one request and short response. In that case it is too much of overhead to setup and tear down a connection. You might have noticed that there are some applications that use UDP. Both provide logical communication between app processes running on different hosts! Fall 2007 CSci232: Transport Layer & TCP

8 Multiplexing/Demultiplexing
Multiplexing at send host: Demultiplexing at rcv host: delivering received segments to correct application process gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) = API (“socket”) = process application transport network link physical P1 P2 P3 P4 host 1 host 2 host 3 Fall 2007 CSci232: Transport Layer & TCP

9 How Demultiplexing Works
32 bits host receives IP datagrams each datagram has source IP address, destination IP address each datagram carries 1 transport-layer segment each segment has source, destination port number (recall: well-known port numbers for specific applications) host uses IP addresses & port numbers to direct segment to appropriate app process (identified by “socket’) source port # dest port # other header fields application data (message) TCP/UDP segment format Fall 2007 CSci232: Transport Layer & TCP

10 UDP: User Datagram Protocol [RFC 768]
“no frills,” “bare bones” Internet transport protocol “best effort” service, UDP segments may be: lost delivered out of order to app connectionless: no handshaking between UDP sender, receiver each UDP segment handled independently of others Why is there a UDP? no connection establishment (which can add delay) simple: no connection state at sender, receiver small segment header no congestion control: UDP can blast away as fast as desired Fall 2007 CSci232: Transport Layer & TCP

11 CSci232: Transport Layer & TCP
UDP (cont’d) often used for streaming multimedia apps loss tolerant rate sensitive other UDP uses DNS SNMP reliable transfer over UDP: add reliability at application layer application-specific error recovery! 32 bits source port # dest port # Length, in bytes of UDP segment, including header length checksum Application data (message) UDP segment format Fall 2007 CSci232: Transport Layer & TCP

12 CSci232: Transport Layer & TCP
UDP Checksum Goal: detect “errors” (e.g., flipped bits) in transmitted segment Sender: treat segment contents as sequence of 16-bit integers checksum: addition (1’s complement sum) of segment contents sender puts checksum value (1’s complement of 1’s complement sum of 16-bit words) into UDP checksum field Receiver: compute checksum of received segment check if computed checksum equals checksum field value: NO - error detected YES - no error detected. But maybe errors nonetheless? More later …. Fall 2007 CSci232: Transport Layer & TCP

13 CSci232: Transport Layer & TCP
Checksum: Example arrange data segment in sequences of 16-bit words + sum: checksum(1’s complement): verify by adding: Fall 2007 CSci232: Transport Layer & TCP

14 CSci232: Transport Layer & TCP
TCP Overview Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning network Connection-oriented Byte-stream app writes bytes TCP sends segments app reads bytes Application process W rite bytes TCP Send buffer Segment T ransmit segments Read Receive buffer Fall 2007 CSci232: Transport Layer & TCP

15 CSci232: Transport Layer & TCP
Functionality Split Network provides best-effort delivery End-systems implement many functions Reliability In-order delivery Demultiplexing Message boundaries Connection abstraction Flow Control Congestion control IP service model = best effort delivery End-points do everything else… over time many functions have been implemented Congestion control is unique…. Every endpoint must do this! This is necessary for overall efficiency of IP network Fall 2007 CSci232: Transport Layer & TCP

16 High-Level TCP Characteristics
Protocol implemented entirely at the ends Fate sharing Protocol has evolved over time and will continue to do so Nearly impossible to change the header Use options to add information to the header Change processing at endpoints Backward compatibility is what makes it TCP Fall 2007 CSci232: Transport Layer & TCP

17 Van Jacobson’s algorithms
Evolution of TCP 1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1975 Three-way handshake Raymond Tomlinson In SIGCOMM 75 1987 Karn’s algorithm to better estimate round-trip time 1990 4.3BSD Reno fast retransmit delayed ACK’s 1983 BSD Unix 4.2 supports TCP/IP 1986 Congestion collapse observed 1988 Van Jacobson’s algorithms congestion avoidance and congestion control (most implemented in 4.3BSD Tahoe) 1974 TCP described by Vint Cerf and Bob Kahn In IEEE Trans Comm 1982 TCP & IP RFC 793 & 791 1975 1980 1985 1990 Fall 2007 CSci232: Transport Layer & TCP

18 TCP Through the 1990s 1993 1994 1996 1994 T/TCP (Braden) Transaction
SACK TCP (Floyd et al) Selective Acknowledgement 1993 TCP Vegas (Brakmo et al) real congestion avoidance 1994 ECN (Floyd) Explicit Congestion Notification 1996 Hoe Improving TCP startup 1996 FACK TCP (Mathis et al) extension to SACK 1993 1994 1996 Fall 2007 CSci232: Transport Layer & TCP

19 TCP Segment Header Structure
source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number rcvr window size ptr urgent data checksum F S R P A U head len not used Options (variable length) counting by bytes of data (not segments!) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) A TCP packet is referred to as TCP segment. This shows the structure of a TCP segment. The port numbers identify the end points of the connection. The sequence number refers to the byte number starting from the initial sequence number of the connection. The acknowledgement number corresponds to the data in the other direction. It is the next byte this side is expecting from its peer. There are several flags. The ACK flag validates the acknowledgement number field. The SYN and FIN flags are used to establish and terminate a connection. The RST flag signifies that the receiver has become confused with some unexpected condition and so wants to abort the connection. The receiver window size meant for data in the other direction. It specifies the amount of buffer available for reception on this side. The checksum covers both the header and the data. Internet checksum (as in UDP) Fall 2007 CSci232: Transport Layer & TCP

20 TCP Segment Format (cont)
Each connection identified with 4-tuple: (SrcPort, SrcIPAddr, DstPort, DstIPAddr) Sliding window + flow control acknowledgment, SequenceNum, AdvertisedWinow Flags SYN, FIN, ACK, RESET, PUSH, URG Checksum pseudo header (src & dst IP addresses) + TCP header + data Sender Data (SequenceNum) Acknowledgment + AdvertisedWindow Receiver Fall 2007 CSci232: Transport Layer & TCP

21 CSci232: Transport Layer & TCP
TCP Connection Set Up Three way handshake: Step 1: client sends TCP SYN control segment to server specifies initial seq # Step 2: server receives SYN, replies with SYN+ACK control segment ACKs received SYN specifies server  receiver initial seq. # Step 3:client receives SYN+ACK, replies with ACK segment (which may contain 1st data segment) TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. # buffers, flow control info client: end host that initiates connection server: end host contacted by client TCP is connection-oriented protocol. The connection has to be established before exchanging any data. During the connection setup, peers exchange sequence numbers, buffer sizes etc. Let us see how TCP does connection management. In the rest of the discussion we use the convention that client is the connection initiator and server is the responder. TCP uses three-way handshake described earlier for establishing connection. A client initiates connection with a SYN segment (SYN flag is set) that includes the initial sequence number x chosen by the client. The server responds with SYN-ACK segment (both SYN and ACK flags are set). This segment includes the acknowledgement number x+1 and also the server’s initial sequence number y. The client responds with ACK segment that acknowledges the server’s sequence number by setting acknowledgment number to y+1. This way after exchanging three segments, SYN, SYN-ACK, ACK, both client and server establish the connection. Fall 2007 CSci232: Transport Layer & TCP

22 CSci232: Transport Layer & TCP
TCP 3-Way Hand-Shake client server Question: a. What kind of “state” client and server need to maintain? b. What initial sequence # should client (and server) use? initiate connection SYN, seq=x SYN received SYN+ACK, seq=y, ack=x+1 connection established Now lets look at the connection termination procedure. TCP connections are full duplex and can be thought of as two simplex connections. Each simplex connection can be terminated independently. When a client wants to close the connection, it sends a FIN segment (FIN flag set) and server responds with ACK. When server is ready to close the connection it will also send a FIN segment. The client responds with an ACK and server can then delete the state. ACK, seq=x+1, ack=y+1 connection established (1st data segment) Fall 2007 CSci232: Transport Layer & TCP

23 TCP Connection Setup Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags] TCP > 22 [SYN] Seq= Len=0 MSS=1260 TCP > 1414 [SYN, ACK] Seq= Ack= Win=25200 Len=0 MSS=1460 TCP > 22 [ACK] Seq= Ack= Win=16384 Len=0 Fall 2007 CSci232: Transport Layer & TCP

24 TCP Connection Setup Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags] TCP > 80 [SYN] Seq= Len=0 MSS=1260 TCP > 1567 [SYN, ACK] Seq= Ack= Win=25200 Len=0 MSS=1460 TCP > 80 [ACK] Seq= Ack= Win=17640 Len=0 TCP > 80 [PSH,ACK] Seq= Ack= Win=17640 Len=564 TCP > 1567 [ACK] Seq= Ack= Win=25200 Len=0 MSS=1460 Fall 2007 CSci232: Transport Layer & TCP

25 3-Way Handshake: Finite State Machine
Client FSM? Server FSM? Upper layer: initiate connection sent SYN w/ initial seq =x ? ? SYN sent closed ? SYN+ACK received sent ACK conn estab’ed ? TCP is connection-oriented protocol. The connection has to be established before exchanging any data. During the connection setup, peers exchange sequence numbers, buffer sizes etc. Let us see how TCP does connection management. In the rest of the discussion we use the convention that client is the connection initiator and server is the responder. TCP uses three-way handshake described earlier for establishing connection. A client initiates connection with a SYN segment (SYN flag is set) that includes the initial sequence number x chosen by the client. The server responds with SYN-ACK segment (both SYN and ACK flags are set). This segment includes the acknowledgement number x+1 and also the server’s initial sequence number y. The client responds with ACK segment that acknowledges the server’s sequence number by setting acknowledgment number to y+1. This way after exchanging three segments, SYN, SYN-ACK, ACK, both client and server establish the connection. info (“state”) maintained at client? Fall 2007 CSci232: Transport Layer & TCP

26 Connection Setup Error Scenarios
Lost (control) packets What happen if SYN lost? client vs. server actions What happen if SYN+ACK lost? client vs. server actions What happen if ACK lost? client vs. server actions Duplicate (control) packets What does server do if duplicate SYN received? What does client do if duplicate SYN+ACK received? What does server do if duplicate ACK received? Just the source/destination addresses and port numbers are not sufficient to uniquely identify a connection and the packets belonging to that connection. It is possible that a connection is opened and closed, and another new connection is opened with the same four fields. Since the network can delay packets, it is possible that packets from the old connection still linger in the network. When such old packets arrive at the receiver and it may not be to realize that they don’t belong to this connection causing it to malfunction. So we need a way to distinguish between packets of old and new connections. Fall 2007 CSci232: Transport Layer & TCP

27 Connection Setup Error Scenarios (cont’d)
Importance of (unique) initial seq. no.? When receiving SYN, how does server know it’s a new connection request? When receiving SYN+ACK, how does client know it’s a legitimate, i.e., a response to its SYN request? Dealing with old duplicate packets from old connections (or from malicious users) If not careful: “TCP Hijacking” How to choose unique initial seq. no.? randomly choose a number (and add to last syn# used) Other security concern: “SYN Flood” -- denial-of-service attack Just the source/destination addresses and port numbers are not sufficient to uniquely identify a connection and the packets belonging to that connection. It is possible that a connection is opened and closed, and another new connection is opened with the same four fields. Since the network can delay packets, it is possible that packets from the old connection still linger in the network. When such old packets arrive at the receiver and it may not be to realize that they don’t belong to this connection causing it to malfunction. So we need a way to distinguish between packets of old and new connections. Fall 2007 CSci232: Transport Layer & TCP

28 Detecting Half-Open Connections
TCP A TCP B (CRASH) CLOSED SYN-SENT  <SEQ=400><CTL=SYN> (!!)  <SEQ=300><ACK=100><CTL=ACK> SYN-SENT  <SEQ=100><CTL=RST> SYN-SENT (send 300, receive 100) ESTABLISHED (??)  (Abort!!) CLOSED Fall 2007 CSci232: Transport Layer & TCP

29 TCP State Diagram: Connection Setup
Client CLOSED Server active OPEN create TCB Snd SYN passive OPEN CLOSE create TCB delete TCB LISTEN CLOSE delete TCB rcv SYN SEND SYN RCVD snd SYN ACK snd SYN SYN SENT rcv SYN snd ACK Rcv SYN, ACK rcv ACK of SYN Snd ACK CLOSE Send FIN ESTAB Fall 2007 CSci232: Transport Layer & TCP

30 TCP: Closing Connection
Remember TCP duplex connection! Client wants to close connection: Step 1: client end system sends TCP FIN control segment to server client server FIN client closing ACK half closed Step 2: server receives FIN, replies with ACK. half closed Step 3: client receives ACK. half closed, wait for server to close half closed FIN server closing Now lets look at the connection termination procedure. TCP connections are full duplex and can be thought of as two simplex connections. Each simplex connection can be terminated independently. When a client wants to close the connection, it sends a FIN segment (FIN flag set) and server responds with ACK. When server is ready to close the connection it will also send a FIN segment. The client responds with an ACK and server can then delete the state. Server finishes sending data, also ready to close: Step 4: server sends FIN. Fall 2007 CSci232: Transport Layer & TCP

31 TCP: Closing Connection (cont’d)
Step 5: client receives FIN, replies with ACK. connection fully closed client server client closing FIN half closed Step 6: server, receives ACK. connection fully closed ACK half closed server closing FIN Well Done! While the server can delete the state soon after receiver the ACK for its FIN, the client has to wait for a while before clearing the connection state. Why? The reason for this is that while the client has sent an ACK in response to server’s FIN segment, it does not know that the ACK was successfully delivered. As a consequence, the other side might retransmit its FIN segment, and this second FIN segment might be delayed in the network. If the connection were allowed to move directly to the CLOSED state, then another pair of application processes may open the same connection (i.e., use the same pair of port numbers and sequence numbers), and the delayed FIN segment from the earlier incarnation of the connection would immediately initiate the termination of the later incarnation of that connection. ACK full closed full closed Problem Solved? Fall 2007 CSci232: Transport Layer & TCP

32 TCP: Closing Connection (revised)
client FIN server ACK closing half closed Two Army Problem! Step 5: client receives FIN, replies with ACK. Enters “timed wait” - will respond with ACK to received FINs Step 6: server, receives ACK. connection fully closed timed wait ACK FIN X timeout While the server can delete the state soon after receiver the ACK for its FIN, the client has to wait for a while before clearing the connection state. Why? The reason for this is that while the client has sent an ACK in response to server’s FIN segment, it does not know that the ACK was successfully delivered. As a consequence, the other side might retransmit its FIN segment, and this second FIN segment might be delayed in the network. If the connection were allowed to move directly to the CLOSED state, then another pair of application processes may open the same connection (i.e., use the same pair of port numbers and sequence numbers), and the delayed FIN segment from the earlier incarnation of the connection would immediately initiate the termination of the later incarnation of that connection. full closed ACK Step 7: client, timer expires, connection fully closed full closed Fall 2007 CSci232: Transport Layer & TCP

33 TCP Connection Tear-Down Example
No. Time Source > Destination Proto SrcPort>DstPort [Flags] TCP > 22 [PSH,ACK] Seq= Ack= Win=15920 Len=32 TCP > 22 [FIN, ACK] Seq= Ack= Win=15920 Len=0 TCP > 1414 [ACK] Seq= Ack= Win=25200 Len= TCP > 1414 [ACK] Seq= Ack= Win=25200 Len= TCP > 1414 [FIN,ACK] Seq= Ack= Win=25200 Len= TCP > 22 [ACK] Seq= Ack= Win=15920 Len=0 Fall 2007 CSci232: Transport Layer & TCP

34 State Diagram: Connection Tear-down
CLOSE Active Close ESTAB send FIN Passive Close CLOSE rcv FIN send FIN send ACK FIN WAIT-1 CLOSE WAIT rcv FIN CLOSE ACK snd ACK snd FIN rcv FIN+ACK FIN WAIT-2 CLOSING LAST-ACK snd ACK rcv ACK of FIN rcv ACK of FIN TIME WAIT CLOSED rcv FIN Timeout=2min snd ACK delete TCB Fall 2007 CSci232: Transport Layer & TCP

35 TCP Connection Management FSM
TCP client lifecycle TCP client lifecycle This shows the life cycle of a connection from the client side. You can see how connection is established after sending SYN and receiving SYN-ACK. Also, how the connection enters CLOSED state after sending and receiving FINs and ACKs. We have already explained why we need to have TIME_WAIT state. This waiting period depends on implementation and it could be 120 seconds instead of 30 seconds. Fall 2007 CSci232: Transport Layer & TCP

36 TCP Connection Management FSM
TCP server lifecycle TCP server lifecycle On the server side, the connection is ESTABLISHED after receiving a SYN and another ACK for SYN-ACK segment. Then connection moves to CLOSED state after receiving a FIN and an ACK for the FIN segment sent. How come no TIME_WAIT state on the server side. Once the server receives the LAST ACK, it can safely close the connection. It is not expecting any more segments from the client side. Also even if there are any delayed duplicate ACKs in the network, they wouldn’t do any harm to a later incarnation of the connection. Fall 2007 CSci232: Transport Layer & TCP

37 Reliability and Error Recovery
ARQ vs. FEC automatic retransmission request forward error correction General ARQ Algorithms Stop & Wait Perform issue: low utilization when delay-bw product large Sliding Window Protocols Go-Back-N Selective Repeat Key design issues: window size vs. size of seq. no. space Fall 2007 CSci232: Transport Layer & TCP

38 Error Recovery: Stop and Wait
ARQ Receiver sends acknowledgement (ACK) when it receives packet Sender waits for ACK and timeouts if it does not arrive within some time period Simplest ARQ protocol Send a packet, stop and wait until ACK arrives Sender Receiver Packet Timeout ACK Time Fall 2007 CSci232: Transport Layer & TCP

39 CSci232: Transport Layer & TCP
Recovering from Error Packet Packet Packet Timeout Timeout Timeout ACK ACK Time Packet Packet Packet ACK Timeout Timeout Timeout ACK ACK Application may get duplicates in the case of early timeouts Early timeout DUPLICATE PACKETS!!! ACK lost Packet lost Fall 2007 CSci232: Transport Layer & TCP

40 Problems with Stop and Wait
How to recognize a duplicate Performance Can only send one packet per round trip Fall 2007 CSci232: Transport Layer & TCP

41 How to Recognize Resends?
Use sequence numbers both packets and acks Sequence # in packet is finite  How big should it be? For stop and wait? One bit – won’t send seq #1 until received ACK for seq #0 Pkt 0 ACK 0 Pkt 0 ACK 0 Pkt 1 ACK 1 Fall 2007 CSci232: Transport Layer & TCP

42 Problem with Stop & Wait Protocol
Sender Receiver data (L bytes) ACK first packet bit transmitted, t = 0 RTT first packet bit arrives ACK arrives, send next packet, t = RTT + L / R Can’t keep the pipe full Utilization is low when bandwidth-delay product (R x RTT)is large! Fall 2007 CSci232: Transport Layer & TCP

43 Stop & Wait: Performance Analysis
Example: 1 Gbps connection, 15 ms end-end prop. delay, data segment size: 1 KB = 8Kb U sender: utilization, i.e., fraction of time sender busy sending 1KB data segment every 30 msec (round trip time) --> 0.027% x 1 Gbps = 33kB/sec throughput over 1 Gbps link Moral of story: network protocol limits use of physical resources! Fall 2007 CSci232: Transport Layer & TCP

44 How to Keep the Pipe Full?
Send multiple packets without waiting for first to be acked Number of pkts in flight = window Reliable, unordered delivery Several parallel stop & waits Send new packet after each ack Sender keeps list of unack’ed packets; resends after timeout Receiver same as stop & wait How large a window is needed? Suppose 10Mbps link, 4ms delay, 500byte pkts 1? 10? 20? Round trip delay * bandwidth = capacity of pipe Fall 2007 CSci232: Transport Layer & TCP

45 Pipelined (Sliding Window) Protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged data segments range of sequence numbers must be increased buffering at sender and/or receiver Two generic forms of pipelined protocols: Go-Back-N and Selective Repeat Fall 2007 CSci232: Transport Layer & TCP

46 Pipelining: Increased Utilization
sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3! Fall 2007 CSci232: Transport Layer & TCP

47 CSci232: Transport Layer & TCP
Sliding Window Reliable, ordered delivery Receiver has to hold onto a packet until all prior packets have arrived Why might this be difficult for just parallel stop & wait? Sender must prevent buffer overflow at receiver Circular buffer at sender and receiver Packets in transit  buffer size Advance when sender and receiver agree packets at beginning have been received Fall 2007 CSci232: Transport Layer & TCP

48 Sender/Receiver State
Max ACK received Next seqnum Next expected Max acceptable Sender window Receiver window Sent & Acked Sent Not Acked Received & Acked Acceptable Packet OK to Send Not Usable Not Usable Fall 2007 CSci232: Transport Layer & TCP

49 Window Sliding – Common Case
On reception of new ACK (i.e. ACK for something that was not acked earlier) Increase sequence of max ACK received Send next packet On reception of new in-order data packet (next expected) Hand packet to application Send cumulative ACK – acknowledges reception of all packets up to sequence number Increase sequence of max acceptable packet Fall 2007 CSci232: Transport Layer & TCP

50 CSci232: Transport Layer & TCP
Loss Recovery On reception of out-of-order packet Send nothing (wait for source to timeout) Cumulative ACK (helps source identify loss) Timeout (Go-Back-N recovery) Set timer upon transmission of packet Retransmit all unacknowledged packets Performance during loss recovery No longer have an entire window in transit Can have much more clever loss recovery Fall 2007 CSci232: Transport Layer & TCP

51 CSci232: Transport Layer & TCP
Go-Back-N in Action Fall 2007 CSci232: Transport Layer & TCP

52 CSci232: Transport Layer & TCP
Selective Repeat Receiver individually acknowledges all correctly received pkts Buffers packets, as needed, for eventual in-order delivery to upper layer Sender only resends packets for which ACK not received Sender timer for each unACKed packet Sender window N consecutive seq #’s Again limits seq #s of sent, unACKed packets Fall 2007 CSci232: Transport Layer & TCP

53 Selective Repeat: Sender, Receiver Windows
Fall 2007 CSci232: Transport Layer & TCP

54 CSci232: Transport Layer & TCP
Sequence Numbers How large does size of sequence number space need to be? Must be able to detect wrap-around Depends on sender/receiver window size E.g. size of seq. no. space = 8, send win=recv win=7 If pkts 0..6 are sent succesfully and all acks lost Receiver expects 7,0..5, sender retransmits old 0..6!!! size of sequence no. space must be  send window + recv window Xxx picture Fall 2007 CSci232: Transport Layer & TCP

55 Sequence Numbers in TCP
TCP regards data as a “byte-stream” each byte in byte stream is numbered. 32 bit value, wraps around initial values selected at start up time TCP breaks up byte stream in packets. Packet size is limited to the Maximum Segment Size (MSS) Each packet has a sequence number seq. no of 1st byte indicates where it fits in the byte stream TCP connection is duplex data in each direction has its own sequence numbers packet 8 packet 9 packet 10 13450 14950 16050 17550 Fall 2007 CSci232: Transport Layer & TCP

56 TCP Seq. #’s and ACKs host ACKs receipt of echoed ‘C’ Host A Host B Seq=42, ACK=79, data = ‘C’ Seq=79, ACK=43, data = ‘C’ Seq=43, ACK=80 User types receipt of ‘C’, echoes back ‘C’ simple telnet scenario time red: A-to-B green: B-to-A Seq. #’s: byte stream “number”of first byte in segment’s data ACKs: seq # of next byte expected from other side There are several things to note about TCP sequence numbers and acknowledgements. First, they refer to bytes not packets. Second, acknowledgements are cumulative and also refer to the next byte expected. Lets look at a simple telnet example. The first segment shown gives the sequence number as 42 and acknowledge number as 79. This means that host A is sending bytes starting at 42. It also says that it has received all bytes up to and including 78 and it is expecting bytes from 79 onwards from host B. For this host B responds with bytes starting at seq 79 and ACK 43 since 43 is the next byte it is expecting from host A. Fall 2007 CSci232: Transport Layer & TCP

57 TCP Reliable Data Transfer
TCP creates reliable data transfer service on top of IP’s unreliable service Pipelined segments Cumulative ACKs TCP uses single retransmission timer Retransmissions are triggered by: timeout events duplicate acks Initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control Fall 2007 CSci232: Transport Layer & TCP

58 TCP = Go-Back-N Variant
Sliding window with cumulative acks Receiver can only return a single “ack” sequence number to the sender. Acknowledges all bytes with a lower sequence number Starting point for retransmission Duplicate acks sent when out-of-order packet received But: sender only retransmits a single packet. Reason??? Only one that it knows is lost Network is congested  shouldn’t overload it Error control is based on byte sequences, not packets. Retransmitted packet can be different from the original lost packet – Why? Fall 2007 CSci232: Transport Layer & TCP

59 CSci232: Transport Layer & TCP
TCP Sender Events: data rcvd from app: Create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer ACK received: If acknowledges previously unACKed segments update what is known to be ACKed start timer if there are outstanding segments Fall 2007 CSci232: Transport Layer & TCP

60 TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Arrival of segment that partially or completely fills gap TCP Receiver Action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap Fall 2007 CSci232: Transport Layer & TCP

61 CSci232: Transport Layer & TCP
TCP Flow Control sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control receive side of TCP connection has a receive buffer: speed-matching service: matching the send rate to the receiving app’s drain rate app process may be slow at reading from buffer Fall 2007 CSci232: Transport Layer & TCP

62 TCP Flow Control: How It Works
Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow guarantees receive buffer doesn’t overflow (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow (dynamically changes) = RcvBuffer-[LastByteRcvd - LastByteRead] Fall 2007 CSci232: Transport Layer & TCP

63 TCP Segment Structure source port # dest port # application data
32 bits application data (variable length) sequence number acknowledgement number rcvr window size ptr urgent data checksum F S R P A U head len not used Options (variable length) counting by bytes of data (not segments!) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) A TCP packet is referred to as TCP segment. This shows the structure of a TCP segment. The port numbers identify the end points of the connection. The sequence number refers to the byte number starting from the initial sequence number of the connection. The acknowledgement number corresponds to the data in the other direction. It is the next byte this side is expecting from its peer. There are several flags. The ACK flag validates the acknowledgement number field. The SYN and FIN flags are used to establish and terminate a connection. The RST flag signifies that the receiver has become confused with some unexpected condition and so wants to abort the connection. The receiver window size meant for data in the other direction. It specifies the amount of buffer available for reception on this side. The checksum covers both the header and the data. Internet checksum (as in UDP) Fall 2007 CSci232: Transport Layer & TCP

64 Triggering Transmission
How does TCP decide to transmit a segment? MSS (Maximum segment size) Set to size of the largest segment TCP can send without local IP fragmentation (MTU of directly connected) Sending process explicitly asked to do (Push to flush) Firing timer Silly Window Syndrome Flow control needs to be maintained Sender can transmit full segment (MSS) when Acked by receiver Fall 2007 CSci232: Transport Layer & TCP

65 Silly Window Syndrome (cont’d)
Window currently closed from receiver ACK opens MSS/2 bytes Should sender transmit MSS/2? Original TCP implementation silent Early implementation of TCP decided to go ahead Sender can not know when the window will open for full MSS If sender is aggressive, sending available window size results Silly window syndrome small segment size remains indefinitely Hence a problem when either sender transmits a small segment or receiver opens window a small amount Fall 2007 CSci232: Transport Layer & TCP

66 Triggering Transmission (cont’d)
Receiver may delay ACKs, but how long? Ultimate solution lies with sender: When does the TCP sender decide to transmit a segment? Nagle’s Algorithm: Waiting too long hurt interactive applications (Telnet) Without waiting, risk of sending a bunch of tiny packets (silly window syndrome) Wait till timer expires: Self clocking: As long as TCP has any data in flight, sender receives an ACK which can be used to trigger transmission If no data in flight, immediately send the segment (setting TCP_NoDElAY option) Fall 2007 CSci232: Transport Layer & TCP

67 TCP Round Trip Time and Timeout
Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions, why? SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT Fall 2007 CSci232: Transport Layer & TCP

68 Round-trip Time Estimation
Wait at least one RTT before retransmitting Importance of accurate RTT estimators: Low RTT estimate unneeded retransmissions High RTT estimate poor throughput RTT estimator must adapt to change in RTT But not too fast, or too slow! Spurious timeouts “Conservation of packets” principle – never more than a window worth of packets in flight Fall 2007 CSci232: Transport Layer & TCP

69 Adaptive Retransmission (Original Algorithm)
Measure SampleRTT for each segment/ ACK pair Compute weighted running average of RTT EstRTT = a x EstimatedRTT + (1-a) x SampleRTT a between 0.8 and 0.9 ( to smooth Estimated RTT) Small a indicates temp. fluctuation, a large value more stable, may not be quick to adapt to real changes Set timeout based on EstRTT TimeOut = 2 x EstRTT Fall 2007 CSci232: Transport Layer & TCP

70 Retransmission Ambiguity
Sender Receiver Sender Receiver Original transmission Original transmission TT TT ACK Retransmission SampleR SampleR Retransmission ACK ACK is for Original transmission but was for retransmission => Sample RTT is too large ACK is for retransmission but was for original => Sample RTT too small Fall 2007 CSci232: Transport Layer & TCP

71 Karn/Partridge Algorithm
Solution: Do not sample RTT when retransmitting only measures sample RTT for segments sent once Double timeout for each retransmission Next timeout to be twice the last timeout, rather than basing it on the last Estimated RTT Karn and Patridge proposal is exponential backoff Congestion is most likely cause of lost segments TCP sources should not react too aggressively to a timeout More timeouts mean more cautious the source should become (congestion problem) Fall 2007 CSci232: Transport Layer & TCP

72 Jacobson/ Karels Algorithm
Original computation for RTT did not take the variance of sample RTTs into account If variation among samples is small, Estimated RTT can be better used without increasing the estimate twice A large variance in the samples mean Time out values should not be too tightly coupled to the Estimated RTT New Calculations for average RTT Diff = SampleRTT - EstRTT EstRTT = EstRTT + (  x Diff) Dev = Dev +  ( |Diff| - Dev) where  is a fraction between 0 and 1 Consider variance when setting timeout value TimeOut = m x EstRTT + f x Dev where m = 1 and f = 4 Fall 2007 CSci232: Transport Layer & TCP

73 TCP Round Trip Time Estimation
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential weighted moving average influence of past sample decreases exponentially fast typical value:  = 0.125 Setting the timeout interval Estimted RTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin “safty margin”: accommodate variations in estimatedRTT DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically,  = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT Fall 2007 CSci232: Transport Layer & TCP

74 Example RTT Estimation:
Fall 2007 CSci232: Transport Layer & TCP

75 CSci232: Transport Layer & TCP
Timestamp Extension Used to improve timeout mechanism by more accurate measurement of RTT When sending a packet, insert current time into option 4 bytes for time, 4 bytes for echo a received timestamp Receiver echoes timestamp in ACK Actually will echo whatever is in timestamp Removes retransmission ambiguity Can get RTT sample on any packet Fall 2007 CSci232: Transport Layer & TCP

76 CSci232: Transport Layer & TCP
Timer Granularity Many TCP implementations set RTO (Retransmission Timeout) in multiples of 200,500,1000ms Why? Avoid spurious timeouts – RTTs can vary quickly due to cross traffic Make timers interrupts efficient What happens for the first couple of packets? Pick a very conservative value (seconds) Fall 2007 CSci232: Transport Layer & TCP

77 CSci232: Transport Layer & TCP
Important Lessons TCP state diagram  setup/teardown TCP timeout calculation  how is RTT estimated Modern TCP loss recovery Why are timeouts bad? How to avoid them?  e.g. fast retransmit Fall 2007 CSci232: Transport Layer & TCP

78 CSci232: Transport Layer & TCP
Fast Retransmit What are duplicate acks (dupacks)? Repeated acks for the same sequence When can duplicate acks occur? Loss Packet re-ordering Window update – advertisement of new flow control window Assume re-ordering is infrequent and not of large magnitude Use receipt of 3 or more duplicate acks as indication of loss Don’t wait for timeout to retransmit packet Fall 2007 CSci232: Transport Layer & TCP

79 CSci232: Transport Layer & TCP
Fast Retransmit Retransmission X Duplicate Acks Sequence No Packets Acks Time Fall 2007 CSci232: Transport Layer & TCP

80 CSci232: Transport Layer & TCP
TCP (Reno variant) X X X Now what? - timeout X Sequence No Packets Acks Time Fall 2007 CSci232: Transport Layer & TCP

81 CSci232: Transport Layer & TCP
SACK Basic problem is that cumulative acks provide little information Selective acknowledgement (SACK) essentially adds a bitmask of packets received Implemented as a TCP option Encoded as a set of received byte ranges (max of 4 ranges/often max of 3) When to retransmit? Still need to deal with reordering  wait for out of order by 3pkts Fall 2007 CSci232: Transport Layer & TCP

82 CSci232: Transport Layer & TCP
SACK X X X Now what? – send retransmissions as soon as detected X Sequence No Packets Acks Time Fall 2007 CSci232: Transport Layer & TCP


Download ppt "CSci232: Transport Layer & TCP"

Similar presentations


Ads by Google