2 Content Wireless TCP and UDP Additional Slides Transport layer services provided to upper layersTransport service protocolsBerkley SocketsAn Example of Socket ProgrammingInternet file serverElements of transport protocolsAdressing, establishing and releasing a connectionTransport layer protocolsUDPUDP segment headerRemote Procedure CallTCPService modelTCP ProtocolThe TCP Segment HeaderTCP Connection EstablishmentTCP Connection ReleaseTCP Connection Management ModelingTCP Transmission PolicyTCP Congestion ControlTCP Timer ManagementWireless TCP and UDPAdditional Slides
3 Services provided to the upper layer Provides a seamless interface between the Application layer and the Network layerThe heart of the communications systemTwo types:Connection oriented transport serviceConnectionless transport serviceKey function is of isolating the upper layers from the technology, design and imperfections of the subnet (network layer)Allows applications to talk to each other without knowing aboutThe underlying networkPhysical links between nodes
4 Network, transport and application layers TPDU – Transport Packet Data UnitTransport Layer makes use of services provided by the network layer. The implementation of transport layer is done usually in software, and it is called transport entity. The transport entity can be located in the operating system kernel, in a separate user process, in a library package linked into the network applications or sometimes in the hardware, in the network cards.
5 TPDUTransport Packet Data Unit are sent from transport entity to transport entityTPDUs are contained in packets (exchanged by network layer)Packets are contained in frames (exchanged by the data link layer)
6 Transport service primitives Transport layer can provideConnection oriented transport service, providing an error free bit/byte stream; reliable service on top of unreliable networkConnectionless unreliable transport (datagram) serviceExample of basic transport service primitives:LISTEN – in a client server application, the server executes this primitive, blocking the server until a client turns upCONNECT – when a client wants to talk to the server, it executes this primitive; the transport entity caries out this primitive (by sending a connection packet request to the server and waiting for a connection accepted response), blocking the caller until the connection is established.SEND/RECEIVE primitives can be used to exchange data after the connection has been established; either party could do (blocking) RECEIVE to wait for the other party to do SEND; when the TPDU arrives, the receiver is unblocked, does the required processing and sends back a replyThis is what is happening from the user point of view; in reality, the transport entity has to acknowledge every data and control TPDU, implement time-outs and re-transmission policies.None of the implementation details of the transport entity is visible to the transport users; for them, the connection is just a reliable bit pipeDISCONNECT – when a connection is no longer needed, it must be released in order to free up tables in the transport entities; it can be asymmetric (either end sends a disconnection TPDU to the remote transport entity; upon arrival, the connection is released) or symmetric (each direction is closed separately)DISCONNECT – when a connection is no longer needed, it must be released in order to free up tables in the transport entities; it can be asymmetric (either end sends a disconnection TPDU to the remote transport entity; upon arrival, the connection is released) or symmetric (each direction is closed separately)CONNECT – when a client wants to talk to the server, it executes this primitive; the transport entity caries out this primitive (by sending a connection packet request to the server and waiting for a connection accepted response), blocking the caller until the connection is established.LISTEN – in a client server application, the server executes this primitive, blocking the server until a client turns upSEND/RECEIVE primitives can be used to exchange data after the connection has been established; either party could do (blocking) RECEIVE to wait for the other party to do SEND; when the TPDU arrives, the receiver is unblocked, does the required processing and sends back a reply
7 FSM modelingIn a basic system you can be one of a finite number of conditions (states)listening - waiting for something to happenconnectingconnecteddisconnectedwhilst connected you can either besendingreceivingIn this way we have defined the problem to be a limited number of states with a finite number of transitions between them
8 State diagram for a simple transport manager This is a slide presenting a simple state diagram for connection establishment and release for the simple transport primitives presented so far. Each transition is triggered by some event, either a primitive executed by the local transport user or an incoming packet. For simplicity, we assume that each TPDU is separately acknowledged.Transitions in italics are caused by packet arrivalsSolid lines show the client’s state sequenceDashed lines show the server’s sequence
9 Berkley socketsThe first four primitives are executed in that order by the serversThe client side:SOCKET must be called first to create a socket (first primitive)The socket doesn’t have to be bound since the address of the client doesn’t matter to the serverCONNECT blocks the caller and actively establishes the connection with the serverSEND/RECEIVE data over a full duplex connectionCLOSE primitive has to be called to close the connection; connection release with sockets is symmetric, so both sides have to call the CLOSE primitive
12 Transport protocolTransport protocols reassemble the data link protocolsBoth have to deal with error control, sequencing and flow controlSignificant differences between the two, given by:At data-link layer, two router communicate directly over a physical wireAt transport layer, the physical channel is replaced by the subnetExplicit addressing of the destination is required for transport layerNeed for connection establishment and disconnection (in the data-link case the connection is always there)Packets get lost, duplicated or delayed in the subnet, the transport layer has to deal with this
13 Addressing – fixed TSAP TSAP – Transport Service Access PointTo whom should each message be sent??Have transport address to which processes can listen for connection requests (or datagrams in connectionless transports)Host and process is identified Service Access Points (TSAP, NSAP)In Internet, TSAP is the port and NSAP is the IP
14 Addressing – dynamic TSAP How a user process in host 1 establishes a connection with a time-of-day server in host 2.Initial connection protocolProcess server that acts as a proxy server for less-heavily used serversListens on a set of ports at the same time for a number of incoming TCP connection requestsSpawns the requested server, allowing it to inherit the existing connection with the userName server (directory server)Handles situations where the services exist independently of the process server (i.e. file server, needs to run on special hardware and cannot just be created on the fly when someone wants to talk to it)Listens to a well known TSAPGets message specifying the service name and sends back the TSAP address for the given serviceServices need to register themselves with the name server, giving both its service name and its TSAP address
15 Establishing a connection When a communication link is made over a network (internet) problems can arise:The network can lose, store and duplicate packetsUse case scenario:User establishes a connection with a bank, instructs the bank to transfer large amount of money, closes connectionThe network delivers a delayed set of packets, in same sequence, getting the bank to perform another transferDeal with duplicated packetsThere way handshake to establish a connection
16 Deal with duplicate packets (1) Use of throwaway TSAP addressesNot good, because the client-server paradigm will not work anymoreGive each connection an unique identifier (chosen by the initiating party and attached in each TPDU)After disconnection, each transport entity to update a used connections table (source transport entity, connection identifier) pairRequires each transport entity to maintain history informationPacket lifetime has to be restricted to a known one:Restricted subnet designAny method that prevents packets from loopingHop counter in each packetHaving a hop counter incremented every time the packet is forwardedTime-stamping each packetEach packet caries the time it was created, with routers agreeing to discard any packets older than a given timeRouters have to have sync clocks (not an easy task)In practice, not only the packet has to be dead, but all the sequent acknowledges to it are also dead
17 Deal with duplicate packets (2) Tomlinson (1975) proposed that each host will have a binary counter that increments itself at uniform intervalsThe number of bits in the counter has to exceed the number of bits used in the sequence numberThe counter will run even if the host goes downThe basic idea is to ensure that two identically numbered TPDUs are never outstanding at the same timeEach connection starts numbering its TPDUs with a difference sequence number; the sequence space should be so large that by the time sequence numbers will wrap around, old TPDUs with the same sequence numbers are long gone
18 Connection Establishment Control TPDUs may also be delayed; Consider the following situation:Host 1 sends CONNECTION REQ (initial seq. no, destination port no.) to a remote peer host 2Host 2 acknowledges this req. by sending CONNECTION ACCEPTED TPDU backIf first request is lost, but a delayed duplicate CONNECTION REQ suddenly shows up at host 2, the connection will be established incorrectly; 3 way handshake solves this;3-way handshakeEach packet is responded to in sequenceDuplicates must be rejected
19 Three way handshakeThree protocol scenarios for establishing a connection using a three-way handshake.CR and ACK denote CONNECTION REQUEST and CONNECTION ACCEPTED, respectively.(a) Normal operation.(b) Old duplicate CONNECTION REQUEST appearing out of nowhere.(c) Duplicate CONNECTION REQUEST and duplicate ACK.Host 1 initiates the connectionChooses the initial sequence number x and sends a CONNECTION REQUEST TPDUHost 2 acknowledges host 1 request with a CONNECTION ACCEPTED TPDU, acknowledging x and announcing its own initial sequence number yHost 1 acknowledges host’s 2 choice of an initial sequence number in the first data TPDU that it sends
20 Asymmetric connection release Host 1 sends the TPDU that arrives properly at host 2 (after the connection is established). Then host 1 will send another data TPDU, but in the mean time host 2 decided to close the connection … (DISCONNECTION TPDU) … this result In the connection release and loss of data.Asymmetric releaseIs abrupt and may result in loss of data
21 Symmetric connection release (1) a) Host 1 sends a DR TPDU and starts a timer. After the first DR gets to the HOST2, this host will start a timer, in case its DR is lost and sends a DR back to host1. Host 1 will send back an ACK TPDU and closes its connection. Finally the ACK TPDU arrives at the client and host2 is releasing its connection too.b) Loss of the final ACK situation – the timer started on host 2 saves the situationFour protocol scenarios for releasing a connection.(a) Normal case of a three-way handshake(b) final ACK lost – situation saved by the timer
22 Symmetric connection release (2) c) Second DR is being lost .. User initiating the disconnection will timeout and send again a DR.d) Same as c) but now assume that all attempts to resend DR will fail … in this case after N retries the sender just gives up and closes the connection. Meanwhile, the receiver times out and exits as well.(c) Response lost(d) Response lost and subsequent DRs lost
23 TCP/IP transport layer Two end-end protocolsTCP - Transmission Control Protocolconnection oriented (either real or virtual)fragments a message for sendingcombines the message on receiptUDP - User Datagram Protocolconnection less - unreliableflow control etc provided by applicationclient server application
24 The big picture Internet has two main protocols at the transport layer ICMP – Internet Control and Messaging ProtocolIGMP – Internet Group Management Protocol – used for multicast addressingInternet has two main protocols at the transport layerConnectionless protocol: UDP (User Datagram Protocol)Connection oriented protocol: TCP (Transport Control Protocol)
25 User Datagram Protocol Simple transport layer protocol described in RFC 768, in essence just a IP datagram with a short headerProvides a way to send encapsulated raw IP datagrams without having to establish a connectionThe application writes a datagram to a UDP socket, which is encapsulated as either IPv4 or IPv6 datagram that is sent to the destination; there is no guarantee that UDP datagram ever reaches its final destinationMany client-server application that have one request – one response use UDPEach UDP datagram has a length; if a UDP datagram reaches its final destination correctly, then the length of the datagram is passed onto the receiving applicationUDP provides a connectionless service as there is no need for a long term relation-ship between a client and the serverIn example, a UDP client can create a socket and send a datagram to a given server and then immediately send another datagram on the same socket to a different server; similarly, UDP server can receive multiple datagrams from different sources on the same single UDP socket
26 UDP header UDP segment consists of 8 bytes header followed by the data The source port and destination port identify the end points within the source and destination machines; without the port fields, the transport would not know what to do with the packet; with them, it delivers the packets correctly to the application attached to the destination portThe UDP length field includes the 8 byte header and the dataThe UDP checksum includes the 1 complement sum of the UDP data and header. It is optional, if not computed it should be stored as all bits “0”.
27 UDPDoes not:Flow controlError controlRetransmission upon receipt of a bad segmentProvides an interface to the IP protocol, with the added feature of demultiplexing multiple processes using the portsOne area where UDP is useful is client server situations where the client sends a short request to the server and expects a short replayIf the request or reply is lost, the client can timeout and try againDNS (Domain Name System) is an application that uses UDPshortly, the DNS is used to lookup the IP address of some host name; DNS sends and UDP packet containing the host name to a DNS server, the server replies with an UDP packet containing the host IP address. No setup is needed in advance and no release of a connection is required. Just two messages go over the network)Widely used in client server RPCWidely used in real time multimedia applications
28 Remote Procedure Call (1) Sending a message to a server and getting a reply back is like making a function call in programming languagesi.e. gethostbyname(char *host_name)Works by sending a UDP packet to a DNS server and waiting for the reply, timing out and trying again if an answer is not coming quickly enoughAll the networking details are hidden from the programmerBirrell and Nelson (1984) proposed the RPC mechanismAllows programs to call procedures located on remote hosts, and makes the remote procedure call look as much alike a local procedure callWhen a process on machine 1 calls a procedure on machine 2, the calling process on machine 1 is suspended and execution of the called procedure takes place on 2Information can be transported from caller to the callee in the parameters and can come back in the procedure resultNo message parsing is visible to the programmerStubsThe client program must be bound with a small library procedure, called the client stub, that represents the server procedure in the client’s address spaceSimilarly, the server is bound with a procedure called the server stubThose procedures hide the fact that the procedure call from the client to the server is not local
29 Remote Procedure Call (2) Step 4: the kernel on the receiving machine is passing the message from the transport stack to the server stub.Step 5: the server stub is calling the server procedure with the unmarchaled parameters.The reply traces the same path in the other directionStep 3: the kernel is sending the message from the client machine to the server machine, over a transport and network protocol.Step 2: the client stub is packing the parameters into a message and makes a system call to send the message; packing the parameters is called marshaling, and it is a very wide use technique.Step 1: Client calling the client stub. This is a local procedure call, with the parameters pushed into the stack in the normal wayStep 1: Client calling the client stub. This is a local procedure call, with the parameters pushed into the stack in the normal wayStep 2: the client stub is packing the parameters into a message and makes a system call to send the message; packing the parameters is called marshaling, and it is a very wide use technique.Step 3: the kernel is sending the message from the client machine to the server machine, over a transport and network protocol.Step 4: the kernel on the receiving machine is passing the message from the transport stack to the server stub.
30 Problems with RPC Use of pointer parameters Using pointers is impossible, because the client and the server are in different address spacesCan be tricked by replacing the call by reference mechanism with a copy-restore oneProblems with custom defined type of dataProblems using global variables as means of communication between the calling and called proceduresIf the called procedure is moved on a remote machine, because the variables are not longer shared
31 Transmission Control Protocol Provides a reliable end to end byte stream over unreliable internetwork (different parts may have different topologies, bandwidths, delays, packet sizes, etc…)Defined in RFC793, with some clarifications and bug fixes in RFC1122 and extensions in RFC1323Each machine supporting TCP has a TCP transport entity (user process or part of the kernel) that manages TCP streams and interfaces to the IP layerAccepts user data streams from local processes, breaks them into pieces (not larger than 64KB) and sends each piece as a separate IP datagramAt the receiving end, the IP datagrams that contains TCP packets, are delivered to the TCP transport entity, which reconstructs the original byte streamThe IP layer gives no guarantee that the datagrams will be delivered properly, so it is up to TCP to time out and retransmit them as the need arises.Datagrams that arrive, may be in the wrong order; it is up to TCP to reassemble them into messages in the proper sequence
32 TCP Service Model (1)Provides connections between clients and servers, both the client and the server create end points, called socketsEach socket has a number (address) consisting of the IP address of the host and a 16 bits number, local to that host, called port (TCP name for TSAP)To obtain a TCP service, a connection has to be explicitly established between the two end pointsA socket may be used for multiple connections at the same timeTwo or more connections can terminate in the same socket; connections are identified by socket identifiers at both ends (socket1, socket2)Provides reliabilityWhen TCP sends data to the other end it requires acknowledgement in return; if ACK is not received, the TCP entity retransmits automatically the data and waits a longer amount of time
33 TCP Service Model (2)TCP contains algorithms to estimate round-trip time between a client and a server to know dynamically how long to wait for an acknowledgementTCP provides flow control – it tells exactly to its peer how many bytes of data is willing to accept from its peer.Port numbers bellow 1024 are called well known ports and are reserved for standard servicesPort 21 used by FTP (File Transfer Protocol)Port 23 used by Telnet (Remote Login)Port 25 used by SMTP ( )Port 69 used by TFTP (Trivial File Transfer Protocol)Port 79 used by Finger (lookup for a user information)Port 80 used by HTTP (World Wide Web)Port 110 used by POP3 (remote access)Port 119 used by NNTP (Usenet news)
34 TCP Service Model (3)All TCP connections are full duplex and point to point (it doesn’t support multicasting or broadcasting)A TCP connection is a byte stream not a message stream (message boundaries are not preserved)i.e. if a process is doing four writes of 512 bytes to a TCP stream, data may be delivered at the other end as four 512 bytes reads, two 1024 bytes reads or one 2048 read(a) Four 512-byte segments sent as separate IP datagrams.(b) The 2048 bytes of data delivered to the application in a single READ CALL.
35 TCP Service Model (4)When an application sends data to TCP, TCP entity may send it immediately or buffer it (in order to collect a larger amount of data to send at once)Sometimes the application really wants the data to be sent immediately (i.e. after a command line has been finished); to force the data out, applications can use the PUSH flag (which tells TCP not to delay transmission)Urgent data – is a way of sending some urgent data from one application to another over TCPi.e. when an interactive user hits DEL or CTRL-C key to break-off the remote computation; the sending app puts some control info in the TCP stream along with the URGENT flag; this causes the TCP to stop accumulating data and send everything it has for that connection immediately; when the urgent data is received at the destination, the receiving application is interrupted (by sending a break signal in Unix), so it can read the data stream to find the urgent data
36 TCP Protocol Overview (1) Each byte on a TCP connection has its own 32 bit sequence number, used for various purposes (re-arrangement of out of sequence segments; identification of duplicate segments, etc…)Sending and receiving TCP entities exchange data in the form of segments. A segment consists of a fixed 20 byte fixed header (plus an optional part) followed by zero or more data bytesThe TCP software decide how big segments should be; it can accumulate data from several writes to one segment or split data from one write over multiple segments.Two limits restrict the segment size:Each segment including the TCP header must fit the byte IP payloadEach network has an maximum transfer unit (MTU) and each segment must fit in the MTU (in practice the MTU is a few thousand bytes and therefore sets the upper bound on the segment size)
37 TCP protocol Overview (2) The basic protocol is the sliding window protocolWhen a sender transmits a segment, it also starts a timerWhen the segment arrives at the destination, the receiving TCP sends back a segment (with data, if any data is to be carried) bearing an acknowledgement number equal to the next sequence number it expects to receive;If sender’s timer goes off, before an acknowledgement has been received, the segment is sent againProblems with the TCP protocolSegments can be fragmented on the way; parts of the segment can arrive, but some could get lostSegments can be delayed and duplicates can arrive at the receiving endSegments may hit a congested or broken network along its path
38 TCP segment headerSource and destination ports identify the local endpoints for the connection; each host may decide for itself how to allocate the its own ports starting at 1024; a port number plus its host IP form a 48 bits unique TSAPSequence Number is associated with every byte that is sent. It is used for a number of different purposes: it is used to re-arrange the data at the receiving end, before passing it to the application; it is used to detect duplicate data and Acknowledgement number fields specifies the next byte expectedTCP header length tells how many 32 bit words are contained in the TCP header; this is required because the Options field is of variable length; technically it indicates the start of data within the segment, measured in 32 bits words.URG flag is set to 1 if urgent pointer is in use; the urgent pointer is used to indicate a byte offset from the current sequence number at which urgent data is to be found; this facilitates interrupt messages without getting the TCP itself involved in carrying such message typesACK flag is set to 1 to indicate that the acknowledgement number is valid; if set to 0, then the packet doesn’t contain an acknowledgement, so the appropriate field is ignored (the Acknowledgement number field is ignored)PSH flag indicates pushed data, so the receiver is requested to deliver the received data to the application upon arrival, without buffering it to form a full buffer has been receivedRST flag is used to restart a connection that has become confused due to a host crash or any other reasons; it is also used to reject an invalid segment or refuse an attempt to open a connectionSYN flag is used to establish connections; the connection requests have SYN = 1 and ACK = 0 to indicate that the piggyback acknowledgement field is not in use; the connection response does bear an acknowledgment so it has SYN = 1 and ACK = 1; In essence SYN bit is used to denote an CONNECTION REQUEST and a CONNECTION ACCEPTED with ACK field used to distinguish between those two possibilitiesFlow control in TCP is done using variable size sliding window; the Window size field tells how many bytes may be sent starting at the byte acknowledged; a window size field with value 0 is legal and means that bytes up to (Acknowledgement number -1) have been received, and no more accepted; to resume receiving data, the receiver releases another segment with a window size different than 0 and same acknowledge numberChecksum is provided for extreme reliability. It checksums the header, the data and the conceptual pseudo-header shown on the next slide; when performing the computation, the data field is padded with an additional zero byte if its length is an odd number; the checksum is simply the sum in 1’s complement; as consequence, when the receiver performs the calculation on the entire segment, including the checksum field, the result should be 0Options field was designated to provide extra facilities not covered by the regular header; the most important one is the one that allows each host to specify the maximum TCP payload is willing to accept; all Internet hosts are required to accept at least TCP segmentsFIN flag is used to release connections; it specifies that the sender has no more data to transmit; Both SYN and FYN segments have sequence number and thus guaranteed to be processed in correct orderFlow control in TCP is done using variable size sliding window; the Window size field tells how many bytes may be sent starting at the byte acknowledged; a window size field with value 0 is legal and means that bytes up to (Acknowledgement number -1) have been received, and no more acceptedOptions field was designated to provide extra facilities not covered by the regular header; the most important one is the one that allows each host to specify the maximum TCP payload is willing to accept; all Internet hosts are required to accept at least TCP segmentsSYN flag is used to establish connections; the connection requests have SYN = 1 and ACK = 0 to indicate that the piggyback acknowledgement field is not in use; the connection response does bear an acknowledgment so it has SYN = 1 and ACK = 1;Checksum is provided for extreme reliability. It checksums the header, the data and the conceptual pseudo-header shown on the next slide; when performing the computation, the data field is padded with an additional zero byte if its length is an odd number; the checksum is simply the sum in 1’s complementRST flag is used to restart a connection that has become confused due to a host crash or any other reasons; it is also used to reject an invalid segment or refuse an attempt to open a connectionTCP header length tells how many 32 bit words are contained in the TCP header; this is required because the Options field is of variable length; technically it indicates the start of data within the segment, measured in 32 bits words.Sequence Number is associated with every byte that is sent. It is used for a number of different purposes: it is used to re-arrange the data at the receiving end, before passing it to the application; it is used to detect duplicate data and Acknowledgement number fields specifies the next byte expectedURG flag is set to 1 if urgent pointer is in use; the urgent pointer is used to indicate a byte offset from the current sequence number at which urgent data is to be found; this facilitates interrupt messages without getting the TCP itself involved in carrying such message typesACK flag is set to 1 to indicate that the acknowledgement number is valid; if set to 0, then the packet doesn’t contain an acknowledgement, so the appropriate field is ignored (the Acknowledgement number field is ignored)PSH flag indicates pushed data, so the receiver is requested to deliver the received data to the application upon arrival, without buffering it to form a full buffer has been receivedSource and destination ports identify the local endpoints for the connection; each host may decide for itself how to allocate the its own ports starting at 1024; a port number plus its host IP form a 48 bits unique TSAP
39 TCP pseudoheaderThe pseudoheader contains the 32 bits IP addresses of the source and the destination machines, the protocol number (6 for TCP) and the byte count for the TCP segment (including the header)Including this pseudoheader in the TCP checksum calculation helps detect miss-delivered packets, but doing so violates the protocol hierarchy (since IP addresses belong to the network layer, not to the TCP layer)
40 TCP extra optionsFor lines with high bandwidth and high delay, the 64KB widow size is often a problemi.e. on a T3 line ( Mb/s) it takes only about 12ms to output a full 64KB window. If the round trip propagation delay is 50 ms (typical for a transcontinental fiber), then the sender will be idle ¾ of the time, waiting for acknowledgementsIn RFC 1323 a window scale option was proposed, allowing the sender and receiver to negotiate a window scale factor. This number allows both sides to shift the window size up to 14 bits to the left, thus allowing for window size up to 230 bytes; most TCP implementations support this optionThe use of “selective repeat” instead of “go and back n” protocol described in RFC 1106If the receiver gets a bad segment and then a large number of good ones, the normal TCP protocol eventually time out and retransmit all the unacknowledged segments, including the ones that were received correctlyRFC 1106 introduces NACs to allow the receiver to ask for a specific segment (or segments). After it gets those, it can acknowledge all the buffered data, thus reducing the amount of data retransmitted.
41 TCP connection establishment It is use the three-way handshake protocola) Normal caseb) Call collision case, when two hosts are trying to establish a connection between same two socketsThe result is that just one connection will be established, not two, because the connections are identified by their endpoints.
42 TCP connection establishment …  the server must be prepared to accept an incoming connection; this is normally done by calling socket, bind and listen and it is called passive open the client, after the creation of a new socket, issues an active open by calling connect. This causes the client TCP to send a SYN segment (which stands for synchronize) to tell the server the client’s initial sequence number for the data that the client will send on the connection; normally there is no data sent with SYN: it just contains an IP header, a TCP header and possible TCP options the server must acknowledge the client’s SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends SYN and the ACK of the client’s SYN in a single segmentthe client must acknowledge the server’s SYNThe initial sequence number on a connection is not 0 (to avoid confusion when a host crashes). A clock based scheme is used, with a clock tick every 4 us. For additional safety, when a host crashes, it may not reboot for the maximum packet life time (120s) to make sure that no packets from previous connections are still roaming around the Internet, somewhere.the client must acknowledge the server’s SYNNote that SYN consumes one byte of sequence space so it can be acknowledged unambiguouslyThe initial sequence number on a connection is not 0 (to avoid confusion when a host crashes). A clock based scheme is used, with a clock tick every 4 us. For additional safety, when a host crashes, it may not reboot for the maximum packet life time (120s) to make sure that no packets from previous connections are still roaming around the Internet, somewhere. the server must acknowledge the client’s SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends SYN and the ACK of the client’s SYN in a single segment the server must be prepared to accept an incoming connection; this is normally done by calling socket, bind and listen and it is called passive open the client, after the creation of a new socket, issues an active open by calling connect. This causes the client TCP to send a SYN segment (which stands for synchronize) to tell the server the client’s initial sequence number for the data that the client will send on the connection; normally there is no data sent with SYN: it just contains an IP header, a TCP header and possible TCP options
43 TCP connection termination The connection is full duplex, each simple connection is released independentlyTo release a connection, either party can send a TCP segment with FIN bit set, which means that there is no more data to transmitWhen FIN is acknowledged, that direction is shut down for new data; however, data may continue to flow indefinitely in the other directionWhen both directions have been shutdown, the connection is releasedNormally, four TCP segments are used to shutdown the connection (one FIN and one ACK for each direction)To avoid complications when segments are lost, timers are used; if the ACK for a FIN packet is not arriving in two packet lifetimes, the sender of the FIN releases the connection; the other side will eventually realize that nobody seem to listen to it anymore and times out as well
44 TCP connection termination  one application calls close first, and we say that this end performs the active close. This end’s TCP sends a FIN segment, which means it is finished sending data sometime latter, the application that received the end-of-file will close the socket; this will cause its TCP to send a FIN packet the TCP on the system that receives the final FIN (the end that did the active close) acknowledges the FIN the TCP on the system that receives the final FIN (the end that did the active close) acknowledges the FIN the other end receives the FIN and performs a passive close. The received SYN is acknowledged by the TCP; the receipt of the FIN is also passed onto the application as an end-of-file; upon the receipt of FIN, the application will not receive anymore any data on the socket sometime latter, the application that received the end-of-file will close the socket; this will cause its TCP to send a FIN packet one application calls close first, and we say that this end performs the active close. This end’s TCP sends a FIN segment, which means it is finished sending data
45 TCP state transition diagram The steps involved in establishing and releasing connections can be described/modeled using Finite State Machine model. The TCP can be represented as a FSM with 11 states.Each connection starts in a CLOSED state. It leaves that state when it does either a passive open (LISTEN) or an active open (CONNECT). If the other side does the opposite one, a connection is established and the state becomes ESTABLISHED. Connection release can be initiated by either side. When it is complete, the state returns to CLOSED
46 The heavy solid line – client The heavy dashed line – server The light lines - unusual eventsEach transition is labeled by Event causing it/ActionTCP connection management finite state machine. The heavy solid line is the normal path for a client. The heavy dashed line is the normal path for a server. The light lines are unusual events. Each transition is labeled by the event causing it and the action resulting from it, separated by a slash.The event can either be a user-initiated system call (CONNECT, LISTEN, SEND or CLOSE), a segment arrival (SYN, FYN, ACK or RST) or in one case a timeout. The action is the sending of a control segment (SYN, FIN or RST) or nothing, indicated by “-”
47 TCP transmission policy Window management in TCP, starting with the client having a 4096 bytes buffer
48 TCP transmission policy When window size is 0 the sender can’t send segments with two exceptionsUrgent data may be sent (i.e. to allow the user to kill the process running on the remote machine)The sender may send 1 byte segment to make the receiver re-announce the next byte expected and window sizeSenders are not required to send data as soon as they get it from the application layer;i.e. when the first 2KB of data came in from the application, TCP may have decided to buffer it until the next 2KB of data would have arrived, and send at once a 4KB segment (knowing that the receiver can accept 4KB buffer)This leaves space for improvementsReceivers are not required to send acknowledgements as soon as possible
49 TCP performance issues (1) Consider a telnet session to an interactive editor that reacts to every keystroke, we will have the worst case scenario:when a character arrives at the sending TCP entity, TCP creates a 21 bytes segment, which is given to IP to be sent as a 41 bytes datagram;at the receiving side, TCP immediately sends a 40 bytes acknowledgement (20 bytes TCP segment headers and 20 bytes IP headers)Latter, at the receiving side, when the editor (application) has read the character, TCP sends a window update, moving the window 1 byte to the right; this packet is also 40 bytesFinally, when the editor has interpreted the character, it will echo it as a 41 byte characterWe will have 162 bytes of bandwidth are used and four segments are sent for each character typed.
50 TCP optimizations – delayed ACK One solution that many TCP implementation use to optimize this situation is to delay acknowledgements and window updates for 500msThe idea is the hope to acquire some data that will be bundled in the ACK or window update segmenti.e. in the editor case, assuming that the editor sends the echo within 500 ms from the character read, the window update and the actual byte of data will be sent back as a 41 bytes packetThis solution deals with the problem at the receiver end, it doesn’t solve the inefficiency at the sending end
51 TCP optimizations – Nagle’s algorithm Operation:When data come into the sender TCP one byte at a time, just send the first byte as a single TCP segment and buffer all the subsequent ones until the first byte is acknowledgedThen send all the buffered characters in one TCP segment and start buffering again until they are all acknowledgedThe algorithm additionally allows a new segment to be sent if enough data has accumulated to fill half the window or a new maximum segmentIf the user is typing quickly and the network is slow, then a substantial number of characters may go in each segment, greatly reducing the usage of the bandwidthNagle’s algorithm is widely used in TCP implementations; there are some times when it is better to disable it:i.e. when an X-Window is run over internet, mouse movements have to be sent to remote computer; gathering them and send them in bursts, make the cursor move erratically at the other end.
52 TCP performance issues (2) Silly window syndrome (Clark, 1982)Data is passed to the sending TCP entity in large blocksData is read at the receiving side in small chucks (1 byte)
53 TCP optimizations – Clark’s solution Prevent the receiver from sending a window update for 1 byteInstead have the receiver to advertise a decent amount of space available; specifically, the receiver should not send a window update unless it has space to handle the maximum segment size (that has been advertised when the connection was established) or its receiving buffer its half empty, which ever is smallerFurthermore, the sender can help by not sending small segments; instead it should wait until it has accumulated enough space in the window to send a full segment or at least one containing half of the receiver’s buffer size (which can be estimated from the pattern of window updates it has received in the past)
54 Nagle’s algorithm vs. Clark’s solution Clark’s solution to the silly window syndrome and Nagle’s algorithm are complementaryNagle was trying to solve the problem caused by the sending application deliver data to TCP one byte at a timeClark was trying to solve the problem caused by the receiving application reading data from TCP one byte at a timeBoth solutions are valid and can work together. The goal is for the sender not to send small segments and the receiver not to ask for themThe receiving TCP can also improve performance by blocking a READ request from the application until it has a large chunk of data to provide:However, this can increase the response time.But, for non-interactive applications (e.g. file transfer) efficiency may outweigh the response time to individual requests.
55 TCP congestion control TCP deals with congestion by dynamically manipulating the window sizeFirst step in managing the congestion is to detect itA timeout caused by a lost packet can be caused byNoise on the transmission line (not really an issue for modern infrastructure)Packet discard at a congested routerMost transmission timeouts are due congestionAll the Internet TCP algorithms assume that timeouts are due to congestion and monitor timeouts to detect congestion
56 TCP congestion control Two types of problems can occur:Network capacityReceiver capacityWhen the load offered to a network is more than it can handle, congestion builds up(a) A fast network feeding a low capacity receiver.(b) A slow network feeding a high-capacity receiver.
57 TCP congestion control TCP deals with network capacity congestion and receiver capacity congestion separately; the sender maintains two windowsThe window that the receiver has guaranteedThe congestion windowBoth of the windows reflect the number of bytes that the sender may transmit; the number that can be transmitted is the minimum of the two windowsIf the receiver says “send 8K” but the sender knows that more than 4K will congest the network, it sends 4KOn the other hand, if the receiver says “send 8K” and the sender knows that the network can handle 32K, then it sends the full 8KTherefore the effective window is the minimum between what the sender thinks is all right and the receiver thinks is all right
58 Slow start algorithm (Jacobson, 1988) When a connection is established, the sender initializes the congestion window to the size of a the maximum segment in use; it then sends a maximum segmentIf the segment is acknowledged in time, the sender doubles the size of the congestion window (making it twice the size of a segment) and sends two segments, that have to be acknowledged separatelyAs each of those segments is acknowledged in time, the size of the congestion window is increased by one maximum segment size (in effect, each burst successfully acknowledged doubles the congestion window)The congestion window keeps growing until either a timeout occurs or the receiver’s window is reachedThe idea is that if bursts of size, say 1024, 2048, 4096 bytes work fine, but burst of 8192 bytes timeouts, congestion window remains at 4096 to avoid congestion; as long as the congestion window remains at 4096, no larger bursts than that will be sent, no matter how much space the receiver grants
59 Slow start algorithm (Jacobson, 1988) Max segment size is 1024 bytesInitially the threshold was 64KB, but a timeout occurred and the threshold is set to 32KB and the congestion window to 1024 at transmission time 0The internet congestion algorithm uses a third parameter, the threshold, initially 64K, in addition to the receiver and congestion windows.When a timeout occurs, the threshold is set to half of the current congestion window, and the congestion window is reset to one maximum segment size.Each burst successfully acknowledged doubles the congestion window.It grows exponentially until the threshold value is reached.It then grows linearly until the receivers window value is reached.
60 TCP timer management TCP uses multiple timers to do its work The most important is the retransmission timerWhen a segment is sent, a retransmission timer is startedIf the segment is acknowledged before this timer expires, the timer is stoppedIf the timer goes off before the segment is acknowledged, then the segment gets retransmitted (and the timer restarted)The big question is how long this timer interval should be?Keepalive timer is designed to check for connection integrityWhen goes off (because a long time of inactivity), causing one side to check if the other side is still there
61 TCP timer management TCP uses multiple timers to do its work Persistence timer is designed to prevent deadlockReceiver sends a packet with window size 0Latter, it sends another packet with larger window size, letting the sender know that it can send data, but this segment gets lostBoth the receiver and transmitter are waiting for the otherSolution: persistence timer on the sender end, that goes off and produces a probe packet to go to the receiver and make it to advertise again its windowTIMED WAIT state timer used when a connection is closed; it runs for twice the maximum packet lifetime to make sure that when a connection is closed, all packets belonging to this connection have died off.
62 TCP Retransmission Timer Probability density of ACK arrival times in the data link layer.Expected delay is highly predictable, so the timer can be set to go off slightly after the acknowledgement is expected; since ACK are rarely delayed in data link layer, the absence of it means that it has been lost(b) Probability density of ACK arrival times for TCP.Determining the round trip time to the destination is tricky; even if RRT is known, determining the timeout interval is difficultToo small, unnecessary transmissions will occurToo large, performance will be affected
63 TCP Retransmission Timer TCP should use a highly dynamic algorithm that constantly adjusts the timeout interval, based on continuous measurement of network performanceJacobson timer management algorithmFor each connection, TCP maintains a variable RTT that is the best current estimate of the round trip time to destinationWhen a segment is sent, a timer is started, both to see how long an ACK takes and to trigger a retransmissionIf the ACK gets back before the timer expires, TCP measures how long the ACK took, say M. It then updates the RTT according to the formula:RTT = άRTT + (1- ά)M, where ά is the smoothing factor, typically 7/8
64 TCP Retransmission Timer Jacobson timer management algorithmEven a good value for RTT, choosing the retransmission timeout is not a trivial matterNormally TCP uses βRTT, but the trick is to choose β (in initial implementations β was chosen 2, but experience showed that constant value was not flexibleJacobson proposed to make the β proportional with the standard deviation of the acknowledgment arrival timeHis algorithm required to keep track of another variable D (deviation)Whenever an ACK comes in, the difference between the expected and observed value |RTT-M| is computed ; a smoothed value of this is maintained by the formula:D = ά D + (1- ά) |RTT-D|Most TCP implementations use this algorithm to set the time out to:Timeout = RTT + 4 *D4 is chosen arbitrary, but it has the advantage that multiplication by 4 can be done with a shift; it also has the advantage that minimizes the retransmissions because very few packets arrive in more than four standard deviations lateOne problem: what to do when a segment times out and is sent again?Incoming ACK refers to the first or second sent segmentSolution (Karn): don’t update RTT for retransmitted segments
65 Wireless TCPThe principal problem is the congestion control algorithm, because most of the TCP implementations assume that timeouts are caused by congestion and not by lost packetsWireless networks are highly unreliableThe best approach with lost packets is to send them ASAP, slowing down (Jacobson slow start) only makes things worstWhen a packet is lostOn wired networks, the sender should slow downOn wireless networks, the sender should try harderIf the type of network is not known, it is difficult to make the correct decision
66 Wireless TCP The path from sender to receiver is inhomogeneous Making the decision on a timeout is very difficultSolution proposed by Bakne and Badrinath (1995) is to split the TCP connection into two separate TCP connections (indirect TCP)First connection goes from sender to the base stationSecond connection from base station to the mobile receiverBase-station simply copies packets between the connections in both directionsBoth connections are homogenous, timeouts on first connection can slowdown the sender while on the second connection makes the sender (base-station) try harderThe problem with this approach is that it violates the semantics of the TCP. Since each part of the connection is a TCP connection, the base station will acknowledge each TCP segment in the usual way. The problem is that an acknowledgement receipt by a sender, doesn’t mean that the receiver got the packet.
67 Wireless TCP - Balakrishnan et al (1995) Doesn’t break the semantics of the TCP, works by making small modifications of the code at the base station endAddition of an agent that would cache all the TCP segments going out to the TCP mobile host and ACK coming back from it;When the agent sees a TCP segment going out but no ACK coming back, it will issue a retransmission without letting the sender know what is going onIt also generates retransmission when it sees duplicate ACK coming from the mobile host (it means the mobile host missed something)Duplicate ACK are discarded by the agent (to avoid the source interpret it as congestion)One problem is that if the wireless net is very lossy, the sender may timeout and invoke the congestion control algorithm; with indirect TCP the congestion algorithm wouldn’t be started unless there is really congestion in the wired partThere is a fix for to the problem of lost segments originating at the mobile hostWhen the base-station notices a gap in the inbound sequence numbers, it generates a request for a selective repeat of the missing bytes using a TCP option
68 References Andrew S. Tanenbaum – Computer Networks, ISBN: 0-13066102-3 Behrouz A. Forouzan – Data Communications and Networking, ISBN:
69 Quality of service RTTP, RTCP and Transactional TCP Additional SlidesQuality of serviceRTTP, RTCP and Transactional TCP
70 Supplement - Quality of service (1) Connection establishmentDelay – time between the request has been issued and the confirmation being received; the shorter the delay, the better the servicefailuresThroughput – the number of bytes of user data transferred per second; it is measured separately for each directionTransit delay – the time between a message being sent by the transport user the source machine and its being received by the transport user on the destination machineResidual error ratio – the number of lost messages as a fraction of the total sent; it theory it should be 0Protection – provides a way for the user to make the transport layer provide protection against unauthorized third partiesPriority – a way for the transport user to indicate that some of its connections are more important then other ones (in case of congestion)Robustness – probability of the transport layer spontaneously terminating a link
71 Supplement - Quality of service (2) Achieving a good quality of service requires negotiationOption negotiation – QoS parameters are specified by the transport user when a connection is requested; desired and minimum values can be givenMay immediately realize that the requested value is not achievable, returning to the caller an error, without contacting the destinationMay request a lower value from the destination (i.e. it knows it can’t achieve 600Mb/s but it can achieve a lower, still acceptable rate, 150Mb/s); it contacts the destination with the desirable and minimum value; the destination can make a counteroffer …etc.Good quality lines cost moreMaintain the negotiated parameter over the life of the connection
72 Real Time Transport Protocol Described in RFC 1889 and used in multimedia applications (internet radio, internet telephony, music on demand, video on demand, video conferencing, etc..)The position of the RTP in the protocol stack is somewhat strange: it has been decided to be in user space and work normally over UDPWorking model:The multimedia application generates multiple streams (audio, video, text, etc) that are fed into the RTP library;The library multiplexes the streams and encodes them in RTP packets which are fed to a UDP socketThe UDP entity is generating UDP packets that are embedded into IP packets; IP packets are embedded into Ethernet frames and sent over the network (if the network is ETH)It is difficult to say which layer the RTP is in (transport or application)Basic function of RTP is to multiplex several real time data streams into a single stream of UDP packets; the UDP stream can be sent to single destinations (unicast) or to multiple destinations (multicast)
73 Real Time Transport Protocol The position of RTP in the protocol stack.Packet nesting.
74 Real Time Transport Protocol Each packet has a number, one higher than its predecessorUsed by the destination to determine missing packetsIf packet is missing, best action is to try and interpolate itNo flow control, no retransmission, no error controlRTP payload may contain multiple samples coded any way the application wantsAllows for time-stamping (time relative to the start of the stream, so only differences are significant)This is used to play synchronously certain streams
75 RTP headerThe P bit indicates that the packet has been padded to a multiple of 4 bytes; the last padding byte tells how many bytes were addedThe X bit tells that an extension header is present; the format and meaning of the extension header are not defined; the only thing that is defined is the first word of the extension that gives the length; this is an escape mechanism for unforeseen requirementsThe CC field tells how many contributing sources are presentThe M bit is an application specific marker bit; it can be used to mark the start of a video frameThe Payload type field tells which encoding algorithm has been used (MP3, PCM, etc); since every packet carries this field, the encoding can change during transmissionSequence number is just a counter that increments on each RTP packet sent; it is used to detect lost packetsThe timestamp is produced by the stream’s source to note when the first sample in the packet was made; this field can help reduce jitter by decoupling the playback from the packet arrival timeSynchronization source identifier tells which stream the packet belongs to; it is the method used to multiplex and demultiplex multiple data streams over one single stream of UDP packets.Contributing source identifier, if any, is used when mixers are present in the studio; in that case, the mixer is the synchronization source and the streams being mixed are listed here.Synchronization source identifier tells which stream the packet belongs to; it is the method used to multiplex and demultiplex multiple data streams over one single stream of UDP packets.Contributing source identifier, if any, is used when mixers are present in the studio; in that case, the mixer is the synchronization source and the streams being mixed are listed here.The timestamp is produced by the stream’s source to note when the first sample in the packet was made; this field can help reduce jitter by decoupling the playback from the packet arrival timeThe Payload type field tells which encoding algorithm has been used (MP3, PCM, etc); since every packet carries this field, the encoding can change during transmissionThe P bit indicates that the packet has been padded to a multiple of 4 bytes; the last padding byte tells how many bytes were addedThe X bit tells that an extension header is present; the format and meaning of the extension header are not defined; the only thing that is defined is the first word of the extension that gives the length; this is an escape mechanism for unforeseen requirementsThe CC field tells how many contributing sources are presentThe M bit is an application specific marker bit; it can be used to mark the start of a video frameSequence number is just a counter that increments on each RTP packet sent; it is used to detect lost packets
76 Real Time Transport Control Protocol It handles feedback, synchronization for the RTP but is not actually transporting any dataFeedback on delay, jitter, bandwidth, congestion and any other networking parametersHandles inter-stream synchronization whenever different streams use different clocks with different granularitiesProvides a way of naming different sources (i.e. in ASCII text); this information can be displayed at the receiver end to display who is talking at the moment
77 Transactional TCPUDP is attractive as transport for RPC only if the request and the reply fit into a single packetsIf the reply is large, using TCP as transport may be the solution, still some efficiency related problemsTo do RPC over TCP, nine packets are required (best case scenario)Client sends SYN packet to establish a connectionServer sends ACK + its SYNThe client completes the three way handshake with an ACKThe client sends an actual requestThe client sends a FIN packet to indicate it is done sendingThe server acknowledges the request and the FIN packetThe server sends back the reply to the clientThe server sends a FIN packet to indicate it is doneThe client acknowledges the server's FINPerformance improvement to TCP is required, this is done in Transactional TCP, described in RFC 1379 and 1644
78 Transactional TCP(a) RPC using normal TPC (b) RPC using T/TCP.The idea is to modify the standard connection setup to allow data transport