Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSE 524: Lecture 12 Transport Layer (Part 2). 2 Administrative Exam –Still being graded –Will be returned on Wednesday guaranteed.

Similar presentations


Presentation on theme: "1 CSE 524: Lecture 12 Transport Layer (Part 2). 2 Administrative Exam –Still being graded –Will be returned on Wednesday guaranteed."— Presentation transcript:

1 1 CSE 524: Lecture 12 Transport Layer (Part 2)

2 2 Administrative Exam –Still being graded –Will be returned on Wednesday guaranteed

3 3 Transport Layer Last class –Transport layer functions This class –Specific transport layers

4 4 Specific transport layers UDP –unreliable (“best-effort”), –unordered –unicast or multicast delivery TCP –reliable –in-order –unicast SCTP (will not cover in class) –See http://www.ietf.org/rfc/rfc2960.txthttp://www.ietf.org/rfc/rfc2960.txt –reliable –optional ordering –unicast

5 5 TL: UDP and Transport Layer Functions Demux to upper layer –UDP port field Quality of service –none Security –none Delivery semantics –Unordered –Unicast or multicast Flow control –none Congestion control –none Reliable data transfer –none, but data integrity provided by checksum

6 6 TL: UDP: User Datagram Protocol http://www.rfc-editor.org/rfc/rfc768.txt “no frills,” “bare bones” Internet transport protocol “best effort” service, UDP segments may be: –lost –delivered out of order to app connectionless: –no handshaking between UDP sender, receiver –each UDP segment handled independently of others Why is there a UDP? no connection establishment (which can add delay) simple: no connection state at sender, receiver small segment header no congestion control: UDP can blast away as fast as desired

7 7 TL: UDP: more often used for streaming multimedia apps –loss tolerant –rate sensitive other UDP uses (why?): –DNS –SNMP reliable transfer over UDP: add reliability at application layer –application-specific error recovery! –many applications re- implement reliability over UDP to bypass TCP –new transport protocols? source port #dest port # 32 bits Application data (message) UDP segment format length checksum Length, in bytes of UDP segment, including header

8 8 TL: UDP checksum Sender: treat segment contents as sequence of 16-bit integers checksum: addition (1’s complement sum) of segment contents sender puts checksum value into UDP checksum field similar to IP’s header checksum Receiver: compute checksum of received segment check if computed checksum equals checksum field value: –NO - error detected –YES - no error detected. But maybe errors nonethless? More later …. Goal: detect “errors” (e.g., flipped bits) in transmitted segment

9 9 TL: TCP and Transport Layer Functions Demux to upper layer Quality of service Security Delivery semantics Flow control Congestion control Reliable data transfer

10 10 TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581 full duplex data: –bi-directional data flow in same connection –MSS: maximum segment size connection-oriented: –handshaking (exchange of control msgs) init’s sender, receiver state before data exchange –protocol implemented at ends (“fate-sharing”) flow and congestion controlled: –sender will not overwhelm receiver or network point-to-point: –one sender, one receiver reliable, in-order byte steam: –no “message boundaries” pipelined: –TCP congestion and flow control set window size send & receive buffers

11 11 TL: TCP header source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number rcvr window size ptr urgent data checksum F SR PAU head len not used Options (variable length) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) # bytes rcvr willing to accept counting by bytes of data (not segments!) Internet checksum (as in UDP)

12 12 TL: TCP connections TCP sender, receiver establish “connection” before exchanging data segments –initialize TCP variables: Initial sequence #s Buffers, flow control info (e.g. RcvWindow ) Window scaling client: connection initiator server: contacted by client Java API Socket clientSocket = new Socket("hostname","port#”); Socket connectionSocket = welcomeSocket.accept();

13 13 TL: TCP connections Three way handshake: –Step 1: client end system sends TCP SYN control segment to server specifies initial seq # should be random to prevent spoofing ( http://www.rfc- editor.org/rfc/rfc1948.txt )http://www.rfc- editor.org/rfc/rfc1948.txt –Step 2: server end system receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers specifies server-> receiver initial seq. # –Step 3: client receives SYNACK control segment, replies with ACK and potentially data ACKs received SYNACK goes to established state

14 14 TL: TCP Connection Establishment A and B must agree on initial sequence number selection 3-way handshake AB SYN + Seq A SYN+ACK-A + Seq B ACK-B

15 15 TL: TCP Sequence Number Selection Why not simply chose 0? Must avoid overlap with earlier incarnation Client machine seq #0, initiates connection to server with seq #0. –Client sends one byte and machine crashes –Client reboots and initiates connection again –Server thinks new incarnation is the same as old connection

16 16 TL: TCP Sequence Number Selection Why is selecting a random ISN Important? Suppose machine X selects ISN based on predictable sequence Fred has.rhosts to allow login to X from Y Evil Ed attacks –Disables host Y – denial of service attack –Make a bunch of connections to host X –Determine ISN pattern a guess next ISN –Fake pkt1: [, guessed ISN] –Fake pkt2: desired command –Attack popularized by K. Mitnick

17 17 TL: TCP ISN selection and spoofing attacks Ed Y X.rhosts Y 1. Flood continuously 3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED! 2. Spoof TCP SYN from Y With spoofed Y ISN 6. Real acks dropped so Y does not reset connection 4. Send ACK with guess of X’s ISN as if you received TCP SYNACK 5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >>.rhosts spoof acknowledgements Ed 7. Door now open, rlogin to X from Ed directly

18 18 TL: TCP connection setup CLOSED SYN SENT SYN RCVD ESTAB LISTEN active OPEN create TCB Snd SYN create TCB passive OPEN delete TCB CLOSE delete TCB CLOSE snd SYN APP SEND snd SYN ACK rcv SYN Send FIN CLOSE rcv ACK of SYN Snd ACK Rcv SYN, ACK rcv SYN snd ACK

19 19 TL: TCP connections Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs Seq. #’s: –byte stream “number” of first byte in segment’s data ACKs: –seq # of next byte expected from other side –cumulative ACK –duplicate acks sent when out- of-order packet received See web trace Java API connectionSocket.receive(); clientSocket.send(); Host A Host B Seq=42, ACK=79, data = ‘C’ Seq=79, ACK=43, data = ‘C’ Seq=43, ACK=80 User types ‘C’ host ACKs receipt of echoed ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ time simple telnet scenario

20 20 TL: TCP connections Closing a connection: Client-initiated close (reverse process for server-initiated close) Java API: clientSocket.close(); Step 1: client end system sends TCP FIN control segment to server Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. client FIN server ACK FIN close closed timed wait

21 21 TL: TCP connections Step 3: client receives FIN, replies with ACK. –Enters “timed wait” - will respond with ACK to received FINs Step 4: server, receives ACK. Connection closed. Note: with small modification, can handle simultaneous FINs. client FIN server ACK FIN closing closed timed wait closed

22 22 TL: TCP Half-Close SenderReceiver FIN FIN-ACK FIN FIN-ACK Data write Data ack

23 23 TL: TCP Connection Tear-down CLOSING CLOSE WAIT FIN WAIT-1 ESTAB TIME WAIT snd FIN CLOSE send FIN CLOSE rcv ACK of FIN LAST-ACK CLOSED FIN WAIT-2 snd ACK rcv FIN delete TCB Timeout=2msl send FIN CLOSE send ACK rcv FIN snd ACK rcv FIN rcv ACK of FIN snd ACK rcv FIN+ACK rcv ACK

24 24 TL: Time Wait Issues Cannot close connection immediately after receiving FIN –What if a new connection restarts and uses same sequence number? Web servers not clients close connection first –Established  Fin-Waits  Time-Wait  Closed –Why would this be a problem? Time-Wait state lasts for 2 * MSL –MSL is should be 120 seconds (is often 60s) –Servers often have order of magnitude more connections in Time-Wait

25 25 TL: TCP connections TCP client lifecycle TCP server lifecycle

26 26 TL: TCP Demux to upper layer multiplexing/demultiplexing: based on sender, receiver port numbers, IP addresses –source, dest port #s in each segment –recall: well-known port numbers for specific applications –Servers wait on well known ports (/etc/services) gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) source port #dest port # 32 bits application data (message) other header fields TCP/UDP segment format Multiplexing:

27 27 TL: TCP Demux to upper layer host A server B source port: x dest. port: 23 source port:23 dest. port: x port use: simple telnet app Web client host A Web server B Web client host C Source IP: C Dest IP: B source port: x dest. port: 80 Source IP: C Dest IP: B source port: y dest. port: 80 port use: Web server Source IP: A Dest IP: B source port: x dest. port: 80

28 28 TL: TCP Flow control TCP is a sliding window protocol –For window size n, can send up to n bytes without receiving an acknowledgement –When the data is acknowledged then the window slides forward Each packet advertises a window size –Indicates number of bytes the receiver has space for Original TCP always sent entire window –Congestion control now limits this

29 29 TL: TCP Flow control receiver: explicitly informs sender of (dynamically changing) amount of free buffer space –RcvWindow field in TCP segment sender: keeps the amount of transmitted, unACKed data less than most recently received RcvWindow sender won’t overrun receiver’s buffers by transmitting too much, too fast flow control receiver buffering RcvBuffer = size or TCP Receive Buffer RcvWindow = amount of spare room in Buffer

30 30 TL: TCP Flow control What happens if window is 0? –Receiver updates window when application reads data –What if this update is lost? Deadlock TCP Persist timer –Sender periodically sends window probe packets –Receiver responds with ACK and up-to-date window advertisement

31 31 TL: TCP flow control enhancements Problem: (Clark, 1982) –If receiver advertises small increases in the receive window then the sender may waste time sending lots of small packets What happens if window is small? –Small packet problem known as “Silly window syndrome” Receiver advertises one byte window Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)

32 32 TL: TCP flow control enhancements Solutions to silly window syndrome Clark (1982) –receiver avoidance –prevent receiver from advertising small windows –increase advertised receiver window by min(MSS, RecvBuffer/2) Nagle’s algorithm (1984) –sender avoidance –prevent sender from unnecessarily sending small packets –http://www.rfc-editor.org/rfc/rfc896.txthttp://www.rfc-editor.org/rfc/rfc896.txt “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged” Allow only one outstanding small (not full sized) segment that has not yet been acknowledged Works for idle connections (no deadlock) Works for telnet (send one-byte packets immediately) Works for bulk data transfer (delay sending)

33 33 TL: TCP reliable data transfer Segment integrity Acknowledgement generation Retransmission

34 34 TL: TCP RDT segment integrity Checksum included in header Is it sufficient to just checksum the packet contents? No, need to ensure correct source/destination –Pseudoheader – portion of IP hdr that are critical –Checksum covers Pseudoheader, transport hdr, and packet body –Layer violation, redundant with parts of IP checksum

35 35 TL: TCP RDT acks and timeouts TCP’s reliable data transfer approach –Cumulative acknowledgements Receiver sends back the byte number it expects to receive next Out of order packets generate duplicate acknowledgements –Receive 1, Ack 2 –Receive 4, Ack 2 –Receive 3, Ack 2 –Receive 2, Ack 5 –Retransmissions Sender sends segment and sets a timer Waits for an acknowledgement indicating segment was received –Send 1 –Wait for Ack 2 –No Ack 2 and timer expires –Send 1 again

36 36 TL: TCP RDT acks and timeouts simplified sender, assuming wait for event wait for event event: data received from application above event: timer timeout for segment with seq # y event: ACK received, with ACK # y create, send segment retransmit segment ACK processing one way data transfer no flow, congestion control

37 37 TL: TCP RDT acks and timeouts 00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */ Simplified TCP sender

38 38 TL: TCP delayed acknowledgements Problem: –In request/response programs, you send separate ACK and Data packets for each transaction Delay ACK in order to send ACK back along with data Solution: –Don’t ACK data immediately Wait 200ms (must be less than 500ms – why?) Must ACK every other packet Must not delay duplicate ACKs –Without delayed ACK: 40 byte ack + data packet –With delayed ACK: data packet includes ACK –See web trace example –Extensions for asymmetric links See later part of lecture

39 39 TL: TCP ACK generation [RFC 1122, RFC 2581] Event in-order segment arrival, no gaps, everything else already ACKed in-order segment arrival, no gaps, one delayed ACK pending out-of-order segment arrival higher-than-expect seq. # gap detected arrival of segment that partially or completely fills gap TCP Receiver action delayed ACK. Wait up to 200ms for next segment. If no next segment, send ACK immediately send single cumulative ACK send duplicate ACK, indicating seq. # of next expected byte immediate ACK if segment starts at lower end of gap

40 40 TL: TCP retransmission Wait at least one RTT before retransmitting packet Importance of accurate RTT estimators: –Estimator too low  unneeded retransmissions –Estimator too high  poor throughput, slow reaction to segment loss RTT estimator must adapt to change in RTT –But not too fast, or too slow! Backing off the retransmission timeout –Exponential backoff –Double retransmission timer interval after every loss until successful retransmission

41 41 TL: TCP retransmission scenarios Host A Seq=92, 8 bytes data ACK=100 loss timeout time lost ACK scenario Host B X Seq=92, 8 bytes data ACK=100 Host A Seq=100, 20 bytes data ACK=100 Seq=92 timeout time premature timeout, cumulative ACKs Host B Seq=92, 8 bytes data ACK=120 Seq=92, 8 bytes data Seq=100 timeout ACK=120

42 42 TL: Initial Round-trip Estimator Round trip times exponentially averaged: –Recommended value for x: 0.1-0.2 0.125 for most TCP’s –Influence of given sample decreases exponentially fast Retransmit timer set to  RTT, where  = 2 –Every time timer expires, RTO exponentially backed-off –Like Ethernet Not good at preventing spurious timeouts EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

43 43 TL: Jacobson’s Retransmission Timeout Key observation: –At high loads round trip variance is high –Need larger safety margin with larger variations in RTT Solution: –Base RTO value on RTT and standard deviation (RRTT)

44 44 TL: Jacobson’s Retransmission Timeout EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin Timeout = EstimatedRTT + 4*Deviation Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|

45 45 TL: Retransmission Ambiguity AB ACK Sample RTT Original transmission retransmission RTO AB Original transmission retransmission Sample RTT ACK RTO X

46 46 TL: Karn’s algorithm Accounts for retransmission ambiguity If a segment has been retransmitted: –Don’t count RTT sample on ACKs for this segment –Keep backed off time-out for next packet –Reuse RTT estimate only after one successful transmission

47 47 TL: Timer Granularity Many TCP implementations set RTO in multiples of 200,500,1000ms Why? –Avoid spurious timeouts – RTTs can vary quickly due to cross traffic –Make timers interrupts efficient


Download ppt "1 CSE 524: Lecture 12 Transport Layer (Part 2). 2 Administrative Exam –Still being graded –Will be returned on Wednesday guaranteed."

Similar presentations


Ads by Google