Download presentation
Presentation is loading. Please wait.
Published byMicah Goldwyn Modified over 10 years ago
1
How to use it Press Space to go alonge slide animation
Don’t hurry to press Space next time. Wait for end of animation If you want to go back, use key «PgUp». Proxy ARP Incoming connection Request Queue Interactive data flow ????????? Push flag Urgent mode Заголовок IP - подробно, с разбивкой по битам и указанием его максимальной длины в байтах Repacketization p 320 ICMP Errors p 317 exponential backoff p 299 Path MTU discovery forUDP 118 TCP flags Version 08 June 1999 Come later - presentation is under construction now
2
Encapsulation data into Ethernet packet
User data Application header User data TCP header Application data TCP segment IP header Application data TCP header IP datagram Ethernet header Application data TCP header IP Ethernet trailer 46 to 1500 bytes Ethernet frame
3
IEEE 802.2/802.3 Encapsulation (RFC 1042)
802.3 MAC 802.2 LLC 802.2 SNAP Destination address 6 Source address 6 length 2 DSAP 0xAA 1 SSAP 0xAA 1 cntl 03 1 Org code 00 3 type 2 DATA CRC 4 LENGTH contain length packet from next byte till CRC (CRC isn’t included) Type 0800 2 IP Datagram DSAP (Destination Service Access Point) and SSAP (Source Service Access Point) both are set to 0xAA. or CNTL (Control field) is set to 3. Type 0806 2 ARP request/reply 28 PAD 18 ORG CODE allways is 0 in all bytes IHL - IP header length TYPE field identifies data that follows. For example, type 0x0800 (hex) identifies IP datagram follows or Type 8035 2 RARP request/reply 28 PAD 18
4
Ethernet Encapsulation (RFC 894)
bytes Destination address 6 Source address 6 type 2 DATA CRC 4 Type 0800 2 IP Datagram or Type 0806 2 ARP request/reply 28 PAD 18 IHL - IP header length or Type 8035 2 RARP request/reply 28 PAD 18
5
16-bit total packet length
IP packet structure Version.Current protocol version is 4. 15 16 31 4-bit ver 4-bit IHL TOS 16-bit total packet length IHL - IP header length. IHL is quantity of 32-bit words in IP header. This field has 4-bit length => maximum header length is 60 bytes 16-bit identification flags 3-bit 13-bit Fr offset DATA Header checksum TTL Protocol Source address Destination address Options (+padding) TOS - type of service contain of a 3-bit precedence bits (ignored), 4 TOS bits, and unused bit which must be 0. 4 TOS bits: minimize delay maxm,ize throughput maximize reliability minimize monetary cost Only 1 of these 4 bits can be turned on TPL - total packet length is total IP packet’s length in bytes. Then maximum length of IP packet is bytes. IHL - IP header length IDENTIFICATIN - this field is used when IP need fragment fatagrams. Identification identifies each datagram and is incremented each time a datagram is sent We’ll see meaning of this field when we talk about fragmentation FLAGS and FRAGMENT OFFEST we’’ see also when we talk about fragmentation Continue...
6
16-bit total packet length
IP packet structure 15 16 31 TTL - time-to-live sets an upper limit of routers through which a datagram can pass. This field is decremented each time when datagram pass the router. When this field became 0 a datagram is dropped by router and ICMP message is sent to datagram’s sender. 16-bit total packet length 16-bit identification TOS 4-bit ver 4-bit IHL 13-bit Fr offset flags 3-bit TTL Protocol Header checksum Source address PROTOCOL - this field identifies DATA portion of datagram (which protocol is encapsulated into IP datagram). Destination address Options (+padding) HEADER CHECKSUM is calculetaed for IP header only. DATA SOURCE and DESTINATION addresses is sender’s and receiver’s IP addresses. IHL - IP header length OPTIONS is a variable-length field which contain som eoptions. We’ll discuss some of them later. The option field always end on a 32-bit boundary. PAD bytes (value is 0) are added if neccessary. DATA is data.
7
Special case IP addresses
IP address classes Class Range A to B to C to D to Multicast E to
8
ARP and RARP ARP For example, we are working on the Ethernet network. Ethernet driver and adapter are using MAC-address. TCP/IP is using IP addresses. When host want to send data to another host it known onlt receiver’s IP address and put this information to TCP/IP stack. Then TCP/IP stack need mechanism to have correspondence between MAC and IP addresses. IP have two algorithms for solve it. 32-bit IP address 48-bit Ethernet address ARP RARP RARP If system don’t have hard or floppy drive and should boot from network it can’t take IP address from local resourses. Such system have only MAC-address. RARP is algorithm which allow system to obtain IP address from network
9
ARP Host ARP IP Do I know hardware address? Ethernet driver Host Host
Send IP datagram to IP address Host ARP IP Resolve IP address to hardware address Do I know hardware address? Yes Yes No Ethernet driver ARP request Host Host Ethernet driver Ethernet driver ARP Is somebody looking for my address? Is somebody looking for my address? ARP No Yes Ignore request Send ARP reply
10
RARP Diskless workstation RARP server Boot
Read own hardware network address I have a IP address!!! Send RARP request Send RARP reply Somebody wants to have IP address! Give to somebody IP address from my table RARP server
11
Sender Ethernet address
ARP packet Dest address 6 Source address 6 type 2 Hard type 2 Prot type 2 Hard size 1 Prot size 1 op 2 Sender Ethernet address 6 Sender IP address 4 Target Ethernet address 6 Target IP address 4 type 0x806 hardware type Specified hardware type. 1 for an Ethernet protocol type 0x800 for IP hardware size Size of hardware address. 6 for an Ethernet protocol size Size of protocol address. 4 for IP op Type of operation (request or reply). ARP request - 1, ARP reply - 2, RARP request - 3, RARP reply - 4. Dest address Broadcast
12
ICMP - Internet Control Message Protocol RFC 792 packet structure
IP header 20 ICMP message 8-bit type 8-bit code 16-bit checksum (for entire ICMP message) The same for all type of messages Contents depend on type and code
13
ICMP address mask request and reply
Type 17-request 18 - reply Code - 0 16-bit checksum (for entire ICMP message) identifier (anything) sequence number (anything) 12 bytes Subnet mask ICMP timestamp request and reply Type 13-request 14 - reply Code - 0 16-bit checksum (for entire ICMP message) identifier (anything) sequence number (anything) 32-bit originate timestamp 20 bytes 32-bit receive timestamp 32-bit transmit timestamp
14
ICMP port unreachable error
IP datagram ICMP message Data portion of ICMP message Ethernet header 14 IP header 20 ICMP header 8 IP header of datagram that generated error 20 UDP header 8 Must include IP header of the datagram that generated the error At least 8 byte that followed this IP header. In this example it is UDP header General format ICMP unreachable message type 3 code 0-15 16-bit checksum (for entire ICMP message) 8 bytes Unused (must be 0) IP header uncluding options + first 8 bytes of original IP datagram data
15
ICMP echo request and echo reply (PING)
Client Server I want to know is server alive Server is alive I received “ping” to my address Answer to client Send echo request Send echo reply Packets: type 0 - reply 8 - request code 0 16-bit checksum (for entire ICMP message) 8 bytes identifier sequence number Optional data identifier - process ID of the sending process sequence number - starts at 0 and incremented every time a new echo request is sent Server must reply identifier and sequence number fields. Historically ping has operated in mode where it sends an echo request once a second.
16
IP record option (-r option)
Send echo reply Send echo request with -r option Client Router 1 Router 2 Server Router 3 Packet IP option: Routers put into RR packet IP addresses of their outgoing interfaces code 1 len 1 ptr 1 IP addr R1 4 IP addr R3 4 IP addr R2 4 IP addr of server 4 IP addr R2 4 IP addr R1 4 Incoming interface 4 Ptr: = 24 28 4 20 8 12 16 Code 1-byte field specifying the type of IP option. For RR option its value is 7 Len total number of bytes of the RR option. Ping always provides a 38-byte option, to record up to 9 IP addresses - maximum There is the limited room in the IP header for the list of IP addresses, because entire IP header is limited to 15*32-bit words (60 bytes). There are only up to 40 bytes for option field in IP header
17
Four types of IP broadcast
BROADCASTING Four types of IP broadcast Name Address Description Limited limited broadcast never forwarded by a router. Net-directred netid routers forward this kind of broadcast. These broadcast asign for netid IP network Subnet-directred host ID all is 1 bit broadcast for specific subnet. For example, knowledge of is broadcast for subnet x mask is required with subnet mask All-subnet-directred knowledge of If network is subneted this is all-subnet-directed mask is required broadcast. If network isn’t subneted this is net-directed subnet ID all 1, broadcast host ID all 1
18
Here is format of a class D IP address
MULTICASTING !Note! On an Ethernet multicast address is 01:00:00:00:00:00 Addressing Do you remember? Class D to Multicast Here is format of a class D IP address First four bit for class D: = = 239 28 bit multicast group ID 1 IP address The set of host listening to a particular IP multicast address is called a host group. A host group can span multiple networks. Membership in a host group is dynamic - hosts may join and leave host group at will. There is no restriction on the number of hosts in a host group, and a host not have to belong to a group to send a message to that group.
19
MULTICASTING Converting Multicast Group addresses to Ethernet Addresses The Ethernet addresses corresponding to IP multicasting are in the range 01:00:5e:00:00:00 through 01:00:5e:7f:ff:ff We have 23 bits in the Etherntet address to correspond to the IP multicast group ID. The mapping places the low order 23 bits of the multicast group ID into these 23 bits of the Ethernet address. These 5 bits in the multicast froup ID are not used to form the Ethernet address 48-bit Ethernet address Class D IP address 1 5e Low-order 23 bits of multicast group ID is copied to Ethernet address Since the upper 5 bits of the multicast group ID are ignored in this mapping, it is not uniwue. 32 different multicast group IDs map to same Ethernet address (1111 = 31). The device driver or the IP software must perform filtering, since the interface card may receive multicast frames in which the host is really not interested.
20
IGMP reports and queries
(Internet Group Management Protocol) Multicast groups participant: No Process 3 1 Group Address Group Group Wait for random timer Example, 2 seconds Wait for 0-10 seconds Join to group 1 Host IP IGMP report Dest IP Group IP Another GMP report Dest IP Group IP IGMP report Dest IP Group IP IGMP report Dest IP Group IP IGMP report Dest IP Group IP Another IGMP report Dest IP Group IP IGMP query Dest IP Group IP - 0 Another IGMP report Dest IP Group IP IGMP report Dest IP Group IP Interface 1 Don’t report group 2 next time IP Group 1 alive Group 2 alive IP Wait for 0-10 seconds Wait for 0-10 seconds Wait for random timer Example, 3 seconds Join to group 1 Group 1 reported Leave group 2 Report group 2 only Join to group 2 Host Multicast groups on interface 1: 1 2 Process 1 Process 2 Timer! Send IGMP query Multicast groups participant: No 1 2 Router
21
32-bit group address (calss D IP address)
IGMP packet IP datagram IP header 20 IGMP message 8 IGMP message 4 8 16 31 IGMP version (1) IGMP type (1-2) unused 16-bit checksum 8 bytes 32-bit group address (calss D IP address) Version 1 Type 1 - multicast router query 2 - response sent by a host Group address class D IP address. For query address is set to 0
22
UDP
23
UDP packet Source port Destination port UDP length UDP checksum
16 31 Source port Destination port UDP length UDP checksum DATA (if any)
24
TFTP Trivial File Transfer Protocol
Packet types IP datagram UDP datagram TFTP message IP header 20 UDP header 8 Opcode 1=RRQ 2=WRQ 2 filename N 1 mode N 1 Requestes opcode 3=data 2 Block number 2 data 0-512 Data packet opcode 4=ACK 2 Block number 2 Data ACK packet Mode netascii octet opcode 5=error 2 Error number 2 Error message N 1 Error packet
25
TFTP operations File transfer opcode 3 blcok number 1 bytes 512 Dest UDP port - appl Source UDP port - new port number, was appointed for this file transfer by TFTP server Those ports numbers will be used during file transfer. File trnsfer opcode 3 blcok number 2 bytes 356 (last block of “File”) Read request for “File” opcode 1 Dest UDP port 69 Source UDP port - appl ACK opcode 4 block number 1 ACK opcode 4 block number 2 Receiving block 2. Data size < 512 byte => last block of file Receiving block 1 Client received block 1 File can be read by client? Need file “File” from server YES Process Client Server In case of write file the client sends the WRQ. If all is OK, server responds with ACK and block number 0. And so on. Error messages. Server responds with this type of packet if a read request or write request can’t be processed. Also read or write error during file transmission can cause this message to be sent, and transmission is then terminated.
26
BOOTP: Bootstrap Protocol
BOOTP Packet Format IP datagram UDP datagram IP header 20 UDP header 8 BOOTP request/reply 300
27
BOOTP datagram opcode hardware type hardware address length hopcount
7 8 15 16 23 24 31 opcode hardware type hardware address length hopcount Opcode request, reply Transaction ID H type - 1 for Ethernet H addr length - 6 for Ethernet number of seconds unused Hop count - set to 0 by client Trans ID - set by client and returned by the server client IP address Number of seconds - set by client your IP address Client IP - set by client. If client don’t have an address => 0 server IP address 300 bytes Your IP - filled by the server with client’s IP address gateway IP address Server IP - filled by the server Gateway IP - filled by a proxy server. If is. client hardware address (16 bytes) Client H address - must be set by client server hostname (64bytes) Server hostname - null terminating string that is optionally filled in by the server boot filename (128 bytes) Boot filename -fully qualified, null terminated pathnema of a file to bootstrap from vendor-specific information (64 bytes)
28
BOOTP Port numbers Vendor-Specific information Examples Server 67
Client 68 Vendor-Specific information 1 255 1 End of the items. Any bytes after this should be set to 255 Pad Examples 1 4 1 subnet mask 4 Subnet mask 1 N 1 IP address of preferred gateway 4 many fields ... IP address of preferred gateway 4 Gateway If information in vendor-specific filed is provided, the first 4 bytes of this area are set to th IP address This is called magic cookie. tag length
29
BOOTP operations NOBODY ANSWER Server’s reply Source IP - 1.1.1.1
Your IP Server IP Gateway IP Boot file name - BFILE ARP request to see if anyone else on network has same adress Target IP Source IP Client sends second ARP request 0.5 second later, and third ARP request 0.5 second after it. Third ARP request Source IP address is (client’s address) Client’s request Dest UDP port 67 Source IP Dest IP ARP reply Sender Target IP Target harware address - server’s Client’s request Source IP Dest IP ARP request “who is server” Sender IP Target IP TFTP Clients read boot file BFILE from the server Server’s reply Source IP Your IP Server IP Gateway IP Boot file name - BFILE Client’s request Source IP Dest IP Server’s reply Source IP Your IP Server IP Gateway IP Boot file name - BFILE I have IP, I have loodable image. I can start! My IP address unique! Receiving information Is my IP address unique? BOOTP process UDP port 68 NOBODY ANSWER BOOTP server UDP port 67 Boot process Client. Port 68. Server. Port 67. IP For client
30
TCP
31
Acknowledgment number
TCP packet 16 31 Source port Destination port Sequence number Acknowledgment number Header length (4) Reserved (6) flags (6) Window Header checksum Urgent pointer Options (+padding) DATA The MSS option is using only in SYN packets
32
TCP sequence and aknowledgement
Receiving SEQ 10 and 10 bytes Receiving SEQ 30 DATA 20 ACK 20 Receiving SEQ 20 DATA 10 ACK 50 ACK = 10 (SEQ) + 10 bytes my ACK = Server received my data, his ACK = 20 my curr SEQ = prev send plus data = my ACK = Client received my data, his ACK = 50 my curr SEQ = prev send plus data = Send 20 bytes SEQ 50 ACK 30 Send 10 bytes SEQ 20 ACK 50 Send 20 bytes SEQ 30 ACK 20 Send 10 bytes SEQ 10 ACK No Send my own data with my own SEQ and ACK = 20 Client Server And so on….
33
TCP connection establishment
Receiving packet. Send packet with S (SYN) flag. (SYN segement). Packet contain the port number of the server that the client want to connect Receiving server’s respond SEQ 348 ACK 146 Flags SA ACK 349 Flags A SEQ 145 ACK - Flags S Respond with own SYN segment containing own SN and ACK for client’s SYN plus one (SYN comsumes one sequence number) ACK = = 146 Server respond contain correct ACK The connection establishment completed Acknowledge server’s SYN with ACK = server’s SN + 1 = = 349 Client ISN = 145 ISN = 348 Server Active open Passive open ISN - initial sequence number Described three segments complete the connection establishment. This is often called the three-way handshake.
34
TCP connection termination
Receiving FIN packet. Receiving FIN packet. User type “quite”, for example SEQ 426 ACK 659 Flags FA ACK 659 Flags A SEQ 658 ACK 426 Flags FA ACK 427 Flags A Respond with correspondent ACK Respond with correspondent ACK Next ACK should be, for example, 426 and my own SN must be 658 I should close second direction The connection closed Now is «half-close». It can be some data is sending by server to client, with corresponding ACKs. Then server close another direction of connection Send FIN - packety with FIN flag Client Server Active close Passive close TCP connection is full duplex, and each direction must be shut down independenly
35
TCP states for connection establishment and termination
active open passive open Client Server SYN J SYN_SENT SYN_RCVD SYN K, ack J+1 ESTABLISHED ack K+1 ESTABLISHED active close passive close FIN M FIN_WAIT_1 CLOSE_WAIT ack M+1 FIN_WAIT_2 FIN N LAST_ACK TIME_WAIT ack N+1 CLOSED Client stays in this state for twice the MSL
36
Server doesn’t have process with port 10000
2 MSL state All received datagram is discarded There is impossible to open another connection for this socket pairs (IP tuple) Quiet Time If a host in the 2MSL wait crashes, reboots within MSL seconds and immediatly establishes new connections isung the same local and foreign IP addresses and port number. To protect this scenario RFC 793 states that TCP should not create any connectionfor MSL seconds after rebooting. This is called the quiet time. Reset Segments Reset segment - “reset” bit in TCP header is set to 1. Any queued data is thrown away and the reset is sent immediately. The receiver of the RST can tell that the other end did an abort instead of a normal close. Example We trying to connect to server with port number that’s not in use on the destionation. UDP sends “port unreachable” message in this case. TCP sends reset segment. SEQ 0 ACK 401 Flags RA SEQ 400 Flags S port 10000 Server Client Server doesn’t have process with port 10000 FIN - orderly release. RST - abortive release.
37
Half-Open But sometimes something can crash. All is fine !
Packet Packet Packet Packet Packet But sometimes something can crash. All is fine ! Alive computer don’t know that peer is died. Peer havn’t sent FIN or RES segments. Connection is Half-Open
38
Result - one connection, not two.
Simultaneous Open Usual connection open active open passive open SYN J SYN_SENT SYN_RCVD SYN K, ack J+1 ESTABLISHED ack K+1 ESTABLISHED Simultaneous Open active open SYN_SENT SYN J SYN K SYN_RCVD SYN J, ack K+1 SYN K, ack J+1 ESTABLISHED Result - one connection, not two.
39
Usual connection close
Simultaneous Close Usual connection close active close passive close FIN M FIN_WAIT_1 CLOSE_WAIT ack M+1 FIN_WAIT_2 FIN N LAST_ACK TIME_WAIT ack N+1 CLOSED Simultaneous Close active close FIN J FIN K FIN_WAIT_1 CLOSING ack K+1 ack J+1 TIME_WAIT
40
TCP options (RFC 792 and 1323) (examples) 1 byte 1 byte 1 byte 1 byte
kind=0 1 byte End of option list Those options don’t have length field. The other do. length is th total length, uncluding the kind and len bytes. kind=1 1 byte No operations kind=2 1 byte len=4 1 byte MSS 2 byte Maximum segment size IHL - IP header length kind=3 1 byte len=3 1 byte shift count 1 byte Window scale factor Timestamp kind=8 1 byte len=10 1 byte timestamp value 4 byte timestamp echo reply 4 byte
41
Delayed Acknowledgment (delayed ACK)
For example, delayed ACK here is 200 ms. See to client. Client Server PSH 2:6 (4) ack 11 START KERNEL long time... is waiting And now... ack 6 Client don’t send ACK immediatly. It delay ACK, hoping to have data to send them in the same direction as the ACK. It can wait till next “delay ACK” boundary. Another instant PSH 6:12 (4) ack 11 TIME 200 ms intervals Here delayed ACK flag is turned off is waiting PSH 11:15 (4) ack 12 piggyback TCP has decided to sent data packet. IHL - IP header length
42
Nagle algoritm * TCP has data for send entire packet. And TCP does it.
Client Server APPLICATION PSH 2:3 (1) ack 2 TCP has data for send entire packet. And TCP does it. TCP doesn’t send packet. We are waiting for first packet’s ACK. TCP doesn’t send packet. We are waiting for first packet’s ACK. TCP has received packet. Now it can send data from buffer. ack 3 Send packet PSH 3:5 (2) ack 2 mss (20 bytes) 20 bytes PSH 5:25 (20) ack 2 ack 5 TCP buffer 1 byte 1 byte 1 byte ack 25 bla.., bla... bla… bla… tume has passed PSH 8:10 (2) ack 55 IHL - IP header length PSH 55:56 (1) ack 10 ack 56 ACK is receiving, I have data, preparing and send packet PSH 10:12 (2) ack 56 * Befor packet was pushed into physical media another packet from server had been received PSH 56:58 (2) ack 10 Now I have data for sending again. And I have “free” ACK from server (packet *) PSH 56:58 (2) ack 12
43
TCP timers Retransmission timer. This timer is used when expecting an acknowledfment from other end. Persist timer keeps window size information flowing even if the other end closes its receive window. Keepalive timer detect when the other end on an otherwise idle connection crashes or reboots. 2MSL timer measures the time a connection has been in the TIME_WAIT state. IHL - IP header length
44
Round-Trip Time Measured RTT (M)
PSH 2:3 (1) ack 2 Measured RTT (M) ack 3 Send bytes Receive ACK for that bytes There are some formules which are used for calculate retransmissiom timeout value (RTO). Err = M - A A A + gErr D D + h(|Err| - D) RTO = A + 4D A - smoothed RTT (an estimator of average) D - smoothed mean deviation g (1/8) h IHL - IP header length Karn’s algoritm. Algoritm specify that when retransmission occurs, we cannot update the RTT estimator when the acknowledgement for the retransmitted data finally arrives.
45
RTT example. Measurement.
Most implementation measure only one RTT value per connection at any time. If the timer for a given connection is already in use when a data segment is transmitted, that segment is not timed. start timer 1:257 (256) ack 1 1 RTT № sec 2 ack 257 stop timer 257:513 (256) ack 1 3 start timer 513:769 (256) ack 1 4 RTT № sec 5 ack 513 8 ack 769 stop timer 769:1025 (256) ack 1 6 start timer IHL - IP header length 1025:1281 (256) ack 1 7 10 ack 1025 1281:1537 (256) ack 1 9 12 ack 1281 RTT № sec stop timer . . . 1537:1793 (256) ack 1 11
46
RTT example. Measurement.
1:257 (256) ack 1 1 2 ack 257 257:513 (256) ack 1 3 513:769 (256) ack 1 4 5 ack 513 8 ack 769 769:1025 (256) ack 1 6 1025:1281 (256) ack 1 7 1281:1537 (256) ack 1 9 10 ack 1025 12 ack 1281 1537:1793 (256) ack 1 11 . . . RTT № sec RTT № sec RTT № sec The timing is done by incrementing a counter every 500-ms TCP timer routine is invoked. Figure shows the relationship in our example between actual RTT that we can determin by network analyzator and the counted clock ticks. IHL - IP header length 0.03 0.53 1.03 1.53 2.03 2.53 3.03 RTT №2. 1 tick RTT №1. 3 ticks RTT №3. 2 ticks start timer stop timer start timer stop timer start timer stop timer
47
RTT example. Calculation.
1:257 (256) ack 1 1 2 ack 257 257:513 (256) ack 1 3 513:769 (256) ack 1 4 5 ack 513 8 ack 769 769:1025 (256) ack 1 6 1025:1281 (256) ack 1 7 1281:1537 (256) ack 1 9 10 ack 1025 12 ack 1281 1537:1793 (256) ack 1 11 . . . RTT № sec (3 RTT № sec RTT № sec Err = M - A A A + gErr D D + h(|Err| - D) RTO = A + 4D RTT №1 = 3 ticks RTT №2 = 1 ticks RTT №3 = 2 ticks A is initialized to 0 D is initialized to 3 Initial RTO = A + 2D = 0 + 2*3 = 6 seconds (Factor 2 is used only for initial calculation) IHL - IP header length When the ACK for the second data segment arrives (segment 5) measured RTT is 1 and update is Err = M - A = = -1.5 A = A + g*Err = *1.5 = D = D + H(|Err| - D) = *( ) = RTO = A + 4D = *1.125 = But most implementation use RTO as a multiple of 500 ms. In our instance RTO will be 6 seconds. When the ACK for the first data segment arrives (segment 2) measured RTT is 3 and our estimators initialized as A = M = = 2 D = A/2 = 1 RTO = A+4D = 2+ 4*1 = 6 seconds
48
Congestion example. There is normal data flow
6401:6657 (256) ack 1 6657:6913 (256) ack 1 to appl ack 6657 to appl 6913:7169 (256) ack 1 ack 6913 7169:7425 (256) ack 1 Congestion. For example, router lost packet Host knows that prevous packet is missed. Then host send ACK for prevous received packet and save receiving packet. ack 6913 (save 256) 7425:7681 (256) ack 1 7681:7937 (256) ack 1 First duplicate ACK ack 6913 (save 256) 7937:8193 (256) ack 1 ack 6913 (save 256) Second duplicate ACK There is third duplicate ACKs 3rd ACK 6913:7169 (256) ack 1 retransmission ack 6913 (save 256) all saved to appl 8193 :8449 (256) ack 1 IHL - IP header length ack 8193 to appl Received missed packet. Now this host has all data bytes ack 8449 TCP count the number of duplicate ACKs received, and when the third one is received assume that a segment has been lost. TCP retransmit only one one segment, starting with that sequence number. We discuss fast retransmit algoritm later.
49
Slow start. And so on cwnd = 1
Slow start works with congestion window - CWND. CWND is initialized to 1 (one) segment and is increased by one segment each time an ACK is received. 1:513 (512) ack 1 ack 513 cwnd = 2 513:1025 (512) ack 1 1025:1537 (512) ack 1 ack 1025 At some point the capacity of the network can be reached and some packets can be discarded. This situation tells to the sender that its CWND is too large. We’’ ll see later mechanism of CWND adjusting. cwnd = 3 1537:2049 (512) ack 1 2049:2561 (512) ack 1 ack 1537 Sender sends only two segments because ACK for segment 1025:1537 hasn’t received. Result: We have CWND = 3 and 3 sended (without ACK) segments. cwnd = 4 IHL - IP header length 2561:3073 (512) ack 1 3073:3585 (512) ack 1 The sender can transmit up to the minimum of the congestion window and advertized windiw. CWND is flow control imposed by sender. And so on CWND is maintained in bytes
50
Congestion avoidance algoritm.
Congestion avoidance and slow start are different. But in practice congestion avoidance and slow start are implemented together. When congestion occurs TCP slows down the transmission rate of packets into the network and then invoke slow start to get things going again. Congestion avoidance and slow start require that two variables be maintained for each connection: CWND A slow start treshold size, ssthresh There are two indications of packet loss: IHL - IP header length a timeout occure the receipt of duplicate ACKs
51
Congestion avoidance algoritm.
Combined algoritm’s work. No Yes Initialization: CWND = 1 segment SSTHRESH = bytes Is congestion indicated by timeout? TCP’s doing SLOW START Slow start has CWND start at one segment and be incremented by one segmentevery an ACK is received. (Do you remember slide before?). Yes No Normal data flow, CWND is growing CWND = 1 segment Slow start continues until we are halfway to where congestion occured (since we recorded half of the window size that got us into trouble), and then congestion avoidance takes over. Congestion occur! Retransmission , bla-bla-bla.. At least: ACK is received IHL - IP header length TCP increase CWND, but the way it increases depends on whether we TCP performs slow start or congestion avoidance CWND =< SSTHRESH? CONGESTION AVOIDANCE Congestion avoidance dictates that CWND be incremented by 1/CWND each time an ACK is received. So we want to increase CWND by at most one segment each RTT, whereas slow start will increment CWND by the number of ACKs received in a RTT SSTRESH = CWS/2 CWS - current window size
52
Congestion avoidance algoritm. Illustration.
Starting point: We assumed that congestion has just occured when CWND had a value of 32 segments. Congestion was indicated by timeout 2 1 4 6 8 10 12 14 16 18 20 3 5 7 SSTRESH = 16 CWND round-trip times SSTRESH = 32 / 2 = 16 CWND = 1 1 segment is send at time 0 At time 1 ACK is returned and CWND is incremented to 2 segments At time 2 two ACK is returned and CWND is incremented to 4 segments (CWND was 2 and two ACK received) IHL - IP header length congestion moment And so on Now congestion avoidance is working. Increasing of CWND is linear, with a maximum increase of one segment per round-trip time CWND = SSTRESH. Slow start is stopped and congestion avoidance is started
53
Fast retransmit and Fast recovery algoritms.
TCP host 1:513 (512) ack 1 I am able to send 3 packets NETWORK 513:1025 (512) ack 1 ack 513 ack 513 1st duplicated ACK ack 513 2nd duplicated ACK ack 513 3rdt duplicated ACK It’ duplicated ACK also may be generated by reordering segments. It’ duplicated ACK may be generated by reordering segments. I think segment is lost IHL - IP header length Host don’t wait for timer retransmission expires. It send the lost segment. This is: Slow start isn’t performed, but congestion algoritm is working. This is FAST RETRANSMIT ALGORITM FAST RECOVERY ALGORITM
54
Fast retransmit and Fast recovery algoritms.
Combined algoritm’s work. 3rd duplicate ACK is received ACK is received which acknowledges all data segments sent between lost packet and 1st duplicate ACK SSTRESH = CWS/2 CWND= SSTRESH + 3 * segment size CWND = SSTRESH Retransmit the missing segment Congestion avoidance is now working IHL - IP header length If duplicate ACK arrives, INC(CWND;segment size); transmit packet (if CWND allows)
55
Slow start and congestion avoidance example
CWND DATA GO DATA GO DATA GO DATA GO DATA GO SEQ x 1000 300 400 500 600 700 800 900 1000 1100 200 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2 3 4 5 6 7 8 9 10 11 12 1 SYN SYN S, A ACK numbers (from table) Initialize: CWND = MSS = 256 SSTRESH = 65535 IHL - IP header length CWND <=SSTRESH slow start CWND = CWND + 1 segment CWND = = 768 CWND > SSTRESH cong.avoid. CWND < *256/ /8 We are using integer arithmetic. CWND = 1089 Timeout occurs SSTRESH = CWS/2 = minimum valuse = 512 CWND = 1 segment = 256 CWND > SSTRESH cong.avoid. CWND < *256/ /8 We are using integer arithmetic. CWND = 885 Here is no changes because new data is not being acknowledged Here is ACK for data! CWND <= SSTRESH we in slow start 1 segment = 256 CWND = CWND = 512 CWND > SSTRESH cong.avoid. CWND < *256/ /8 We are using integer arithmetic. CWND = 991 Real formula for 1/CWND is cwnd <- cwnd + (segsize*segsize)/cwnd + segsize/8
56
Slow start and congestion avoidance example
1400 1600 1800 2000 2200 2400 2600 2800 3000 1200 numbers (from table) 62 64 66 68 70 72 8,7 8,8 8,9 9 9,1 9,2 9,3 9,4 9,5 CWND SEQ x 1000 3200 9,6 9,7 3400 58 60 First two duplicated ACK is received and is counted and CWND is left alone Third duplicated ACK is arrived SSTRESH = CWND/2 = 2426/ 2 = 1024 (rounded down to the next mult. of the segment size) CWND = SSTRESH + number of dupl ACKs = * 256 = 1792 IHL - IP header length Duplicated ACK is received. CWND = CWND + 1 segment = = 2304 But CWND ‘s not big enough for sent data ACK for new data is received CWND <= SSTRESH slow start!!! CWND = SSTRESH + segment size = = 1280 Retransmission is sent Duplicated ACK is received. CWND = CWND + 1 segment = = 2560 We can send data NOTE: here we have unacknowledged data from prevous segments Duplicated ACK is received. CWND = CWND + 1 segment = = 2048 But CWND ‘s not big enough for sent data Data is sent There are some segments with same situation
57
TCP keepalive timer TCP implementation may use keepalive option. This option is used to know: Is my peer alive? One example is one half-open connection. One peer is died but another end don’t know about it. It keeps socket (IP address + port number) for that died perr. But peer needn’t anything already... And alive one must know it! Usually the keepalive timer is 2 hours. IHL - IP header length There are 4 scenarios if there is no activity on connection and one peer send keepalive probe to another
58
TCP keepalive timer Scenario 1. Peer is alive and reachable.
Client Packet ARP request keepalive probe ARP reply ACK Packet Server That’s all.. Peers don’t have any data to send to each other but connection is established Keepalive probe has SEQ that is one less than it should be (for example, receiver wait for SEQ = 14, but keepalive probe has SEQ = 13. Receiver receivs packet with incorrect SEQ and is forced to respond with ACK which containnext SEQ thar the server is expecting Client received answer from the server. It knows that the server is alive and reset its keepalive timer 2 (two) hours passed... My keepalive timer exhaust Is my peer alive? But I forgot his MAC address... Scenario 2. Peer crashed or process was rebooted. Client keepalive probe keepalive probe Packet Packet Server IHL - IP header length That’s all.. Peers don’t have any data to send to each other but connection is established 2 hours have passed 75 seconds… No answer My keepalive timer exhaust Is my peer alive? TCP send request. Don’t see now on lower level (for ARP). We should know whatever perr alive or not. 75 seconds… No answer Client send 10 keep-alive probes. If it doesn’t receive response, it consider the peer’s host is down or terminate connection But peer is crashed
59
TCP keepalive timer Scenario 3. Peer has crashed and rebooted.
Host has crashed, rebooted. It has working TCP stack but doesn’t have socket for that connection I’ll be laconic… 2 hours has passed Client Server keepalive probe reset connection Once again.. My keepalive timer exhaust Is my peer alive? Are they crazy? I don’t have such socket! Scenario 4.Client is running, but unreachable. IHL - IP header length In this scenario situation will be the same as in scenario 1 - from client’s point of view. This situation may be caused by accident with intermediate router
60
Path MTU Discovery Connection established Decrease MTU MTU = MIN (my interface MTU; MSS announced by the other end) Router generate newer form of ICMP error message which contain its MSS Router generate older form of ICMP error message If th other end doesn’t specify MSS, it default to 536 It is possible to save path MTU on a per-route basis We send datagrams with DF (don’t fragment) bit set MTU = MSS - IP header - TCP header We take next smaller MTU Things is being changing… After timeout we can try bigger MTU (depending on implementation ). RFC 1101 recommends 10 minutes. We have received ICMP error “can’t fragment” But things is changing… For example, router fell and route was changed. Another router needs fragmnet our datagram, but datagram has DF bit set. Router is sending ICMP error to our host
61
TCP packet with MSS option
Data offset DATA Destination port Acknowledgment number Sequence number Urgent pointer Source port Header checksum Options (+padding) Flags Reserved Window Maximum segment size option kind=2 1 byte len=0 1 byte MSS 2 byte
62
Path MTU Discovery. Example.
Host 2 Host 1 MTU = 552 MTU = 296 Router 1 MTU = 1500 MTU = 1500 SYN, ACK mss = 512 SYN mss = 1460 1:257(256) ACK ICMP error message: Host 1 unreachable, need to frag, mtu = 296 (newer implementation router’s TCP) 1:513 (512) ACK Router: I can’t send so big datagram without fragmentation. But DF bit is set => error occur! MTU is 552! I can send datagram with 512 bytes of data. My MSS now 256 (MTU = 296)
63
Window Scale Option 1 byte 1 byte 1 byte
Networks are growing and buffers is coming bigger and there is not enough window size (maximum window size allowed by window field in TCP header) The newer implementation using WINDOW SCALE OPTION The newer implementation can work with oldest implementations. Data offset DATA Destination port Acknowledgment number Sequence number Urgent pointer Source port Header checksum Options (+padding) Flags Reserved Window TCP header Option field can contain WINDOW SCALE OPTION kind=3 1 byte len=3 1 byte shift count 1 byte Window WINDOW SCALE OPTION can be advertized only in SYN segment. Sacel factor is fixed in each direction when the connection established Shift count: no scaling performed There are only 16 bit
64
Window Scale Option. Setting.
To enable window scaling both ends must have this option in their SYN segments SYN, ACK, wscale 3 SYN, wscale 1 Active Open I think my window scale should be 1 Active peer is going to use window scale! I understand it and choose my window scale = 0. I must set this option to 0. IHL - IP header length How scale work. Window scale is using to shift value from window field to get real window size 1 Using window scale to shift value to left for 1 bit... For example, window scale was set to 1 and window size in the receiving packet is 4 (it’s only example) 1 Real advertized window is 8
65
Timestamp option 1 byte 1 byte 4 byte 4 byte
kind=8 1 byte len=10 1 byte timestamp value 4 byte timestamp echo reply 4 byte Timestamp oyin isusing for better calculating RTT The sender places a 32-bit value in the first field and the receiver echoes this back in the reply field. For usinf this option both ends must be able to work with this option. For established this option the active peer must set timestamp option in the SYN and another (passive) end must answer with option too. Only one timestamp option is kept per connection How does TCP do it? Receiver’s TCP keeps: ACK number from the last ACK which was sent, and time stamp value which was placed to there (tsrecenct). ACK number is next sequence number whivh we are waiting for (lastback). IHL - IP header length Segment arrived: If SEQ from segment is lastback, tsrecent = timestamp option from the segment SEQ Trsent is sent to the timestamp reply field and lastback is sent to ACK value in the sending ACK.
66
PAWS: Protection Against Wrapped Sequence Numbers
IHL - IP header length
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.