Presentation on theme: "ECE544: Communication Networks-II Spring 2015 Transport Protocols Includes teaching materials from, L. Peterson, David Wetherall and Tom Anderson (Univ."— Presentation transcript:
ECE544: Communication Networks-II Spring 2015 Transport Protocols Includes teaching materials from, L. Peterson, David Wetherall and Tom Anderson (Univ. of Washington), Gregg, Brendan. Systems Performance: Enterprise and the Cloud
Today’s Lecture Introduction to transport protocols UDP TCP RTP
The Disconnect R1 ETH FDDI IP R2 FDDI PPP IP R3 PPP ETH IP Host1 Host8 IP ETH IP ETH Applications running on hosts need to communicate – Require some guarantees from the underlying layer Network Layer (IP) provides only best-effort communication services – Only between hosts (not applications) Best-effort Appl. Guaranteed Service
Transport Protocol IP ETH R1 ETH FDDI IP R2 FDDI PPP IP R3 PPP ETH IP Host1 IP ETH TCP/UDP Host8 TCP/UDP Appl. Transport protocol – Provides services required by applications using the services provides by the network layer – The Transport Layer is the lowest layer in the network stack that is an end-to-end protocol
Transport Protocols Applications requirements vs. IP layer limitations – Guarantee message delivery Network may drop messages. – Deliver messages in the same order they are sent Messages may be reordered in networks and incurs a long delay. – Delivers at most one copy of each message Messages may duplicate in networks. – Support arbitrarily large message Network may limit message size. – Support synchronization between sender and receiver – Allows the receiver to apply flow control to the sender – Support multiple application processes on each host Network only support communication between hosts – Many more Design just a few transport protocols to meet most of the current and future application requirements – Each satisfies the requirements for a class of applications – Many applications=>few transport protocols
Most Popular Transport Protocols User Datagram Protocol (UDP) – Support multiple applications processes on each host – Option to check messages for correctness with CRC check Transmission Control Protocol (TCP) – Ensures reliable delivery of packets between source and destination processes – Ensures in-order delivery of packets to destination process – Other services Real Time Protocol (RTP) – Serves real-time multimedia applications – Moves decision making to the applications – Runs over UDP TCP, UDP and RTP satisfy needs of the most common applications – Applications requiring other functionality usually use UDP for transport protocol, and implement additional features as part of the application
UDP Demultiplexing Service: Support for multiple processes on each host to communicate – Issue: IP only provides communication between hosts (IP addresses) Solution – Add port number and associate a process with a port number – 4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort, DestIPAddr ] SrcPortDesPort LengthChecksum Payload Appl process UDP IP Appl process UDP IP Appl process Network UDP Packet Format
UDP Error Detection Service: Ensure message correctness – Issue: Packet corruption in transit Solution – Use Checksum. Why isn’t IP checksum enough? – Includes UDP header, payload, pseudo header – Pseudo header Protocol number, source IP address, destination IP address, and UDP length SrcPortDesPort LengthChecksum Payload 01631
UDP Properties and Applications It is transaction-oriented, suitable for simple query-response protocols such as the Domain Name System (DNS) or Network Time Protocol (NTP) It provides datagrams, suitable for modeling other protocols such as in IP tunneling, or Remote Procedure Call (RPC) and the Network File System (NFS). It is simple, suitable for bootstrapping without a full protocol stack, such as the DHCP and Trivial File Transfer Protocol. It is stateless, suitable for very large numbers of clients, such as in streaming media applications for The lack of retransmission delays makes it suitable for real-time applications such a VoIP, online games, and many protocols built on top of the Real Time Streaming Protocol. Works well in unidirectional communication, suitable for broadcast information such as in many kinds of service discovery and shared information such as broadcast time or Routing Information Protocol (RIP).
Transmission Control Protocol (TCP) First proposed by Vinton Cerf and Robert Kahn, 1974 – TCP/IP enabled computers of all sizes, from different vendors, different OSs, to communicate with each other. – Used by 80% of all traffic on the Internet Reliable, in-order delivery, connection- oriented, bye-stream service
TCP: Connection-oriented Service: Connection-oriented – Application states the destination once – Issue: IP is connection-less Solution: TCP maintains the connection state – Connection Establishment – Connection Termination
13 Evolution of TCP TCP & IP RFC 793 & TCP described by Vint Cerf and Bob Kahn In IEEE Trans Comm 1983 BSD Unix 4.2 supports TCP/IP 1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1987 Karn’s algorithm to better estimate round-trip time 1986 Congestion collapse observed 1988 Van Jacobson’s algorithms congestion avoidance and congestion control (most implemented in 4.3BSD Tahoe) BSD Reno fast retransmit delayed ACK’s 1975 Three-way handshake Raymond Tomlinson In SIGCOMM 75
14 TCP Through the 1990s ECN (Floyd) Explicit Congestion Notification 1993 TCP Vegas (Brakmo et al) real congestion avoidance 1994 T/TCP (Braden) Transaction TCP 1996 SACK TCP (Floyd et al) Selective Acknowledgement 1996 Hoe Improving TCP startup 1996 FACK TCP (Mathis et al) extension to SACK
TCP: Packet Format Flags – SYN, FIN, ACK, RESET, URG, PUSH Sequence number – Sequence number of the first byte of data in the segment It is an abstract number (more later) Acknowledgement – Next sequence number expected from the sender
Reliable Byte-stream Bidirectional data transfer – Control information (e.g., ACK) piggybacks on data segments in reverse direction
TCP Connection Management Setup – assymetric 3-way handshake Transfer – sliding window; data and acks in both directions Teardown – symmetric 2-way handshake Client-server model – initiator (client) contacts server – listener (server) responds, provides service
Three-Way Handshake Opens both directions for transfer Active participant (client) Passive participant (server) SYN, SequenceNum = x SYN + ACK, SequenceNum = y, ACK, Acknowledgment = y + 1 Acknowledgment = x + 1 +data
Do we need 3-way handshake? Allows both sides to – allocate state for buffer size, state variables, … – calculate estimated RTT, estimated MTU, etc. Helps prevent – Duplicates across incarnations – Intentional hijacking random nonces => weak form of authentication S hort-circuit? – Persistent connections in HTTP (keep connection open) – Transactional TCP (save seq #, reuse on reopen) – But congestion control effects dominate
TCP Transfer Connection is bi-directional – acks can carry response data (client)(server) Seq = x + MSS; Ack = y+1 Seq = y+MSS; Ack = x+2MSS+1 Seq = x + 2*MSS; Ack = y+1 Seq = x + 3*MSS; Ack = y+MSS+1
TCP Connection Teardown Symmetric: either side can close connection (or RST!) Web serverWeb browser FIN ACK data, ACK FIN data, ACK ACK Half-open connection; data can be continue to be sent Can reclaim connection right away (must be at least 1MSL after first FIN) Can reclaim connection after 2 MSL
Connection Establishment Both sender and receiver must be ready before we start the transfer of data – Need to agree on a set of parameters – e.g., the Maximum Segment Size (MSS) This is (in-band) signaling – It sets up state at the endpoints
Connection Establishment Server – Informs TCP about the listening port Up-call registration Client – Performs three way handshake – SYN and ACK flags in the header are used – Initial Sequence numbers x and y selected at random Active participant (client) Passive participant (server) SYN, Seq#=x SYN+ACK, Seq#=y Ack#=x+1 ACK, Ack#=y+1 Data+ACK Connection Establishment Data transport
Connection Termination Any side can terminate the connection Each side closes its half of the connection independently – A connection may be half- opened FIN FIN-ACK FIN ACK DATA Data write Data ACK Can only receive data
What if packets can be delayed? Solutions? – Never reuse an ID? – Change IP layer to eliminate packet reordering? – Prevent very late delivery? IP routers keep hop count per pkt, discard if exceeded ID’s not reused within delay bound – TCP won’t work without some bound on how late packets can arrive! Accept! Reject!
TCP Connection Setup, with States Active participant (client) Passive participant (server) SYN, SequenceNum = x SYN + ACK, SequenceNum = y, ACK, Acknowledgment = y + 1 Acknowledgment = x + 1 +data LISTEN SYN_RCVD SYN_SENT ESTABLISHED
TCP Connection Teardown Web serverWeb browser FIN ACK FIN FIN_WAIT_1 CLOSE_WAIT LAST_ACK FIN_WAIT_2 TIME_WAIT CLOSED …
The TIME_WAIT State We wait 2MSL (two times the maximum segment lifetime of 60 seconds) before completing the close Why? ACK might have been lost and so FIN will be resent Could interfere with a subsequent connection
TCP Handshake in an Uncooperative Internet TCP Hijacking – if seq # is predictable, attacker can insert packets into TCP stream – many implementations of TCP simply bumped previous seq # by 1 – attacker can learn seq # by setting up a connection Solution: use random initial sequence #’s – weak form of authentication Malicious attacker Server SYN, SequenceNum = x SYN + ACK, y, x + 1 Client “HTTP get URL”, x + MSS web page, y + MSS ACK, y+1 fake web page, y+MSS
TCP Handshake in an Uncooperative Internet TCP SYN flood – server maintains state for every open connection – if attacker spoofs source addresses, can cause server to open lots of connections – eventually, server runs out of memory Malicious attacker Server SYN, SequenceNum = x SYN + ACK, y, x + 1 SYN, p SYN, q SYN, r SYN, s
How can TCP choose segment size? Pick LAN MTU as segment size? – LAN MTU can be larger than WAN MTU – E.g., Gigabit Ethernet jumbo frames Pick smallest MTU across all networks in Internet? – Most traffic is local! Local file server, web proxy, DNS cache,... – Increases packet processing overhead Discover MTU to each destination? ( IP DF bit) Guess?
How do we keep the pipe full? Unless the bandwidth*delay product is small, stop and wait can’t fill pipe Solution: Send multiple packets without waiting for first to be acked Reliable, unordered delivery: – Send new packet after each ack – Sender keeps list of unack’ed packets; resends after timeout – Receiver same as stop&wait H ow easy is it to write apps that handle out of order delivery? – How easy is it to test those apps?
Sliding Window: Reliable, ordered delivery Two constraints: – Receiver can’t deliver packet to application until all prior packets have arrived – Sender must prevent buffer overflow at receiver Solution: sliding window – circular buffer at sender and receiver packets in transit <= buffer size advance when sender and receiver agree packets at beginning have been received – How big should the window be? bandwidth * round trip delay
Sender/Receiver State sender – packets sent and acked (LAR = last ack recvd) – packets sent but not yet acked – packets not yet sent (LFS = last frame sent) receiver – packets received and acked (NFE = next frame expected) – packets received out of order – packets not yet received (LFA = last frame ok)
Sliding window Allows multiple packets, up to the size of the window, to be sent on the network before acknowledgments are received. Provides high throughput even on high- latency networks. The size of the window is advertised by the receiver to indicate how many packets it is willing to receive at that time.
Sliding Window LAR LFS Send Window sent acked x x x x x x x x NFE LFA Receive Window recvd acked x x x x x x x x
TCP State-Transition Max segment lifetime (MSL): 120 sec (recommended)
What if we lose a packet? Go back N (original TCP) – receiver acks “got up through k” (“cumulative ack”) – ok for receiver to buffer out of order packets – on timeout, sender restarts from k+1 Selective retransmission (RFC 2018) – receiver sends ack for each pkt in window – on timeout, resend only missing packet
Can we shortcut timeout? If packets usually arrive in order, out of order delivery is (probably) a packet loss – Negative ack receiver requests missing packet – Fast retransmit (TCP) receiver acks with NFE-1 (or selective ack) if sender gets acks that don’t advance NFE, resends missing packet
Sender Algorithm Send full window, set timeout On receiving an ack: if it increases LAR (last ack received) send next packet(s) -- no more than window size outstanding at once else (already received this ack) if receive multiple acks for LAR, next packet may have been lost; retransmit LAR + 1 (and more if selective ack) On timeout: resend LAR + 1 (first packet not yet acked)
Receiver Algorithm On packet arrival: if packet is the NFE (next frame expected) send ack increase NFE hand any packet(s) below NFE to application else if < NFE (packet already seen and acked) send ack and discard // Q: why is ack needed? else (packet is > NFE, arrived out of order) buffer and send ack for NFE – 1 -- signal sender that NFE might have been lost -- and with selective ack: which packets correctly arrived
Sequence Number Selection Initial sequence number (ISN) selection: – Why not simply chose 0? – Must avoid overlap with earlier incarnation – Security issues Requirements for ISN selection – Must operate correctly Without synchronized clocks Despite node failures
Sequence Number Wrap Around Protect against SequenceNum wrap around – Sliding window Seq # space >= 2 x WinSize For TCP: 2 32 >> 2 x 2 16 – Seq # should not wraparound within a MSL (120 sec) period of time – For OC-48 (2.5 Gbps), time until wraparound: 14 sec TCP extension to the sequence # space for protecting against seq # wrapping around – Add 32-bit timestamp as optional header
How do we determine timeouts? If timeout too small, useless retransmits – can lead to congestion collapse (and did in 86) – as load increases, longer delays, more timeouts, more retransmissions, more load, longer delays, more timeouts … – Dynamic instability! If timeout too big, inefficient – wait too long to send missing packet Timeout should be based on actual round trip time (RTT) – varies with destination subnet, routing changes, congestion, …
Estimating RTTs Idea: Adapt based on recent past measurements – For each packet, note time sent and time ack received – Compute RTT samples and average recent samples for timeout – EstimatedRTT = x EstimatedRTT + (1 - ) x SampleRTT – This is an exponentially-weighted moving average (low pass filter) that smoothes the samples. Typically, = 0.8 to 0.9. – Set timeout to small multiple (2) of the estimate
Keep the Pipe Full AdvertisedWindow: 2 16 =>64 KB – Big enough to allow the sender to keep the pipe full (assume that the receiver has enough buffer to handle the data) – If RTT = 100 ms, Delay x Bandwidth = 122 KB for 10 Mbps link Delay x Bandwidth = 1.2 MB for 100 Mbps link (AdvertisedWindow is not large enough) TCP Extension: – Scaling factor option for AdvertisedWindow, e.g., use 16-byte units of data
TCP Error Control Cumulative ACK: ACK the highest contiguous bytes received – Same as studied before Extension: Selective ACK (SACK), ACK additional blocks of received data in TCP optional header Timeout Timer – If timeout too soon unnecessarily retransmit → adds load to network – If timeout too late Increases latency Limits the throughput.
TCP Timeout Issue: RTT in a wide area network varies substantially Solution: Adaptive Timeout Original Algorithm: – EstimatedRTT = x EstimatedRTT + (1- ) x SampleRTT – Timeout = β x EstimatedRTT (β = 2) Problem – Does not distinguish whether the ACK is for original transmission or retransmission (suggestions?) – Constant β is not good. Assumes constant variance
TCP Timeout Karn/Partridge Algorithm – Whenever TCP retransmits a segment, it stops taking samples of the RTT Only measure SampleRTT for segments that have have been sent only once – Each time TCP retransmits, set the next timeout to be twice the last timeout Relieves congestion Jacobson/Karels Algorithm: Adaptive variance (uses mean variance) Difference = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + ( x Difference) → (same as in original) Deviation = Deviation + (|Difference|- Deviation) Timeout = x EstimatedRTT + x Deviation (default: set = 1 and = 4 )
Triggering Transmission When to transmit a segment: – small segments subject to large overhead Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment – MSS = local MTU – IP & TCP header The sending process explicitly ask the TCP to transmit, “push”
TCP Silly Window Syndrome – Sender has MSS bytes of data to send, but window is closed – ACK arrives with a small window – Sender sends a small segment (high overhead) – Receiver advertise a small window – Sender sends a small receive segment – Repeat the above To solve: Nagle’s Algorithm – When the application have data to send If both available data and the window >= MSS – Send a full segment Else – If there is unACKed data in flight » Buffer the new data until an ACK arrives – Else » Send all the new data now
TCP Deadlock – receiver advertises a window size of 0, the sender stops sending data – the window size update from the receiver is lost To solve it: – the sender starts the persist timer when AdvertisedWindow = 0 – When the persist timer expires, the sender sends a small packet
Example – Exchange of Packets SEQ=1 SEQ=2 SEQ=3 SEQ=4 ACK=2; WIN=3 ACK=3; WIN=2 ACK=4; WIN=1 ACK=5; WIN=0 Receiver has buffer of size 4 and application doesn’t read Stall due to flow control here T=1 T=2 T=3 T=4 T=5 T=6
Example – Buffer at Sender T=1 T=2 T=3 T=4 T=5 T=6 =acked =sent =advertised
How does sender know when to resume sending? If receive window = 0, sender stops – no data => no acks => no window updates Sender periodically pings receiver with one byte packet – receiver acks with current window size Why not have receiver ping sender?
Should sender be greedy? Should sender transmit as soon as any space opens in receive window? – Silly window syndrome receive window opens a few bytes sender transmits little packet receive window closes Solution (Clark, 1982): sender doesn’t resume sending until window is half open
Should sender be greedy? App writes a few bytes; send a packet? – Don’t want to send a packet for every keystroke – If buffered writes >= max segment size – if app says “push” (ex: telnet, on carriage return) – after timeout (ex: 0.5 sec) Nagle’s algorithm – Never send two partial segments; wait for first to be acked, before sending next – Self-adaptive: can send lots of tinygrams if network is being responsive But (!) poor interaction with delayed acks (later)
Congestion When the network cannot support the sender’s rate – Queues at the network elements overflow Source 1 Source 2 Source 3 Dest 2 Dest 1 Even with flow control packets might not reach the destination
Congestion Control vs. Flow Control Congestion Control – Mechanism to prevent sender from overrunning the capacity of the network When network is the bottleneck Flow Control – Mechanism to prevent sender from overrunning the capacity of the receiver When receiver is the bottleneck
Misbehaving TCP Receivers On server side, little incentive to cheat TCP – Mostly competing against other flows from same server On client side, high incentive to induce server to send faster – How? 60
Congestion Control: Design Approach Maintain another window at the sender called CongestionWindow (cwnd) – CongestionWindow is the max number of packets allowed in the network Number of unACKed packets at the sender. Key: How to calculate congestion window (cwnd) – Various approaches possible – TCP estimates it based on observed packet losses Assumes packet loss as indication of congestion Since we don’t know whether the network or the receiver is the bottleneck – MaxWindow = MIN(CongestionWindow, AdvertisedWindow) – EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked)
Congestion Avoidance Prevent sending too much data and causing saturation, which can cause packet drops and worse performance. Slow-start: Part of TCP congestion control, this begins with a small congestion window and then increases it as acknowledgments (ACKs) are received within a certain time. When they are not, the congestion window is reduced.
Optimizations Selective acknowledgments (SACKs): allow TCP to acknowledge discontinuous packets, reducing the number of retransmits required. Fast retransmit: Instead of waiting on a timer, TCP can retransmit dropped packets based on the arrival of duplicate ACKs. These are a function of round-trip time and not the typically much slower timer.
Optimizations Fast recovery: This recovers TCP performance after detecting duplicate ACKs, by resetting the connection to perform slow-start. In some cases these are implemented by use of extended TCP options added to the protocol header. Important topics for TCP performance include the three-way handshake, duplicate ACK detection, congestion control algorithms, Nagle, delayed ACKs, SACK, and FACK.
Congestion Avoidance: (AIMD) If no congestion in the network (increase conservatively) – Increase the congestion window additively every RTT If congestion in the network (decrease aggressively) – Decrease the congestion window multiplicatively, immediately How is congestion detected? – Estimated (more later) Every RTT w = w + 1 w = cwnd in segments Every ACK reception w = w + 1/w w = cwnd in segments Every ACK reception cwnd = cwnd + MSS*(MSS/cwnd) cwnd in bytes cwnd = cwnd/2 cwnd in bytes
Congestion Avoidance: (AIMD) TCP’s saw tooth pattern Issues with additive increase – takes too long to ramp up a connection from the beginning – The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender Time CongestionWindow Size Startup time
TCP “Slow Start”: To start quickly! Maintain another variable slow start threshold (ssthresh) – Last known stable rate – If (cwnd > ssthresh) State = congestion avoidance – Else State = slow start In Slow start – Increase the congestion window exponentially every RTT Key: How is ssthresh calculated? Every ACK reception w = w + 1 w = cwnd in segments Every ACK reception cwnd = cwnd + MSS cwnd in bytes
TCP: Congestion Detection and Retransmit Loss of packet indicates congestion – Timer Timeouts (No ACK) Set according to Jacobson/Karels algorithm – On timer timeout ssthresh = max(2*MSS, effwin/2); cwnd = MSS – Notice this will cause TCP to go into slow start Issue: takes a long time to detect a packet loss – Affects throughput – Any other quicker way of detecting a packet loss?
Fast Retransmit Observation: A series of duplicate ACKs might mean a packet loss Solution Every time receiver receives a packet (out-of-order), sends a duplicate ACK Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs) Fast Retransmit does not replace timeouts Issue: Reduces latency (early retransmit) but still incurs loss in throughput (slow start after packet loss ) ACK 1 ACK 2 ACK 6 PKT 1 PKT 2 PKT 4 PKT 5 PKT 6 PKT 3 Retran PKT 3
Fast Recovery Transmit a packet for every ACK received till the retransmitted packet is ACK’d – ssthresh= (2*MSS, cwdn/2); cwnd = sshthred + 3 – On every ACK will the ACK of retransmitted packet cwnd = cwnd + 1 On reception of ACK of retransmitted packet – Start congestion avoidance instead of slow start cwnd = ssthresh
TCP backlog queues
TCP send & receive buffers
Duplicate ACK detection Used by the fast retransmit and fast recovery algorithms. It is performed on the sender and works as follows: 1.The sender sends a packet with sequence number The receiver replies with an ACK for sequence number The sender sends 11, 12, and Packet 11 is dropped. 5.The receiver replies to both 12 and 13 by sending an ACK for 11, which it is still expecting. 6.The sender receives the duplicate 11 ACKs. Also used by TCP Reno and Tahoe congestion avoidance algorithms.
TCP Performance (Steady State) Bandwidth as a function of – RTT? – Loss rate? – Packet size? – Receive window? 74
What if TCP connection is short? Slow start dominates performance – What if network is unloaded? – Burstiness causes extra drops Packet losses unreliable indicator for short flows – can lose connection setup packet – Can get loss when connection near done – Packet loss signal unrelated to sending rate In limit, have to signal congestion (with a loss) on every connection – 50% loss rate as increase # of connections
Example: 100KB transfer 100Mb/s Ethernet,100ms RTT, 1.5MB MSS Ethernet ~ 100 Mb/s 64KB window, 100ms RTT ~ 6 Mb/s slow start (delayed acks), no losses ~ 500 Kb/s slow start, with 5% drop ~ 200 Kb/s Steady state, 5% drop rate ~ 750 Kb/s
Improving Short Flow Performance Start with a larger initial window – RFC 3390: start with 3-4 packets Persistent connections – HTTP: reuse TCP connection for multiple objects on same page – Share congestion state between connections on same host or across host Skip slow start? Ignore congestion signals?
78 TCP Modeling Given the congestion behavior of TCP can we predict what type of performance we should get? What are the important factors –Loss rate Affects how often window is reduced –RTT Affects increase rate and relates BW to window –RTO Affects performance during loss recovery –MSS Affects increase rate
79 Overall TCP Behavior Time Window Let’s concentrate on steady state behavior with no timeouts and perfect loss recovery
80 Simple TCP Model Some additional assumptions –Fixed RTT –No delayed ACKs In steady state, TCP losses packet each time window reaches W packets –Window drops to W/2 packets –Each RTT window increases by 1 packet W/2 * RTT before next loss –BW = MSS * avg window/RTT = MSS * (W + W/2)/(2 * RTT).75 * MSS * W / RTT
81 Simple Loss Model What was the loss rate? –Packets transferred between losses = Avg BW * time = (.75 W/RTT) * (W/2 * RTT) = 3W 2 /8 –1 packet lost loss rate = p = 8/3W 2 –W = sqrt( 8 / (3 * loss rate)) BW =.75 * MSS * W / RTT –BW = MSS / (RTT * sqrt (2/3p))
82 TCP Friendliness What does it mean to be TCP friendly? –TCP is not going away –Any new congestion control must compete with TCP flows Should not clobber TCP flows and grab bulk of link Should also be able to hold its own, i.e. grab its fair share, or it will never become popular How is this quantified/shown? –Has evolved into evaluating loss/throughput behavior –If it shows 1/sqrt(p) behavior it is ok –But is this really true?
83 TCP Performance Can TCP saturate a link? Congestion control –Increase utilization until… link becomes congested –React by decreasing window by 50% –Window is proportional to rate * RTT Doesn’t this mean that the network oscillates between 50 and 100% utilization? –Average utilization = 75%??Average utilization = 75%?? –No…this is *not* right!
84 TCP Congestion Control Only W packets may be outstanding Rule for adjusting W If an ACK is received: W ← W+1/W If a packet is lost:W ← W/2 SourceDest t Window size
Congestion Control: Reno and Tahoe Reno: triple duplicate ACKs trigger: halving of the congestion window, halving of the slow-start threshold, fast retransmit, and fast recovery Tahoe: triple duplicate ACKs trigger: fast retransmit, halving the slow-start threshold, congestion window set to one maximum segment size (MSS), and slow-start state. Some operating systems (e.g., Linux and Oracle Solaris 11) allow the algorithm to be selected as part of system tuning. Newer algorithms that have been developed for TCP include Vegas, New Reno, and Hybla.
Nagle This algorithm [RFC 896] reduces the number of small packets on the network by delaying their transmission to allow more data to arrive and coalesce. This delays packets only if there is data in the pipeline and delays are already being encountered. The system may provide a tunable parameter to disable Nagle, which may be necessary if its operation conflicts with delayed ACKs.
Delayed ACKs This algorithm [RFC 1122] delays the sending of ACKs up to 500 msec., so that multiple ACKs may be combined. Other TCP control messages can also be combined, reducing the number of packets on the network.
Selective ACK (SACK) Allows the receiver to inform the sender that it received a noncontiguous block of data. Without this, a packet drop would eventually cause the entire send window to be retransmitted, to preserve a sequential acknowledgment scheme. This harms TCP performance and is avoided by most modern operating systems that support SACK.
Forward ACKs (FACK) SACK has been extended by forward acknowledgments (FACK), which are supported in Linux by default. FACKs track additional state and better regulate the amount of outstanding data in the network, improving overall performance.
Key Metrics for Network Monitoring
Network performance analysis tools
Linux sar Network Statistics
netstat –s (cont.)
traceroute The traceroute command sends a series of test packets to experimentally determine the current route to a host. This is performed by increasing the IP protocol time to live (TTL) by one for each packet, causing the sequence of gateways to the host to reveal themselves by sending ICMP time exceeded response messages (provided a firewall doesn’t block them). For example, testing the current route between a host in California and a target in Virginia.
tcpdump Network packets can be captured and inspected using the tcpdump utility. Each line of output shows the time of the packet (with microsecond resolution), its source and destination IP addresses, and TCP header values. By studying these, the operation of TCP can be understood in detail, including how advanced features are working for your workload. The -n option was used to not resolve IP addresses as host names. Various other options are available, including printing verbose details where available (-v), link-layer headers (-e), and hex-address dumps (-x or -X).