Presentation on theme: "TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot"— Presentation transcript:
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Tests configuration POS 155 MbpsGbE Pcgiga-gbe.cern.ch (Geneva) Plato.cacr.caltech.edu (California) Ar1-chicago Cernh 9 Calren2 / Abilene Lxusa-ge.cern.ch (Chicago) GbE u CERN (Geneva) Caltech (California) u RTT : 175 ms u Bandwith-delay product : 3,4 MBytes. u CERN Chicago u RTT : 110 ms u Bandwidth-delay-product : 1.9 MBytes. u Tcp flows were generated by Iperf. u Tcpdump was used to capture packets flows u Tcptrace and xplot were used to plot and summarize tcpdump data set.
TCP overview: Slow Start and congestion Avoidance Example Cwnd average of the last 10 samples. Cwnd average over the life of the connection to that point Slow start Congestion Avoidance SSTHRESH Here is an estimation of the cwnd (Output of TCPtrace): Slow start : fast increase of the cwnd Congestion Avoidance : slow increase of the window size
u During congestion avoidance and without any loss, the cwnd increases by one segment each RTT. In our case, we have no loss, so the window increases by 1460 bytes each 175 ms. If the cwnd is equal to 730 kbyte, it takes more than 5 minutes to have a cwnd larger than the bandwidth delay product (3,4 MByte). In other words, we have to wait almost 5 minutes to use the whole capacity of the link (155 Mbps)!!! SSTHRESH = 730Kbyte SSTHRESH =1460Kbyte Slow start Congestion avoidance Cwnd=f(time) ( Throughput = 33 Mbit/s)Cwnd=f(time) ( Throughput= 63 Mbit/s) Influence of the initial SSTHRESH on TCP performance
Reactivity u TCP reactivity r Time to recover a 200 Mbps throughput after a loss is larger than 50 seconds for a connection between Chicago and CERN. u A single loss is disastrous r TCP is much more sensitive to packet loss in WANs than in LANs 53 sec
Linux Patch GRID TCP u Parameter tuning è New parameter to better start a TCP transfer r Set the value of the initial SSTHRESH u Modifications of the TCP algorithms (RFC 2001) è Modification of the well-know congestion avoidance algorithm r During congestion avoidance, for every useful acknowledgement received, cwnd increases by M * (segment size) * (segment size) / cwnd. Its equivalent to increase cwnd by M segments each RTT. M is called congestion avoidance increment è Modification of the slow start algorithm r During slow start, for every useful acknowledgement received, cwnd increases by N segments. N is called slow start increment. è Note: N=1 and M=1 in common TCP implementations. u Smaller backoff (Not implemented yet) è Reduce the strong penalty imposed by a loss è Reproduce the behavior of a Multi-streams TCP connection. u Only the senders TCP stack need to be modified u Alternative to Multi-streams TCP transfers
TCP tuning by modifying the slow start increment Slow start, 0.8s Slow start, 1.2s Congestion window (cwnd) as function of the time Slow start increment = 1, throughput = 98 Mbit/s Congestion window (cwnd) as function of the time Slow start increment = 3, throughput = 116 Mbit/s Slow start, 2.0s Cwnd of the last 10 samples. Cwnd average over the life of the connection to that point Slow start, 0.65s Congestion window (cwnd) as function of the time Slow start increment = 2, throughput = 113 Mbit/s Congestion window (cwnd) as function of the time Slow start increment = 5, throughput = 119 Mbit/s
TCP tuning by modifying the congestion avoidance increment (1) Congestion window (cwnd) as function of the time – Congestion avoidance increment = 1, throughput = 37.5 Mbit/s Congestion window (cwnd) as function of the time – Congestion avoidance increment = 10, throughput = 61.5 Mbit/s SSTHREH = Mbyte Cwnd is increased by 1200 bytes in 27 sec. Cwnd is increased by bytes(10*1200) in 27 sec.
Benefice of larger congestion avoidance increment when losses occur u When a loss occur, the cwnd is divided by two. The performance is determined by the speed at which the cwnd increases after the loss. So higher is the congestion avoidance increment, better is the performance. u We simulate losses by using a program which drops packets according to a configured loss rate. For the next two plots, the program drop one packet every packets. 3) cwnd:=cwnd/2 2) Fast Recovery (Temporary state until the loss is repaired) 1) A packet is lost Congestion window (cwnd) as function of the time – Congestion avoidance increment = 1, throughput = 8 Mbit/s Congestion window (cwnd) as function of the time – Congestion avoidance increment = 10, throughput = 20 Mbit/s
TCP Performance Improvement Memory to memory transfers Without any tuning By tuning TCP buffers TCP Grid on 155 Mbps US-CERN Link New bottlenecks è Iperf is not able to perform long transfers è Linux station with 32 bit 33 Mhz PCI bus (Will replace with modern server) TCP Grid on 2 X 155 Mbps US-CERN Link TCP Grid on 622 Mbps US- CERN Link
Conclusion u To achieve high throughput over high latency/bandwidth network, we need to : è Set the initial slow start threshold (ssthresh) to an appropriate value for the delay and bandwidth of the link. è Avoid loss r by limiting the max cwnd size. è Recover fast if loss occurs: r Larger cwnd increment => we increase faster the cwnd after a loss r Smaller window reduction after a loss è …..