TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

Slides:



Advertisements
Similar presentations
Michele Pagano – A Survey on TCP Performance Evaluation and Modeling 1 Department of Information Engineering University of Pisa Network Telecomunication.
Advertisements

TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Cheng Jin David Wei Steven Low FAST TCP: design and experiments.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.
1 TCP - Part II. 2 What is Flow/Congestion/Error Control ? Flow Control: Algorithms to prevent that the sender overruns the receiver with information.
School of Information Technologies TCP Congestion Control NETS3303/3603 Week 9.
Chapter 3 Transport Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 12.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #7 TCP New Reno Vs. Reno.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Data Communication and Networks
1 Internet Networking Spring 2004 Tutorial 10 TCP NewReno.
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
TCP: flow and congestion control. Flow Control Flow Control is a technique for speed-matching of transmitter and receiver. Flow control ensures that a.
Lect3..ppt - 09/12/04 CIS 4100 Systems Performance and Evaluation Lecture 3 by Zornitza Genova Prodanoff.
Courtesy: Nick McKeown, Stanford 1 TCP Congestion Control Tahir Azim.
Transport Layer 4 2: Transport Layer 4.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
1 Transport Protocols (continued) Relates to Lab 5. UDP and TCP.
NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge,
27th, Nov 2001 GLOBECOM /16 Analysis of Dynamic Behaviors of Many TCP Connections Sharing Tail-Drop / RED Routers Go Hasegawa Osaka University, Japan.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
BIC Control for Fast Long-Distance Networks paper written by Injong Rhee, Lisong Xu & Khaled Harfoush (2004) Presented by Jonathan di Costanzo (2009/02/18)
1 TCP - Part II Relates to Lab 5. This is an extended module that covers TCP data transport, and flow control, congestion control, and error control in.
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer 3- Midterm score distribution. Transport Layer 3- TCP congestion control: additive increase, multiplicative decrease Approach: increase.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429/556 Introduction to Computer Networks Principles of Congestion Control Some slides.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
1 John Magee 20 February 2014 CS 280: Transport Layer: Congestion Control Concepts, TCP Congestion Control Most slides adapted from Kurose and Ross, Computer.
1 TCP - Part II. 2 What is Flow/Congestion/Error Control ? Flow Control: Algorithms to prevent that the sender overruns the receiver with information.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
ECE 4110 – Internetwork Programming
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP data transport, and flow control, congestion control, and error control in TCP.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Recap Slow start introduced cwnd Slow start introduced cwnd Can transmit up to Can transmit up to min( cwnd, offered window ) Flow control by the sender.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
1 Testing TCP Westwood+ over Transatlantic Links at 10 Gigabit/Second rate Saverio Mascolo Dipartimento di Elettrotecnica ed Elettronica Politecnico di.
Transmission Control Protocol (TCP) TCP Flow Control and Congestion Control CS 60008: Internet Architecture and Protocols Department of CSE, IIT Kharagpur.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Karn’s Algorithm Do not use measured RTT to update SRTT and SDEV Calculate backoff RTO when a retransmission occurs Use backoff RTO for segments until.
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
13. TCP Flow Control and Congestion Control – Part 2
CS450 – Introduction to Networking Lecture 19 – Congestion Control (2)
Chapter 3 outline 3.1 transport-layer services
Chapter 6 TCP Congestion Control
Transport Protocols over Circuits/VCs
TCP - Part II Relates to Lab 5. This is an extended module that covers TCP flow control, congestion control, and error control in TCP.
TCP Performance over a 2.5 Gbit/s Transatlantic Circuit
Chapter 6 TCP Congestion Control
Transport Layer: Congestion Control
TCP flow and congestion control
TCP: Transmission Control Protocol Part II : Protocol Mechanisms
High-Performance Data Transport for Grid Applications
Presentation transcript:

TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot

Context u High Energy Physics (HEP) è LHC model shows data at the experiment will be stored at the rate of 100 – 1500 Mbytes/sec throughout the year. è Many Petabytes per year of stored and processed binary data will be accessed and processed repeatedly by the worldwide collaborations. u New backbone capacities advancing rapidly to 10 Gbps range u TCP limitation è Additive increase and multiplicative decrease policy u Grid DT è Practical approach è Transatlantic testbed r Datatag project : 2.5 Gb/s between CERN and Chicago r Level3 loan : 10 Gb/s between Chicago and Sunnyvale (SLAC & Caltech collaboration) r Powerful End-hosts è Single stream è Fairness r Different RTT r Different MTU

Time to recover from a single loss u TCP reactivity r Time to increase the throughput by 120 Mbit/s is larger than 6 min for a connection between Chicago and CERN. u A single loss is disastrous r A TCP connection reduces its bandwidth use by half after a loss is detected (Multiplicative decrease) r A TCP connection increases slowly its bandwidth use (Additive increase) r TCP throughput is much more sensitive to packet loss in WANs than in LANs 6 min

Responsiveness (I)  The responsiveness  measures how quickly we go back to using the network link at full capacity after experiencing a loss if we assume that the congestion window size is equal to the Bandwidth Delay product when the packet is lost.  C. RTT 2. MSS 2 C : Capacity of the link

Responsiveness (II) CaseC RTT (ms) MSS (Byte) Responsiveness Typical LAN in Mb/s [ 2 ; 20 ] 1460 [ 1.7 ms ; 171 ms ] Typical LAN today 1 Gb/s 2 (worst case) ms Futur LAN 10 Gb/s 2 (worst case) s WAN Geneva Chicago 1 Gb/s min WAN Geneva Sunnyvale 1 Gb/s min WAN Geneva Tokyo 1 Gb/s h 04 min WAN Geneva Sunnyvale 2.5 Gb/s min Future WAN CERN Starlight 10 Gb/s h 32 min Future WAN link CERN Starlight 10 Gb/s (Jumbo Frame) 15 min The Linux kernel 2.4.x implements delayed acknowledgment. Due to delayed acknowledgments, the responsiveness is multiplied by two. Therefore, values above have to be multiplied by two !

Effect of the MTU on the responsiveness Effect of the MTU on a transfer between CERN and Starlight (RTT=117 ms, bandwidth=1 Gb/s) u Larger MTU improves the TCP responsiveness because you increase your cwnd by one MSS each RTT. u Couldn’t reach wire-speed with standard MTU è Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets)

Starlight (Chi) CERN (GVA) MTU and Fairness u Two TCP streams share a 1 Gb/s bottleneck u RTT=117 ms u MTU = 3000 Bytes ; Avg. throughput over a period of 7000s = 243 Mb/s u MTU = 9000 Bytes; Avg. throughput over a period of 7000s = 464 Mb/s u Link utilization : 70,7 % RR GbE Switch Host #1 POS 2.5 Gbps 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck

Sunnyvale Starlight (Chi) CERN (GVA) RTT and Fairness RR GbE Switch Host #1 POS 2.5 Gb/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck R POS 10 Gb/s R 10GE u Two TCP streams share a 1 Gb/s bottleneck u CERN Sunnyvale RTT=181ms ; Avg. throughput over a period of 7000s = 202Mb/s u CERN Starlight RTT=117ms; Avg. throughput over a period of 7000s = 514Mb/s u MTU = 9000 bytes u Link utilization = 71,6 %

Starlight (Chi) CERN (GVA) Effect of buffering on End-hosts u Setup è RTT = 117 ms è Jumbo Frames è Transmit queue of the network device = 100 packets (i.e 900 kBytes) u Area #1 è Cwnd Throughput Throughput < Bandwidth è RTT constant è Throughput = Cwnd / RTT u Area #2 è Cwnd > BDP => Throughput = Bandwidth è RTT increase (proportional to Cwnd) u Link utilization larger than 75% Area #2 Area #1 RR Host GVA Host CHI POS 2.5 Gb/s 1 GE

Buffering space on End-hosts u Link utilization near 100% if : è No congestion into the network è No transmission error è Buffering space = Bandwidth delay product è TCP buffers size = 2 * Bandwidth delay product => Congestion window size always larger than the bandwidth delay product Txqueulen is the transmit queue of the network device

Linux Patch “GRID DT” u Parameter tuning è New parameter to better start a TCP transfer r Set the value of the initial SSTHRESH u Modifications of the TCP algorithms (RFC 2001) è Modification of the well-know congestion avoidance algorithm r During congestion avoidance, for every acknowledgement received, cwnd increases by A * (segment size) * (segment size) / cwnd. It’s equivalent to increase cwnd by A segments each RTT. A is called additive increment è Modification of the slow start algorithm r During slow start, for every acknowledgement received, cwnd increases by M segments. M is called multiplicative increment. è Note: A=1 and M=1 in TCP RENO. u Smaller backoff è Reduce the strong penalty imposed by a loss

Grid DT u Only the sender’s TCP stack has to be modified u Very simple modifications to the TCP/IP stack u Alternative to Multi-streams TCP transfers è Single stream vs Multi streams r it is simpler r startup/shutdown are faster r fewer keys to manage (if it is secure) u Virtual increase of the MTU. u Compensate the effect of delayed ack u Can improve “fairness” r between flows with different RTT r between flows with different MTU

Effect of the RTT on the fairness u Objective: Improve fairness between two TCP streams with different RTT and same MTU u We can adapt the model proposed by Matt. Mathis by taking into account a higher additive increment u Assumptions: è Approximate the packet loss of probability p by assuming that each flow delivers 1/p consecutive packets followed by one drop. è Under these assumptions, the congestion window of the flows oscillate with a period T0. è If the receiver acknowledges every packet, then the congestion window size opens by x (additive increment) packets each RTT. W W/2 (t)2T0T0 If we want each flow to deliver the same number of packets in one period: Relation between t and t’: Number of packets delivered by each stream in one period: CWND evolution under periodic loss

Effect of the RTT on the fairness u TCP Reno performance (see slide #8): è First stream GVA Sunnyvale : RTT = 181 ms ; Avg. throughput over a period of 7000s = 202 Mb/s è Second stream GVA CHI : RTT = 117 ms; Avg. throughput over a period of 7000s = 514 Mb/s è Links utilization 71,6% u Grid DT tuning in order to improve fairness between two TCP streams with different RTT: è First stream GVA Sunnyvale : RTT = 181 ms, Additive increment = A = 7 ; Average throughput = 330 Mb/s è Second stream GVA CHI : RTT = 117 ms, Additive increment = B = 3 ; Average throughput = 388 Mb/s è Links utilization 71.8% Sunnyvale Starlight (CHI) CERN (GVA) RR GbE Switch POS 2.5 Gb/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck R POS 10 Gb/s R 10GE Host #1

Effect of the MTU Starlight (Chi) CERN (GVA) u Two TCP streams share a 1 Gb/s bottleneck u RTT=117 ms u MTU = 3000 Bytes ; Additive increment = 3; Avg. throughput over a period of 6000s = 310 Mb/s u MTU = 9000 Bytes; Additive increment = 1; Avg. throughput over a period of 6000s = 325 Mb/s u Link utilization : 61,5 % RR GbE Switch Host #1 POS 2.5 Gb/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck

Next Work u Taking into account the value of the MTU in the evaluation of the additive increment: è Define a reference: è For example: r Reference: MTU = 9000 bytes => Add. Increment = 1 r MTU = 1500 bytes => Add. Increment = 6 r MTU = 3000 bytes => Add. Increment = 3 u Taking into account the square of the RTT in the evaluation of the additive increment: è Define a reference: è For example: r Reference: RTT=10 ms => Add. Increment = 1 r RTT=100ms => Add. Increment = 100 r RTT=200ms => Add. Increment = 400 u Combining the two formulas above: u Periodic evaluation of the RTT and the MTU. u How to define the references?

Conclusion u To achieve high throughput over high latency/bandwidth network, we need to : è Set the initial slow start threshold (ssthresh) to an appropriate value for the delay and bandwidth of the link. è Avoid loss r By limiting the max cwnd size è Recover fast if loss occurs: r Larger cwnd increment r Smaller window reduction after a loss r Larger packet size (Jumbo Frame) u Is standard MTU the largest bottleneck? u How to define the fairness? è Taking into account the MTU è Taking into account the RTT