NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge,

Slides:



Advertisements
Similar presentations
Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Advertisements

Michele Pagano – A Survey on TCP Performance Evaluation and Modeling 1 Department of Information Engineering University of Pisa Network Telecomunication.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
FAST TCP Anwis Das Ajay Gulati Slides adapted from : IETF presentation slides Link:
1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Presentation by Joe Szymanski For Upper Layer Protocols May 18, 2015.
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
Congestion Control An Overview -Jyothi Guntaka. Congestion  What is congestion ?  The aggregate demand for network resources exceeds the available capacity.
Efficient Network Protocols for Data-Intensive Worldwide Grids Seminar at JAIST, Japan 3 March 2003 T. Kelly, University of Cambridge, UK S. Ravot, Caltech,
Congestion control in data centers
Transport Layer 3-1 Fast Retransmit r time-out period often relatively long: m long delay before resending lost packet r detect lost segments via duplicate.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
Explicit Congestion Notification ECN Tilo Hamann Technical University Hamburg-Harburg, Germany.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
MULTIMEDIA TRAFFIC MANAGEMENT ON TCP/IP OVER ATM-UBR By Dr. ISHTIAQ AHMED CH.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Data Communication and Networks
Congestion Control for High Bandwidth-Delay Product Environments Dina Katabi Mark Handley Charlie Rohrs.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
Introduction 1 Lecture 14 Transport Layer (Congestion Control) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer Science.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
3: Transport Layer3b-1 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle”
Transport Layer 4 2: Transport Layer 4.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
1 Robust Transport Protocol for Dynamic High-Speed Networks: enhancing XCP approach Dino M. Lopez Pacheco INRIA RESO/LIP, ENS of Lyon, France Congduc Pham.
Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control!
Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
BIC Control for Fast Long-Distance Networks paper written by Injong Rhee, Lisong Xu & Khaled Harfoush (2004) Presented by Jonathan di Costanzo (2009/02/18)
Data Transport Challenges for e-VLBI Julianne S.O. Sansa* * With Arpad Szomoru, Thijs van der Hulst & Mike Garret.
Transport Layer3-1 TCP throughput r What’s the average throughout of TCP as a function of window size and RTT? m Ignore slow start r Let W be the window.
TERENA Networking Conference, Zagreb, Croatia, 21 May 2003 High-Performance Data Transport for Grid Applications T. Kelly, University of Cambridge, UK.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
Computer Networking Lecture 18 – More TCP & Congestion Control.
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer 3- Midterm score distribution. Transport Layer 3- TCP congestion control: additive increase, multiplicative decrease Approach: increase.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
Data Transport Challenges for e-VLBI Julianne S.O. Sansa* * With Arpad Szomoru, Thijs van der Hulst & Mike Garret.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
XCP: eXplicit Control Protocol Dina Katabi MIT Lab for Computer Science
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
1 Testing TCP Westwood+ over Transatlantic Links at 10 Gigabit/Second rate Saverio Mascolo Dipartimento di Elettrotecnica ed Elettronica Politecnico di.
Peer-to-Peer Networks 13 Internet – The Underlay Network
An Analysis of AIMD Algorithm with Decreasing Increases Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data Mining.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Approaches towards congestion control
COMP 431 Internet Services & Protocols
Transport Protocols over Circuits/VCs
Internet Congestion Control Research Group
TCP Performance over a 2.5 Gbit/s Transatlantic Circuit
Presentation at University of Twente, The Netherlands
CS640: Introduction to Computer Networks
High-Performance Data Transport for Grid Applications
Review of Internet Protocols Transport Layer
Presentation transcript:

NORDUnet 2003, Reykjavik, Iceland, 26 August 2003 High-Performance Transport Protocols for Data-Intensive World-Wide Grids T. Kelly, University of Cambridge, UK S. Ravot, Caltech, USA J.P. Martin-Flatin, CERN, Switzerland

NORDUnet 2003, Reykjavik, Iceland, 26 August Outline Overview of DataTAG project Problems with TCP in data-intensive Grids Problem statement Analysis and characterization Solutions: Scalable TCP GridDT Future Work

NORDUnet 2003, Reykjavik, Iceland, 26 August Overview of DataTAG Project

NORDUnet 2003, Reykjavik, Iceland, 26 August Member Organizations

NORDUnet 2003, Reykjavik, Iceland, 26 August Project Objectives Build a testbed to experiment with massive file transfers (TBytes) across the Atlantic Provide high-performance protocols for gigabit networks underlying data-intensive Grids Guarantee interoperability between major HEP Grid projects in Europe and the USA

NORDUnet 2003, Reykjavik, Iceland, 26 August DataTAG Testbed r04gva Cisco7606 r04chi Cisco7609 stm16 (DTag) r05chi-JuniperM10 r06chi-Alcatel7770 r05gva-JuniperM10 r06gva Alcatel7770 SURF NET stm16(Colt) backup+projects s01gva Extreme S1i w01gva w02gva w03gva w04gva w05gva w06gva w20gva v02gva v03gva 7x w01chi w02chi v10chi v11chi v12chi v13chi s01chi Extreme S5i 8x VTHD/INRIA stm16 (FranceTelecom) ChicagoGeneva 2x ONS15454 Alcatel 1670 SURFNET CESNET ONS15454 stm64 (GC) CNAF GEANT cernh7 w03chi w04chi w05chi 3x 2x w02chi 1000baseSX SDH/Sonet 1000baseT 10GbaseLX w03 w06chi w01bol CCC tunnel stm64 Edoardo Martelli

NORDUnet 2003, Reykjavik, Iceland, 26 August Network Research Activities Enhance performance of network protocols for massive file transfers: Data-transport layer: TCP, UDP, SCTP QoS: LBE (Scavenger) Equivalent DiffServ (EDS) Bandwidth reservation: AAA-based bandwidth on demand Lightpaths managed as Grid resources Monitoring Rest of this talk

NORDUnet 2003, Reykjavik, Iceland, 26 August Problems with TCP in Data-Intensive Grids

NORDUnet 2003, Reykjavik, Iceland, 26 August Problem Statement End-user’s perspective: Using TCP as the data-transport protocol for Grids leads to a poor bandwidth utilization in fast WANs Network protocol designer’s perspective: TCP is inefficient in high bandwidth*delay networks because: few TCP implementations have been tuned for gigabit WANs TCP was not designed with gigabit WANs in mind

NORDUnet 2003, Reykjavik, Iceland, 26 August Design Problems (1/2) TCP’s congestion control algorithm (AIMD) is not suited to gigabit networks Due to TCP’s limited feedback mechanisms, line errors are interpreted as congestion: Bandwidth utilization is reduced when it shouldn’t RFC 2581 (which gives the formula for increasing cwnd) “forgot” delayed ACKs: Loss recovery time twice as long as it should be

NORDUnet 2003, Reykjavik, Iceland, 26 August Design Problems (2/2) TCP requires that ACKs be sent at most every second segment: Causes ACK bursts Bursts are difficult to handle by kernel and NIC

NORDUnet 2003, Reykjavik, Iceland, 26 August AIMD (1/2) Van Jacobson, SIGCOMM 1988 Congestion avoidance algorithm: For each ACK in an RTT without loss, increase: For each window experiencing loss, decrease: Slow-start algorithm: Increase by one MSS per ACK until ssthresh

NORDUnet 2003, Reykjavik, Iceland, 26 August AIMD (2/2) Additive Increase: A TCP connection increases slowly its bandwidth utilization in the absence of loss: forever, unless we run out of send/receive buffers or detect a packet loss TCP is greedy: no attempt to reach a stationary state Multiplicative Decrease: A TCP connection reduces its bandwidth utilization drastically whenever a packet loss is detected: assumption: line errors are negligible, hence packet loss means congestion

NORDUnet 2003, Reykjavik, Iceland, 26 August Congestion Window (cwnd) buffering beyond bandwidth*delay congestionavoidance slowstart loss

NORDUnet 2003, Reykjavik, Iceland, 26 August Disastrous Effect of Packet Loss on TCP in Fast WANs (1/2) AIMD C=1 Gbit/s MSS=1,460 Bytes

NORDUnet 2003, Reykjavik, Iceland, 26 August Disastrous Effect of Packet Loss on TCP in Fast WANs (2/2) Long time to recover from a single loss: TCP should react to congestion rather than packet loss: line errors and transient faults in equipment are no longer negligible in fast WANs TCP should recover quicker from a loss TCP is particularly sensitive to packet loss in fast WANs (i.e., when both cwnd and RTT are large)

NORDUnet 2003, Reykjavik, Iceland, 26 August Characterization of the Problem (1/2) The responsiveness  measures how quickly we go back to using the network link at full capacity after experiencing a loss (i.e., loss recovery time if loss occurs when bandwidth utilization = network link capacity)  2. inc C. RTT 2

NORDUnet 2003, Reykjavik, Iceland, 26 August Characterization of the Problem (2/2) CapacityRTT# incResponsiveness 9.6 kbit/s (typ. WAN in 1988) max: 40 ms10.6 ms 10 Mbit/s (typ. LAN in 1988) max: 20 ms8~150 ms 100 Mbit/s (typ. LAN in 2003) max: 5 ms20~100 ms 622 Mbit/s120 ms~2,900~6 min 2.5 Gbit/s120 ms~11,600~23 min 10 Gbit/s120 ms~46,200~1h 30min inc size = MSS = 1,460 Bytes

NORDUnet 2003, Reykjavik, Iceland, 26 August Congestion vs. Line Errors Throughput Required Bit Loss Rate Required Packet Loss Rate 10 Mbit/s Mbit/s Gbit/s Gbit/s At gigabit speed, the loss rate required for packet loss to be ascribed only to congestion is unrealistic with AIMD RTT=120 ms, MTU=1,500 Bytes, AIMD

NORDUnet 2003, Reykjavik, Iceland, 26 August Single TCP Stream Performance under Periodic Losses Loss rate =0.01%: è LAN BW utilization= 99% è WAN BW utilization=1.2% MSS=1,460 Bytes

NORDUnet 2003, Reykjavik, Iceland, 26 August Solutions

NORDUnet 2003, Reykjavik, Iceland, 26 August What Can We Do? To achieve higher throughputs over high bandwidth*delay networks, we can: Change AIMD algorithm Use larger MTUs Change the initial setting of ssthresh Avoid losses in end hosts Two proposals: Kelly: Scalable TCP Ravot: GridDT

NORDUnet 2003, Reykjavik, Iceland, 26 August Delayed ACKs with AIMD RFC 2581 (spec. defining TCP congestion control AIMD algorithm) erred: Implicit assumption: one ACK per packet In reality: one ACK every second packet with delayed ACKs Responsiveness multiplied by two: Makes a bad situation worse in fast WANs Problem fixed by ABC in RFC 3465 (Feb 2003) Not implemented in Linux

NORDUnet 2003, Reykjavik, Iceland, 26 August Delayed ACKs with AIMD and ABC

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP: Algorithm For cwnd>lwnd, replace AIMD with new algorithm: for each ACK in an RTT without loss: cwnd i+1 = cwnd i + a for each window experiencing loss: cwnd i+1 = cwnd i – (b x cwnd i ) Kelly’s proposal during internship at CERN: (lwnd,a,b) = (16, 0.01, 0.125) Trade-off between fairness, stability, variance and convergence

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP: lwnd

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP: Advantages Responsiveness is independent of capacity Responsiveness improves dramatically for gigabit networks

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP: Responsiveness Independent of Capacity

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP: Improved Responsiveness Responsiveness for RTT=200 ms and MSS=1,460 Bytes: Scalable TCP: ~3 s AIMD: ~3 min at 100 Mbit/s ~1h 10min at 2.5 Gbit/s ~4h 45min at 10 Gbit/s Patch available for Linux kernel :

NORDUnet 2003, Reykjavik, Iceland, 26 August Scalable TCP vs. AIMD: Benchmarking Number of flows TCP TCP + new dev driver Scalable TCP Bulk throughput tests with C=2.5 Gbit/s. Flows transfer 2 GBytes and start again for 20 min.

NORDUnet 2003, Reykjavik, Iceland, 26 August GridDT: Algorithm Congestion avoidance algorithm: For each ACK in an RTT without loss, increase: By modifying A dynamically according to RTT, GridDT guarantees fairness among TCP connections:

NORDUnet 2003, Reykjavik, Iceland, 26 August AIMD: RTT Bias SunnyvaleStarLightCERN RR GE Switch Host #1 POS 2.5 Gbit/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck R POS 10 Gbit/s R 10GE Two TCP streams share a 1 Gbit/s bottleneck CERN-Sunnyvale: RTT=181ms. Avg. throughput over a period of 7,000s = 202 Mbit/s CERN-StarLight: RTT=117ms. Avg. throughput over a period of 7,000s = 514 Mbit/s MTU = 9,000 Bytes. Link utilization = 72%

NORDUnet 2003, Reykjavik, Iceland, 26 August GridDT Fairer than AIMD CERN-Sunnyvale: RTT = 181 ms. Additive inc. A1 = 7. Avg. throughput = 330 Mbit/s CERN-StarLight: RTT = 117 ms. Additive inc. A2 = 3. Avg. throughput = 388 Mbit/s MTU = 9,000 Bytes. Link utilization 72% SunnyvaleStarLightCERN RR GE Switch POS 2.5 Gbit/s 1 GE Host #2 Host #1 Host #2 1 GE Bottleneck R POS 10 Gbit/s R 10GE Host #1 A2=3 RTT=117ms A1=7 RTT=181ms

NORDUnet 2003, Reykjavik, Iceland, 26 August Measurements with Different MTUs (1/2) Mathis advocates the use of large MTUs: we tested standard Ethernet MTU and Jumbo frames Experimental environment: Linux SysKonnect device driver 6.12 Traffic generated by iperf average throughout over the last 5 seconds Single TCP stream RTT = 119 ms Duration of each test: 2 hours Transfers from Chicago to Geneva MTUs: POS MTU set to 9180 Max MTU on the NIC of a PC running Linux : 9000

NORDUnet 2003, Reykjavik, Iceland, 26 August Measurements with Different MTUs (2/2) TCP max: 990 Mbit/s (MTU=9000) TCP max: 940 Mbit/s (MTU=1500)

NORDUnet 2003, Reykjavik, Iceland, 26 August Related Work Floyd: High-Speed TCP Low: Fast TCP Katabi: XCP Web100 and Net100 projects PFLDnet 2003 workshop:

NORDUnet 2003, Reykjavik, Iceland, 26 August Research Directions Compare performance of TCP variants Investigate proposal by Shorten, Leith, Foy and Kildu More stringent definition of congestion: Lose more than 1 packet per RTT ACK more than two packets in one go: Decrease ACK bursts SCTP vs. TCP