Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications.

Slides:



Advertisements
Similar presentations
TERENA Networking Conference, Lyngby, May 2007, R. Hughes-Jones Manchester 1 The Performance of High Throughput Data Flows for e-VLBI in Europe Multiple.
Advertisements

MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
Project Partners Project Collaborators The Council for the Central Laboratory of the Research Councils Funded by EPSRC GR/T04465/01
CALICE, Mar 2007, R. Hughes-Jones Manchester 1 Protocols Working with 10 Gigabit Ethernet Richard Hughes-Jones The University of Manchester
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
PFLDnet, Nara, Japan 2-3 Feb 2006, R. Hughes-Jones Manchester 1 Transport Benchmarking Panel Discussion Richard Hughes-Jones The University of Manchester.
5 Annual e-VLBI Workshop, September 2006, Haystack Observatory R. Hughes-Jones Manchester 1 The Network Transport layer and the Application or TCP/IP.
Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
JIVE VLBI Network Meeting 28 Jan 2004 R. Hughes-Jones Manchester Brief Report on Tests Related to the e-VLBI Project Richard Hughes-Jones The University.
T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester 1 ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchester then.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
GEANT2 Network Performance Workshop, Jan 200, R. Hughes-Jones Manchester 1 TCP/IP Masterclass or So TCP works … but still the users ask: Where is.
Networkshop Apr 2006, R. Hughes-Jones Manchester 1 Bandwidth Challenges or "How fast can we really drive a Network?" Richard Hughes-Jones The University.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,
Slide: 1 Richard Hughes-Jones CHEP2004 Interlaken Sep 04 R. Hughes-Jones Manchester 1 Bringing High-Performance Networking to HEP users Richard Hughes-Jones.
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 VLBI & Protocols vlbi_udp Multiple Flow Tests Richard Hughes-Jones The University.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
GGF4 Toronto Feb 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress Mar 02)
13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones.
EVN-NREN Meeting, Zaandan, 31 Oct 2006, R. Hughes-Jones Manchester 1 FABRIC 4 Gigabit Work & VLBI-UDP Performance and Stability. Richard Hughes-Jones The.
High Data Rate Transfer for HEP and VLBI Ralph Spencer, Richard Hughes-Jones and Simon Casey The University of Manchester Netwrokshop33 March 2005.
EVN-NREN meeting, Schiphol, , A. Szomoru, JIVE Recent eVLBI developments at JIVE Arpad Szomoru Joint Institute for VLBI in Europe.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
Slide: 1 Richard Hughes-Jones e-VLBI Network Meeting 28 Jan 2005 R. Hughes-Jones Manchester 1 TCP/IP Overview & Performance Richard Hughes-Jones The University.
High Rate Internet Data Transfer for eVLBI Ralph Spencer, Richard Hughes-Jones and Simon Casey The University of Manchester All Hands Sept 2005.
Slide: 1 Richard Hughes-Jones Mini-Symposium on Optical Data Networking, August 2005, R. Hughes-Jones Manchester 1 Using TCP/IP on High Bandwidth Long.
Network Performance for ATLAS Real-Time Remote Computing Farm Study Alberta, CERN Cracow, Manchester, NBI MOTIVATION Several experiments, including ATLAS.
ESLEA VLBI Bits&Bytes Workshop, 4-5 May 2006, R. Hughes-Jones Manchester 1 VLBI Data Transfer Tests Recent and Current Work. Richard Hughes-Jones The University.
Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester1 TCP/IP and Other Transports for High Bandwidth Applications TCP/IP on High Performance.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
ESLEA Bits&Bytes, Manchester, 7-8 Dec 2006, R. Hughes-Jones Manchester 1 Protocols DCCP and dccpmon. Richard Hughes-Jones The University of Manchester.
Slide: 1 Richard Hughes-Jones IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester 1 Investigating the Network Performance of Remote Real-Time.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.
Geneva – Kraków network measurements for the ATLAS Real-Time Remote Computing Farm Studies R. Hughes-Jones (Univ. of Manchester), K. Korcyl (IFJ-PAN),
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.
Collaboration Meeting, 4 Jul 2006, R. Hughes-Jones Manchester 1 Collaborations in Networking and Protocols HEP and Radio Astronomy Richard Hughes-Jones.
Networkshop March 2005 Richard Hughes-Jones Manchester Bandwidth Challenge, Land Speed Record, TCP/IP and You.
VLBI_UDP An application for transferring VLBI data via UDP protocol Simon Casey e-VLBI meeting, Haystack 20 Sep 2006.
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester 1 ATLAS TDAQ Networking, Remote Compute Farms & Evaluating SFOs Richard Hughes-Jones The.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
Connect. Communicate. Collaborate Support by GÉANT2 and NRENs for eVLBI in Europe John Chevers, DANTE In collaboration with: Toby Rodwell (DANTE) Steve.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Networks ∙ Services ∙ People Richard-Hughes Jones eduPERT Training Session, Porto A Hands-On Session udpmon for Network Troubleshooting 18/06/2015.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
1 eVLBI Developments at Jodrell Bank Observatory Ralph Spencer, Richard Hughes- Jones, Simon Casey, Paul Burgess, The University of Manchester.
ESLEA VLBI Bits&Bytes Workshop, 31 Aug 2006, R. Hughes-Jones Manchester 1 vlbi_udp Throughput Performance and Stability. Richard Hughes-Jones The University.
Data delayed Real Time Data Transfer for Very Long Baseline Interferometry Simon Casey, Richard Hughes-Jones, Stephen Kershaw, Ralph Spencer, Matt Strong.
PFLDnet, Marina Del Ray, 7-9 Feb 2007, R. Hughes-Jones Manchester 1 How do transport protocols affect applications & The relative importance of different.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
Connect. Communicate. Collaborate 4 Gigabit Onsala - Jodrell Lightpath for e-VLBI Richard Hughes-Jones.
DataGrid WP7 Meeting Jan 2002 R. Hughes-Jones Manchester Initial Performance Measurements Gigabit Ethernet NICs 64 bit PCI Motherboards (Work in progress)
R. Hughes-Jones Manchester
Networking between China and Europe
MB-NG Review High Performance Network Demonstration 21 April 2004
MB – NG SuperJANET4 Development Network
Presentation transcript:

Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications Real Applications on Real Networks Richard Hughes-Jones University of Manchester then “Talks” then look for “Brasov”

Slide: 2 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 2 This is what researchers find when they try to use high performance networks. uReal Applications on Real Networks Disk-2-disk applications on real networks Memory-2-memory tests Comparison of different data moving applications The effect (improvement) of different TCP Stacks Transatlantic disk-2-disk at Gigabit speeds Remote Computing Farms The effect of distance Protocol vs implementation Radio Astronomy e-VLBI Users with data that is random noise ! Thanks for allowing me to use their slides to: Sylvain Ravot CERN, Les Cottrell SLAC, Brian Tierney LBL, Robin Tasker DL Ralph Spencer Jodrell Bank What we might cover!

Slide: 3 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 3 u SuperMicro P4DP8-2G (P4DP6) uDual Xeon u 400/522 MHz Front side bus u 6 PCI PCI-X slots u 4 independent PCI buses 64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X u Dual Gigabit Ethernet u Adaptec AIC-7899W dual channel SCSI u UDMA/100 bus master/EIDE channels data transfer rates of 100 MB/sec burst “Server Quality” Motherboards

Slide: 4 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 4 “Server Quality” Motherboards u Boston/Supermicro H8DAR u Two Dual Core Opterons u 200 MHz DDR Memory Theory BW: 6.4Gbit u HyperTransport u 2 independent PCI buses 133 MHz PCI-X u 2 Gigabit Ethernet u SATA u ( PCI-e )

Slide: 5 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 5 UK Transfers MB-NG and SuperJANET4 Throughput for real users

Slide: 6 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 6 Topology of the MB – NG Network Key Gigabit Ethernet 2.5 Gbit POS Access MPLS Admin. Domains UCL Domain Edge Router Cisco 7609 man01 man03 Boundary Router Cisco 7609 RAL Domain Manchester Domain lon02 man02 ral01 UKERNA Development Network Boundary Router Cisco 7609 ral02 lon03 lon01 HW RAID

Slide: 7 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 7 Topology of the Production Network Key Gigabit Ethernet 2.5 Gbit POS Access 10 Gbit POS man01 RAL Domain Manchester Domain ral01 HW RAID routers switches 3 routers 2 switches

Slide: 8 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 8 iperf Throughput + Web100 u SuperMicro on MB-NG network u HighSpeed TCP u Linespeed 940 Mbit/s u DupACK ? <10 (expect ~400) u BaBar on Production network u Standard TCP u 425 Mbit/s u DupACKs – re-transmits

Slide: 9 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 9 Applications: Throughput Mbit/s u HighSpeed TCP u 2 GByte file RAID5 u SuperMicro + SuperJANET u bbcp u bbftp u Apachie u Gridftp u Previous work used RAID0 (not disk limited)

Slide: 10 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 10 bbftp: What else is going on? u Scalable TCP u BaBar + SuperJANET u SuperMicro + SuperJANET u Congestion window – duplicate ACK u Variation not TCP related? Disk speed / bus transfer Application

Slide: 11 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 11 bbftp: Host & Network Effects u 2 Gbyte file RAID5 Disks: 1200 Mbit/s read 600 Mbit/s write u Scalable TCP u BaBar + SuperJANET Instantaneous Mbit/s u SuperMicro + SuperJANET Instantaneous Mbit/s for 6 sec Then Mbit/s u SuperMicro + MB-NG Instantaneous Mbit/s for 1.3 sec Then Mbit/s

Slide: 12 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 12 Average Transfer Rates Mbit/s AppTCP StackSuperMicro on MB-NG SuperMicro on SuperJANET4 BaBar on SuperJANET4 SC2004 on UKLight IperfStandard HighSpeed Scalable bbcpStandard HighSpeed Scalable bbftpStandard HighSpeed Scalable apacheStandard HighSpeed Scalable GridftpStandard HighSpeed320 Scalable335 New stacks give more throughput Rate decreases

Slide: 13 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 13 Transatlantic Disk to Disk Transfers With UKLight SuperComputing 2004

Slide: 14 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 14 SC2004 UKLIGHT Overview MB-NG 7600 OSR Manchester ULCC UKLight UCL HEP UCL network K2 Ci Chicago Starlight Amsterdam SC2004 Caltech Booth UltraLight IP SLAC Booth Cisco 6509 UKLight 10G Four 1GE channels UKLight 10G Surfnet/ EuroLink 10G Two 1GE channels NLR Lambda NLR-PITT-STAR-10GE-16 K2 Ci Caltech 7600

Slide: 15 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 15 uSCINet Collaboration at SC2004 uSetting up the BW Bunker uThe BW Challenge at the SLAC Booth uWorking with S2io, Sun, Chelsio

Slide: 16 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 16 Transatlantic Ethernet: TCP Throughput Tests uSupermicro X5DPE-G2 PCs uDual 2.9 GHz Xenon CPU FSB 533 MHz u1500 byte MTU u2.6.6 Linux Kernel uMemory-memory TCP throughput uStandard TCP uWire rate throughput of 940 Mbit/s uFirst 10 sec uWork in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing

Slide: 17 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 17 SC2004 Disk-Disk bbftp ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s

Slide: 18 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 18 Network & Disk Interactions (work in progress) uHosts: Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size uMeasure memory to RAID0 transfer rates with & without UDP traffic Disk write 1735 Mbit/s Disk write MTU UDP 1218 Mbit/s Drop of 30% Disk write MTU UDP 1400 Mbit/s Drop of 19% % CPU kernel mode

Slide: 19 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 19 Remote Computing Farms in the ATLAS TDAQ Experiment

Slide: 20 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 20 Remote Computing Concepts ROB L2PU SFI PF Local Event Processing Farms ATLAS Detectors – Level 1 Trigger SFOs Mass storage Experimental Area CERN B513 Copenhagen Edmonton Krakow Manchester PF Remote Event Processing Farms PF lightpaths PF Data Collection Network Back End Network GÉANT Switch Level 2 Trigger Event Builders ~PByte/sec 320 MByte/sec

Slide: 21 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 21 ATLAS Remote Farms – Network Connectivity

Slide: 22 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 22 ATLAS Application Protocol u Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes u Processing of event u Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation u tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. Send OK Send event data Request event ●●● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter Daemon EFD SFI and SFO

Slide: 23 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 23 Using Web100 TCP Stack Instrumentation to analyse application protocol - tcpmon

Slide: 24 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 24 tcpmon: TCP Activity Manc-CERN Req-Resp Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms TCP Congestion window gets re-set on each Request TCP stack implementation detail to reduce Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms Transfer achievable throughput 120 Mbit/s

Slide: 25 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 25 tcpmon: TCP Activity Manc-cern Req-Resp TCP stack tuned Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 19 rtt or ~ 380 ms TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) Transfer achievable throughput grows to 800 Mbit/s

Slide: 26 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 26 Round trip time 150 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 11 rtt or ~ 1.67 s tcpmon: TCP Activity Alberta-CERN Req-Resp TCP stack tuned TCP Congestion window in slow start to ~1.8s then congestion avoidance Response in 2 rtt after ~2.5s Rate 2.2/s (with 50ms wait) Transfer achievable throughput grows slowly from 250 to 800 Mbit/s

Slide: 27 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 27 Time Series of Request-Response Latency Alberta – CERN Round trip time 150 ms 1 Mbyte of data returned Stable for ~150s at 300ms Falls to 160ms with ~80 μs variation Manchester – CERN Round trip time 20 ms 1 Mbyte of data returned Stable for ~18s at ~42.5ms Then alternate points 29 & 42.5 ms

Slide: 28 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 28 Using the Trigger DAQ Application

Slide: 29 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 29 Time Series of T/DAQ event rate Manchester – CERN Round trip time 20 ms 1 Mbyte of data returned 3 nodes: 1 GEthernet + two 100Mbit 2 nodes: two 100Mbit nodes 1node: one 100Mbit node Event Rate: Use tcpmon transfer time of ~42.5ms Add the time to return the data 95ms Expected rate 10.5/s Observe ~6/s for the gigabit node Reason: TCP buffers could not be set large enough in T/DAQ application

Slide: 30 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 30 Tcpdump of the Trigger DAQ Application

Slide: 31 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 31 tcpdump of the T/DAQ dataflow at SFI (1) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI Incoming event request Followed by ACK N 1448 byte packets SFI sends event Limited by TCP receive buffer Time 115 ms (~4 ev/s) When TCP ACKs arrive more data is sent. ●●●●●●

Slide: 32 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 32 Tcpdump of TCP Slowstart at SFI (2) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI First event request N 1448 byte packets SFI sends event Limited by TCP Slowstart Time 320 ms When ACKs arrive more data sent.

Slide: 33 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 33 tcpdump of the T/DAQ dataflow for SFI &SFO Cern-Manchester – another test run 1.0 Mbyte event Remote EFD requests events from SFI Remote EFD sending computation back to SFO Links closed by Application Link setup & TCP slowstart

Slide: 34 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 34 Some Conclusions uThe TCP protocol dynamics strongly influence the behaviour of the Application. uCare is required with the Application design eg use of timeouts. uWith the correct TCP buffer sizes It is not throughput but the round-trip nature of the application protocol that determines performance. Requesting the 1-2Mbytes of data takes 1 or 2 round trips TCP Slowstart (the opening of Cwnd) considerably lengthens time for the first block of data. Implementation “improvements” (Cwnd reduction) kill performance! uWhen the TCP buffer sizes are too small (default) The amount of data sent is limited on each rtt Data is send and arrives in bursts It takes many round trips to send 1 or 2 Mbytes uThe End Hosts themselves CPU power is required for the TCP/IP stack as well and the application Packets can be lost in the IP stack due to lack of processing power uAlthough the application is ATLAS-specific, the network interactions is applicable to other areas including: Remote iSCSI Remote database accesses Real-time Grid Computing – eg Real-Time Interactive Medical Image processing

Slide: 35 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 35 Radio Astronomy e-VLBI

Slide: 36 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 36 Radio Astronomy with help from Ralph Spencer Jodrell Bank uThe study of celestial objects at 1m wavelength.  Sensitivity for continuum sources B=bandwidth,  integration time. uHigh resolution achieved by interferometers. Some radio emitting X-ray binary stars in our own galaxy: GRS MERLIN SS433 MERLIN and European VLBI Cygnus X-1 VLBA

Slide: 37 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 37 Earth-Rotation Synthesis and Fringes Need ~ 12 hours for full synthesis, not necessarily collecting data for all that time. NB Trade-off between B and  for sensitivity. Telescope data correlated in pairs: N(N-1)/2 baselines Merlin u-v coverage Fringes Obtained with the correct signal phase

Slide: 38 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 38 The European VLBI Network: EVN uDetailed radio imaging uses antenna networks over 100s- 1000s km uAt faintest levels, sky teems with galaxies being formed uRadio penetrates cosmic dust - see process clearly uTelescopes in place … uDisk recording at 512Mb/s uReal-time connection allows greater: response reliability sensitivity

Slide: 39 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 39 Westerbork Netherlands Dedicated Gbit link EVN-NREN Onsala Sweden Gbit link Jodrell Bank UK Dwingeloo DWDM link Cambridge UK MERLIN Medicina Italy Chalmers University of Technology, Gothenburg Torun Poland Gbit link

Slide: 40 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 40 Throughput vs packet spacing Manchester: 2.0G Hz Xeon Dwingeloo: 1.2 GHz PIII Near wire rate, 950 Mbps NB record stands at 6.6 Gbps SLAC-CERN Packet loss CPU Kernel Load sender CPU Kernel Load receiver 4 th Year project Adam Mathews Steve O’Toole UDP Throughput Manchester-Dwingeloo (Nov 2003)

Slide: 41 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 41 Packet loss distribution: Cumulative distribution Cumulative distribution of packet loss, each bin is 12  sec wide Long range effects in the data? Poisson

Slide: 42 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester th January 2005 UDP Tests Simon Casey (PhD project) Between JBO and JIVE in Dwingeloo, using production network Period of high packet loss (3%):

Slide: 43 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 43 The GÉANT2 Launch June 2005

Slide: 44 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 44 Jodrell Bank UK Dwingeloo DWDM link Medicina Italy Torun Poland e-VLBI at the GÉANT2 Launch Jun 2005

Slide: 45 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 45 e-VLBI UDP Data Streams

Slide: 46 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 46 UDP Performance: 3 Flows on GÉANT uThroughput: 5 Hour run uJodrell: JIVE 2.0 GHz dual Xeon – 2.4 GHz dual Xeon Mbit/s uMedicina (Bologna): JIVE 800 MHz PIII – mark GHz PIII 330 Mbit/s limited by sending PC uTorun: JIVE 2.4 GHz dual Xeon – mark GHz PIII Mbit/s limited by security policing (>400Mbit/s  20 Mbit/s) ? uThroughput: 50 min period uPeriod is ~17 min

Slide: 47 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 47 UDP Performance: 3 Flows on GÉANT uPacket Loss & Re-ordering uJodrell: 2.0 GHz Xeon Loss 0 – 12% Reordering significant uMedicina: 800 MHz PIII Loss ~6% Reordering in-significant uTorun: 2.4 GHz Xeon Loss % Reordering in-significant

Slide: 48 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester Hour Flows on UKLight Jodrell – JIVE, 26 June 2005 uThroughput: uJodrell: JIVE 2.4 GHz dual Xeon – 2.4 GHz dual Xeon Mbit/s uTraffic through SURFnet uPacket Loss Only 3 groups with lost packets each No packets lost the rest of the time uPacket re-ordering None

Slide: 49 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 49 uHost is critical: Motherboards NICs, RAID controllers and Disks matter The NICs should be well designed: NIC should use 64 bit 133 MHz PCI-X (66 MHz PCI can be OK) NIC/drivers: CSR access / Clean buffer management / Good interrupt handling Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses 32 bit 33 MHz is too slow for Gigabit rates 64 bit 33 MHz > 80% used Choose a modern high throughput RAID controller Consider SW RAID0 of RAID5 HW controllers uNeed plenty of CPU power for sustained 1 Gbit/s transfers uPacket loss is a killer Check on campus links & equipment, and access links to backbones uNew stacks are stable give better response & performance Still need to set the tcp buffer sizes ! Check other kernel settings e.g. window-scale, uApplication architecture & implementation is also important uInteraction between HW, protocol processing, and disk sub-system complex Summary, Conclusions MB - NG

Slide: 50 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 50 More Information Some URLs uReal-Time Remote Farm site uUKLight web site: uDataTAG project web site: uUDPmon / TCPmon kit + writeup: (Software & Tools) uMotherboard and NIC Tests: & “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue (Publications) uTCP tuning information may be found at: & uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing (Publications) uPFLDnet uDante PERT

Slide: 51 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 51 Any Questions?

Slide: 52 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 52 Backup Slides

Slide: 53 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 53 uUDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program uLatency uRound trip times measured using Request-Response UDP frames uLatency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies uHistograms of ‘singleton’ measurements uTells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Latency Measurements

Slide: 54 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 54 Throughput Measurements uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals n bytes Number of packets Wait time time  Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver

Slide: 55 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 55 PCI Bus & Gigabit Ethernet Activity uPCI Activity uLogic Analyzer with PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display PCI bus Possible Bottlenecks

Slide: 56 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 56 End Hosts & NICs CERN-nat-Manc. Request-Response Latency Throughput Packet Loss Re-Order uUse UDP packets to characterise Host, NIC & Network SuperMicro P4DP8 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 64 bit 66 MHz PCI / 133 MHz PCI-X bus uThe network can sustain 1Gbps of UDP traffic uThe average server can loose smaller packets uPacket loss caused by lack of power in the PC receiving the traffic uOut of order packets due to WAN routers uLightpaths look like extended LANS have no re-ordering

Slide: 57 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 57 TCP (Reno) – Details uTime for TCP to recover its throughput from 1 lost packet given by: u for rtt of ~200 ms: 2 min UK 6 ms Europe 20 ms USA 150 ms

Slide: 58 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 58 Network & Disk Interactions uDisk Write mem-disk: 1735 Mbit/s Tends to be in 1 die uDisk Write + UDP 1500 mem-disk : 1218 Mbit/s Both dies at ~80% uDisk Write + CPU  mem mem-disk : 1341 Mbit/s 1 CPU at ~60% other 20% Large user mode usage Below Cut = hi BW Hi BW = die1 used uDisk Write + CPUload mem-disk : 1334 Mbit/s 1 CPU at ~60% other 20% All CPUs saturated in user mode Total CPU load Kernel CPU load