GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations.

Slides:



Advertisements
Similar presentations
EE384Y: Packet Switch Architectures
Advertisements

Chapter 13: I/O Systems I/O Hardware Application I/O Interface
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 11 Ethernet Evolution: Fast and Gigabit Ethernet.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Lee, Seungjun ( ) Korea Advanced Institute of Science and Technology August 28, 2003 APAN Measurement WG meeting eTOP End-to-end.
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Helping TCP Work at Gbps Cheng Jin the FAST project at Caltech
ESLEA and HEPs Work on UKLight Network. ESLEA Exploitation of Switched Lightpaths in E- sciences Applications Exploitation of Switched Lightpaths in E-
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Chapter 5 Input/Output 5.1 Principles of I/O hardware
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
E-Science All Hands Meeting 1-4 Sep 03 R. Hughes-Jones Manchester 1 High Bandwidth High Throughput in the MB-NG & DataTAG Projects Richard Hughes-Jones,
Streaming Video over the Internet
1 Chapter One Introduction to Computer Networks and Data Communications.
TCP transfers over high latency/bandwidth network & Grid TCP Sylvain Ravot
Chapter 1: Introduction to Scaling Networks
Local Area Networks - Internetworking
I/O Systems.
ICFA SCIC Meeting CERN Dec 01 R. Hughes-Jones Manchester Network Connectivity and Projects – a Perspective from the UK Richard Hughes-Jones PPNCG SuperJANET4.
MB - NG MB-NG Meeting UCL 1 Nov 02 R. Hughes-Jones Manchester 1 Status of Task 2 Traffic Generation and Measurement.
MB-NG Review – 24 April 2004 Richard Hughes-Jones The University of Manchester, UK MB-NG Review High Performance Network Demonstration 21 April 2004.
IGRID2002 Radio Astronomy VLBI Demo. uWeb based demonstration sending VLBI data A controlled stream of UDP packets Mbit/s on the production network.
MB - NG MB-NG Technical Meeting 03 May 02 R. Hughes-Jones Manchester 1 Task2 Traffic Generation and Measurement Definitions Pass-1.
MB - NG MB-NG Jan 2002 R. Hughes-Jones Manchester Some Edge (Bbone) Router requirements Connect to the test systems in the IP domain. Accept marked packets.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 2 Networking Fundamentals.
PSSA Preparation.
CS61C L13 I/O © UC Regents 1 CS 161 Chapter 8 - I/O Lecture 17.
DataTAG CERN Oct 2002 R. Hughes-Jones Manchester Initial Performance Measurements With DataTAG PCs Gigabit Ethernet NICs (Work in progress Oct 02)
JIVE VLBI Network Meeting 15 Jan 2003 R. Hughes-Jones Manchester The EVN-NREN Project Richard Hughes-Jones The University of Manchester.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 End-2-End Network Monitoring What do we do ? What do we use it for? Richard Hughes-Jones Many people.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of.
CdL was here DataTAG/WP7 Amsterdam June 2002 R. Hughes-Jones Manchester 1 EU DataGrid - Network Monitoring Richard Hughes-Jones, University of Manchester.
Slide: 1 Richard Hughes-Jones PFLDnet2005 Lyon Feb 05 R. Hughes-Jones Manchester 1 Investigating the interaction between high-performance network and disk.
DataGrid WP7 Meeting CERN April 2002 R. Hughes-Jones Manchester Some Measurements on the SuperJANET 4 Production Network (UK Work in progress)
JIVE VLBI Network Meeting 28 Jan 2004 R. Hughes-Jones Manchester Brief Report on Tests Related to the e-VLBI Project Richard Hughes-Jones The University.
CALICE UCL, 20 Feb 2006, R. Hughes-Jones Manchester 1 10 Gigabit Ethernet Test Lab PCI-X Motherboards Related work & Initial tests Richard Hughes-Jones.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
EDG WP7 Networking Demonstration uDemonstration sending HEP data CERN to SARA Multiple streams of TCP packets Tuned TCP parameters: ifconfig eth0 txqueuelen.
PFLDNet Argonne Feb 2004 R. Hughes-Jones Manchester 1 UDP Performance and PCI-X Activity of the Intel 10 Gigabit Ethernet Adapter on: HP rx2600 Dual Itanium.
Slide: 1 Richard Hughes-Jones CHEP2004 Interlaken Sep 04 R. Hughes-Jones Manchester 1 Bringing High-Performance Networking to HEP users Richard Hughes-Jones.
ESLEA Bedfont Lakes Dec 04 Richard Hughes-Jones Network Measurement & Characterisation and the Challenge of SuperComputing SC200x.
CdL was here DataTAG CERN Sep 2002 R. Hughes-Jones Manchester 1 European Topology: NRNs & Geant SuperJANET4 CERN UvA Manc SURFnet RAL.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
13th-14th July 2004 University College London End-user systems: NICs, MotherBoards, TCP Stacks & Applications Richard Hughes-Jones.
Technology for Using High Performance Networks or How to Make Your Network Go Faster…. Robin Tasker UK Light Town Meeting 9 September.
Slide: 1 Richard Hughes-Jones e-VLBI Network Meeting 28 Jan 2005 R. Hughes-Jones Manchester 1 TCP/IP Overview & Performance Richard Hughes-Jones The University.
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
CAIDA Bandwidth Estimation Meeting San Diego June 2002 R. Hughes-Jones Manchester UDPmon and TCPstream Tools to understand Network Performance Richard.
PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit Ethernet NICs & Server Quality Motherboards Richard Hughes-Jones.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
R. Hughes-Jones Manchester
Networking between China and Europe
Networking for grid Network capacity Network throughput
MB-NG Review High Performance Network Demonstration 21 April 2004
MB – NG SuperJANET4 Development Network
High-Performance Data Transport for Grid Applications
Presentation transcript:

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 1 High Performance Networking for ALL Members of GridPP are in many Network collaborations including: MB - NG Close links with: SLAC UKERNA, SURFNET and other NRNs Dante Internet2 Starlight, Netherlight GGF Ripe Industry …

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 2 Network Monitoring [1] Architecture DataGrid WP7 code extended by Gareth Manc Technology transfer to UK e-Science Developed by Mark Lees DL Fed back into DataGrid by Gareth Links to: GGF NM-WG, Dante, Internet2 Characteristics, Schema & web services Success

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 3 Network Monitoring [2] 24 Jan to 4 Feb 04 TCP iperf RAL to HEP Only 2 sites >80 Mbit/s 24 Jan to 4 Feb 04 TCP iperf DL to HEP HELP !

RIPE-47, Amsterdam, 29 January 2004 High bandwidth, Long distance…. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK DataTAG is a project sponsored by the European Commission - EU Grant IST

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 5 Throughput… Whats the problem? One Terabyte of data transferred in less than an hour On February , the transatlantic DataTAG network was extended, i.e. CERN - Chicago - Sunnyvale (>10000 km). For the first time, a terabyte of data was transferred across the Atlantic in less than one hour using a single TCP (Reno) stream. The transfer was accomplished from Sunnyvale to Geneva at a rate of 2.38 Gbits/s

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 6 Internet2 Land Speed Record On October , DataTAG set a new Internet2 Land Speed Record by transferring 1.1 Terabytes of data in less than 30 minutes from Geneva to Chicago across the DataTAG provision, corresponding to an average rate of 5.44 Gbits/s using a single TCP (Reno) stream

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 7 So how did we do that? Management of the End-to-End Connection Memory-to-Memory transfer; no disk system involved Processor speed and system bus characteristics TCP Configuration – window size and frame size (MTU) Network Interface Card and associated driver and their configuration End-to-End no loss environment from CERN to Sunnyvale! At least a 2.5 Gbits/s capacity pipe on the end-to-end path A single TCP connection on the end-to-end path No real user application Thats to say - not the usual User experience!

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 8 Realistically – whats the problem & why do network research? End System Issues Network Interface Card and Driver and their configuration TCP and its configuration Operating System and its configuration Disk System Processor speed Bus speed and capability Network Infrastructure Issues Obsolete network equipment Configured bandwidth restrictions Topology Security restrictions (e.g., firewalls) Sub-optimal routing Transport Protocols Network Capacity and the influence of Others! Many, many TCP connections Mice and Elephants on the path Congestion

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 9 End Hosts: Buses, NICs and Drivers Latency Throughput Bus Activity Use UDP packets to characterise Intel PRO/10GbE Server Adaptor SuperMicro P4DP8-G2 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 133 MHz PCI-X bus

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 10 End Hosts: Understanding NIC Drivers Linux driver basics – TX Application system call Encapsulation in UDP/TCP and IP headers Enqueue on device send queue Driver places information in DMA descriptor ring NIC reads data from main memory via DMA and sends on wire NIC signals to processor that TX descriptor sent Linux driver basics – RX NIC places data in main memory via DMA to a free RX descriptor NIC signals RX descriptor has data Driver passes frame to IP layer and cleans RX descriptor IP layer passes data to application Linux NAPI driver model On receiving a packet, NIC raises interrupt Driver switches off RX interrupts and schedules RX DMA ring poll Frames are pulled off DMA ring and is processed up to application When all frames are processed RX interrupts are re-enabled Dramatic reduction in RX interrupts under load Improving the performance of a Gigabit Ethernet driver under Linux

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 11 Protocols: TCP (Reno) – Performance AIMD and High Bandwidth – Long Distance networks Poor performance of TCP in high bandwidth wide area networks is due in part to the TCP congestion control algorithm For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 12 Protocols: HighSpeed TCP & Scalable TCP Adjusting the AIMD Algorithm – TCP Reno For each ack in a RTT without loss: cwnd -> cwnd + a / cwnd- Additive Increase, a=1 For each window experiencing loss: cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½ High Speed TCP a and b vary depending on current cwnd where a increases more rapidly with larger cwnd and as a consequence returns to the optimal cwnd size sooner for the network path; and b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput. Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd such that the increase is greater than TCP Reno, and the decrease on loss is less than TCP Reno

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 13 Protocols: HighSpeed TCP & Scalable TCP HighSpeed TCP Scalable TCP HighSpeed TCP implemented by Gareth Manc Scalable TCP implemented by Tom Kelly Camb Integration of stacks into DataTAG Kernel Yee UCL + Gareth Success

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 14 Some Measurements of Throughput CERN -SARA Using the GÉANT Backup Link 1 GByte file transfers Blue Data Red TCP ACKs Standard TCP Average Throughput 167 Mbit/s Users see Mbit/s! High-Speed TCP Average Throughput 345 Mbit/s Scalable TCP Average Throughput 340 Mbit/s

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 15 Users, The Campus & the MAN [1] NNW – to – SJ4 Access 2.5 Gbit PoS Hits 1 Gbit 50 % Man – NNW Access 2 * 1 Gbit Ethernet Pete White Pat Meyrs

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 16 Users, The Campus & the MAN [2] LMN to site 1 Access 1 Gbit Ethernet LMN to site 2 Access 1 Gbit Ethernet Message: Continue to work with your network group Understand the traffic levels Understand the Network Topology

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester GigEthernet: Tuning PCI-X

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester GigEthernet at SC2003 BW Challenge (Phoenix) Three Server systems with 10 GigEthernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Streams From SLAC/FNAL booth in Phoenix to: Pal Alto PAIX 17 ms rtt Chicago Starlight 65 ms rtt Amsterdam SARA 175 ms rtt

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 19 Helping Real Users [1] Radio Astronomy VLBI PoC with NRNs & GEANT 1024 Mbit/s 24 on 7 NOW

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester byte Packets man -> JIVE FWHM 22 µs (B2B 3 µs ) VLBI Project: Throughput Jitter & 1-way Delay 1-way Delay – note the packet loss (points with zero 1 –way delay) 1472 byte Packets Manchester -> Dwingeloo JIVE

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 21 Measure the time between lost packets in the time series of packets sent. Lost 1410 in 0.6s Is it a Poisson process? Assume Poisson is stationary λ(t) = λ Use Prob. Density Function: P(t) = λ e -λt Mean λ = 2360 / s [426 µs] Plot log: slope expect Could be additional process involved VLBI Project: Packet Loss Distribution

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 22 VLBI Traffic Flows – Only testing! Manchester – NetNorthWest - SuperJANET Access links Two 1 Gbit/s Access links: SJ4 to GÉANT GÉANT to SurfNet

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 23 Throughput & PCI transactions on the Mark5 PC: Read / Write n bytes Wait time time Mark5 uses Supermicro P3TDLE 1.2 GHz PIII Mem bus 133/100 MHz 2 *64bit 66 MHz PCI 4 32bit 33 MHz PCI SuperStor NIC Input Card IDE Disc Pack Ethernet Logic Analyser Display

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 24 PCI Activity: Read Multiple data blocks 0 wait Read bytes Each Data block: Setup CSRs Data movement Update CSRs For 0 wait between reads: Data blocks ~600µs long take ~6 ms Then 744µs gap PCI transfer rate 1188Mbit/s (148.5 Mbytes/s) Read_sstor rate 778 Mbit/s (97 Mbyte/s) PCI bus occupancy: 68.44% Concern about Ethernet Traffic 64 bit 33 MHz PCI needs ~ 82% for 930 Mbit/s Expect ~360 Mbit/s Data transfer CSR Access PCI Burst 4096 bytes Data Block131,072 bytes

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 25 PCI Activity: Read Throughput Flat then 1/t dependance ~ 860 Mbit/s for Read blocks >= bytes CPU load ~20% Concern about CPU load needed to drive Gigabit link

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 26 Helping Real Users [2] HEP BaBar & CMS Application Throughput

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 27 BaBar Case Study: Disk Performace BaBar Disk Server Tyan Tiger S2466N motherboard 1 64bit 66 MHz PCI bus Athlon MP2000+ CPU AMD-760 MPX chipset 3Ware RAID5 8 * 200Gb Maxtor IDE 7200rpm disks Note the VM parameter readahead max Disk to memory (read) Max throughput 1.2 Gbit/s 150 MBytes/s) Memory to disk (write) Max throughput 400 Mbit/s 50 MBytes/s) [not as fast as Raid0]

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 28 BaBar: Serial ATA Raid Controllers 3Ware 66 MHz PCI ICP 66 MHz PCI

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 29 BaBar Case Study: RAID Throughput & PCI Activity 3Ware RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro Network mem-mem 619 Mbit/s Disk – disk throughput bbcp Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! Read from RAID5 Disks Write to RAID5 Disks

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 30 PC MB – NG SuperJANET4 Development Network BaBar Case Study RAL OSM- 1OC48- POS-SS MCC OSM- 1OC48- POS-SS MAN Gigabit Ethernet 2.5 Gbit POS Access 2.5 Gbit POS core MPLS Admin. Domains SJ4 Dev PC 3ware RAID5 BarBar PC 3ware RAID5 MB - NG Status / Tests: Manc host has DataTAG TCP stack RAL Host now available BaBar-BaBar mem-mem BaBar-BaBar real data MB-NG BaBar-BaBar real data SJ4 Mbng-mbng real data MB-NG Mbng-mbng real data SJ4 Different TCP stacks already installed

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 31 PC Study of Applications MB – NG SuperJANET4 Development Network UCL OSM- 1OC48- POS-SS MCC OSM- 1OC48- POS-SS MAN Gigabit Ethernet 2.5 Gbit POS Access 2.5 Gbit POS core MPLS Admin. Domains SJ4 Dev PC 3ware RAID0 PC 3ware RAID0 MB - NG

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester Hours HighSpeed TCP mem-mem TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 64 Rx-abs Mbit/s Mbit/s MB - NG

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 33 Gridftp Throughput HighSpeedTCP Int Coal Txqueuelen 2000 TCP buffer 1 M byte (rtt*BW = 750kbytes) Interface throughput Acks received Data moved 520 Mbit/s Same for B2B tests So its not that simple! MB - NG

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 34 Gridftp Throughput + Web100 Throughput Mbit/s: See alternate 600/800 Mbit and zero Cwnd smooth No dup Ack / send stall / timeouts MB - NG

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 35 http data transfers HighSpeed TCP Apachie web server out of the box! prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput 72 MBytes/s Cwnd - some variation No dup Ack / send stall / timeouts MB - NG

GridPP Meeting Edinburgh 4-5 Feb 04 R. Hughes-Jones Manchester 36 More Information Some URLs MB-NG project web site: DataTAG project web site: UDPmon / TCPmon kit + writeup: Motherboard and NIC Tests: & TCP tuning information may be found at: &