Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications.

Similar presentations


Presentation on theme: "Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications."— Presentation transcript:

1 Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications Real Applications on Real Networks Richard Hughes-Jones University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” then look for “Brasov” www.hep.man.ac.uk/~rich/

2 Slide: 2 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 2 This is what researchers find when they try to use high performance networks. uReal Applications on Real Networks Disk-2-disk applications on real networks Memory-2-memory tests Comparison of different data moving applications The effect (improvement) of different TCP Stacks Transatlantic disk-2-disk at Gigabit speeds Remote Computing Farms The effect of distance Protocol vs implementation Radio Astronomy e-VLBI Users with data that is random noise ! Thanks for allowing me to use their slides to: Sylvain Ravot CERN, Les Cottrell SLAC, Brian Tierney LBL, Robin Tasker DL Ralph Spencer Jodrell Bank What we might cover!

3 Slide: 3 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 3 u SuperMicro P4DP8-2G (P4DP6) uDual Xeon u 400/522 MHz Front side bus u 6 PCI PCI-X slots u 4 independent PCI buses 64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X u Dual Gigabit Ethernet u Adaptec AIC-7899W dual channel SCSI u UDMA/100 bus master/EIDE channels data transfer rates of 100 MB/sec burst “Server Quality” Motherboards

4 Slide: 4 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 4 “Server Quality” Motherboards u Boston/Supermicro H8DAR u Two Dual Core Opterons u 200 MHz DDR Memory Theory BW: 6.4Gbit u HyperTransport u 2 independent PCI buses 133 MHz PCI-X u 2 Gigabit Ethernet u SATA u ( PCI-e )

5 Slide: 5 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 5 UK Transfers MB-NG and SuperJANET4 Throughput for real users

6 Slide: 6 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 6 Topology of the MB – NG Network Key Gigabit Ethernet 2.5 Gbit POS Access MPLS Admin. Domains UCL Domain Edge Router Cisco 7609 man01 man03 Boundary Router Cisco 7609 RAL Domain Manchester Domain lon02 man02 ral01 UKERNA Development Network Boundary Router Cisco 7609 ral02 lon03 lon01 HW RAID

7 Slide: 7 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 7 Topology of the Production Network Key Gigabit Ethernet 2.5 Gbit POS Access 10 Gbit POS man01 RAL Domain Manchester Domain ral01 HW RAID routers switches 3 routers 2 switches

8 Slide: 8 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 8 iperf Throughput + Web100 u SuperMicro on MB-NG network u HighSpeed TCP u Linespeed 940 Mbit/s u DupACK ? <10 (expect ~400) u BaBar on Production network u Standard TCP u 425 Mbit/s u DupACKs 350-400 – re-transmits

9 Slide: 9 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 9 Applications: Throughput Mbit/s u HighSpeed TCP u 2 GByte file RAID5 u SuperMicro + SuperJANET u bbcp u bbftp u Apachie u Gridftp u Previous work used RAID0 (not disk limited)

10 Slide: 10 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 10 bbftp: What else is going on? u Scalable TCP u BaBar + SuperJANET u SuperMicro + SuperJANET u Congestion window – duplicate ACK u Variation not TCP related? Disk speed / bus transfer Application

11 Slide: 11 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 11 bbftp: Host & Network Effects u 2 Gbyte file RAID5 Disks: 1200 Mbit/s read 600 Mbit/s write u Scalable TCP u BaBar + SuperJANET Instantaneous 220 - 625 Mbit/s u SuperMicro + SuperJANET Instantaneous 400 - 665 Mbit/s for 6 sec Then 0 - 480 Mbit/s u SuperMicro + MB-NG Instantaneous 880 - 950 Mbit/s for 1.3 sec Then 215 - 625 Mbit/s

12 Slide: 12 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 12 Average Transfer Rates Mbit/s AppTCP StackSuperMicro on MB-NG SuperMicro on SuperJANET4 BaBar on SuperJANET4 SC2004 on UKLight IperfStandard940350-370425940 HighSpeed940510570940 Scalable940580-650605940 bbcpStandard434290-310290 HighSpeed435385360 Scalable432400-430380 bbftpStandard400-410325320825 HighSpeed370-390380 Scalable430345-532380875 apacheStandard425260300-360 HighSpeed430370315 Scalable428400317 GridftpStandard405240 HighSpeed320 Scalable335 New stacks give more throughput Rate decreases

13 Slide: 13 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 13 Transatlantic Disk to Disk Transfers With UKLight SuperComputing 2004

14 Slide: 14 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 14 SC2004 UKLIGHT Overview MB-NG 7600 OSR Manchester ULCC UKLight UCL HEP UCL network K2 Ci Chicago Starlight Amsterdam SC2004 Caltech Booth UltraLight IP SLAC Booth Cisco 6509 UKLight 10G Four 1GE channels UKLight 10G Surfnet/ EuroLink 10G Two 1GE channels NLR Lambda NLR-PITT-STAR-10GE-16 K2 Ci Caltech 7600

15 Slide: 15 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 15 uSCINet Collaboration at SC2004 uSetting up the BW Bunker uThe BW Challenge at the SLAC Booth uWorking with S2io, Sun, Chelsio

16 Slide: 16 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 16 Transatlantic Ethernet: TCP Throughput Tests uSupermicro X5DPE-G2 PCs uDual 2.9 GHz Xenon CPU FSB 533 MHz u1500 byte MTU u2.6.6 Linux Kernel uMemory-memory TCP throughput uStandard TCP uWire rate throughput of 940 Mbit/s uFirst 10 sec uWork in progress to study: Implementation detail Advanced stacks Effect of packet loss Sharing

17 Slide: 17 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 17 SC2004 Disk-Disk bbftp ubbftp file transfer program uses TCP/IP uUKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 uMTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off uMove a 2 Gbyte file uWeb100 plots: uStandard TCP uAverage 825 Mbit/s u(bbcp: 670 Mbit/s) uScalable TCP uAverage 875 Mbit/s u(bbcp: 701 Mbit/s ~4.5s of overhead) uDisk-TCP-Disk at 1Gbit/s

18 Slide: 18 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 18 Network & Disk Interactions (work in progress) uHosts: Supermicro X5DPE-G2 motherboards dual 2.8 GHz Zeon CPUs with 512 k byte cache and 1 M byte memory 3Ware 8506-8 controller on 133 MHz PCI-X bus configured as RAID0 six 74.3 GByte Western Digital Raptor WD740 SATA disks 64k byte stripe size uMeasure memory to RAID0 transfer rates with & without UDP traffic Disk write 1735 Mbit/s Disk write + 1500 MTU UDP 1218 Mbit/s Drop of 30% Disk write + 9000 MTU UDP 1400 Mbit/s Drop of 19% % CPU kernel mode

19 Slide: 19 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 19 Remote Computing Farms in the ATLAS TDAQ Experiment

20 Slide: 20 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 20 Remote Computing Concepts ROB L2PU SFI PF Local Event Processing Farms ATLAS Detectors – Level 1 Trigger SFOs Mass storage Experimental Area CERN B513 Copenhagen Edmonton Krakow Manchester PF Remote Event Processing Farms PF lightpaths PF Data Collection Network Back End Network GÉANT Switch Level 2 Trigger Event Builders ~PByte/sec 320 MByte/sec

21 Slide: 21 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 21 ATLAS Remote Farms – Network Connectivity

22 Slide: 22 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 22 ATLAS Application Protocol u Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes u Processing of event u Return of computation EF asks SFO for buffer space SFO sends OK EF transfers results of the computation u tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication. Send OK Send event data Request event ●●● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter Daemon EFD SFI and SFO

23 Slide: 23 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 23 Using Web100 TCP Stack Instrumentation to analyse application protocol - tcpmon

24 Slide: 24 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 24 tcpmon: TCP Activity Manc-CERN Req-Resp Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms TCP Congestion window gets re-set on each Request TCP stack implementation detail to reduce Cwnd after inactivity Even after 10s, each response takes 13 rtt or ~260 ms Transfer achievable throughput 120 Mbit/s

25 Slide: 25 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 25 tcpmon: TCP Activity Manc-cern Req-Resp TCP stack tuned Round trip time 20 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 19 rtt or ~ 380 ms TCP Congestion window grows nicely Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait) Transfer achievable throughput grows to 800 Mbit/s

26 Slide: 26 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 26 Round trip time 150 ms 64 byte Request green 1 Mbyte Response blue TCP starts in slow start 1 st event takes 11 rtt or ~ 1.67 s tcpmon: TCP Activity Alberta-CERN Req-Resp TCP stack tuned TCP Congestion window in slow start to ~1.8s then congestion avoidance Response in 2 rtt after ~2.5s Rate 2.2/s (with 50ms wait) Transfer achievable throughput grows slowly from 250 to 800 Mbit/s

27 Slide: 27 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 27 Time Series of Request-Response Latency Alberta – CERN Round trip time 150 ms 1 Mbyte of data returned Stable for ~150s at 300ms Falls to 160ms with ~80 μs variation Manchester – CERN Round trip time 20 ms 1 Mbyte of data returned Stable for ~18s at ~42.5ms Then alternate points 29 & 42.5 ms

28 Slide: 28 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 28 Using the Trigger DAQ Application

29 Slide: 29 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 29 Time Series of T/DAQ event rate Manchester – CERN Round trip time 20 ms 1 Mbyte of data returned 3 nodes: 1 GEthernet + two 100Mbit 2 nodes: two 100Mbit nodes 1node: one 100Mbit node Event Rate: Use tcpmon transfer time of ~42.5ms Add the time to return the data 95ms Expected rate 10.5/s Observe ~6/s for the gigabit node Reason: TCP buffers could not be set large enough in T/DAQ application

30 Slide: 30 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 30 Tcpdump of the Trigger DAQ Application

31 Slide: 31 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 31 tcpdump of the T/DAQ dataflow at SFI (1) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI Incoming event request Followed by ACK N 1448 byte packets SFI sends event Limited by TCP receive buffer Time 115 ms (~4 ev/s) When TCP ACKs arrive more data is sent. ●●●●●●

32 Slide: 32 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 32 Tcpdump of TCP Slowstart at SFI (2) Cern-Manchester 1.0 Mbyte event Remote EFD requests event from SFI First event request N 1448 byte packets SFI sends event Limited by TCP Slowstart Time 320 ms When ACKs arrive more data sent.

33 Slide: 33 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 33 tcpdump of the T/DAQ dataflow for SFI &SFO Cern-Manchester – another test run 1.0 Mbyte event Remote EFD requests events from SFI Remote EFD sending computation back to SFO Links closed by Application Link setup & TCP slowstart

34 Slide: 34 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 34 Some Conclusions uThe TCP protocol dynamics strongly influence the behaviour of the Application. uCare is required with the Application design eg use of timeouts. uWith the correct TCP buffer sizes It is not throughput but the round-trip nature of the application protocol that determines performance. Requesting the 1-2Mbytes of data takes 1 or 2 round trips TCP Slowstart (the opening of Cwnd) considerably lengthens time for the first block of data. Implementation “improvements” (Cwnd reduction) kill performance! uWhen the TCP buffer sizes are too small (default) The amount of data sent is limited on each rtt Data is send and arrives in bursts It takes many round trips to send 1 or 2 Mbytes uThe End Hosts themselves CPU power is required for the TCP/IP stack as well and the application Packets can be lost in the IP stack due to lack of processing power uAlthough the application is ATLAS-specific, the network interactions is applicable to other areas including: Remote iSCSI Remote database accesses Real-time Grid Computing – eg Real-Time Interactive Medical Image processing

35 Slide: 35 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 35 Radio Astronomy e-VLBI

36 Slide: 36 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 36 Radio Astronomy with help from Ralph Spencer Jodrell Bank uThe study of celestial objects at 1m wavelength.  Sensitivity for continuum sources B=bandwidth,  integration time. uHigh resolution achieved by interferometers. Some radio emitting X-ray binary stars in our own galaxy: GRS 1915+105 MERLIN SS433 MERLIN and European VLBI Cygnus X-1 VLBA

37 Slide: 37 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 37 Earth-Rotation Synthesis and Fringes Need ~ 12 hours for full synthesis, not necessarily collecting data for all that time. NB Trade-off between B and  for sensitivity. Telescope data correlated in pairs: N(N-1)/2 baselines Merlin u-v coverage Fringes Obtained with the correct signal phase

38 Slide: 38 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 38 The European VLBI Network: EVN uDetailed radio imaging uses antenna networks over 100s- 1000s km uAt faintest levels, sky teems with galaxies being formed uRadio penetrates cosmic dust - see process clearly uTelescopes in place … uDisk recording at 512Mb/s uReal-time connection allows greater: response reliability sensitivity

39 Slide: 39 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 39 Westerbork Netherlands Dedicated Gbit link EVN-NREN Onsala Sweden Gbit link Jodrell Bank UK Dwingeloo DWDM link Cambridge UK MERLIN Medicina Italy Chalmers University of Technology, Gothenburg Torun Poland Gbit link

40 Slide: 40 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 40 Throughput vs packet spacing Manchester: 2.0G Hz Xeon Dwingeloo: 1.2 GHz PIII Near wire rate, 950 Mbps NB record stands at 6.6 Gbps SLAC-CERN Packet loss CPU Kernel Load sender CPU Kernel Load receiver 4 th Year project Adam Mathews Steve O’Toole UDP Throughput Manchester-Dwingeloo (Nov 2003)

41 Slide: 41 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 41 Packet loss distribution: Cumulative distribution Cumulative distribution of packet loss, each bin is 12  sec wide Long range effects in the data? Poisson

42 Slide: 42 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 42 26 th January 2005 UDP Tests Simon Casey (PhD project) Between JBO and JIVE in Dwingeloo, using production network Period of high packet loss (3%):

43 Slide: 43 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 43 The GÉANT2 Launch June 2005

44 Slide: 44 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 44 Jodrell Bank UK Dwingeloo DWDM link Medicina Italy Torun Poland e-VLBI at the GÉANT2 Launch Jun 2005

45 Slide: 45 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 45 e-VLBI UDP Data Streams

46 Slide: 46 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 46 UDP Performance: 3 Flows on GÉANT uThroughput: 5 Hour run uJodrell: JIVE 2.0 GHz dual Xeon – 2.4 GHz dual Xeon 670-840 Mbit/s uMedicina (Bologna): JIVE 800 MHz PIII – mark623 1.2 GHz PIII 330 Mbit/s limited by sending PC uTorun: JIVE 2.4 GHz dual Xeon – mark575 1.2 GHz PIII 245-325 Mbit/s limited by security policing (>400Mbit/s  20 Mbit/s) ? uThroughput: 50 min period uPeriod is ~17 min

47 Slide: 47 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 47 UDP Performance: 3 Flows on GÉANT uPacket Loss & Re-ordering uJodrell: 2.0 GHz Xeon Loss 0 – 12% Reordering significant uMedicina: 800 MHz PIII Loss ~6% Reordering in-significant uTorun: 2.4 GHz Xeon Loss 6 - 12% Reordering in-significant

48 Slide: 48 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 48 18 Hour Flows on UKLight Jodrell – JIVE, 26 June 2005 uThroughput: uJodrell: JIVE 2.4 GHz dual Xeon – 2.4 GHz dual Xeon 960-980 Mbit/s uTraffic through SURFnet uPacket Loss Only 3 groups with 10-150 lost packets each No packets lost the rest of the time uPacket re-ordering None

49 Slide: 49 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 49 uHost is critical: Motherboards NICs, RAID controllers and Disks matter The NICs should be well designed: NIC should use 64 bit 133 MHz PCI-X (66 MHz PCI can be OK) NIC/drivers: CSR access / Clean buffer management / Good interrupt handling Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the memory bus at least 3 times Separate the data transfers – use motherboards with multiple 64 bit PCI-X buses 32 bit 33 MHz is too slow for Gigabit rates 64 bit 33 MHz > 80% used Choose a modern high throughput RAID controller Consider SW RAID0 of RAID5 HW controllers uNeed plenty of CPU power for sustained 1 Gbit/s transfers uPacket loss is a killer Check on campus links & equipment, and access links to backbones uNew stacks are stable give better response & performance Still need to set the tcp buffer sizes ! Check other kernel settings e.g. window-scale, uApplication architecture & implementation is also important uInteraction between HW, protocol processing, and disk sub-system complex Summary, Conclusions MB - NG

50 Slide: 50 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 50 More Information Some URLs uReal-Time Remote Farm site http://csr.phys.ualberta.ca/real-time uUKLight web site: http://www.uklight.ac.uk uDataTAG project web site: http://www.datatag.org/ uUDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/ (Software & Tools) uMotherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt & http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ (Publications) uTCP tuning information may be found at: http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html uTCP stack comparisons: “Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 http:// www.hep.man.ac.uk/~rich/ (Publications) uPFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ uDante PERT http://www.geant2.net/server/show/nav.00d00h002

51 Slide: 51 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 51 Any Questions?

52 Slide: 52 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 52 Backup Slides

53 Slide: 53 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 53 uUDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program uLatency uRound trip times measured using Request-Response UDP frames uLatency as a function of frame size Slope is given by: Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates: processing times + HW latencies uHistograms of ‘singleton’ measurements uTells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence Latency Measurements

54 Slide: 54 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 54 Throughput Measurements uUDP Throughput uSend a controlled stream of UDP frames spaced at regular intervals n bytes Number of packets Wait time time  Zero stats OK done ●●● Get remote statistics Send statistics: No. received No. lost + loss pattern No. out-of-order CPU load & no. int 1-way delay Send data frames at regular intervals ●●● Time to send Time to receive Inter-packet time (Histogram) Signal end of test OK done Time Sender Receiver

55 Slide: 55 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 55 PCI Bus & Gigabit Ethernet Activity uPCI Activity uLogic Analyzer with PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC Gigabit Ethernet Probe CPU mem chipset NIC CPU mem NIC chipset Logic Analyser Display PCI bus Possible Bottlenecks

56 Slide: 56 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 56 End Hosts & NICs CERN-nat-Manc. Request-Response Latency Throughput Packet Loss Re-Order uUse UDP packets to characterise Host, NIC & Network SuperMicro P4DP8 motherboard Dual Xenon 2.2GHz CPU 400 MHz System bus 64 bit 66 MHz PCI / 133 MHz PCI-X bus uThe network can sustain 1Gbps of UDP traffic uThe average server can loose smaller packets uPacket loss caused by lack of power in the PC receiving the traffic uOut of order packets due to WAN routers uLightpaths look like extended LANS have no re-ordering

57 Slide: 57 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 57 TCP (Reno) – Details uTime for TCP to recover its throughput from 1 lost packet given by: u for rtt of ~200 ms: 2 min UK 6 ms Europe 20 ms USA 150 ms

58 Slide: 58 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 58 Network & Disk Interactions uDisk Write mem-disk: 1735 Mbit/s Tends to be in 1 die uDisk Write + UDP 1500 mem-disk : 1218 Mbit/s Both dies at ~80% uDisk Write + CPU  mem mem-disk : 1341 Mbit/s 1 CPU at ~60% other 20% Large user mode usage Below Cut = hi BW Hi BW = die1 used uDisk Write + CPUload mem-disk : 1334 Mbit/s 1 CPU at ~60% other 20% All CPUs saturated in user mode Total CPU load Kernel CPU load


Download ppt "Slide: 1 Richard Hughes-Jones Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester 1 TCP/IP and Other Transports for High Bandwidth Applications."

Similar presentations


Ads by Google