Geneva – Kraków network measurements for the ATLAS Real-Time Remote Computing Farm Studies R. Hughes-Jones (Univ. of Manchester), K. Korcyl (IFJ-PAN),

Slides:



Advertisements
Similar presentations
PIONIER and its usability for GEANT extension to Eastern Europe Michał Przybylski, CEF Networks Workshop, May 2005.
Advertisements

THE ICT RESEARCH INFRASTRUCTURE DEVELOPMENT PROGRAMME Grzegorz Żbikowski Department of Information Systems for Science Ministry of Science and.
Polish Tier-2 Ryszard Gokieli Institute for Nuclear Studies Warsaw.
Zagreb, NATO ANW: The Third CEENet Workshop on Network Management, Piotr Sąsiedzki POL-34 Silesian University of Technology Computer Centre.
Chapter 17 Networking Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
The International Grid Testbed: a 10 Gigabit Ethernet success story in memoriam Bob Dobinson GNEW 2004, Geneva Catalin Meirosu on behalf of the IGT collaboration.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Meeting on ATLAS Remote Farms. Copenhagen 11 May 2004 R. Hughes-Jones Manchester Networking for ATLAS Remote Farms Richard Hughes-Jones The University.
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
DataTAG Meeting CERN 7-8 May 03 R. Hughes-Jones Manchester 1 High Throughput: Progress and Current Results Lots of people helped: MB-NG team at UCL MB-NG.
© 2006 Open Grid Forum Interactions Between Networks, Protocols & Applications HPCN-RG Richard Hughes-Jones OGF20, Manchester, May 2007,
Computer Networks Eyad Husni Elshami. Computer Network A computer network is a group of interconnected computers to share data resources ( printer, data.
Polish Tier-2 Andrzej Olszewski Institute of Nuclear Physics Kraków, Poland October 2005 – February 2006.
Lecture 1 Overview: roadmap 1.1 What is computer network? the Internet? 1.2 Network edge  end systems, access networks, links 1.3 Network core  network.
MB - NG MB-NG Meeting UCL 17 Jan 02 R. Hughes-Jones Manchester 1 Discussion of Methodology for MPLS QoS & High Performance High throughput Investigations.
Computing in Poland from the Grid/EGEE/WLCG point of view Ryszard Gokieli Institute for Nuclear Studies Warsaw Gratefully acknowledging slides from: P.Lasoń.
02 nd April 03Networkshop Managed Bandwidth Next Generation F. Saka UCL NETSYS (NETwork SYStems centre of excellence)
Network Simulation Internet Technologies and Applications.
Polish Contribution to the Worldwide LHC Computing Grid WLCG M. Witek On behalf of the team of Polish distributed Tier-2 Outline Introduction History and.
Networking LAN (Local Area Network) A network is a collection of computers that communicate with each other through a shared network medium. LANs are.
“Science and technology potential in Poland” - Dr Olaf Gajl, Information Processing Centre OPI Warsaw, Pl International Conference “Scientific and Technological.
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
1 ESnet Network Measurements ESCC Feb Joe Metzger
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
1 WHY NEED NETWORKING? - Access to remote information - Person-to-person communication - Cooperative work online - Resource sharing.
Worldwide event filter processing for calibration Calorimeter Calibration Workshop Sander Klous September 2006.
Wan Technologies. OSI Model Do a quick internet search to find a good picture of an OSI model.
Networking LAN (Local Area Network)  A network is a collection of computers that communicate with each other through a shared network medium.  LANs.
1 Next Few Classes Networking basics Protection & Security.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
 What is a network and how does it function with computer systems? It is a collection of computers and devices that communicate with one another over.
Running large scale experimentation on Content-Centric Networking via the Grid’5000 platform Massimo GALLO (Bell Labs, Alcatel - Lucent) Joint work with:
Remote Online Farms Sander Klous
Clusterix:National IPv6 Computing Facility in Poland Artur Binczewski Radosław Krzywania Maciej Stroiński
Chapter 1 Communication Networks and Services Network Architecture and Services.
POLISH OPTICAL INTERNET Cross Border Fiber - towards the revolution in NREN international connectivity Artur Binczewski (Poznan Supercomputing and Networking.
Chapter 6 Data Communications. Network Collection of computers Communicate with one another over transmission line Major types of network topologies What.
Network Performance for ATLAS Real-Time Remote Computing Farm Study Alberta, CERN Cracow, Manchester, NBI MOTIVATION Several experiments, including ATLAS.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
Slide: 1 Richard Hughes-Jones IEEE Real Time 2005 Stockholm, 4-10 June, R. Hughes-Jones Manchester 1 Investigating the Network Performance of Remote Real-Time.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
Online-Offsite Connectivity Experiments Catalin Meirosu *, Richard Hughes-Jones ** * CERN and Politehnica University of Bucuresti ** University of Manchester.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 4 Monitoring Network Activity.
Kraków4FutureDaQ Institute of Physics & Nowoczesna Elektronika P.Salabura,A.Misiak,S.Kistryn,R.Tębacz,K.Korcyl & M.Kajetanowicz Discrete event simulations.
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester 1 ATLAS TDAQ Networking, Remote Compute Farms & Evaluating SFOs Richard Hughes-Jones The.
Connect communicate collaborate Research Networking: A “20/20” Vision UAE Forum in Information and Communication Technology Research 2010 Dai Davies, DANTE.
GridPP Meeting Jan 2003 R. Hughes-Jones Manchester ATLAS Trigger/DAQ Real-time use of the Grid Network Richard Hughes-Jones The University of Manchester.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
ATLAS Computing Model & Service Challenges Roger Jones 12 th October 2004 CERN.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Network Processing Systems Design
Chapter 6 The Transport Layer.
Transport Protocols over Circuits/VCs
Event Building With Smart NICs
Remote Online Farms TDAQ Sander Klous ACAT April
MB – NG SuperJANET4 Development Network
Performance Evaluation of Computer Networks
Chapter 8 – Data switching and routing
Presentation transcript:

Geneva – Kraków network measurements for the ATLAS Real-Time Remote Computing Farm Studies R. Hughes-Jones (Univ. of Manchester), K. Korcyl (IFJ-PAN), C. Meirosu (CERN) MOTIVATION Several experiments, including ATLAS at the Large Hadron Collider (LHC) and D0 at Fermi Lab, have expressed interest in using remote computing farms for processing and analysing, in real time, the information from particle collision events. Different architectures have been suggested from pseudo-real-time file transfer and subsequent remote processing, to the real-time requesting of individual events as described here. To test the feasibility of using remote farms for real-time processing, a collaboration was set up between members of ATLAS Trigger/DAQ community, with support from several national research and education network operators (DARENET, Canarie, Netera, PSNC, UKERNA and Dante) to demonstrate a Proof of Concept and measure end-to-end network performance. The testbed was centred at CERN and used three different types of wide area high-speed network infrastructures to link the remote sites: an end-to-end lightpath (SONET circuit) to the University of Alberta in Canada standard Internet connectivity to the University of Manchester in the UK and the Niels Bohr Institute in Denmark a Virtual Private Network (VPN) composed out of an MPLS tunnel over the GEANT and an Ethernet VPN over the PIONIER networks to IFJ PAN Krakow in Poland. EQUIPMENT We developed custom measuring equipment to measure the quality of service at Layer 2/3. The equipment is based on an Alteon Gigabit Ethernet (GE) network interface card (NIC) reprogrammed to act as an IP traffic generator, a custom clock card and commercial Global Positioning System equipment (used as a global clock time reference). We can measure one- way latency, inter-arrival time, frame loss and re-ordering on a packet-by-packet basis, as a function of load, up to Gigabit Ethernet speed. For the Layer 4 (TCP) measurements we developed and used the “tcpmon” program. The tcpmon is an instrumented request-response program that emulates the communication between the EFD and SFI components of the ATLAS Event Filter. It is builds on the experience of UDPmon, a generic tool that can be used to automate hardware and network performance measurements using UDP packets. UDPmon calculates the CPU load and the number of interrupts generated by the network interface card during a given test, along with standard network-related parameters like latency, inter-arrival time and bandwidth. TCPmon and UDPmon run on standard PCs under the Linux operating system CONCLUSIONS The quality of service over long-distance connectivity may vary quite a lot momentarily, even though it might still meet the Service Level Agreement (based on long-term averages). Long-distance Ethernet circuits, tunneled over routed networks, may produce out-of-order packets – which would be not the case in a LAN environment. Application with real-time requirements should monitor the performance of the underlying network and adapt accordingly. Out of order packets are important for our application. Studies are under way to determine the real impact. CERN RESULTS – summary: out of order frames present, even if the connection is composed of a Layer 2 tunnel over MPLS and a pure Ethernet VLAN the number of out of order frames may vary, depending on the offered load - we have not observed any out of order frames during our tests for loads lower than 500 Mbit/s relevant to the 1.5 MB transfers we would have in our application ? Yes ! minimal: using a modern TCP stack, if frames are “not too much out of order”, the stack will not request re-transmits. But the CPU load, required for the bookkeeping, is higher. worst case: the stack will require a re-transmit, halving the TCP window in the process hence reducing the maximum transfer rate for a given time interval ATLAS Application Protocol Event Request –EFD requests an event from SFI –SFI replies with the event ~2Mbytes Processing of event Return of computation –EF asks SFO for buffer space –SFO sends OK –EF transfers results of the computation Send OK Send event data Request event ●● ● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter EFD SFI and SFO Remote Computing Concepts ROB L2PU SFI PF Local Event Processing Farms ATLAS Detectors – Level 1 Trigger SFOs Mass storage Experimental Area CERN B513 Copenhagen Edmonton Krakow Manchester PF Remote Event Processing Farms PF lightpaths PF Data Collection Network Back End Network GÉANT Switch Level 2 Trigger Event Builders GDAŃSK POZNAŃ ZIELONA GÓRA KATOWICE KRAKÓW LUBLIN WARSZAWA BYDGOSZCZ TORUŃ CZĘSTOCHOWA BIAŁYSTOK OLSZTYN RZESZÓW BIELSKO-BIAŁA GÉANT 10 Gb/s Metropolitan Area Networks 622 Mb/s 155 Mb/s 10 Gb/s OWN FIBERS GÉANT LEASED CHANNELS KOSZALIN SZCZECIN WROCŁAW ŁÓDŹ KIELCE PUŁAWY OPOLE RADOM KRAKÓW PNSC steady state request-response latency: ~140 ms event rate: ~7.2 events/s the first event took 600 ms (due to the start-up time on the TCP connection) Web100 parameters on the server located at CERN (data source) Green – small requests Blue – big responses TCP ACK packets also counted (in each direction) One response = 1 MB ~ 380 packets Geant