Presentation on theme: "1 1 LHCOPN Status and Plans David Foster Head, Communications and Networks CERN January 2008 Joint Techs Hawaii LHCOPN Status and Plans Joint-Techs Hawaii."— Presentation transcript:
1 1 LHCOPN Status and Plans David Foster Head, Communications and Networks CERN January 2008 Joint Techs Hawaii LHCOPN Status and Plans Joint-Techs Hawaii David Foster Head, Communications and Networks CERN January 2008
2 2 Acknowledgments Many presentations and material in the public domain have contributed to this presentation, too numerous to mention individually.
4 4 CERN – March 2007 26659m in Circumference SC Magnets pre ‑ cooled to -193.2°C (80 K) using 10 080 tonnes of liquid nitrogen 60 tonnes of liquid helium bring them down to -271.3°C (1.9 K). 600 Million Proton Collisions/second The internal pressure of the LHC is 10 -13 atm, ten times less than the pressure on the Moon
5 5 CERN’s Detectors To observe the collisions, collaborators from around the world are building four huge experiments: ALICE, ATLAS, CMS, LHCb Detector components are constructed all over the world Funding comes mostly from the participating institutes, less than 20% from CERN CMS ALICE ATLAS LHCb
6 6 The LHC Computing Challenge Signal/Noise 10 -9 Data volume High rate x large number of channels x 4 experiments 15 PetaBytes of new data each year Compute power Event complexity x Nb. events x thousands users 100 k of today's fastest CPUs Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology
9 9 LHC Computing Multi-science Grid 1999 - MONARC project First LHC computing architecture – hierarchical distributed model 2000 – growing interest in grid technology HEP community main driver in launching the DataGrid project 2001-2004 - EU DataGrid project middleware & testbed for an operational grid 2002-2005 – LHC Computing Grid – LCG deploying the results of DataGrid to provide a production facility for LHC experiments 2004-2006 – EU EGEE project phase 1 starts from the LCG grid shared production infrastructure expanding to other communities and sciences CERN
10 The WLCG Distribution of Resources Tier-0 – the accelerator centre Data acquisition and initial Processing of raw data Distribution of data to the different Tier’s Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 (11 centers ) – “online” to the data acquisition process high availability Managed Mass Storage – grid-enabled data service Data-heavy analysis National, regional support Tier-2 – ~200 centres in ~40 countries Simulation End-user analysis – batch and interactive 14
11 Centers around the world form a Supercomputer The EGEE and OSG projects are the basis of the Worldwide LHC Computing Grid Project WLCG Inter-operation between Grids is working!
12 Tier-1 Centers: TRIUMF (Canada); GridKA(Germany); IN2P3 (France); CNAF (Italy); SARA/NIKHEF (NL); Nordic Data Grid Facility (NDGF); ASCC (Taipei); RAL (UK); BNL (US); FNAL (US); PIC (Spain) The Grid is now in operation, working on: reliability, scaling up, sustainability
14 LHCOPN Mission To assure the T0-T1 transfer capability. Essential for the Grid to distribute data out to the T1’s. Capacity must be large enough to deal with most situation including “Catch up” The excess capacity can be used for T1-T1 transfers. Lower priority than T0-T1 May not be sufficient for all T1-T1 requirements Resiliency Objective No single failure should cause a T1 to be isolated. Infrastructure can be improved Naturally started as an unprotected “star” – insufficient for a production network but enabled rapid progress. Has become a reason for and has leveraged cross border fiber. Excellent side effect of the overall approach.
15 LHCOPN Design Information All technical content is on the LHCOPN Twiki: http://lhcopn.cern.ch http://lhcopn.cern.ch Coordination Process LHCOPN Meetings (every 3 months) Active Working Groups –Routing –Monitoring –Operations Active Interfaces to External Networking Activities European Network Policy Groups US Research Networking Grid Deployment Board LCG Management Board EGEE
23 Basic Link Layer Monitoring Perfsonar very well advanced in deployment (but not yet complete). Monitors the “up/down” status of the links. Integrated into the “End to End Coordination Unit” (E2ECU) run by DANTE Provides simple indications of “hard” faults. Insufficient to understand the quality of the connectivity
26 Active Monitoring Active monitoring needed Implementation consistency needed for accurate results One-way delay TCP achievable bandwidth ICMP based round trip time Traceroute information for path changes Needed for service quality issues First mission is T0-T1 and T1-T1 T1 deployment could be also used for T1-T2 measurements as a second step and with corresponding T2 infrastructure.
28 Monitoring Evolution Long standing collaboration of the measurement and monitoring technologies Monitoring working group of the LHCOPN ESNet and Dante have been leading the effort Proposal for a Managed Service by Dante Manage the tools, archives Manage the hardware, O/S Manage integrity of information Sites have some obligations On-site operations support Provision of a terminal server Dedicated IP port on the border router PSTN/ISDN line for out of band communication Gigabit Ethernet Switch GPS Antenna Protected power Rack Space
29 Operational Procedures Have to be finalised but need to deal with change and incident management. Many parties involved. Have to agree on the real processes involved Recent Operations workshop made some progress Try to avoid, wherever possible, too many “coordination units”. All parties agreed we need some centralised information to have a global view of the network and incidents. Further workshop planned to quantify this. We also need to understand existing processes used by T1’s.
30 Resiliency Issues The physical fiber path considerations continue Some lambdas have been re-routed. Others still may be. Layer3 backup paths for RAL and PIC are still an issue. In the case of RAL, excessive costs seem to be a problem. For PIC, still some hope of a CBF between RedIris and Renater Overall the situation is quite good with the CBF links, but can still be improved. Most major “single” failures are protected against.
31 T0-T1 Lambda routing (schematic) Connect. Communicate. Collaborate DE Frankfurt Basel T1 GRIDKA T1 Zurich CNAF DK Copenhagen NL SARA UK London T1 BNL T1 FNAL CH NY Starlight MAN LAN FR Paris T1 IN2P3 Barcelona T1 PIC ES Madrid T1 RAL IT Milan Lyon Strasbourg/Kehl GENEVA Atlantic Ocean VSNL N VSNL S AC-2/Yellow Stuttgart T1 NDGF T0 Hamburg T1 SURFnet T0-T1s: CERN-RAL CERN-PIC CERN-IN2P3 CERN-CNAF CERN-GRIDKA CERN-NDGF CERN-SARA CERN-TRIUMF CERN-ASGC USLHCNET NY (AC-2) USLHCNET NY (VSNL N) USLHCNET Chicago (VSNL S) T1 TRIUMF T1 ASGC ??? Via SMW-3 or 4 (?) Amsterdam
32 T1-T1 Lambda routing (schematic) Connect. Communicate. Collaborate DE Frankfurt Basel T1 GRIDKA T1 Zurich CNAF DK Copenhagen NL SARA UK London T1 BNL T1 FNAL CH NY Starlight MAN LAN FR Paris T1 IN2P3 Barcelona T1 PIC ES Madrid T1 RAL IT Milan Lyon Strasbourg/Kehl GENEVA Atlantic Ocean VSNL N VSNL S AC-2/Yellow Stuttgart T1 NDGF T0 Hamburg T1 SURFnet T1-T1s: GRIDKA-CNAF GRIDKA-IN2P3 GRIDKA-SARA SARA-NDGF T1 TRIUMF T1 ASGC ??? Via SMW-3 or 4 (?)
33 Some Initial Observations Connect. Communicate. Collaborate DE Frankfurt Basel T1 GRIDKA T1 Zurich CNAF DK Copenhagen NL SARA UK London T1 BNL T1 FNAL CH NY Starlight MAN LAN FR Paris T1 IN2P3 Barcelona T1 PIC ES Madrid T1 RAL IT Milan Lyon Strasbourg/Kehl GENEVA Atlantic Ocean VSNL N VSNL S AC-2/Yellow Stuttgart T1 NDGF T0 Hamburg T1 SURFnet (Between CERN and BASEL) Following lambdas run in same fibre pair: CERN-GRIDKA CERN-NDGF CERN-SARA CERN-SURFnet-TRIUMF/ASGC (x2) USLHCNET NY (AC-2) Following lambdas run in same (sub-)duct/trench: (all above +) CERN-CNAF USLHCNET NY (VSNL N) [supplier is COLT] Following lambda MAY run in same (sub-)duct/trench as all above: USLHCNET Chicago (VSNL S) [awaiting info from Qwest…] (Between BASEL and Zurich) Following lambdas run in same trench: CERN-CNAF GRIDKA-CNAF (T1-T1) Following lambda MAY run in same trench as all above: USLHCNET Chicago (VSNL S) [awaiting info from Qwest…] T1 TRIUMF T1 ASGC ??? Via SMW-3 or 4 (?) KEY GEANT2 NREN USLHCNET Via SURFnet T1-T1 (CBF)
34 Closing Remarks The LHCOPN is an important part of the overall requirements for LHC Networking. It is a (relatively) simple concept. Statically Allocated 10G Paths in Europe Managed Bandwidth on the 10G transatlantic links via USLHCNet Multi-domain operations remain to be completely solved This is a new requirement for the parties involved and a learning process for everyone Many tools and ideas exist and the work is now to pull this all together into a robust operational framework