INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, 10-2-2014 Stefano Zani

Slides:



Advertisements
Similar presentations
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Steinbuch Centre for Computing (SCC)
Advertisements

T1-NREN Luca dell’Agnello CCR, 21-Ottobre The problem Computing for LHC experiments –Multi tier model (MONARC) –LHC computing based on grid –Experiment.
FNAL Site Perspective on LHCOPN & LHCONE Future Directions Phil DeMar (FNAL) February 10, 2014.
Trial of the Infinera PXM Guy Roberts, Mian Usman.
Modelli di differenziazione delle prestazioni per il supporto del traffico LHC1 Modelli di differenziazione delle prestazioni per il supporto del traffico.
INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.
Luca dell’Agnello INFN-CNAF FNAL, May
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
Agenda Network Infrastructures LCG Architecture Management
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Site report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2014/12/10Tomoaki Nakamura1.
Fermi National Accelerator Laboratory, U.S.A. Brookhaven National Laboratory, U.S.A, Karlsruhe Institute of Technology, Germany CHEP 2012, New York, U.S.A.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Spring 2014.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
T0/T1 network meeting July 19, 2005 CERN
The University of Bolton School of Games Computing & Creative Technologies LCT2516 Network Architecture CCNA Exploration LAN Switching and Wireless Chapter.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Introducing Network Design Concepts Designing and Supporting Computer Networks.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.
Analysis in STEP09 at TOKYO Hiroyuki Matsunaga University of Tokyo WLCG STEP'09 Post-Mortem Workshop.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
Network to and at CERN Getting ready for LHC networking Jean-Michel Jouanigot and Paolo Moroni CERN/IT/CS.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Introducing Network Design Concepts Designing and Supporting Computer Networks.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2013.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft STEINBUCH CENTRE FOR COMPUTING - SCC
INFN-T1 site report Andrea Chierici, Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX spring 2012.
High performance Brocade routers
Network infrastructure at FR-CCIN2P3 Guillaume Cessieux – CCIN2P3 network team Guillaume. cc.in2p3.fr On behalf of CCIN2P3 network team LHCOPN.
Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.
Connect communicate collaborate LHCONE in Europe DANTE, DFN, GARR, RENATER, RedIRIS LHCONE Meeting Amsterdam 26 th – 27 th Sept 11.
Brookhaven Science Associates U.S. Department of Energy 1 Network Services LHC OPN Networking at BNL Summer 2006 Internet 2 Joint Techs John Bigrow July.
Click to edit Master title style Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan.
INFN-T1 site report Luca dell’Agnello On behalf ot INFN-T1 staff HEPiX Spring 2013.
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff HEPiX Fall 2015.
Connect communicate collaborate LHCONE European design & implementation Roberto Sabatino, DANTE LHCONE Meeting, Washington, June
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
Daniele Cesini - INFN CNAF. INFN-CNAF 20 maggio 2014 CNAF 2 CNAF hosts the Italian Tier1 computing centre for the LHC experiments ATLAS, CMS, ALICE and.
Elastic CNAF Datacenter extension via opportunistic resources INFN-CNAF.
Run - II Networks Run-II Computing Review 9/13/04 Phil DeMar Networks Section Head.
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
David Foster, CERN LHC T0-T1 Meeting, Cambridge, January 2007 LHCOPN Meeting January 2007 Many thanks to DANTE for hosting the meeting!! Thanks to everyone.
Stato del Tier1 Luca dell’Agnello 11 Maggio 2012.
T0-T1 Networking Meeting 16th June Meeting
Network evolving 2020 and beyond
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Luca dell’Agnello INFN-CNAF
LHCOPN/LHCONE status report pre-GDB on Networking CERN, Switzerland 10th January 2017
“A Data Movement Service for the LHC”
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
INFN CNAF TIER1 Network Service
Daniele Cesini – INFN-CNAF - 19/09/2017
LHCOPN update Brookhaven, 4th of April 2017
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
INFN CNAF TIER1 and TIER-X network infrastructure
LHC-OPN Meeting Janet (London), 8-9 March 2010
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
The INFN TIER1 Regional Centre
The INFN Tier-1 Storage Implementation
Tony Cass, Edoardo Martelli
Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017
Big-Data around the world
an overlay network with added resources
Luca dell’Agnello, Daniele Cesini – INFN-CNAF CCR - 23/05/2017
IPv6 update Duncan Rand Imperial College London
Presentation transcript:

INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani S.Zani

INFN tier1 Experiment activity CNAF is a Tier1 center for all the LHC experiments and provides resources to many (20) other experiments like: AMS, ARGO, AUGER, Borexino, CDF, Pamela, Virgo, Xenon… S.Zani WCT (HEP Spec 06) CPU (2013)

Total computing resources  195K HepSpec-06 ( 17K job slots) Computing resources for LHC  Current: 100KHepSpec-06,10000 Job Slot (pledged 2014)  After LS1: 130KHepSpec-06, ~13000 job slots (Pledged 2015) Storage Resources for LHC  Curent 11 PB Disk and 16 PB Tape (Pledged 2014)  After LS1: 13 PB Disk and 24 PB Tape (Pledged 2015) Pledged numbers doesn’t suggest any big increment of computing resources in 2015 (20-30%). S.Zani (TwinSquare) 4 mainbords in 2 U (24Cores ). DDN SFA 12K connected to a SAN INFN TIER1 Resources Today and after LS1(pledge 2015)

Switch TOR Farming Farming and Storage local interconnection S.Zani General INTERNET LHCOPN/ ONE cisco 7600 nexus Gb/s ≈ 80 Disk Servers Switch TOR Farming Worker Nodes K Job Slots 2x10Gb/s Up to 4x10Gb/s Worker Nodes SAN 13 PB Disc In order to guarantee the minimum throughput of 5MB/s per Job slot, starting from next tender, we’ll probably connect all the WNs at 10Gb/s. Aggregation switches

LHC OPN and ONE access link S.Zani nexus 7018 CNAF Load Sharing 2x10Gb Point to Point Vlans Vlan 1003 GARR Juniper Rouer LHC OPN and LHC ONE BGP peerings are made on the same port channel using two VLANS reserved for the point to point interfaces ->Sharing the total access bandwidth (Now 20Gb/s  40Gb/s soon) Vlan 1001 LHC OPN LHC ONE

WAN Connectivity S.Zani NEXUSCisco7600 RAL PIC TRIUMPH BNL FNAL TW-ASGC NDGF IN2P3 SARA T1 resources LHC ONE LHC OPN General IP 20Gb/s 10Gb/s 10 Gb/s CNAF-FNAL CDF (Data Preservation) 20 Gb Physical Link (2x10Gb) Shared by LHCOPN and LHCONE. 10Gb/s 10 Gb/s For General IP Connectivity General IP  20 Gb/s LHCOPN/ONE  40 Gb/s CBF GARR Bo1 GARR Mi1 GARR BO1

WAN LHC total utilization (LHC OPN+LHCONE) “How the network is used” S.Zani LHC OPN + ONE Daily LHC OPN + ONE Year OUT IN OUT IN OUT Peak IN Peak AVERAGE IN: 4.1Gb/s AVERAGE OUT: 4.8 Gb/s MAX IN:20Gb/s MAX OUT:20Gb/s LHC OPN + ONE Weekly We are observing many peaks at the nominal maximum speed of the link.

LHC-OPN vs LHCONE S.Zani LHC OPN Yearly LHC ONE Yearly Many Peaks up to 20Gb/s TOP Apps: 53% Xrootd 47% GridFtp TOP Peers: CERN, KIT, IN2P3… Some Peaks up to 16 Gb/s TOP Apps: 70% GridFtp, 30% Xrootd TOP Peers:SLAC, INFN MI, INFN LNL.. AVERAGE IN: 3.2Gb/s AVERAGE OUT: 2.9 Gb/s MAX IN:20Gb/s MAX OUT:20Gb/s AVERAGE IN: 0.9 Gb/s AVERAGE OUT: 1.9 Gb/s MAX IN:12 Gb/s MAX OUT:16 Gb/s LHOPN traffic it is significantly higher than traffic on LHC ONE. The most relevant peaks of traffic are mainly on LHC-OPN routes.

S.Zani “Analysis of a peak” (LHC-OPN Link) In this case the application using most of the bandwidth was Xrootd

S.Zani “Analysis of a peak” (LHC-OPN Link) Looking the Source IPs and Destination IPs Xrootd: From KIT  CNAF (ALICE Xrootd Servers) Xrootd: From CERN  CNAF (Worker Nodes)

POP GARR-X MI1 POP GARR-X BO1 2x10G Mx1Mx2 LHC-T1CNAF-IP 10G Primary access 40G 20G CERN- LHCOPN 40G Mx1 DE-KIT Mx2 T160 0 GEANT- LHCONE 100G 2x10G 10G Present (1 feb 2014) T1 WAN connection GARR side (NOW) INFN CNAF LAN

POP GARR-X MI POP GARR-X BO1 4x10G Mx1 CNAF-IP 10G Primary access 40G Mx2 LHC-T1 CERN- LHCOPN 40G Mx1 DE-KIT Mx2 T160 0 GEANT- LHCONE Evolution (Q1/Q2 2014) 100G 2x10G 10G 2x10G Backup access INFN CNAF Computing Center T1 WAN connection GARR side Next Step

S.Zani POP GARR-X MI1 POP GARR-X BO1 100G Mx1 CNAF-IP 10G Primary access 40G Mx2 LHC-T1 CERN- LHCOPN 40G Mx1 DE-KIT Mx2 T160 0 GEANT- LHCONE Evolution to 100 Gb 100G 2x10G 10G 2x10G Backup access INFN CNAF Computing Center If more bandwidth is necessary, GARR-X network can connect the TIER1 and part of the Italian TIER2s at 100Gb/s. Next Steps in CNAF WAN Connectivity 100G

CNAF views and general concerns We are not experiencing real WAN Network problems  Experiments are using the center and the network I/O seems to be good even during short periods of bandwidth saturation… But we are not in data taking.. Concern: Direct access to data over WAN, (for example analysis traffic) can potentially “Saturate” any WAN link  NEED TO UNDERSTAND BETTER THE DATA MOVEMENT OR ACCESS MODEL in order to provide bandwidth where it is necessary and “protect” the essential connectivity resources. S.Zani

Open Questions Do we keep LHCOPN? Do we change it ? Do we keep the LHCONE L3VPN? Do we change it? The answer to these questions is dependent on the role of the TIERs in next computing models.  If T0-T1 guaranteed bandwidth during data taking is still mandatory  We should keep LHCOPN (or part of it) in order to have a “Better control” on the most relevant traffic paths and to have a faster troubleshooting procedures in case of network problems.  If the data flows will be more and more distributed as a full mesh between Tiers  a L3 approach on over provisioned resources dedicated to LHC (like LHCONE VRF) could be the best matching solution. S.Zani

Services and Tools needed ? Flow analysis tools (Netflow/Sflow analizers) have to be improved by network admins at site level. Services (like for example FTS) used to “Optimize and tune” the main data transfers (and flows) from and to a site could be very useful to the experiments and the sites too.. S.Zani

Thank You! S.Zani

Backup Slides S.Zani

Internal network possible evolution S.Zani nexus 7018 nexus x40Gb/s Switch Farming 10Gb/s Disk Servers 10Gb/s LHC OPN/ONE VPC Link 2X10Gb Redundancy: 2 Core Switches Scalability: Up to Gb/s Ports.

CNAF Network Devices S.Zani 4 Core Switch Routers (fully redundant) Cisco Nexus 7018 (TIER1 Core Switch and WAN Access Router) Gb/s Ports 192 Gigabit ports Extreme Networks BD8810 (Tier1 Concentrator) 24 10Gb/s Ports 96 Gigabit ports Cisco 7606 (General IP WAN access router) Cisco 6506 (Offices core switch) More than 100 TOR switches (About 4800 Gigabit Ports and Gb/s Ports) 40 Extreme summit X00/X450 (48x1Gb/s+4Uplink) 11 3Com 4800 (48x1Gb/s+4Uplink) 12 Juniper Ex4200 (48x1Gb/s+2x10Gbs Uplink) 14 Cisco 4948 (48x1Gb/s+4x10Gbs Uplink) 20 Cisco 3032 – DELL Blade 4 DELL Power Connect 7048 (48x1Gb/s+4x10Gb/s) 550Gb / Slot

S.Zani

GARR-POP One of the main GARR POP is hosted by CNAF and it is inside TIER1 computing center  Tier1’s wan connections are made in LAN (Local patches). The GARR Bo1 POP today can activate up to 160 Lambdas and it is possible to activate the first 100Gb/s links. S.Zani GARR BO1 POP

S.Zani

Next Steps in CNAF WAN Connectivity S.Zani nexus 7018 CNAF GARR Router (BO) nexus 7018 CNAF OPN + ONE 20 Gb/s 40 Gb/s nexus 7018 CNAF 100 Gb/s NOW In DaysEnd (If needed) GARR Router (BO) OPN + ONE GARR Backbone is already connected to GEANT with 2 x 100Gb/s links. If more bandwidth will be necessary, GARR-X network can connect the TIER1 and part of the Italian TIER2s at 100Gb/s.