Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrea Chierici On behalf of INFN-T1 staff

Similar presentations


Presentation on theme: "Andrea Chierici On behalf of INFN-T1 staff"— Presentation transcript:

1 Andrea Chierici On behalf of INFN-T1 staff
INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff

2 New Logo! Andrea Chierici

3 Outline Network Data management & Storage Farming
Projects and activities Andrea Chierici

4 Network

5 Current Status WAN Connectivity:
LHCOPN+LHCONE shared physical link: 6x10Gb/s LHCOPN: 4x10Gb/s dedicated link to CERN LHCONE: Potentially 60Gb/s (Geant Peering at 100Gb/s in Milan) General IP: 2x10Gb/s INFN TIER-1 WAN upgrade to 100 Gb/s starting from the end of this year (GARR has completed the Milan-Bologna link upgrade). LAN evolution: 2 new CORE Switches (Cisco Nexus 9516) Acquisition of 2 modules (32 x 100Gb Ethernet Ports) Tender completed and delivery in 1 month Network down required for the substitution of the fabric modules and the upgrade of the existing interface modules to insert the new “EX series modules” Andrea Chierici

6 Network diagram General IP LHC OPN/ONE Desk Resources
2x10Gb/s Desk Resources 3x10Gb/s 3x10Gb/s Cisco7600 VPC Link Nexus 9516 Nexus 9516 New Storage disk servers will be connected at 2x 100Gb/s. 4x40Gb/s Most Recent Computing resources Every disk server or farming switch connected to both core switches 3x40Gb/s (6x40Gb/s in total) Nexus 7018 Old “single homed” resources Andrea Chierici

7 Developments 100Gb/s connection should allow us to reduce the number of disk servers Today this solution is still expensive: Using 100G Base-SR4 (multimode)  expensive MPO cabling expansion Using 100G Base-LR (single mode)  expensive transceivers IPv6 deployment ongoing Dual Stack on CEs working GridFTP and XRootD coming next Andrea Chierici

8 Data Management & Storage

9 Storage: state of the art
23 PB usable (29 PB raw) on disks 56 PB on tapes Provided to Tier-1 services for all 4 LHC experiments Tier-1 services for several non-LHC experiments General data and computing services for ~30 HEP and Astrophysics experiments Andrea Chierici

10 Latest news and issues 2016 tender finally in Production
All non-LHC experiments at pledge 2017 tender behind schedule  Should be ready by Nov. 2017, 100Gbit servers Had to refurbish in-house 2 old DDN storage appliances Replaced old disks with bigger ones No maintenance, working in mirror-mode In production for ATLAS 2 weeks ago In production for LHCb last week Andrea Chierici

11 IBM Licensing issues GPFS and TSM licenses are a major concern
Working on settling a new deal Licenses are very expensive We haven’t found a convenient and affordable alternative For TSM we moved from active-active configuration to active-stand-by for all HSM nodes reducing by a half the cost of the support Andrea Chierici

12 Farming

13 Computing resources Farm power: 220K HS06
2017 tender: still to be assigned Probably AMD EPYC Increased requests compared to the past ssd disks 10Gbit network At least 16 physical cores Sadly only one bidder Andrea Chierici

14 Farm external extension
In 2018 many internal resources will go out-of-warranty We will not have enough money to replace the whole bunch We will borrow some pledges from CINECA “Marconi” cluster dual 2697v4 CPUs (2x 18 cores), 128 GB RAM, 2x 1TB SAS disks, 2x 10GbE links Adopt BARI ReCaS approach (these nodes should be phased-out too) Andrea Chierici

15 Issues and developments
LSF “Job exit threshold exceeded” issue suffered during summer Investigating grid middleware on centos7 BDII seems ready WNs released on august, lack of documentation singularity “Rumors” on other sites upgrading Are experiments ready for centos7? Andrea Chierici

16 Projects and activities

17 Major Projects ISO certification for a subset of racks in the computing room Several regulations to respect, strict activity control Got “informal” certification notice yesterday! Will be used to host experiments with sensitive data Cloud deployment We are still finalizing our configuration, delay due to uncertainties in Italian laws Who is responsible in case a cloud machine is used improperly? Andrea Chierici

18 Major Projects (2) Deployment of condor pilot
Condor CE to follow Migration from puppet v3 to v5 While working on v4, v5 came out, decided to skip directly to latest Waiting for foreman 1.16 to come out with direct support to puppet v5 Testbed ready next month Foreseen full upgrade within next meeting Andrea Chierici

19 Questions? Andrea Chierici


Download ppt "Andrea Chierici On behalf of INFN-T1 staff"

Similar presentations


Ads by Google