Dynamic Extension of the INFN Tier-1 on external resources

Dynamic Extension of the INFN Tier-1 on external resources
Vincenzo Ciaschini 25/10/2016 CCR Workshop on HTCondor

Tier1 is always saturated
Computing resources always completely used at INFN-T1 with a large amount of waiting jobs (50% of the running jobs) Expected huge resource request (and increase) in the next years mostly coming from LHC experiments INFN Tier-1 farm usage in 2016 Workshop CCR HTCondor 24-27 oct 2016

Cloud bursting on Commercial Provider
Workshop CCR HTCondor 24-27 oct 2016

Bursting on Aruba One of the main Italian commercial resource providers Web, host, mail, cloud, … Main datacenter in Arezzo (near Florence) Small scale test 10x8 cores VM (160 GHz) managed by VMWare Use of idle CPU cycles When a customer requires a resource we are using, the CPU clock speed of “our” VMs is decreased to a few MHz (not destroyed!) Only CMS mcore jobs tested No storage on site: remote data access via Xrootd Use of GPN (no dedicated NREN infrastructure) Workshop CCR HTCondor 24-27 oct 2016

Implementation Uses in-house tool, dynfarm
Authenticates connections coming from remote hosts and create a VPN split tunnel connecting them to Tier-1 Only CE, LSF and Argus are visible to remote WNs Configures the remote machines to make them compatible with the rest of the farm and the requirements to run jobs Only requires outbound connectivity Shared file system access through a R/O GPFS cache (AFM, see later for details) Workshop CCR HTCondor 24-27 oct 2016

Usage Capped at 160Mhz by Aruba Workshop CCR HTCondor 24-27 oct 2016

Results The jobs on the remote machine are the same that are running in the T1 Efficiency depends on job Very Good for Monte Carlo Low for the rest Range from 0.9 to 0.49 Could be increased by better network bandwidth or caching data Tests are foreseen Workshop CCR HTCondor 24-27 oct 2016

Static expansion to remote farms
Workshop CCR HTCondor 24-27 oct 2016

Remote extension to Bari ReCaS
48 WNs (~26 kHS06) and ~330 TB of disk allocated to INFN-T1 farm for WLCG experiments in Bari-ReCaS data center Bari-ReCaS hosts a Tier-2 for CMS and Alice 10% of CNAF total resources, 13% of resources pledged to WLCG experiments Goal: direct and transparent access from INFN-T1 Our model is Wigner Dist: 600km RTT: 10ms Workshop CCR HTCondor 24-27 oct 2016

BARI – INFN-T1 connectivity
Commissioning tests Bari-ReCaS WNs to be considered as if on CNAF LAN L3 VPN configured 2x10 Gb/s, MTU=9000 CNAF /22 subnet allocated to BARI WNs Service network (i.e. for IPMI) accessible Bari WNs routing through CNAF Including LHCONE, LHCOPN and GPN Layout of CNAF-BARI VPN Distance: ~600 Km RTT: ~10 ms Workshop CCR HTCondor 24-27 oct 2016

Farm extension setup INFN-T1 LSF master dispatches jobs also to Bari-ReCaS WNs Remote WNs considered (and really appear) as local resources CEs and other service nodes only at INFN-T1 Auxiliary services configured in Bari-ReCaS CVMFS Squid servers (for software distribution) Frontier Squid servers (used by ATLAS and CMS for condition db) Workshop CCR HTCondor 24-27 oct 2016

Data Access Data at CNAF are organized in GPFS file-systems
Local access through Posix, Gridftp, Xrootd, and http Remote fs mount from CNAF unfeasible (x100 RTT) Jobs expect to access data the same way as if running at INFN-T1 Not all experiments can use a fallback protocol Posix cache for data required in Bari-ReCaS Alice uses Xrootd only (no cache needed) Cache implemented with AFM (GPFS native feature) Workshop CCR HTCondor 24-27 oct 2016

Remote data access via GPFS AFM
A cache providing geographic replica of a file system Manages R/W access to cache Two sides Home - where the information lives Remote – implemented as a cache Data written to the cache is copied back to home as quickly as possible Data is copied to the cache when requested AFM configured R/O for Bari-ReCaS ~400 TB of cache vs. ~11 PB of data Several tuning and reconfigurations required! In any case, decided to avoid submission of high throughput jobs to remote host HEPiX Fall 2016 24-27 oct 2016

Results: Bari-ReCaS Job Bari-ReCaS equivalent (or even better!) In general jobs at CNAF use WNs shared among several VOs during summer one of these (non WLCG), submitted misconfigured jobs, affecting efficiency of all other VOs Atlas submits only low I/O jobs on Bari-ReCaS Alice uses only XrootD, no cache “intense” WAN usage also from CNAF jobs Network appears not to be an issue We can work without cache if data comes via Xrootd Cache mandatory for some experiments Production quality since June 2016 ~550 k jobs (~8% CNAF) Experiment NJobs Efficiency Alice 105109 0,87 Atlas 366999 0,94 CMS 34626 0,80 LHCb 39310 0,92 Job Experiment NJobs Efficiency Alice 536361 0,86 Atlas 0,87 CMS 326891 0,76 LHCb 263376 0,88 Job Efficiency = CPT/WCT Cpt=current processing time Workshop CCR HTCondor 24-27 oct 2016

Conclusions An extension of INFN Tier-1 center with remote resources is possible 2 different implementations INFN remote resources Commercial cloud provider Even if we are planning to upgrade the Data Center to host enough resources at least to guarantee local execution of LHC Run 3 (2023), testing (elastic) extension is important Infrastructure will not scale indefinitely External extensions could address temporary peak requests Workshop CCR HTCondor 24-27 oct 2016

Dynamic Extension of the INFN Tier-1 on external resources

Similar presentations

Presentation on theme: "Dynamic Extension of the INFN Tier-1 on external resources"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Extension of the INFN Tier-1 on external resources

Similar presentations

Presentation on theme: "Dynamic Extension of the INFN Tier-1 on external resources"— Presentation transcript:

Similar presentations

About project

Feedback