DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.

Slides:



Advertisements
Similar presentations
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
Site Report: Tokyo Tomoaki Nakamura ICEPP, The University of Tokyo 2013/12/13Tomoaki Nakamura ICEPP, UTokyo1.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
VMware vSphere Configuration and Management v6
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
DPM at ATLAS sites and testbeds in Italy
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
LCG 3D Distributed Deployment of Databases
StoRM Architecture and Daemons
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
ATLAS Software Installation redundancy Alessandro De Salvo Alessandro
STORM & GPFS on Tier-2 Milan
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
Cloud Computing Architecture
Presentation transcript:

DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali di Frascati) 1A.Doria - DPM Workshop - Edinburgh Outline of the talk: DPM in Italy Setup at the DPM TIER2 sites Setup of the EPEL testbed EPEL testing activity Other activities related to storage access in Italy 13/12/2013

Italy in the DPM collaboration Storage systems at ATLAS italian sites: – DPM at Frascati, Napoli, and Roma Tier2. – STORM at CNAF Tier1 and at Milano Tier2. The 3 people involved in the DPM collaboration are the Site Managers of the 3 DPM Tier2s. Alessandro D.S. is the ATLAS Italian Computing Coordinator. Some smaller sites also use DPM. A.Doria - DPM Workshop - Edinburgh2 Napoli Frascati Roma CNAF T1 Milano 13/12/2013

DPM setup at INFN-Roma1 A.Doria - DPM Workshop - Edinburgh3 > 1 PB of disk space – 18 disk servers, attached to 7 SAN systems (used in direct attach mode) via 4/8 Gbps Fibre Channels – All servers equipped with 10 Gbps Ethernet connection to a central switch – WAN connection via LHCONE at 10 Gbps Installation – Supervised by Puppet (some machines already using Foreman + Puppet) – Currently using a custom puppet module executing yaim – Will move to full puppetized configuration soon, the infrastructure is ready Backup Policy – MySQL main replica in the DPM head node – Slave replica on a different node – Backup, daily snapshots from the slave node – Currently evaluating the possibility to move the database to our Percona XtraDB cluster (6 DB machines on-line), already used for the ATLAS global Installation System, hosted in Roma. Monitoring – Ganglia/nagios via custom plugins – Will add the DPM standard plugins soon 13/12/2013

DPM setup at INFN-Napoli 1.2 PB of disk space – 22 disk servers, attached to 9 SAN systems (used in direct attach mode) via 4/8 Gbps Fibre Channels. – All servers connected to a central switch with 10 Gbps Ethernet FC. – WAN connection via LHCONE at 10 Gbps. Setup – DPM release on SL5, yaim configuration – MySQL DB (rel ) is on the same server as the DPM head node. Backup policy – MySQL DB daily backups, with Percona xtrabackup, are saved for 7 days. – Full backup of the the whole head node (DB included) is done by rsync twice a day on a secondary disk of another server. In case of hw failure, the other server can boot from this disk, starting an exact copy of DPM head node, not older then 12 h. Monitoring – Ganglia/nagios via custom plugins A.Doria - DPM Workshop - Edinburgh413/12/2013

DPM setup at INFN-FRASCATI (production site) 5A.Doria - DPM Workshop - Edinburgh 700 TB of disk space – 7 disk servers, attached to 7 SAN systems (used in direct attach mode) via 4/8 Gbps Fibre Channels – All servers equipped with 10 Gbps Ethernet connection to a central switch – WAN connection via LHCONE at 10 Gbps soon (by the end of Feb ‘14) Setup: – DPM release Upgrade to by the end of the year – DPM DB on a separated server: MySQL Backup – DPM DB replication with MySQL master/slave configuration; – If MySQL server crashes it’s enough to change the DPM_HOST variable in the DPM head node with the MySQL slave hostname and run yaim configuration; – Slave DB daily backup with mysqldump; Monitoring – Ganglia/Nagios custom plugins 13/12/2013

EPEL testbed setup at INFN-Frascati Setup – DPM head node: atlasdisk1.lnf.infn.it – The head node is installed on a VM (2 cores, RAM 4GB) – the MySQL server is on the physical server (8 cores, RAM 16GB). – Only one disk server with 500GB of disk space. – 1Gbps links for LAN and WAN connections. A.Doria - DPM Workshop - Edinburgh6 Installation – SL6.4, DPM – yaim configuration, to be moved to puppet a.s.a.p. – XRootD, WebDAV/https enabled Added to Wahid Hammercloud test page. 13/12/2013

EPEL testing activity Installed at end of summer 2013 for the first time, too late for the first EPEL release validation ! Re-installed again in November ’13 from EPEL +EMI3 SL6 repos: – EPEL-testing repo enabled, no autoupdate. – Installation guide followed step by step to check its consistency. – Very easy, no problem found: only a reference to XRootD setup could be useful. Ready to test upgrades or reinstallation of new releases from scratch, according to the collaboration needs. The HW is quite old and Ethernet connection is 1Gbps: – testbed tailrored for functionality tests – HW improvements for performance tests could be planned if needed. We are planning to test a High Availability setup. 7A.Doria - DPM Workshop - Edinburgh13/12/2013

DPM in HA 8A.Doria - DPM Workshop - Edinburgh The DPM head node is virtualized – Need to evaluate the performance impact on this Two different ways of virtualization possible – Standard virtualization (KVM) over a shared filesystem (GlusterFS or Ceph) – OpenStack-based virtualization Both solutions are possible, more experience in Italy with the first option – Will try to use either GlusterFS with GFAPI or Ceph for performance reasons, need very recent versions of QEMU and libvirt – We also have an OpenStack Havana testbed in Roma, we will experience on that (it is also GlusterFS GFAPI enabled) Database in HA mode via Percona XtraDB cluster – Using (multiple instances of) HAproxy as load balancer Head node HA – Pacemaker/Corosync if standard virtualization – Still have to be discussed for the Cloud option Pool nodes – May want to experience with GlusterFS or Ceph for the testbed Configuration operated via Foreman/Puppet 13/12/2013

ATLAS data access at DPM sites Different access modes can be set for the queues in Panda, the ATLAS workload management system for production and distributed analysis. For DPM sites the usual configuration is: – Analysis jobs access directly local input files via XRootD – MC production at DPM sites copies input files to WNs. The 3 DPM Italian Tier2s and the Tier1 are part of FAX – Federated ATLAS storage systems using XRootD. – FAX uses redirection technology to access data in the federated systems. The use of FAX can be enabled in Panda, with ad-hoc configuration of the Panda queues. FAX redirection allows the failover to storage systems in different sites in case a required file is not found locally. The FAX redirection in Panda is not yet enabled for the Italian sites. Some tests with FAX will be showed. A.Doria - DPM Workshop - Edinburgh913/12/2013

“PROOF-based analysis on the ATLAS Grid facilities: first experience with the PoD/PanDa plugin”, E. Vilucchi, et al. We presented this work at CHEP2013 about the use of PoD, a tool enabling users to allocate a PROOF cluster on Grid resources to run their analysis. – We tested the new Panda plugin for PoD, using real analysis examples, in the ATLAS Italian cloud and at CERN. – We also run performance tests over the DPM, StoRM/GPFS and EOS storage systems and with different access protocols: XRootD both on LAN and WAN; file protocol over StoRM/GPFS; XRootD on Storm/GPFS and EOS over WAN; FAX infrastructure was tested for XRootD access; Few tests of root access with HTTP for DPM. PROOF on Demand (PoD) and tests with FAX in the ATLAS IT cloud 10A.Doria - DPM Workshop - Edinburgh13/12/2013

11 Results of local data access tests Calibration and DS1 at CNAF Calibration and DS1 at CERN Calibration and DS1 at Roma1 Calibration and DS2 at Roma1 Calibration tool performs only data access (no computation load): -It’s a PROOF job reading the whole content of each entry of the TTrees from the files. -To check the site efficiency before to run analysis with PoD; -To obtain an upper limit on the performance in terms of MB/s as a function of number of used worker nodes. Sample datasets of real analysis used as input with PoD -DS1 (100 files), DS2 (300 files): standard D3PD dataset (~100KB per event). STORM/GPFS and EOS have comparable performance. EOS and GPFS file access perform better, but the difference with DPM XRootD is only about 25%; Max MB/s in function of number of workers Calibration and DS2 at CNAF A.Doria - DPM Workshop - Edinburgh13/12/2013

Analysis tests 12 Analysis 1 and DS1 at CERN Analysis 1 and DS1 at CNAF Analysis 1 and DS1 at Roma1 Max MB/s versus number of workers running analysis 1 Analysis tests with three real analyses, on different input datasets (D3PD, user-defined ntuples) with different event processing load. I/O performance scalability at CERN, CNAF, Roma1 Compared with the results of the calibration, in this analysis the weight of the additional event processing is about 16%. A.Doria - DPM Workshop - Edinburgh The performance scales linearly in a cluster up to 100 WNs, no bottleneck found due to data access. Some preliminary tests with http protocol gave slightly lower performance, but comparable with XRootD. 13/12/2013

Analysis 1 at CNAF and DS1 at Napoli Analysis 1 at CNAF, DS1 at Roma1 Analysis 1 at Roma1, DS1 at CNAF Remote XRootD access over WAN and FAX infrastructure 13 We run analyses with PROOF cluster and input datasets in different sites. Direct XRootD access (without FAX); FAX redirection. FAX works smoothly. Accessing, through a central FAX redirector, to an input container distributed across different sites, the request is forwarded to the appropriate storage elements, selecting the local Storage when possible the mechanism is transparent for the user. Performance depends on network connections. Max MB/s in function of # of WNs for analysis 1 with DS1 over WAN A.Doria - DPM Workshop - Edinburgh13/12/2013

WebDAV/http Federation An http(s) redirector has been installed at CNAF. Deployed with Storm on a test endpoint – http(s) in Storm now supports WebDAV With DPM the http(s) access has already been tested and it works without problems. The LFC interface has been enabled for the http redirector – Which LFC is addressed? CERN or CNAF? When ATLAS will switch from LFC to Rucio, a plugin interface for Rucio will be needed. Shortly the redirector will use the production endpoint at CNAF. – DPM sites will be ready to join the http(s) federation. No monitoring yet To be tested with ROOT – the ROOT support is limited until now for https, TDavixFile needed. 14A.Doria - DPM Workshop - Edinburgh13/12/2013

Conclusions We are happy with our DPM systems and with the latest DPM developments. A ot of interesting work can be done in testing new solutions with DPM. Thanks to the DPM collaboration! 13/12/2013A.Doria - DPM Workshop - Edinburgh15