Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September 2010 1.

Slides:



Advertisements
Similar presentations
Steve Traylen Particle Physics Department Experiences of DCache at RAL UK HEP Sysman, 11/11/04 Steve Traylen
Advertisements

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Enigma Data’s SmartMove.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Delivering Experiment Software to WLCG sites A new approach using the CernVM Filesystem (cvmfs) Ian Collier – RAL Tier 1 HEPSYSMAN.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
L. Arrabito, D. Bouvet, X. Canehan, P. Girard, Y. Perret, S. Poulat, R. Rumler SW distribution tests at Lyon Pierre Girard Luisa Arrabito, David Bouvet.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
CVMFS AT TIER2S Sarah Williams Indiana University.
RLS Tier-1 Deployment James Casey, PPARC-LCG Fellow, CERN 10 th GridPP Meeting, CERN, 3 rd June 2004.
Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
1 Resource Provisioning Overview Laurence Field 12 April 2015.
05/29/2002Flavia Donno, INFN-Pisa1 Packaging and distribution issues Flavia Donno, INFN-Pisa EDG/WP8 EDT/WP4 joint meeting, 29 May 2002.
Database Administrator RAL Proposed Workshop Goals Dirk Duellmann, CERN.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Pilot Jobs John Gordon Management Board 23/10/2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
Changes to CernVM-FS repository are staged on an “installation box" using a read/write file system interface. There is a dedicated installation box for.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Using CVMFS to serve site software Sarah Williams Indiana University 2/01/121.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES CVMFS deployment status Ian Collier – STFC Stefan Roiser – CERN.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
CernVM-FS – Best Practice to Consolidate Global Software Distribution Catalin CONDURACHE, Ian COLLIER STFC RAL Tier-1 ISGC15, Taipei, March 2015.
Considerations on Using CernVM-FS for Datasets Sharing Within Various Research Communities Catalin Condurache STFC RAL UK ISGC, Taipei, 18 March 2016.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
CernVM-FS vs Dataset Sharing
Status of Task Forces Ian Bird GDB 8 May 2003.
Bentley Systems, Incorporated
Dag Toppe Larsen UiB/CERN CERN,
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Dag Toppe Larsen UiB/CERN CERN,
Torrent-based software distribution
3D Application Tests Application test proposals
WLCG experiments FedCloud through VAC/VCycle in the EGI
Introduction to CVMFS A way to distribute HEP software on cloud
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Conditions Data access using FroNTier Squid cache Server
Workshop Summary Dirk Duellmann.
Presentation transcript:

Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September

Contents Experiment software server issues at RAL Required characteristics for software area CernVM-FS - a possible solution? Current tests at RAL Outlook 2

Experiment Software Server issues RAL software servers are similar to the setup described at PIC, with NFS servers, nodes for running install jobs etc. We see the same issues and problems, particularly load related performance degradation on the Atlas software server Not in a position to buy a NetApp or BlueArc – but even they appear not be immune Upgrading the NFS server helped a bit – but depending on job mix jobs Atlas can still bring the server to its knees – and knock WNs offline 3

Some Experiment Software Server characteristics Do not require write access Many duplicated files, even within releases, never mind between releases. In Atlas’ case many repeated accesses of same file during jobs Currently maintained with local install jobs at sites Caching read-only filesystem would be ideal 4

CernVM-FS …….. perhaps? HTTP & Fuse based (first implementation was based on GROW-FS) Developed to uniformly deliver experiment software to CernVM appliances Not inherently anything to do with virtualisation May deliver even greater benefits to physical WNs With local squid proxies it should scale easily – just add squids (although our initial tests suggest one proxy will be fine) File based deduplication (a side effect of integrity checks) – no file gets transferred twice to a given cache – software areas contain many, many identical files. Caches at CERN, at local squids, and on client WN Repeated access of software area during jobs (eg Atlas conditions files) all becomes local after first access 5

CernVM-FS testing at RAL Currently testing scalability at RAL So far just 800 or so jobs RAL most interested in resolving file server issues If we can fix that then more jobs will succeed Of course if it means jobs run faster too that would be nice 6

CernVM-FS tests Now testing at RAL: So far just 800 jobs or so - the squid proxy barely misses a beat:

CernVM-FS tests By comparison the production Atlas Software server in the same week (not especially busy):

CernVM-FS at RAL - next steps Scale tests to thousands of jobs – planning now. Compare performance with NFS server as check Install latest version – supports multiple VOs Work out and document use with grid jobs to allow wider tests

CernVM-FS - outlook Security audit of software still to be completed Question about production support still to be answered Would eliminate local install jobs Potential for standard software distribution to all sites (with any configuration areas for each site on server)

So far it looks very promising. Some more detail: 93/cvmfs-tr/cvmfstech.preview.pdf and essionId=3&resId=0&materialId=slides&confId=89681