CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.

Slides:



Advertisements
Similar presentations
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Advertisements

Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 5: Managing File Access.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
1 Introduction to Tool chains. 2 Tool chain for the Sitara Family (but it is true for other ARM based devices as well) A tool chain is a collection of.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
A. Mohapatra, HEPiX 2013 Ann Arbor1 UW Madison CMS T2 site report D. Bradley, T. Sarangi, S. Dasu, A. Mohapatra HEP Computing Group Outline  Infrastructure.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 5: Managing File Access.
SRM at Clemson Michael Fenn. What is a Storage Element? Provides grid-accessible storage space. Is accessible to applications running on OSG through either.
1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
The Disk Resource Manager A Berkeley SRM for Disk Cache Implementation and Testing Experience on the OSG Jorge Luis Rodriguez IBP/Grid Mini-Workshop Apr.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
Introduction to CVMFS A way to distribute HEP software on cloud Tian Yan (IHEP Computing Center, BESIIICGEM Cloud Computing Summer School.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
Introduction to AFS IMSA Intersession 2003 AFS Servers and Clients Brian Sebby, IMSA ‘96 Copyright 2003 by Brian Sebby, Copies of these.
Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
June 22, 2007USATLAS T2-T3 DQ2 0.3 SiteServices Patrick McGuigan
Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center Open Science Grid (OSG) Introduction for the Ohio Supercomputer Center February.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Linux Operations and Administration
Using CVMFS to serve site software Sarah Williams Indiana University 2/01/121.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
+ AliEn site services and monitoring Miguel Martinez Pedreira.
1 CMS Software Installation, Bockjoo Kim, 23 Oct. 2008, T3 Workshop, Fermilab CMS Commissioning and First Data Stan Durkin The Ohio State University for.
Lemon Tutorial Sensor How-To Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Gridmake for GlueX software Richard Jones University of Connecticut GlueX offline computing working group, June 1, 2011.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Testing CernVM-FS scalability at RAL Tier1 Ian Collier RAL Tier1 Fabric Team WLCG GDB - September
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
Five todos when moving an application to distributed HTC.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
11 DEPLOYING AN UPDATE MANAGEMENT INFRASTRUCTURE Chapter 6.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
UNICORE and Argus integration Krzysztof Benedyczak ICM / UNICORE Security PT.
Large Output and Shared File Systems
Operating a glideinWMS frontend by Igor Sfiligoi (UCSD)
Introduction to CVMFS A way to distribute HEP software on cloud
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
How to enable computing
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Integration of Singularity With Makeflow
Any Data, Anytime, Anywhere
Thursday AM, Lecture 2 Brian Lin OSG
Presentation transcript:

CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project

Outline Benefits of CVMFS to campus grid Installing FUSE client Using Parrot client (non-root) A glideinWMS plugin Existing repositories Hosting your own repository Some best practices Dan Bradley2

What is CVMFS? Network filesystem Read-only POSIX interface – FUSE mounted Fetches files via http – Verifies data integrity Aggressive caching – Local disk – Web proxies /cvmfs/MyRepo httpd0 httpd1 httpd2 SQUID local cache CVMFS FUSE client 3Dan Bradley

Benefits of CVMFS to Campus Grids Well suited for software distribution: – Easily scalable Local disk cache for repeated access Add more web proxies as needed – Highly available Robust error handling (failover, offline mode) Add more repository mirrors as needed – Secure access over untrusted networks Strong security mechanisms for data integrity – Works across administrative domains Including unprivileged environments (Parrot) Dan Bradley4

Truth in Advertising Young project Active development Small team Set expectations accordingly! – e.g. server component rarely used outside CERN, so more rough edges than client, which is used by many LHC sites Dan Bradley5

Getting the FUSE Client 1.Install rpm 2.Tell it which http proxies to use 3.Allocate cache space 4.Enable desired repositories Dan Bradley6

Installing FUSE Client RPMs are available from CERN and OSG CERN: OSG: Dan Bradley7

What if I am not root? Parrot Virtual Filesystem – No root privileges required – Works as job wrapper parrot_run /cvmfs/repo/MyProgram … See Dan Bradley8

Example Parrot Setup $ wget redhat5.tar.gz $ tar xzf cctools x86_64-redhat5.tar.gz $ export PATH=`pwd`/cctools x86_64-redhat5/bin:$PATH $ export HTTP_PROXY=frontier01.hep.wisc.edu:3128 $ parrot_run bash bash-3.2$ ls /cvmfs/grid.cern.ch default etcglite Dan Bradley9

Parrot Performance Cost Experience in CMS: For typical CMS jobs, running under Parrot is not much slower Your mileage may vary – Assume 5% performance hit until proven otherwise

Parrot Cache CVMFS local cache is in parrot tmp area – Default: /tmp/parrot. – Only one instance of parrot can use it at a time! – Override with parrot_run –t e.g. batch job could put it in per-job tmp dir Comparison to FUSE CVMFS – Local cache not shared between batch slots So uses more bandwidth and disk space – If cache deleted after job runs, successive jobs in same slot must start from scratch Could be a problem for short jobs (e.g. O(1) minute jobs) Dan Bradley11

Accessing Multiple Repositories Not efficient in current implementation – Considered an experimental feature – Disallowed by default – But should be ok for occasional switching from one repository to another, say < 0.1 Hz To enable multi-repository access in a single parrot session: export PARROT_ALLOW_SWITCHING_CVMFS_REPOSITORIES=1 Dan Bradley12

Accessing Other Repositories By default, Parrot knows about the CERN repositories Can configure Parrot to access other repositories export PARROT_CVMFS_REPO=cms.hep.wisc.edu:pubkey=/path/to/cms. hep.wisc.edu.pub,url= isc.edu (Or use equivalent parrot_run –r option.) See Parrot user’s manual for more cvmfs options – e.g. local cache quota Dan Bradley13

Use-case: FUSE CVMFS at home, glidein+Parrot abroad Idea: – Job can expect uniform CVMFS access wherever it lands – No need to modify job code for different environments Campus machines we administer OSG machines we don’t administer Dan Bradley14

A glideinWMS Job Wrapper If job says it requires CVMFS – Wraps job in parrot – Uses site squid, if possible – Otherwise, need a friendly squid at home May limit scalability Access control? See wrapper wrapper Dan Bradley15

glideinWMS CVMFS local cache Two cases: – Using glexec Each job has its own disk cache Deleted when job exits – Not using glexec Cache is saved for lifespan of glidein May improve efficiency for very short jobs Do we need glexec? – Wrapper uses Parrot’s identity boxing feature Provides privilege separation between job and glidein But cannot be 100% trusted yet due to wrapper running in user-controlled environment – work in progress Dan Bradley16

glideinWMS parrot_cfg # configure parrot cvmfs options # Here we just set the local cache quota # Only default (CERN) repositories are enabled here PARROT_CVMFS_REPO=" :quota_limit=4000,quota_threshold=2000" # central proxies to use for CVMFS if the local site proxy cannot be used CVMFS_PROXIES=" # CVMFS repository to use to test site web proxy CVMFS_TEST_REPO=" # path to test to validate cvmfs access CVMFS_TEST_PATH=/cvmfs/cms.cern.ch # If true and parrot can't access CVMFS_TEST_PATH, abort glidein startup. GlideinRequiresParrotCVMFS=false # If true, all jobs are wrapped with parrot, regardless of job's RequireCVMFS attribute. GlideinAlwaysUseParrotWrapper=false Dan Bradley17

Example glideinWMS job # tell glidein to wrap the job in parrot # (only relevant if glidein config makes this feature optional) +RequiresCVMFS = True Executable = my_program Output = stdout Error = stderr Queue Dan Bradley18

Existing Repositories CERN repositories grid.cern.ch, cms.cern.ch, atlas.cern.ch, etc. OASIS – OSG project under development – VOs may publish files in repository hosted by OSG – Alternative to maintaining files in OSG_APP at all target sites Wisconsin OSG_APP (GLOW OSG site) – VOs write to it like any other OSG_APP Dan Bradley19

CVMFS Server Only needed if you wish to create your own repository Lightweight service – Kernel module to detect updates – Program to prepare published files – httpd to serve files – Most I/O done by proxies May also want a mirror server – httpd + periodic sync of repository files Dan Bradley20

Managing the Repository Simple case: one software maintainer (cvmfs user) – Updates software tree – Triggers cvmfs publication step – New files show up on clients an hour later (or less) Dan Bradley21

Managing the Repository More complicated scenario: implementing OSG_APP with CVMFS – There are many software maintainers – We don’t want them to have to trigger publication Tried periodically running publication – Caused long delays and/or write errors to software maintainers operating at time of publication Instead, using periodic rsync from user- maintained tree into cvmfs software tree – Then publish to cvmfs – Software maintainers are never blocked Dan Bradley22

Wisconsin OSG_APP Repository 23 web files httpd CVMFS “shadow tree” writeable tree cvmfs_server publish rsync cvmfs01.hep.wisc.edu web files httpd cvmfs_pull cvmfs03.hep.wisc.edu CE: osggrid01.hep.wisc.edu nfs mount /cvmfs/cms.hep.wisc.edu /cvmfs/pub/cms.hep.wisc.edu worker node /cvmfs/cms.hep.wisc.edu local cache OSG glidein /cvmfs/cms.hep.wisc.edu parrot local cache SQUID cache01

Some CVMFS Best Practices Following examples are for HTCondor – Ideas are more general Dan Bradley24

Integrating with HTCondor: health check Problem: job runs and fails on machine with broken CVMFS – e.g. cache is on broken/full filesystem How to avoid such black holes: – startd cron job tests for working cvmfs – Publishes MyRepo_CVMFS_Exists = True Actual expression: ifThenElse(isUndefined(LastHeardFrom),CurrentTime,LastHe ardFrom) < 3600 True until test expires in 1 hour – Job requires TARGET.MyRepo_CVMFS_Exists == True Dan Bradley25

check_cvmfs startd cron job See – Basic functional test – Monitor cache space Important if cache does not have its own dedicated partition – Advertise current CVMFS catalog version Dan Bradley26

Integration with HTCondor: stale FS Problem: job runs and fails on machine that does not yet see latest cvmfs contents How to avoid this race condition: – startd cron job publishes catalog version: MyRepo_CVMFS_Revision = 4162 – Job should require execute node revision >= submit node revision For OSG jobs, we do this in condor.pm Dan Bradley27

Example Job # set the following to output of command: # attr -q -g revision /cvmfs/myrepo +SubmitNodeCVMFSRevision = 1234 Requirements = TARGET.MyRepo_CVMFS_Exists && TARGET.MyRepo_CVMFS_Revision >= SubmitNodeCVMFSRevison Dan Bradley28

Links CVMFS website: Parrot website: Parrot CVMFS job wrapper for glideinWMS apper apper CVMFS OSG_APP implementation HTCondor cvmfs startd cron script Dan Bradley29