Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parrot and ATLAS Connect

Similar presentations


Presentation on theme: "Parrot and ATLAS Connect"— Presentation transcript:

1 Parrot and ATLAS Connect
Rob Gardner Dave Lesny

2 ATLAS Connect A Condor and Panda-based batch service to easily connect resources Connect to ATLAS Compliant resources like a Tier2 Connect to opportunistic resources such as campus clusters Stampede cluster at the Texas Advance Computing Center Midway cluster at University of Chicago Illinois Campus Cluster at UIUC/NCSA Each is RHEL6 or equivalent with either SLURM or PBS as local scheduler

3 Accessing Stampede Use simple Condor submit using BLAHP protocol (ssh login to stampede local submit host) (factory based on Test for prerequisites APF uses same mechanism PanDA queues – operated from MWT2 APF for pilot submission CONNECT: production queue ANALY_CONNECT: analysis queue MWT2 storage for DDM endpoints Frontier squid service

4 Challenges Additional system libraries (“ATLAS compatibility libraries”) as packaged in HEP_Oslibs_SL6 Access to CVMFS clients and cache Environment variables normally setup by an OSG CE, needed by the pilot $OSG_APP, $OSG_GRID, $VO_ATLAS_SW_DIR Approach was to provide via the user job wrapper these components

5 Approaches Linux Image with all libraries built using fake[ch]root
Deploy this image locally via tarball or via a CVMFS repo Use the CERN VM3 image in /cvmfs/cernvm-prod.cern.ch Use Parrot to provide access to CVMFS repositories Use Parrot “–mount” to map file references into the Image /usr/lib64  /cvmfs/cernvm-prod.cern.ch/cvm3/usr/lib64 Install a Certificate Authority and OSG WN Client Emulate the CE by defining env vars Some defined in APF ($VO_ATLAS_SW_DIR, $OSG_SITE_NAME) Others defined in “wrapper” ($OSG_APP, $OSG_GRID)

6 Problems (1) Symlinks cannot be followed between repositories
Not possible with Parrot due to restrictions with libcvmfs /cvmfs/osg.mwt2.org/atlas/sw  /cvmfs/atlas.cern.ch/repo/sw In general, we find cross-referencing CVMFS repos unreliable A python script located in atlas.cern.ch needs a lib.so If lib.so resides in another repo, might get “File not found” Solution was to use a local disk for the Linux Image Solution: Download a tarball and installed locally on disk Also install local OSG worker-node client and CA in same location

7 Problems (2): Parrot stability
Parrot is very sensitive to the kernel version When used on kernels 2.x, many atlas programs hang Parrot uses ptrace and clones the system call Bug in ptrace in some kernels cause a timing problem Program being traced is awakened with “sigcont” before it should Result is that the program stays in “T” state forever Kernels known to have issues with Parrot ICC el6.x86_64 Stampede el6.x86_64 Midway el6.x86_64 Custom kernel at MWT2 which seems to work is “ UL3.el6”

8 Towards a solution: Parrot 4.1.4rc5
To work around the hangs, CCTools team provided a feature --cvmfs-enable-thread-clone-bugfix Stops many (not all) hangs with a huge performance penalty Simple ARLB with an asetup of a release take 10x to 100x longer Needed on 2.x kernels to avoid many of the hangs Programs which tend to run on 2.x without “bugfix” are Atlas Local Root Base setup (and diagnostics db-readReal and db-fnget) Reconstruction Panda Pilots Validation jobs Programs which tend to hang Sherpa (always) Release 16.x jobs Some HammerCloud tests (16.x always, 17.x sometimes)

9 Alternatives to Parrot?
The CCTools team will be working on Parrot to fix bugs May need to use kernel 3.x on target site for reliability Three solutions we are pursuing: Parrot with Chirp (avoid libcvmfs) NFS mounting of local CVMFS (requires admin) Use Environment Modules, common on HPC facilities Treat CVMFS client as a user application Jobs “module load cmvfs-client” Prefix has privileges – can load needed FUSE modules Cache re-use my multi-core job slots Might be more palatable to HPC admins

10 Conclusions Good experience accessing opportunistic resources without WLCG or ATLAS services A general problem for campus clusters Would greatly help if we: Relied on only one CVMFS repo + stock SL6 (like CMS) Will continue pursuing the three alternatives Hope we can learn from others here!


Download ppt "Parrot and ATLAS Connect"

Similar presentations


Ads by Google