Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro

Similar presentations


Presentation on theme: "Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro"— Presentation transcript:

1 Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro
Working with Grid Sites in ATLAS Alessandro De Salvo Outline Grid concepts Working with Grid certificates The Atlas VO Getting info on the datasets Managing files Setting up the ATLAS software from CVMFS Using FAX Links and contacts A. De Salvo – 27 Oct 2017

2 SECTION 1 Grid concepts

3 What is a grid? Relation to WWW? Uniform easy access to shared information Relation to distributed computing? Local clusters WAN (super)clusters Condor Relation to distributed file systems? NFS, AFS, GPFS, Lustre, PanFS… A grid gives selected user communities uniform access to distributed resources with independent administrations Computing, data storage, devices, …

4 Why is it called grid? Analogy to power grid
You do not need to know where your electricity comes from Just plug in your devices You should not need to know where your computing is done Just plug into the grid for your computing needs You should not need to know where your data is stored Just plug into the grid for your storage needs

5 Ian Foster’s checklist
Globus A grid coordinates resources that are not subject to centralized control, using standard, open, general-purpose protocols and interfaces, to deliver non-trivial qualities of service. Response time Throughput Capacity Availability Security Co-allocation of multiple resource types for complex work Utility of the combined system significantly greater than the sum of its parts

6 What is Cloud Computing?
Transparent use of generic computing resources off-site Dynamically provisioned Metered Neutral to applications Rent-a-center Computer or data center Internet Amazon EC2 Amazon S3 Sun Private Clouds Site

7 What is grid computing? Site Site Internet Site Site Site Site Site

8 What is grid computing about?
A grid facilitates collaboration between members of a supported distributed community They can form a Virtual Organization within that grid A grid allows distributed resources to be shared uniformly and securely for common goals Computing Data storage A grid can support multiple Virtual Organizations in parallel Sites, computer and data centers make selections according to the projects in which they participate The quality of service may differ per VO

9 How does a grid work? Middleware makes multiple computer and data centers look like a single system to the user Security Information system Data management Job management Monitoring Accounting Not easy!

10 Where can we use grids? Scientific collaborations
Can also serve in spreading know-how to developing countries Industry? Commerce? Research collaborations Intra-company grids Mostly cloud computing Grid research may provide open standards, technologies Homes? Schools? E-learning Internet Service Providers  cloud computing Government? Hospitals? Other public services? Beware of sensitive/private data

11 There are many grids EGEE – Enabling Grids for E-sciencE
Europe and beyond OSG – Open Science Grid USA and beyond National INFNGrid (It), GridPP/NGS (UK), D-Grid (De), NAREGI (Jp), … Regional NorduGrid (Nordic countries), BalticGrid (Baltic region), SEEGrid (S-E Europe), EUMedGrid (Mediterranean), … Interregional EELA (Europe + Latin America), EUIndiaGrid, EUChinaGrid WLCG – Worldwide LHC Computing Grid Federation of EGEE, OSG, Nordic Data Grid Facility, … Grids of Clouds Private/scientific Clouds Commercial Clouds

12 Projects collaborating with EGEE

13 There are many communities
High-energy physics Astrophysics Fusion Computational chemistry Biomed – biological and medical research Health-e-Child – linking pediatric centers WISDOM – “in silico” drug and vaccine discovery Earth sciences UNOSAT – satellite image analysis for the UN Digital libraries E-learning Industrial partners in EGEE CGGVeritas – geophysical services Philips

14 WLCG Tier-1 centers Tier-0 CERN Tier-2 sites
Taiwan ASGC USA BNL France CCIN2P3 Germany FZK FNAL Italy CNAF Nordic countries NDGF Spain PIC UK RAL NL SARA- NIKHEF Canada TRIUMF > 140 computing centers 35 countries Hierarchical and regional organization 12 large centers for primary data management CERN = Tier-0 11 Tier-1 centers 10 countries Fast network links 38 federations of smaller Tier-2 centers

15 WLCG Tier-1 centers

16 WLCG sites

17 The ATLAS computing model
~Pb/sec Event Builder Event Filter Some data for calibration and monitoring to institutes Calibrations flow back Calibration First processing Tier 0 T0 MSS Tier 1 US Regional Centre UK Regional Centre (RAL) Spanish Regional Centre (PIC) Italian Regional Centre (CNAF) Reprocessing Group analysis MSS MSS MSS MSS 622Mb/s Tier 2 Tier2 S Centre Tier2 Centre Tier2 Centre Tier2 Centre Analysis Simulation Institute 1 Institute 2 Institute 3 Institute 4 Average Tier 2 has ~25 physicists working on one or more channels Physics data cache Workstations Desktop

18 Grid, Certificates and the ATLAS VO
SECTION 2 Grid, Certificates and the ATLAS VO

19 The VO Mechanism The Virtual Organization mechanism provides a way to give authorization to the user during the task instantiation VOs are used to organize the credentials of sets of users When a user submit a request to the grid his/her credentials are compared with the informations coming from the VOMS (Virtual Organization Management Service) server VOMS populated using the informations obtained from users and managed by a VO administrator A user included in a VO will be authorized to use all of the resources assigned to that particular VO Different user privileges, depending on the role and group affiliation VO: A collection of people, resources, policies and agreements

20 (registration service)
The VO implementation VOMS (Admin+Server) voms2.cern.ch OSG LCG NorduGrid CERN is currently providing 2 VOMS servers (one with the old lists and one with the new). BNL is combining info in the prod server. VOMS (Admin+Server) vo.racf.bnl.gov VOMS (Admin+Server) lcg-voms2.cern.ch VOMS-Admin (registration service) lcg-voms2.cern.ch bnl-atlas-sync Arrows signify dependencies (not dataflow)

21 Certificates Help about the ATLAS VO, certificates and CAs
The certificates usually come in an encrypted packaged form (pkcs12) The .p12 files may be imported directly to the browsers To access the grid services the certificate must be splitted into a user certificate and a private key Both files will have to be stored into a directory called $HOME/.globus To split a pkcs12 certificate called my_cert.p12 into the cert & key openssl pkcs12 -nokeys -clcerts -in my_cert.p12 -out usercert.pem openssl pkcs12 -nocerts -in my_cert.p12 -out userkey.pem chmod 644 usercert.pem chmod 600 userkey.pem When generating usercert.pem and userkey.pem you’ll be asked for a password to protected them This password will be used to submit jobs to the grid To package the userkey.pem and usercert.pem into a pcks12 file, with name “My certificate” (optional, only used to select your certificate with a reasonable name in the browsers) openssl pkcs12 -export -inkey userkey.pem -in usercert.pem -out my_cert.p12 -name "My certificate"

22 Proxies A proxy is a temporary delegation of the user’s credentials to the services You will be able to submit Grid/Panda jobs only when you have a valid proxy By default the proxies have a validity of 12 hours Maximum time allowed by the server is 96h To open a proxy, from a grid-enabled machine voms-proxy-init -voms atlas Open a proxy with a specific group or role voms-proxy-init -voms atlas:/atlas/it voms-proxy-init -voms atlas:/atlas/phys-higgs/Role=production To check your proxy informations voms-proxy-info To destroy a proxy voms-proxy-destroy

23 How to choose the correct groups/roles in the VO
/atlas/usatlas OSG (Open Science Grid) users only Only US people may apply to this /atlas/<country code> Funding agencies /atlas/phys,perf,trig,… Physics and performance groups, oly needed for group analysis and privileged access to data Roles production managers of the official ATLAS productions pilot Analysis pilots (PANDA)

24 The ATLAS VO registration process
resources authorization tools LCG User Registration (VOMS-Admin) User User selects a VO NorduGrid resources authorization tools Atlas VOMS Checks against the CERN HR database, notification to the VO administrator resources authorization tools OSG resources VO Manager

25 Atlas VO FAQs (incomplete list)
What should I do if I need to renew or change my certificate in the VO? You need to add the new certificate DN via VOMS-Admin What if I need to change VO or leave ATLAS? You should contact the VO managers, in order to be unregistered from the old VO, and register again to the new VO The VOMRS registration server rejects my registration attemp saying that the I’m using does not correspond to any ATLAS user at CERN Check if you are correctly registered as an ATLAS user at CERN Use the address you have registered at CERN Other VO-related problem Please contact

26 Use of the Grid in Atlas In order to use the grid as an Atlas member you need A personal certificate, correctly installed in your machine (see the ATLAS computing workbook To be correctly registered into the ATLAS VO To have access to a grid-enabled front-end machine or just access to CVMFS Most common applications of the grid in Atlas PANDA (Production AND Analysis) Official MC productions and data reconstruction User analysis Distributed Storage Monitoring and historical views (dasboards) All ADC monitoring pages

27 Accessing data stored in the Grid
SECTION 3 Accessing data stored in the Grid

28 Where are the files stored?
The production files are stored in several Storage Elements, scattered around the world Direct access to the files is allowed for the ATLAS VO users Different tools to access the files, depending of the file location File access (sites with Posix filesystems) Xrootd https Tools for Searching for a specific file Getting the file locally (local storage element or local machine)

29 How are the files organized?
The files stored in the sites are organized in reserved disk spaces, called Space Tokens DATADISK/DATATAPE  Real data GROUPDISK  Group Analysis data (now included in DATADISK) LOCALGROUPDISK  Local Analysis Group data PRODDISK  Production data buffer SCRATCHDISK  Temporary anaysis data Users can ask for replication in remote sites to LOCALGROUPDISK Space Tokens Everybody can read from here, but only the central data management system is able to write The other space tokens are not manageable by users, only central production can ask for replications there ALL the space tokens are ATLAS-wide readable The results of the analysis jobs must be stored in the ATLASSCRATCHDISK Space Token Cleaned from time to time (normally 30 days) Users are responsible to replicate the data stored in SCRATCHDISK to the LOCALGROUPDISK of a given site

30 Data Placement Tier-1 centers Tier-0 CERN Tier-2 sites Italian Cloud
Taiwan ASGC USA BNL France CCIN2P3 Germany FZK FNAL Italy CNAF Nordic countries NDGF Spain PIC UK RAL NL SARA- NIKHEF Canada TRIUMF PD2P Dynamic placement of the data at the T2s, based on the data popularity Up to now the jobs run in the sites where data are stored The locality will break when we will fully use the remote access protocols, like Xrootd and Https Italian Cloud

31 The Atlas Distributed Data Management system (Rucio)
File-level granularity Rucio accounts A Rucio user is identified by his credentials, like X509 certificates, username/password, or token Data Identifiers Files, datasets and containers follow an identical naming scheme which is composed of two strings: the scope and a name. The combination of both is called a data identifier (DI) Replica management Replica management is based on replication rules defined on logical files Accounting and quota Quota is a policy limit which the system applies to an account Rucio accounts will only be accounted for the files they set replication rules on

32 Getting files using DDM
The ATLAS Distributed Data Management system, codenamed Rucio Using DQ2, assuming you already have a grid enviroment and a valid VOMS proxy (with a nickname) Setup DQ2 (in a clean shell, or use the rucio clients with localSetupRucio) export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh’ setupATLAS localSetupRucioClients Get the list of the datasets matching a pattern (wildcards like ‘*’ are allowed) rucio list-dids <dataset name> Get the list of the files in a dataset rucio list-files <dataset name> Get the path of the files in a dataset rucio list-file-replicas [--rse <RSE>] <dataset name> Get a dataset (if the destination directory is omitted the files are copied in the local directory) rucio download <dataset name> Users cannot access files on tape or on Tier0 privileged pools The files must be replicated to some other location before they can be accessed Replication is possible for all the files and data to be accessed locally on remote sites Destination must be LOCALGROUPDISK space token

33 The Rucio UI Rucio UI: the new user access point to DataTransfer
Monitoring of subscriptions/rules, spacetoken usage, etc. R2D2: the new Data Transfer interface Reports

34 Setting up the ATLAS software from CVMFS
SECTION 4 Setting up the ATLAS software from CVMFS

35 CVMFS Dynamic software distribution model via CVMFS
Virtual software installation by means of an HTTP File System Data Store Compressed Chunks (Files) Eliminates Duplicates File Catalog Directory Structure Symlinks SHA1 of Regular Files Digitally Signed Time to Live Nested Catalogs Distribution of the condition files via CVMFS Export the experiment software as read-only Mounted in the remote nodes via the fuse module Local cache for faster access Benefits of a squid hierarchy to guarantee performance, scalability and reliability Same squid type as the one used for Frontier

36 Setting up the ATLAS software via CVMFS [1]
Simple setup via ATLASLocalRootBase export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh' setupATLAS Many software components available lsetup lsetup <tool1> [ <tool2> ...] (see lsetup -h): lsetup agis ATLAS Grid Information System lsetup asetup (or asetup) to setup an Athena release lsetup atlantis Atlantis: event display lsetup eiclient Event Index lsetup emi EMI: grid middleware user interface lsetup fax Federated XRootD data storage access (FAX) lsetup ganga Ganga: job definition and management client lsetup lcgenv lcgenv: setup tools from cvmfs SFT repository lsetup panda Panda: Production ANd Distributed Analysis lsetup pod Proof-on-Demand (obsolete) lsetup pyami pyAMI: ATLAS Metadata Interface python client lsetup rcsetup (or rcSetup) to setup an ASG release lsetup root ROOT data processing framework lsetup rucio distributed data management system client lsetup sft setup tools from SFT repo (use lcgenv instead) lsetup xrootd XRootD data access advancedTools advanced tools menu diagnostics diagnostic tools menu helpMe more help printMenu show this menu showVersions show versions of installed software

37 Setting up the ATLAS software via CVMFS [2]
Pros CVMFS is available in all the sites ATLASLocalRootBase is sharing the releases with the standard production/analysis jobs Nightlies available and updated every night in CVMFS Simple setup, access to all the tools needed for the analysis Even the gridd middleware can be setup Constantly updated Software pre-configured statically or dynamically configuring Automatic local site settings Faster access and less disk space used than traditional shared filesystems Cons CVMFS is not available offline, unless the files are already in cache

38 SECTION 5 Working with FAX

39 FAX: the Federated ATLAS Storage System using XRootD
FAX (Federated ATLAS storage systems using XRootD) brings Tier 1, Tier 2 and Tier 3 storage resources together into a common namespace, accessible from anywhere Based on the XRootD protocol and data distribution infrastructure Client software tools like ROOT or xrdcp can use FAX to reach storage services regardless of location Increases in network bandwidth and data structure aware caching mechanisms (such as TTreeCache) make this possible FAX can be used as failover in jobs (not enabled by default) Goal reached ! >96% data covered ATLAS Jamboree – Dec 2014 39

40 Using FAX: basic introduction [1]
Pre-requisites CVMFS VOMS proxy Set up FAX using CVMFS and ROOT export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh localSetupFAX --rootVersion=current-SL6 Check dataset availability in FAX fax-is-dataset-covered <scope>:<dataset name> Example: fax-is-dataset-covered mc15_13TeV:mc15_13TeV PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413 Copy a dataset to the local storage Supports multiple streams, retries, partial dataset copy, skipping non-root files, timeouts and more fax-get <scope>:<dataset name> Example: fax-get mc15_13TeV:mc15_13TeV PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413 Find global logical file names (gLFNs) fax-get-gLFNs <scope>:<dataset name> Example: fax-get-gLFNs mc15_13TeV:mc15_13TeV PowhegPythiaEvtGen_P2012_ttbar_hdamp172p5_nonallhad.merge.DAOD_TOPQ1.e3698_s2608_s2183_r6765_r6282_p2413

41 Using FAX: basic introduction [2]
Copy a file from FAX to local disk The $STORAGEPREFIX depends on your local storage element xrdcp $STORAGEPREFIX/atlas/rucio/<scope>:<file name> /tmp/myLocalCopy.root Open and inspect a file with ROOT Using a FAX enabled storage element TFile *f = TFile::Open("root://grid-cert-03.roma1.infn.it//atlas/rucio/<scope>:<file name>") Using a redirector (her using the IT redirector) TFile *f = TFile::Open("root://atlas-xrd-it.cern.ch//atlas/rucio/<scope>:<file name>") Example: TFile *f = TFile::Open("root://atlas-xrd-it.cern.ch//atlas/rucio/mc15_13TeV:DAOD_TOPQ _ pool.root.1") Using FAX from a prun job Instead of giving the --inDS myDataset option, provide it with --pfnList my_list_of_gLFNS.txt, where my_list_of_gLFNS.txt is the output of fax-get-gLFNs

42 SECTION 6 Links and contacts

43 Documentation Wiki pages LCG/EGEE and the ATLAS VO
ATLAS Computing Workbook ATLAS Distributed Data Management Distributed Analysis Support (DAST) ATLASLocalRootBase LCG/EGEE and the ATLAS VO

44 Contacts ATLAS Italy computing contacts Software distribution Atlas VO
Atlas VO DDM LCG Production System Distributed Analysis Support ATLAS Italy computing contacts (T2 support) (ATLAS Italy Computing list)


Download ppt "Working with Grid Sites in ATLAS Alessandro De Salvo Alessandro"

Similar presentations


Ads by Google