Download presentation
Presentation is loading. Please wait.
1
1 Use of the European Data Grid software in the framework of the BaBar distributed computing model T. Adye (1), R. Barlow (2), B. Bense (3), D. Boutigny (4), D. Colling (5), B. Cowles (3), A. Forti (2), D. Smith (6), G. Grosdidier (7), A. Hasan (3), J. Martyniak (5), A.McNab (2), R. Walker (5) (1) Rutherford Appleton Laboratory – (2) University of Manchester - (3) Stanford Linear Accelerator Center – (4) Laboratoire d'Annecy le Vieux de Physique des Particules – CNRS / IN2P3 - (5) University of London, Imperial College - (6) University of Birmingham – (6) Laboratoire de l'Accélérateur Linéaire – CNRS / IN2P3 On behalf of the BaBar computing group
2
2 Motivations for BaBar-Grid & BaBar Specificities (1) Distributed computing is one of the main axis of the BaBar computing model – Tier A : Main computing centers - Hold all or a large fraction of the data Currently: SLAC, IN2P3, RAL and FZK/GridKa INFN Padova is specialized in data reprocessing. Will probably turn to an analysis Tier A later INFN Ferrara (with SLAC) looking to MC production on the GRID – Tier B: Does not really exist… – Tier C: Smaller centers, have only small chunks of data or n-tuples
3
3 Motivations for BaBar-Grid & BaBar Specificities (2) Special Configuration in the UK: – Large center at RAL – Several smaller centers with significant computing and data storage resources Main motivation for BaBar-Grid – Need a simple and reliable tool for remote job submission – Data may be spread between several sites Need a Metadata Catalog and a tool to automatically split and submit the jobs to centers holding the data BaBar is taking data, the introduction of Grid tools should not disrupt physics production
4
4 Short Term Goals for BaBarGrid developments Setup a Grid system able to submit analysis jobs in major Tier-A centers Proof of concept – Demonstrate usage in real analysis applications – Test various Grid implementations and inter-operability (EDG, LCG-1, VDT,…) – Have to handle 2 data formats: Objectivity and Root Data Distribution "Distributing BaBar Data using the Storage Resource Broker (SRB) " W. Kroeger (previous talk) – BdbServer++ A user-driven data location and retrieval tool (Poster) Metadata catalog and automatic job splitting "BaBar WEB job submission with Globus authentication and AFS access" A. Forti
5
5 VO RC RB CE SE WN CE SE WN CE SE WN CE SE WN CE SE WN The BaBar Grid as of March 2003
6
6 European Data Grid (EDG) Setup BaBar benefits from the EDG test bed installations in the European sites, – We just had to add a dedicated Virtual Organization (VO) and a Replica Catalog (in Manchester) An automatic system has been developed for any BaBar user to automatically register its certificate to the VO – The existence of a special file on the SLAC AFS is the proof that the user is registered in BaBar We use the RB installed at Imperial College which is shared with other experiments using the EDG test bed We decided to restrict ourselves to basic RC usage – We don't use GDMP – We are looking forward testing RLS
7
7 SLAC Setup The EDG software has been deployed at SLAC – Version 1.3.4 compatible with RB 1.4.x Some special adaptation had to be done – WN are running LSF – WN are located behind a firewall so they can't communicate directly with the RB Solved by splitting the submission scripts in such a way that any communication is going through the Gatekeeper SLAC is accepting both EDG and DOE certificates AFS: – gssklog has been installed in order to get AFS tokens The fact that EDG 1.3 / 1.4 needs RH 6.2 is a real problem and needs a special arrangement with the Computing Services
8
8 RB Specificities One major problem with EDG 1.4.x is related to the Meta Directory Service (MDS) – Resources disappearing in a random way from the Information Index (II) 2 solutions: – Replace the dynamic Information Index by a static one (BDII) EDG tested recommended solution – Install monitoring scripts which automatically detect disappearing and reappearing resources and restart the II accordingly – Sometimes gives flaky II oscillating with resources coming in and out. – If this happens the resource matching process fails Both solutions have been tested at Imperial College
9
9 The Analysis Job Use Case The user has an executable and a configuration file (tcl) The executable needs input data in Objectivity or Root format depending on the running site The result of the analysis job is a Root-tuple and a log file We suppose that a suitable BaBar release is available in the target site – In the future, we may package the BaBar release and will be able to install it before actually running the job We want the executable to be stored in a Storage Element (SE) closed to the Computing Element (CE) The input tcl file is sent through the input sandbox (OK as it is relatively small) The output log file is returned through the sandbox The Root-tuple is stored in a SE close to the CE
10
10 The Machinery RB II Close SE CE UI tcl file JDL Wrapper script Executable InputData ? ? OK Ntuple Executable RC Ntuple Log File through Sandbox Executable globus-url-copy
11
11 Getting a generic script able to run everywhere Make use of the edg-brokerinfo commands – Discover the CE and SE parameters For instance: – edg-brokerinfo getCloseSEs returns the closest SE hostnames – edg-brokerinfo getSEMountPoint returns the mount point of the SE file system EDG API allows to build a fully generic script in a very simple way
12
12 Results Success rate : "Submission OK and n-tuple and log-files recovered : With the dynamic MDS equipped with the control scripts : Success rate = 55% to 75% 98% of the failing jobs are due to the RB unable to match the requested resources with any CE With the static MDS : Success rate ~ 99% A few jobs have been lost by the RB !!! During the test we have also been hit by a limit to 512 jobs present at the same time in the RB Serious limitation but should be removed in future versions.
13
13 Monte-Carlo Production Very active work to "grid-ify" BaBar MC production Similar to analysis application previously described but with a stable and controlled environment. – Store MC executable on the SE(s) – Produce output files (in Root) format Store in the SE – Send data back to SLAC or Tier-A Need to package MC production in order to be able to run in any institutes even those not maintaining BaBar software – One difficulty: even if we produce data in Root format, we still need Objectivity for conditions data. See Poster Session for more details
14
14Conclusions Grid technology is of prime importance for BaBar to fully exploit its distributed computing model Many Grid activities related to Data distribution MC production Analysis We have demonstrated that EDG has all the necessary functionalities for running Analysis jobs on the Grid Reliability much better with the static MDS, but still several open issues on the scalability of the system. We look forward testing EDG 2.0 and are open to other Grid implementations –Will test VDT soon –Will move to LCG-1 as soon as it is available –We do not expect to have the same Grid software implemented everywhere… We need to work on the inter-operability of the various systems
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.