Presentation is loading. Please wait.

Presentation is loading. Please wait.

LcgCAF:CDF submission portal to LCG

Similar presentations


Presentation on theme: "LcgCAF:CDF submission portal to LCG"— Presentation transcript:

1 LcgCAF:CDF submission portal to LCG
Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel Jeans, Subir Sarkar and Igor Sfiligoi

2 Main Injector & Recycler
CDF experiment and data volumes D0 CDF Tevatron Main Injector & Recycler p source Lpeak: 2.3x1032s-1cm TeV 36 bunches, 396 ns crossing time Chicago Integrated Luminosity (fb-1) 9 8 7 6 5 4 3 2 1 We are here today By FY07: Double data up to FY06 By FY09: Double data up to FY07 ~ 2 ~ 5.5 ~ 9 data ~ 20 ~ 23 cpu needs 30 mA/hr 25 mA/hr 20 mA/hr 15 mA/hr 9/30/03 9/30/04 9/30/05 9/30/ /30/07 9/30/08 9/30/09 November 11-17 SC06 Tampa

3 CDF Data Handling Model: how it was
Data processing: dedicated production farm at FNAL User analysis and Monte Carlo production: CAF Central Analysis Facility. - FNAL hosts data->farm used to analyze them - Remote sites decentralized CAF (dCAF)  produce Monte Carlo, some sites (ex. CNAF) have also data. CAF and dCAF are CDF dedicated farms structured: Developed by: H. Shih-Chieh, E. Lipeles, M. Neubauer, I. Sfiligoi, F. Wuerthwein Dedicated CAF Worker Node User desktop is a CAF GUI, where the user with a command line Sets the cdf environment. A Gui is opened and here the user can select the dedicated CAF where to run jobs. User defines the dir path where its tarball is and selects if he want receive a mail when the job finish. The submitter uses condor. The submitter send the user tarball with the cafexe. The cafexe “contains” part of CDF software, the most used libraries Cafexe: Setup cfd environment on WN Job monitor of running job Allows interactive command on running job Writes a log Myjob.sh setup user environment Monitor Head Node User Desktop GUI Submitter CafExe MyJob.sh MyJob.exe User tarball (Myjob.exe) condor Mailer November 11-17 SC06 Tampa

4 Moving to a GRID Computing Model
Reasons to move to a GRID computing model Need to expand resources, luminosity expect to increase by factor ~4 until CDF ends. Resource were in dedicated pools, limited expansion and need to be maintained by CDF persons GRID resources can be fully exploited before LHC experiments will start analyze data CDF has no hope of have a large amount of resources without GRID! LcgCAF November 11-17 SC06 Tampa

5 LcgCAF: General Architecture
GRID site User Desktop GUI job job cdf-ui ~ cdf-head GRID site CDF user must have access to CAF clients and Specific CDF certificate output CDF-UI is a central User interface (grid). The user tarball is send to the CDF-UI as in the past. A big work was done to interface the CDF-UI with the grid, mantaining the basic idea of the old CAF-headnode. In order to access to the grid, user has to have a grid certificate output Robotic tape Storage (FNAL) CDF-SE (SE of GRID site) November 11-17 SC06 Tampa

6 Job Submission and Execution
CafExe cdf-ui ~ cdf-head Monitor Grid site Job WMS Grid site Submitter Mailer Daemons running on CDF head: Submitter: receive user job, create its wrapper to run it on GRID environment, make available user tarball Monitor: provide job information to user Mailer: send to user when job is over with a job summary (useful for user bookkeeping) Job wrapper (CafExe): run user job, copy whatever is left at the end of user code to user specified location November 11-17 SC06 Tampa

7 Job Monitor « « « « « « WMS: keeps track of job status on the GRID
cdf-ui ~ cdf-head Job information Information System WN information WN: job wrapper collects information like load dir, ps, top.. Submission to the grid is done in a asynchronous way That means interactive command are not really interactive but their info are provided by different script forked with the user one Interactive job Monitor: running jobs information are read from LcgCAF Information System and sent to user desktop. Available commands: CafMon list CafMon kill CafMon dir CafMon tail CafMon ps CafMon top Web based Monitor: information on running jobs for all users and jobs/users history are read from LcgCAF Information System November 11-17 SC06 Tampa

8 CDF Code Distribution … …
Lcg site: CNAF Proxy Cache /home/cdfsoft /home/cdfsoft Lcg site: Lyon Fermilab server Proxy Cache A CDF job to run needs CDF code and Run Condition DB available to WN, since we cannot expect all the GRID sites to carry them, Parrot is used as virtual file systemfor code and FroNTier for DB. Since most request are the same, a local cache is used: Squid web proxy cache parrot changes local bash command like ls in command on remote resources using different protocol as http ( To avoid overloaded of the system, squid is used as chace. There is a cache for each nation November 11-17 SC06 Tampa

9 CDF Output storage gridftp Job output Worker Nodes CDF-SE KRB5 rcp CDF-Storage User job output is transferred to CDF Storage Element (SE) via gridftp or to a CDF-Storage location via rcp From CDF-SE or CDF-storage output copied using gridftp to FNAL fileservers. Automatic procedure copies files from fileservers to tape and declare them to the CDF catalogue November 11-17 SC06 Tampa

10 LcgCAF Usage: first GRID exercise
LcgCAF used by power users for B Monte Carlo Production but generic users also start to use it! Last month running sections LcgCAF Monitor CDF VO monitored with standard LCG tools: GridIce 1 week Italian resources First exercise to use the grid to Montecarlo production November 11-17 SC06 Tampa

11 LcgCAF Usage CDF B Monte Carlo production: ~5000 jobs in ~1 month
Samples Events on tape  with 1 Recovery Bs->Ds Ds->, Ds->K*K, Ds->KsK ~6,2 · 106 ~94% ~100% User jobs: ~1000 jobs in a week ~100% executed on these sites: Failures: GRID: site miss-configuration, temporary authentication problems, WMS overload LcgCAF: Parrot/Squid cache stacked Retry mechanism: GRID failures  WMS retry LcgCAF failures “home-made” retry (logs parsing) November 11-17 SC06 Tampa

12 Conclusions and Future developments
CDF has developed a portal to access the LCG resources used to produce Monte Carlo data and for every-day user jobs  GRID can serve also a running experiment No policy mechanism on LCG, user policy rely on site rules, for LcgCAF user  “random priority” Plan to implement a mechanism of policy management on top of GRID policy tools Interface between SAM (CDF catalogue) and GRID services under development to allow: output file declaration into catalogue directly from LcgCAF  automatic transfer of output files from local CDF storage to FNAL tape data analysis on LCG sites November 11-17 SC06 Tampa

13 November 11-17 SC06 Tampa

14 More details … Currently CDF access the following sites:
CDF has produced and is producing Monte Carlo data exploiting these resources demonstrating that GRID can serve also a running experiment Site Country KSpecInt2K CNAF-T1 Italy 1500 INFN-Padova 60 INFN-Catania 248 INFN-Bari 87 INFN-Legnaro 230 IN2P3-CC France 500 FZK-LCG2 Germany 2000 IFAE Spain 20 PIC 150 UKI-LT2-UCL-HE UK 300 Liverpool 100 November 11-17 SC06 Tampa

15 Performances (details)
CDF B Monte Carlo production Samples Events on tape  with 1 Recovery Bs->Ds Ds-> ~92% ~100% Bs->Ds Ds->K*K ~98% Bs->Ds Ds->KsK # segments succeeded # segments submitted e = GRID failures: site miss-configuration, temporary authentication problems, WMS overload LcgCAF failures: Parrot/Squid cache stacked Retry mechanism: necessary to achieve ~100% efficiency WMS retry for GRID failures and LcgCAF “home-made” retry based on logs parsing November 11-17 SC06 Tampa


Download ppt "LcgCAF:CDF submission portal to LCG"

Similar presentations


Ads by Google