Presentation is loading. Please wait.

Presentation is loading. Please wait.

Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.

Similar presentations


Presentation on theme: "Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio."— Presentation transcript:

1 Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio Grandi (INFN Bologna) on behalf of the CMS-CCS group

2 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 2 Data Challenge 2002 on Grid Two “official” CMS productions on the grid in 2002 –CMS-EDG Stress Test on EDG testbed + CMS sites ~260K events CMKIN and CMSIM steps Top-down approach: more functionality but less robust, large manpower needed –USCMS IGT Production in the US 1M events Ntuple-only (full chain in single job) 500K up to CMSIM (two steps in single job) Bottom-up approach: less functionality but more stable, little manpower needed –See talk by P.Capiluppi

3 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 3 Data Challenge 2004 Next important computing milestone for CMS is the Data Challenge in 2004 (DC04) –reconstruction and analysis on CMS data sustained over one month at a rate which is the 5% of the LHC rate at full luminosity (25% of start-up luminosity) –50 millions fully digitized events needed as input –will exploit the LCG-1 resources –is a pure computing challenge! –see talk by V.Innocente for CMS data analysis

4 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 4 Pre-Challenge Production Simulation and digitization of 50M events (PCP04) –6 months (July to December 2003) –Transfer to CERN: ~1TB/day for 2 months (Nov.-Dec. 03) –Distributed: most of CMS Regional Centers will participate SimulationDigitizationReconstruction CPU per event (KSI2K s)160812 Total CPU (KSI2K months)3050150230 Size of output (1 event) (MB)21.50.5 (DST) 0.02 (AOD) Total size of sample (TB)1007525 (DST) 1 (AOD) Resource request (PIII, 1 GHz CPU, ~ 400 SI2K) 1000 for 5 months 150 for 2 months 460 (600 SI2K) for 1 month

5 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 5 Boundary conditions for PCP CMS persistency is changing POOL (by LCG) is replacing Objectivity/DB CMS Compiler is changing gcc 3.2.2 is replacing 2.95.2 Operating system is changing Red Hat 7.3 is replacing 6.1.1 Grid middleware structure is changing EDG on top of VDT  CMS has to deal with all this while preparing for the Pre-Challenge Production!

6 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 6 PCP strategy PCP cannot fail (no DC04!) –basic strategy is to run on dedicated, fully controllable resources without the need of grid tools –grid-based prototypes have to be compatible with the basic non-grid environment Jobs will run in a limited-sandbox –input data local to the job –local XML POOL catalogue (prepared by the prod. tools) –output data/metadata and job monitoring data produced locally and moved to the site manager asynchronously –synchronous components optionally update central catalogues. If they fail the job will continue and the catalogues are updated asynchronously –reduce dependencies on external environment and improve robustness

7 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 7 Hybrid production model MCRunJob Site Manager starts an assignment RefDB Phys.Group asks for an official dataset User starts a private production Production Manager defines assignments DAG job JDL shell scripts DAGMan (MOP) Local Batch Manager EDG Scheduler Computer farm LCG-1 testbe d User’s Site Resources Chimera VDL Virtual Data Catalogue Planner

8 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 8 Limited-sandbox environment File transfers, if needed, are managed by external tools (EDG-JSS, additional DAG nodes, etc...) Job Wrapper (job instru- mentation) User Job Journal writer Remote updater Job input Job output Journal Metadata DB Job input Job output Journal Asynchronous updater Worker Node User’s Site

9 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 9 Job production Done by MCRunJob (see talk by G.Graham) –Modular: produce plug-in’s for: reading from RefDB reading from simple GUI submitting to a local resource manager submitting to DAGMan/Condor-G (MOP) submitting to the EDG scheduler producing derivations in the Chimera Virtual Data Catalogue (see talk by R.Cavanaugh) –Runs on the user (e.g. site manager) host –Defines also the sandboxes needed by the job –If needed, the specific submission plug-in takes care of: moving the sandbox files to the worker nodes preparing the XML POOL catalogue with input files information

10 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 10 Job Metadata management Job parameters that represent the job running status are stored in a dedicated database: –when did the job start? –is it finished? but also: –how many events did it produce so far? BOSS is a CMS-developed system that does this extracting the info from the job standard input/output/error streams –The remote updater is based on MySQL –A remote updater based on R-GMA is being developed (scalability tests being done now) for running in a grid environment –See talk by C.G.

11 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 11 Dataset Metadata management Dataset metadata are stored in the RefDB (see talk by V.Lefebure): –by what (logical) files is it made of? but also: –what input parameters to the simulation program? –how many events have been produced so far? Information may be updated in the RefDB in many ways: –manual Site Manager operation –automatic e-mail from the job –a remote updater similar to BOSS + R-GMA will be developed for running in a grid environment Mapping of logical names to GUID and of GUID to physical file names will be done on the grid by RLS

12 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 12 Other issues for PCP Software distribution and installation: –pre-installed software: rpm files installed by LCG site administrators –installed on demand (if possible): DAR files located using PACMAN or the Replica Manager –pile-up data (huge dataset!): must be pre-installed in the site (in an appropriate number of copies) to have reasonable performance on the grid considered as part of the digitization software Data transfer: –Replica Manager or direct gridFTP –MSS access using SRM under test. SE workshop…

13 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 13 DC04 Workflow Process data at 25 Hz (50 MB/s) at the Tier-0 –Reconstruction produces DST and AOD –AOD replicated to all Tier-1 (assume 4 centers) –DST replicated to at least one Tier-1 –Assume Digis are already replicated in at least one Tier-1 No bandwidth to transfer Digis synchronously –Archive Digis to tape library –Express lines transferred to selected Tier-1: Calibration streams, Higgs analysis stream, … Analysis & recalibration –Produce new calibration data at selected Tier-1 and update the Conditions Database –Analysis from the Tier-2 on AOD, DST, occasionally on Digis

14 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 14 DC04 Strategy DC04 is a computing challenge: –Run on LCG-1 (possibly integrated by CMS resources) –Use Replica Manager services to locate data –Use a Workload Management System to select resources –Use a Grid-wide monitoring system –Client-server analysis: Clarens (see talk by C.Steenberg) Data management strategy (preliminary…) –Express Lines pushed from Tier-0 to Tier-1’s –AOD, DST published by Tier-0 and pulled by Tier-1’s –Conditions DB segmented in read-only Calibration Sets Versioned Metadata stored in the RefDB Temporary solution: –need specific middleware for read-write data management

15 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 15 Summary Next CMS computing challenges will be done in a very dynamic environment Data Challenge 2004 will be done on LCG-1 Pre-Challenge Production already well defined –Flexible production tools may run in a local or distributed environment –Basically outside the Grid but an ideal proof of maturity for Grid-based prototypes Data Challenge architecture will be built on the experience CMS will gain during PCP

16 Claudio Grandi INFN Bologna March 27th 2003 CHEP'03 Conference, San Diego 16


Download ppt "Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio."

Similar presentations


Ads by Google