Presentation is loading. Please wait.

Presentation is loading. Please wait.

Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.

Similar presentations


Presentation on theme: "Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing."— Presentation transcript:

1 Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

2 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 2 G.Codispoti, INFN Bologna Outline BossLite: the common interface to Grid and batch systems for the CMS tools gLite usage through BossLite gLite integration in the CMS tools Issues and proposed solutions WMS usage in analysis and MC production activities Overall performances Conclusions

3 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 3 G.Codispoti, INFN Bologna Computing Model Overview CAF LSF based Complex system:  Access to distributed resources through Grid middleware  Access to local batch systems (e.g. local farms and CERN Analysis Facility for high priority tasks)  Access to CMS specific Workload and Data Management Tools High job rate:  Large experimental community (3k people)  Huge amount of data produced by the experiment (up to 2PB/year)  Comparable amount of Monte Carlo data samples to be generated and accessed See talk [192] by I. Fisk: Challenges for the CMS Computing Model in the First YearChallenges for the CMS Computing Model in the First Year

4 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 4 G.Codispoti, INFN Bologna BossLite: a common Grid/batch interface with logging facilities CMS interface to different Grid [WLCG, OSG] and batch systems [LSF, ARC, SGE…] Database to track and log information into an entity-related schema Information logically remapped into python objects that can be transparently used by the CMS framework and tools Requested high efficiency and safe operation in multithreaded environment Database interaction through safe sessions and connections  Connections pool  Thread safe  Focused on connection stability

5 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 5 G.Codispoti, INFN Bologna Submission 1 Submission 2 Submission 1 Submission 2 Submission 3 Shared job info Plugins for transparent interaction with Grid and local batch systems DataBase backend for logging and bookkeeping User Task Description: identical jobs accessing different part of a dataset or producing a part of a MC sample Pool of DB connections to be shared among threads Job static info Runtime info BossLite Architecture

6 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 6 G.Codispoti, INFN Bologna The BossLite interface to gLite Bulk submission, bulk match-making, bulk status query  Faster, more efficient Access through WMProxy python API  Needed to allow the association between BossLite jobs and their grid Identifiers (not trivial through the CLI)  No parsing of "human readable" streams  But the use of the API is complex and the UI tools (e.g. UI configuration, input sandbox transfers,...) are lost  Exposed to python compatibility issues Access to LB information through API  Easy and fast check of job status  Easy to extract many more useful information at runtime: destination queue, status reason, scheduling timestamps… Note: the CMS computing model uses its own data location system: the WMS match making is only used to select among available resources hosting selected data.

7 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 7 G.Codispoti, INFN Bologna CMS use cases Monte Carlo Production:  Automatic, parallelized system for huge data samples simulation/reconstruction Basic analysis tasks (single user):  Transparent usage of the Grid infrastructure as well as local batch system, integrated with the CMS workload management system Regime Analysis and intensive analysis tasks  Centralized system dealing with huge tasks, automating the analysis workflow, optimizing Grid usage  High concurrency system for multiuser environment

8 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 8 G.Codispoti, INFN Bologna Production Agent Files limited in size, produced directly in CMS destination sites (merged later through ad hoc jobs) Multi threaded Status Query Sequential Job Submission (collections) Multi Threaded Output Retrieval for log files and production report ProdAgent See Poster [82] by F. Van Lingen: MC production and processing system – Design and experiences

9 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 9 G.Codispoti, INFN Bologna CMS Remote Analysis Builder All UI functionalities wrapped with WMProxy/LB API

10 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 10 G.Codispoti, INFN Bologna CRAB Analysis Server Direct ISB/OSB Shipping from WN: variable size, possible to implement CMS specific policies bypassing WMS (using gLite features!) Multi threaded Status Query Multi Threaded Job Submission (collections, many users concurrently) Multi Threaded Output Handling and WMS purge See talk [77] by D. Spiga: Automatization of User Analysis Workflow in CMS Automatization of User Analysis Workflow in CMS

11 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 11 G.Codispoti, INFN Bologna Evolution Fruitful collaboration with gLite developers to fix bugs and implement the features CMS needs Proposed xml/json output to CLI commands  Same level of detail for job association  Reusing all UI functionalities, everything is already there, needs just to be made accessible  Specific error logging  Accessible through simple subprocess, encapsulating environment/compatibility issues  Simpler intermediate layer  Reuse&simplify!

12 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 12 G.Codispoti, INFN Bologna Single WMS usage in every day activities Typical instantaneous load of a single WMS (jobs running/idle)  Up to 5 kJobs simultaneously handled in every day analysis per WMS Daily job rate, including ended jobs Typical jobs load for a single WMS may already reach 15k jobs Stress tests reached 30 kJobs per day without breaking point signal for a single WMS! MC production and analysis jobs are balanced over many WMS: currently 7 for the analysis, 4 for the MC production - active jobs - ended jobs - aborted jobs - idle jobs - running jobs

13 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 13 G.Codispoti, INFN Bologna Overall Performances of the CMS tools Reached limits are mainly due to tracking and output retrieval/handling Some optimizations already in place, other small tweaks possible The WMS architecture is such that the system scales linearly with the number of WMSs  Add as many WMSs to a CMS service as needed The CMS architecture is similar:  Deploy as many instances of PA and CRAB Server as needed: No scale problems foreseen at the expected rates  50/100 kJobs/day for MC production and 100/200 kJobs/day for analysis Single CRAB Server instance in multi use mode reached 50KJobs per day using 2 WMS A single ProdAgent instance reached around 30kJobs per day  Lower performance in the output copy from the WMS: we plan to reduce the size and number of the files to be retrieved 50 kJobs

14 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 14 G.Codispoti, INFN Bologna CMS grid activity with gLite WMS ~75k per day  ~30k analysis  ~20k MC production  ~25k other activities Jobs uniformly distributed over more than 40 sites CMS uses gLite WMS since years Increase in the activity during last year  May 2008 challenge (CCRC08, see [312],[389])  From May 2008 to March 2009 Poster [389]: CMS results from Computing Challenges and Commissioning of the computing infrastructureCMS results from Computing Challenges and Commissioning of the computing infrastructure Poster [312]: Commissioning Distributed Analysis at the CMS Tier-2 CentersCommissioning Distributed Analysis at the CMS Tier-2 Centers

15 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 15 G.Codispoti, INFN Bologna Job distribution per activities From May 2008 to March 2009 : 23 M total jobs submitted 58% Success 25% application failures 12% grid failures 5% cancelled about 78% of the total analysis jobs are sumitted with the gLite WMS (the rest mainly CondorG) since years! ~600 distinct real users in the last 3 months 81% Success ~ 9% application failures 10% grid failures 8.8M Analysis Jobs 5.3M MC Production Jobs 87% Success 4% application failures 7% grid failures 6.6 M JobRobot 2% cancelled + 2,3M jobs other test activities

16 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 16 G.Codispoti, INFN Bologna Conclusions CMS successfully uses the WMS in MonteCarlo Production and Analysis tasks We are able to reach more than 30kJobs with a single WMS Each CMS application service may use in parallel as many WMSs as needed:  Up to 50 kJobs from a single CMS server We are able to cover CMS requests delivering few instances of CRAB/ProdAgent The every day experience and usage allows us to improve the system and provide feedback to WMS/gLite developers since years

17 Use of the GLite-WMS in CMS CHEP’09, 23-27 March 2009 17 G.Codispoti, INFN Bologna Author list G. Codispoti, C. Grandi, A.Fanfani (Bo-INFN) D. Spiga, V. Miccio, A. Sciaba' (CERN) F.Fanzago (CERN,CNAF-INFN) M. Cinquilli (Perugia-INFN) F. Farina (Mi-INFN,CERN) S. Lacaprara (LNL-INFN) S. Belforte (Trieste-INFN) D. Bonacorsi, A.Sartirana, D. DonGiovanni, D. Cesini (CNAF-INFN) S.Lemaitre, M.Litmaath, E.Roche, Y.Calas (CERN) S. Wakefield (IC-London) J. Hernandez (CIEMAT)


Download ppt "Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing."

Similar presentations


Ads by Google