Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
A tool to enable CMS Distributed Analysis
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Ch 4. The Evolution of Analytic Scalability
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Grid job submission using HTCondor Andrew Lahiff.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Job Life Cycle Management Libraries for CMS Workflow Management Projects Stuart Wakefield on behalf of CMS DMWM group Thanks to Frank van Lingen for the.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool G. Anders (CERN), G. Avolio (CERN), A. Kazarov (PNPI),
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Practical using C++ WMProxy API advanced job submission
Workload Management Workpackage
Design rationale and status of the org.glite.overlay component
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Ch 4. The Evolution of Analytic Scalability
Gridifying the LHCb Monte Carlo production system
Presentation transcript:

Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Outline BossLite: the common interface to Grid and batch systems for the CMS tools gLite usage through BossLite gLite integration in the CMS tools Issues and proposed solutions WMS usage in analysis and MC production activities Overall performances Conclusions

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Computing Model Overview CAF LSF based Complex system:  Access to distributed resources through Grid middleware  Access to local batch systems (e.g. local farms and CERN Analysis Facility for high priority tasks)  Access to CMS specific Workload and Data Management Tools High job rate:  Large experimental community (3k people)  Huge amount of data produced by the experiment (up to 2PB/year)  Comparable amount of Monte Carlo data samples to be generated and accessed See talk [192] by I. Fisk: Challenges for the CMS Computing Model in the First YearChallenges for the CMS Computing Model in the First Year

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna BossLite: a common Grid/batch interface with logging facilities CMS interface to different Grid [WLCG, OSG] and batch systems [LSF, ARC, SGE…] Database to track and log information into an entity-related schema Information logically remapped into python objects that can be transparently used by the CMS framework and tools Requested high efficiency and safe operation in multithreaded environment Database interaction through safe sessions and connections  Connections pool  Thread safe  Focused on connection stability

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Submission 1 Submission 2 Submission 1 Submission 2 Submission 3 Shared job info Plugins for transparent interaction with Grid and local batch systems DataBase backend for logging and bookkeeping User Task Description: identical jobs accessing different part of a dataset or producing a part of a MC sample Pool of DB connections to be shared among threads Job static info Runtime info BossLite Architecture

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna The BossLite interface to gLite Bulk submission, bulk match-making, bulk status query  Faster, more efficient Access through WMProxy python API  Needed to allow the association between BossLite jobs and their grid Identifiers (not trivial through the CLI)  No parsing of "human readable" streams  But the use of the API is complex and the UI tools (e.g. UI configuration, input sandbox transfers,...) are lost  Exposed to python compatibility issues Access to LB information through API  Easy and fast check of job status  Easy to extract many more useful information at runtime: destination queue, status reason, scheduling timestamps… Note: the CMS computing model uses its own data location system: the WMS match making is only used to select among available resources hosting selected data.

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna CMS use cases Monte Carlo Production:  Automatic, parallelized system for huge data samples simulation/reconstruction Basic analysis tasks (single user):  Transparent usage of the Grid infrastructure as well as local batch system, integrated with the CMS workload management system Regime Analysis and intensive analysis tasks  Centralized system dealing with huge tasks, automating the analysis workflow, optimizing Grid usage  High concurrency system for multiuser environment

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Production Agent Files limited in size, produced directly in CMS destination sites (merged later through ad hoc jobs) Multi threaded Status Query Sequential Job Submission (collections) Multi Threaded Output Retrieval for log files and production report ProdAgent See Poster [82] by F. Van Lingen: MC production and processing system – Design and experiences

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna CMS Remote Analysis Builder All UI functionalities wrapped with WMProxy/LB API

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna CRAB Analysis Server Direct ISB/OSB Shipping from WN: variable size, possible to implement CMS specific policies bypassing WMS (using gLite features!) Multi threaded Status Query Multi Threaded Job Submission (collections, many users concurrently) Multi Threaded Output Handling and WMS purge See talk [77] by D. Spiga: Automatization of User Analysis Workflow in CMS Automatization of User Analysis Workflow in CMS

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Evolution Fruitful collaboration with gLite developers to fix bugs and implement the features CMS needs Proposed xml/json output to CLI commands  Same level of detail for job association  Reusing all UI functionalities, everything is already there, needs just to be made accessible  Specific error logging  Accessible through simple subprocess, encapsulating environment/compatibility issues  Simpler intermediate layer  Reuse&simplify!

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Single WMS usage in every day activities Typical instantaneous load of a single WMS (jobs running/idle)  Up to 5 kJobs simultaneously handled in every day analysis per WMS Daily job rate, including ended jobs Typical jobs load for a single WMS may already reach 15k jobs Stress tests reached 30 kJobs per day without breaking point signal for a single WMS! MC production and analysis jobs are balanced over many WMS: currently 7 for the analysis, 4 for the MC production - active jobs - ended jobs - aborted jobs - idle jobs - running jobs

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Overall Performances of the CMS tools Reached limits are mainly due to tracking and output retrieval/handling Some optimizations already in place, other small tweaks possible The WMS architecture is such that the system scales linearly with the number of WMSs  Add as many WMSs to a CMS service as needed The CMS architecture is similar:  Deploy as many instances of PA and CRAB Server as needed: No scale problems foreseen at the expected rates  50/100 kJobs/day for MC production and 100/200 kJobs/day for analysis Single CRAB Server instance in multi use mode reached 50KJobs per day using 2 WMS A single ProdAgent instance reached around 30kJobs per day  Lower performance in the output copy from the WMS: we plan to reduce the size and number of the files to be retrieved 50 kJobs

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna CMS grid activity with gLite WMS ~75k per day  ~30k analysis  ~20k MC production  ~25k other activities Jobs uniformly distributed over more than 40 sites CMS uses gLite WMS since years Increase in the activity during last year  May 2008 challenge (CCRC08, see [312],[389])  From May 2008 to March 2009 Poster [389]: CMS results from Computing Challenges and Commissioning of the computing infrastructureCMS results from Computing Challenges and Commissioning of the computing infrastructure Poster [312]: Commissioning Distributed Analysis at the CMS Tier-2 CentersCommissioning Distributed Analysis at the CMS Tier-2 Centers

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Job distribution per activities From May 2008 to March 2009 : 23 M total jobs submitted 58% Success 25% application failures 12% grid failures 5% cancelled about 78% of the total analysis jobs are sumitted with the gLite WMS (the rest mainly CondorG) since years! ~600 distinct real users in the last 3 months 81% Success ~ 9% application failures 10% grid failures 8.8M Analysis Jobs 5.3M MC Production Jobs 87% Success 4% application failures 7% grid failures 6.6 M JobRobot 2% cancelled + 2,3M jobs other test activities

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Conclusions CMS successfully uses the WMS in MonteCarlo Production and Analysis tasks We are able to reach more than 30kJobs with a single WMS Each CMS application service may use in parallel as many WMSs as needed:  Up to 50 kJobs from a single CMS server We are able to cover CMS requests delivering few instances of CRAB/ProdAgent The every day experience and usage allows us to improve the system and provide feedback to WMS/gLite developers since years

Use of the GLite-WMS in CMS CHEP’09, March G.Codispoti, INFN Bologna Author list G. Codispoti, C. Grandi, A.Fanfani (Bo-INFN) D. Spiga, V. Miccio, A. Sciaba' (CERN) F.Fanzago (CERN,CNAF-INFN) M. Cinquilli (Perugia-INFN) F. Farina (Mi-INFN,CERN) S. Lacaprara (LNL-INFN) S. Belforte (Trieste-INFN) D. Bonacorsi, A.Sartirana, D. DonGiovanni, D. Cesini (CNAF-INFN) S.Lemaitre, M.Litmaath, E.Roche, Y.Calas (CERN) S. Wakefield (IC-London) J. Hernandez (CIEMAT)