Download presentation
Presentation is loading. Please wait.
1
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea Sciaba` (INFN Pisa) Massimo Sgaravatto (INFN Padova) Zhen Xie (INFN Pisa)
2
M. Sgaravatto - INFN Padova Introduction Goals Evaluate the existing GRID technologies with real applications and on real production environments Can these GRID tools be useful to “manage” these HEP applications ? Collaboration between: CMS INFN-GRID WP 1 (Installation and Evaluation of the Globus toolkit) http://www.infn.it/globus DataGrid WP 1 (Grid Workload Management)
3
M. Sgaravatto - INFN Padova Applications MC Prod. ORCA Prod. Mirrored Db’s Signal Zebra files with HITS ORCA Digitization (merge signal and MB) Objectivity Database HEPEVT ntuples CMSIM HLT Algorithms New Reconstructed Objects HLT Grp Databases ORCA ooHit Formatter Objectivity Database MB Objectivity Database Catalog import Objectivity Database Objectivity Database ytivitcejbOesabataD
4
M. Sgaravatto - INFN Padova Tested configuration for CMS production Globus GRAM CONDOR Globus GRAM LSF Bologna Pisa condor_submit (Globus Universe) Condor-G Submit jobs Local Resource Management Systems Production manager CMS Farms Padova Condor-G as reliable, crash-proof submitting service GRAM as uniform interface to different resource management systems
5
M. Sgaravatto - INFN Padova Overview PC farms at each site installed and configured using the CMS farm kickstart toolkit PC farms managed by possible different local resource management systems Globus GRAM as uniform interface to the different local resource management systems Globus deployment using the INFNGRID distribution toolkit (see Zhen’s presentation) considering the INFN setup
6
M. Sgaravatto - INFN Padova Overview Condor-G as reliable, crash proof submitting service Job submission and monitoring by the production manager from a single machine The production manager decides on which Globus resource (farm) the job must be executed Executable and input files stored on the executing farm Output files created on the executing machine Log files created on the submitting machine Authentication using Globus GSI (use of certificates signed by INFN CA)
7
M. Sgaravatto - INFN Padova Results The CMS production using Globus and Condor-G failed Many many many memory leaks found in the Globus jobmanager !!!... but we (Francesco Prelz, INFN Milano) have been able to provide fixes for these bugs Fixes reported to Globus team Feedback only for what concerning the bugs in the GAA and GSS modules (new fixes “merged” with the original ones) Work in progress Tests with these fixes Fixes included in the INFN-GRID distribution
8
M. Sgaravatto - INFN Padova Other problems Globus GRAM Some minor bugs found and fixed (fixes included in the INFN-GRID distribution) Necessary to “address” some “major” problems Scalability (one jobmanager for each job) Reliability (the jobmanager is not persistent) … Condor-G Some problems in the current implementation (it’s a prototype) Scalability in the submitting machine Logging …
9
M. Sgaravatto - INFN Padova Next steps New tests considering the next CMS productions with the “patched” Globus jobmanager New tests with the new implementations of Condor-G and Globus jobmanager (by Condor team) Tests with bypass Tool written by D. Thain (Condor team) that allows redirection of standard input/output/error to a remote machine (the submitting machine) while the program is running (split execution system) Use of GSI authentication mechanisms New implementation reliable to several kind of failures Tests with the first WP 1 prototype “Integration” with software provided by the other WPs (i.e. replica management tools,..)
10
M. Sgaravatto - INFN Padova Prototype workload management system architecture Globus GRAM CONDOR Globus GRAM LSF Globus GRAM PBS Site1 Site2Site3 condor_submit (Globus Universe) Condor-G Master Grid Information Service (GIS) Submit jobs (using Class-Ads) Resource Discovery Information on characteristics and status of local resources Local Resource Management Systems Globus GRAM as uniform interface to different local resource management systems Condor-G able to provide a reliable/crashproof job submission service Master chooses in which Globus resources the jobs must be submitted Farms Other info
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.