Presentation is loading. Please wait.

Presentation is loading. Please wait.

Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.

Similar presentations


Presentation on theme: "Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003."— Presentation transcript:

1 Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003

2 RWL Jones, Lancaster University Grid in ATLAS Grid in ATLAS ATLAS is a global collaboration, so the various Grid flavours are important ATLAS is a global collaboration, so the various Grid flavours are important Both US ATLAS and NorduGrid provide their own production tools Both US ATLAS and NorduGrid provide their own production tools US-ATLAS EDG Testbed Prod NorduGrid US-ATLAS EDG Testbed Prod NorduGrid

3 RWL Jones, Lancaster University  All the services are either taken from Globus, or written using Globus libraries and API  Should be fairly compatible with Globus-based solutions  Information system knows everything  Substantially re-worked and patched Globus MDS  Distributed and multi-rooted  Allows for a mesh topology  The server (“Grid manager”) on each gatekeeper does most of the job  No need for a centralized broker  Pre- and post- stages files  Interacts with PBS  Keeps track of job status  Cleans up the mess  Sends mails to users  The client (“User Interface”) does the Grid job submission, monitoring, termination, retrieval, cleaning etc  Interprets user’s job task  Gets the testbed status from the information system  Forwards the task to the best Grid Manager  Does some file uploading, if requested

4 RWL Jones, Lancaster University Features and problems  Features:  Relatively simple to join, expands rapidly  Installation is done on a single machine  Hides complexity of the distributed resources  Very convenient Replica Catalog implementation  Highly stable and reliable  Non-intrusive middleware  Accepts EDG certificates  Almost any runtime environment can be set up  Problems:  Standard (a la Globus2) authentication and authorization mechanisms  Simplified (not more than in Globus2) data management system  No persistent book-keeping service  Simplified recovery mechanisms (as much as LRMS provides)  Lacks big storage facilities  Only command-line interface  No standardized procedure for runtime environment installation and validation

5 RWL Jones, Lancaster University US GRAT Software  GRid Applications Toolkit  Used for U.S. Data Challenge production  Based on Globus, Magda, AMI & MySQL  Shell & Python scripts, modular design  Rapid development platform  Essentially scripts  Quickly develop packages as needed by DC  Single particle production  Higgs & SUSY production  Pileup production & data management  Reconstruction  Test grid middleware, test grid performance  Modules can be easily enhanced or replaced by Condor-G, EDG resource broker, Chimera, replica catalogue, OGSA… (in progress)

6 RWL Jones, Lancaster University GRAT Execution Model 1. Resource Discovery 2. Partition Selection 3. Job Creation 4. Pre-stage 5. Batch Submission 6. Job Parameterization 7. Simulation DC1 Prod. (UTA) Remote Gatekeeper Replica (local) MAGDA (BNL) Param (CERN) Batch Execution scratch 1,4,5,10 2 3 4 5 6 7 89 8. Post-stage 9. Cataloging 10. Monitoring

7 RWL Jones, Lancaster University US Middleware Evolution Used in current production software (GRAT & Grappa) Tested successfully (not yet used for large scale production) Under development and testing Tested for simulation (may be used for large scale reconstruction)

8 RWL Jones, Lancaster University  What is the Atlas Commander? –graphical interactive tool to support production manager define jobs in large quantities submit and monitor progress scan log files for (un)known errors update bookkeeping Databases (AMI, Magda) clean up in case of failures –Test bed for GANGA MC production components  AtCom has its own web site  http://atlas-project-atcom.web.cern.ch/ atlas-project-atcom/  contains user guide, developer’s guide, documentation, downloads, relevant contact e-mails, etc.

9 RWL Jones, Lancaster University  Architecture: application + plug-ins AtCom core AMIMgt MagdaMgt Bookkeeping DBs Magda AMI LSFComputingSystem EDGComputingSystem NGComputingSystem PBSComputingSystem Plug-ins... Clusters  Two main functions of AtCom  definition of jobs  job submission/monitoring

10 RWL Jones, Lancaster University  Architecture (continued) –plug-in implements abstract ‘cluster’ interface for specific clusters e.g. LSF –a plug-in is a Java class + configuration parameters e.g. LSF@TIMBUKTU –the AtCom configuration file defines all existing plug-ins and allows each to have its own configuration section they are loaded at run-time

11 RWL Jones, Lancaster University  Available plug-ins  LSF  well understood and supported  NorduGrid  development suspended  PBS  developed by Alvin Tan  EDG  working, but no EDG based clusters used in production  BQS  developed by Jerome Fulachier

12 RWL Jones, Lancaster University  Bookkeeping databases  5 logical database domains, two physical databases physics meta-data permanent production log recipe catalog transient production log replica catalog AMI (Atlas Meta-data Interface) - mySQL DB hosted at Grenoble Magda (Manager for grid-based data) - mySQL DB hosted at BNL

13 RWL Jones, Lancaster University  Monitoring  jobs you submit are automatically added to list of monitored jobs  running jobs can be recovered from the part_run_info table if needed  e.g. after having closed AtCom  any other partition can be added to the list as well  using SQL query composer  allows you to “see” also finished, defined jobs  for the bar charts of course

14 RWL Jones, Lancaster University

15  When a job moves from RUNNING to DONE post processing commences –resolve validation script logical name into physical name and apply it to stdout/stderr in temp locations returns 1=OK, 2=Undecided or 3=Failed –if OK register output files with Magda replica catalog resolve extract script and apply it to stdout copy/move logfiles to final destination set status of partition to Validated  if Failed  delete output files  if Undecided  mark job as such  production manager can look at output of validation script or at the logfiles themselves and then force a decision as OK or Failed

16 RWL Jones, Lancaster University The Future  GANGA is starting to provide the required functionality  For DC2, a new tool is being built, and the GANGA core should be its basis.  DCs require immediate solutions  Robust tools require slow development


Download ppt "Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003."

Similar presentations


Ads by Google