Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production K. De (UT Arlington), D. Malon (ANL), P. Nevski (BNL), A. Vaniachine (ANL)

Similar presentations


Presentation on theme: "Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production K. De (UT Arlington), D. Malon (ANL), P. Nevski (BNL), A. Vaniachine (ANL)"— Presentation transcript:

1 Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production K. De (UT Arlington), D. Malon (ANL), P. Nevski (BNL), A. Vaniachine (ANL)

2 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)2 ATLAS Data Challenge Computational challenges facing the LHC experiments are unprecedented. For ATLAS alone the raw data itself constitute 1.3 PB/year To reduce the data management overhead a traditional centralized computing infrastructure would be simpler. In reality, CERN alone can handle only a fraction of these resources The emerging World Wide computing model is embracing a global data and computation infrastructure to answer to the LHC computing needs

3 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)3 Distributed Production A significant fraction of ATLAS Data Challenge 1 was performed in a Grid environment Grid technologies will naturally offer all the Collaboration members a uniform way of carrying out computing tasks

4 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)4 Centralized Management For efficiency of the large production tasks distributed worldwide, it is essential to establish a centralized production management tools The ATLAS Metadata Catalogue AMI and the Replica Catalogue Magda exemplify such Grid tools deployed in DC1 To complete the data management architecture for the distributed production ATLAS prototyped Virtual Data services

5 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)5 Introducing Virtual Data Prevailing views have been data-centric: we need to produce the data (ASAP), recipes are just some tools that were used in the process. Their value has not been fully appreciated. Preparation of recipes for data production requires significant efforts and encapsulates a considerable knowledge. Because the production recipes have to be fully validated, in DC0 it took more time to develop proper recipes than to run the production The GriPhyN project (www.griphyn.org) introduced a different perspective: recipes are as valuable as the data If you have the recipes (Virtual Data) you do not need the data: you can reproduce the data on-demand

6 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)6 Data Management Architecture AMI ATLAS Metadata Interface MAGDA MAnager for Grid- based DAta Virtual Data Catalog Prototype: COOKBOOK Collection of production recipes - COOKBOOK - complements ATLAS Grid tools

7 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)7 Certified Production Recipes For each data transformation step in the DC1 processing pipeline the essential content of the verified data production recipes was captured and preserved in a COOKBOOK database During the DC1 production, the COOKBOOK database server delivered in a controlled way the validated production parameters and the templated production recipes for thousands of the event generation and the detector simulation jobs around the world, simplifying production management

8 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)8 Athena Generators HepMC.root digis.zebra atlsim atlsim pileup digis.rootAthena recon recon.root QA.ntuple geometry.zebra Athena QA Athena Atlfast filtering.ntuple geometry.root Athena conversion QA.ntuple Athena QA Atlfast.root Atlfast recon recon.root Fully implemented DC1 workflow comprised of multiple independent data transformation steps is complicated Data-driven Workflow

9 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)9 Benefits of Technology Due to the innovative nature of Virtual Data services project the data volume allocated for the production test of the system was limited to about one fifth of all the DC data The major benefit of Virtual Data technologies was demonstrated by simplifying the management of the parameter collections that were different for each of the more than two hundred datasets produced in DC1 Significant reduction in the parameter management overhead enabled successful processing of about half of all the DC1 datasets using the Virtual Data services prototype

10 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)10 Data Reproducibility Another benefit of Virtual Data technologies is the simplification of the data reprocessing step We have found it useful to distinguish (both conceptually and in design) the data required before the invocation of the transformation from the data provenance information collected during and after the data transformation

11 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)11 Knowledge Management For each major data transformation steps identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the COOKBOOK catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must exist before the transformation can be invoked

12 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)12 Virtual Data Pipeline 1. Event generation step: templated jobOptions 2. Simulation step: transformation attributes

13 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)13 Prototype Production System High-throughput features: scatter-gather data processing architecture Fault tolerance features: independent agents pull-model for agent tasks assignment (vs push) local caching of output and input data (except Objy) ATLAS DC0 and DC1 parameter settings for simulations are recorded in the Virtual Data Catalog prototype database - COOKBOOK - using normalized components: parameter collections that are structured orthogonally Data reproducibility Application complexity Grid location

14 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)14 Production Experience Jobs accessing COOKBOOK VDC prototype ~8000 Hosts accessing COOKBOOK VDC prototype ~700 Domains accessing COOKBOOK VDC prototype 32: R1951.sc02.org, bu.edu, cacr.caltech.edu, cern.ch, cnaf.infn.it, cs.wisc.edu, dhcp.fnal.gov, dyn.optonline.net, fnal.gov, gridpp.rl.ac.uk, hep.anl.gov, hep.man.ac.uk, ihep.su, in2p3.fr, iu.edu, lip.pt, nersc.gov, nhn.ou.edu, phys.ufl.edu, phys.unm.edu, phys.uwm.edu, physics.indiana.edu, physics.lsa.umich.edu, quark.lu.se, roma1.infn.it, sinp.msu.ru, uchicago.edu, ucs.indiana.edu, ucsd.edu, uits.iupui.edu, usatlas.bnl.gov, uta.edu Countries: CH, IT, FR, PT, RU, SE, UK, US Many thanks to ATLAS collaborators who tried, tested and used the COOKBOOK VDC prototype!

15 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)15 Fault Tolerance Improvement Given that the production system relied on the COOKBOOK VDC prototype running at one central location (at CERN), the reported failure rate due to such single point of failure architecture was remarkably low (better than 0.001) over the whole DC1 production period Further improvement in the VDC access robustness may be achieved by deploying catalog replicas at different geographic locations

16 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)16 Integrated Solutions Database deployment proposal for ATLAS Data Challenges Slide by Luc Goossens

17 CHEP 2003, March 24-28, La Jolla Pavel Nevski (BNL/CERN)17 Roadmap to Success Based on the positive experience with Virtual Data technologies prototyping in DC1 where a significant contribution to the production both from US ATLAS and CERN have been done using the VDC, the COOKBOOK database is considered for deployment in ATLAS Data Challenges We envision that the production recipe knowledge encapsulated in the COOKBOOK database will be integrated in a uniform system utilizing the Chimera technology from GriPhyN project eliminating 'manual' tracking of the data dependencies between separate production steps and enabling multi-step compound data transformations 'on-demand'


Download ppt "Prototyping Virtual Data Technologies in ATLAS Data Challenge 1 Production K. De (UT Arlington), D. Malon (ANL), P. Nevski (BNL), A. Vaniachine (ANL)"

Similar presentations


Ads by Google