Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alexandre Vaniachine (ANL) ATLAS Collaboration Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 Alexandre Vaniachine (ANL) Data.

Similar presentations


Presentation on theme: "Alexandre Vaniachine (ANL) ATLAS Collaboration Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 Alexandre Vaniachine (ANL) Data."— Presentation transcript:

1 Alexandre Vaniachine (ANL) ATLAS Collaboration Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 Alexandre Vaniachine (ANL) Data Challenges in ATLAS Computing

2 Alexandre Vaniachine (ANL) Outline & Acknowledgements World Wide computing model Data persistency Application framework Data Challenges: Physics + Grid Grid integration in Data Challenges Data QA and Grid validation Thanks to all ATLAS collaborators whose contributions I used in my talk

3 Alexandre Vaniachine (ANL) Core Domains in ATLAS Computing ATLAS Computing is right in the middle of first period of Data Challenges Data Challenge (DC) for software is analogous to Test Beam for detector: many components have to be brought together to work Application GridData Separation of the data and the algorithms in ATLAS software architecture determines our core domains: Persistency solutions for event data storage Software framework for data processing algorithms Grid computing for the data processing flow

4 Alexandre Vaniachine (ANL) World Wide Computing Model The focus of my presentation is on the integration of these three core software domains in ATLAS Data Challenges towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data

5 Alexandre Vaniachine (ANL) ATLAS Computing Challenge The emerging World Wide computing model is an answer to the LHC computing challenge: For ATLAS the raw data itself constitute 1.3 PB/year adding “reconstructed” events and Monte Carlo data results in a ~ 10 PB/year (~3 PB on disk) The required CPU estimates including analysis are ~ 1.6 M SpecInt95 CERN alone can handle only a fraction of these resources Computing infrastructure, which was centralized in the past, now will be distributed (in contrast to the reverse trend for the experiments that were more distributed in the past) Validation of the new Grid computing paradigm in the period before the LHC requires Data Challenges of increasing scope and complexity These Data Challenges will use as much as possible the Grid middleware being developed in Grid projects around the world

6 Alexandre Vaniachine (ANL) Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split) Integrated operation of framework & database domains demonstrated the capability of switching between persistency technologies reading the same data from different frameworks Implementation: data description (persistent dictionary) is stored together with the data, application framework uses transient data dictionary for transient/persistent conversion Grid integration problem is very similar to the transient/persistent issue, since all objects become just the bytestream either on disk or on the net Technology Independence

7 Alexandre Vaniachine (ANL) ATLAS Database Architecture Independent of underlying persistency technology Ready for Grid integration Data description stored together with the data

8 Alexandre Vaniachine (ANL) For some time ATLAS has had both a ‘baseline’ technology (Objectivity) and a baseline evaluation strategy We implemented persistency in Objectivity for DC 0 A ROOT-based conversion service (AthenaROOT) provides the persistence technology for Data Challenge 1 Technology strategy is to adopt LHC-wide LHC Computing Grid (LCG) common persistence infrastructure (hybrid relational and ROOT-based streaming layer) as soon as this is feasible ATLAS is committed to ‘common solutions’ and look forward to LCG being the vehicle for providing these in an effective way Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent represenations Our architecture, in principle, is capable to provide language independence (in the long-term) Change of Persistency Baseline

9 Alexandre Vaniachine (ANL) Athena Software Framework ATLAS Computing is steadily progressing towards a highly functional software suite and implementing World Wide model (Note that a legacy software suite was produced and still exists and is used: so it can be done for ATLAS detector!) Athena Software Framework is used in Data Challenges for: generator events production fast simulation data conversion production QA reconstruction (off-line and High Level Trigger) Work in progress: integrating detector simulations Future Directions: Grid integration

10 Alexandre Vaniachine (ANL) Athena Architecture Features Separation of data and algorithms Memory management Transient/Persistent separation Athena has a common code base with GAUDI framework (LHCb)

11 Alexandre Vaniachine (ANL) ATLAS Detector Simulations Scale of the problem: 25,5 millions distinct volume copies 23 thousands different volume objects 4,673 different volume types managing up to few hundred pile-up events one million hits per event on average

12 Alexandre Vaniachine (ANL) MC event (HepMC) MC event (HepMC) MC event (HepMC) MC event (HepMC) Universal Simulation Box With all interfaces clearly defined, simulations become “Geant-neutral” You can in principle run G3, G4, Fluka, parameterized simulation with no effect on the end users G4 robustness test completed in DC0 Detector simulation program MC event (HepMC) MC event (HepMC) MCTruth Hits Digitisation DetDescription

13 Alexandre Vaniachine (ANL) Data Challenges Data Challenges prompted increasing integration of grid components in ATLAS software DC0 used to test the software readiness and the production pipeline continuity/robustness Scale was limited to < 1 M events Physics oriented: output for leptonic channels analysis and legacy Physics TDR data Despite the centralized production in DC0 we started deployment of our DC infrastructure (organized in 13 work packages) covering in particular areas related to Grid like: production tools Grid tools for metadata bookkeeping and replica management We started distributed production on the Grid in DC1

14 Alexandre Vaniachine (ANL) DC 0 Data Flow Multiple production pipelines Independent data transformation steps Quality Assurance procedures

15 Alexandre Vaniachine (ANL) Data Challenge 1 Reconstruction & analysis on a large scale: exercise data model, study ROOT I/O performance, identify bottlenecks, exercise distributed analysis, … Produce data for High Level Trigger (HLT) TDR & Physics groups Study performance of Athena and algorithms for use in High Level Trigger Test of ‘data-flow’ through HLT: byte-stream -> HLT-> algorithms -> recorded data High statistics needed (background rejection study) Scale ~10M simulated events in days, O(1000) PC’s Exercising LHC Computing model: involvement of CERN & outside-CERN sites Deployment of ATLAS Grid infrastructure: outside sites essential for this event scale Phase 1 (started in June) ~10oM Generator particles events (all data produced at CERN) ~10M simulated detector response events (June – July) ~10M reconstructed objects events Phase 2 (September –December) Introduction and use of new Event Data Model and Detector Description More Countries/Sites/Processors Distributed Reconstruction Additional samples including pile-up Distributed analyses Further tests of GEANT4

16 Alexandre Vaniachine (ANL) DC1 Phase 1 Resources Organization & infrastructure is in place lead by CERN ATLAS group 2000 processors, SI95sec adequate for ~ 4*10 7 simulated events 2/3 of data produced outside of CERN production on a global scale: Asia, Australia, Europe and North America 17 countries, 26 production sites Australia Melbourne Canada Alberta Triumf Czech Republic Prague Denmark Copenhagen France CCIN2P3 Lyon Switzerland CERN Taiwan Academia Sinica UK RAL Lancaster Liverpool (MAP) USA BNL... Germany Karlsruhe Italy: INFN CNAF Milan Roma1 Naples Japan Tokyo Norway Oslo Portugal FCUL Lisboa Russia: RIVK BAK JINR Dubna ITEP Moscow SINP MSU Moscow IHEP Protvino Spain IFIC Valencia Sweden Stockholm

17 Alexandre Vaniachine (ANL) Data Challenge 2 Schedule: Spring-Autumn 2003 Major physics goals: Physics samples have ‘hidden’ new physics Geant4 will play a major role Testing calibration and alignment procedures Scope increased to what has been achieved in DC0 & DC1 Scale at a sample of 10 8 events System at a complexity ~50% of system Distributed production, simulation, reconstruction and analysis: Use of GRID testbeds which will be built in the context of the Phase 1 of the LHC Computing Grid Project, Automatic ‘splitting’, ‘gathering’ of long jobs, best available sites for each job Monitoring on a ‘gridified’ logging and bookkeeping system, interface to a full ‘replica catalog’ system, transparent access to the data for different MSS system Grid certificates

18 Alexandre Vaniachine (ANL) Grid Integration in Data Challenges Grid and Data Challenge Communities - overlapping objectives: Grid middleware – testbed deployment, packaging, basic sequential services, user portals Data management – replicas, reliable file transfers, catalogs Resource management – job submission, scheduling, fault tolerance Quality Assurance – data reproducibility, application and data signatures, Grid QA

19 Alexandre Vaniachine (ANL) Grid Middleware ?

20 Alexandre Vaniachine (ANL) Grid Middleware !

21 Alexandre Vaniachine (ANL) ATLAS Grid Testbeds US-ATLAS Grid Testbed EU DataGrid NorduGrid For more information see presentations by Roger Jones and Aleksandr Konstantinov

22 Alexandre Vaniachine (ANL) Interfacing Athena to the GRID Areas of work: Data access (persistency), Event Selection, GANGA (job configuration & monitoring, resource estimation & booking, job scheduling, etc.), Grappa - Grid User Interface for Athena Athena/GAUDI Application GANGA/Grappa GUI Virtual Data Algorithms GRID Services Histograms Monitoring Results Making the Athena framework working in the GRID environment requires: Architectural design & components making use of the Grid services

23 Alexandre Vaniachine (ANL) Data Management Architecture AMI ATLAS Metatdata Interface MAGDA MAnager for Grid- based DAta VDC Virtual Data Catalog

24 Alexandre Vaniachine (ANL) AMI Architecture Data warehousing principle (star architecture)

25 Alexandre Vaniachine (ANL) MAGDA Architecture Component-based architecture emphasizing fault-tolerance

26 Alexandre Vaniachine (ANL) VDC Architecture Two-layer architecture

27 Alexandre Vaniachine (ANL) Introducing Virtual Data Recipes for producing the data (jobOptions, kumacs) has to be fully tested, the produced data has to be validated through a QA step Preparation production recipes takes time and efforts, encapsulating considerable knowledge inside. In DC0 more time has been spent to assemble the proper recipes than to run the production jobs When you got the proper recipes, producing the data is straightforward After the data have been produced, what do we have to do with the developed recipes? Do we really need to save them? Data are primary, recipes are secondary

28 Alexandre Vaniachine (ANL) Virtual Data Perspective GriPhyN project (www.griphyn.org) provides a different perspective: recipes are as valuable as the data production recipes are the Virtual Data If you have the recipes you do not need the data (you can reproduce them) recipes are primary, data are secondary Do not throw away the recipes, save them (in VDC) From the OO perspective: Methods (recipes) are encapsulated together with the data in Virtual Data Objects

29 Alexandre Vaniachine (ANL) VDC-based Production System High-throughput features: scatter-gather data processing architecture Fault tolerance features: independent agents pull-model for agent tasks assignment (vs push) local caching of output and input data (except Objy input) ATLAS DC0 and Dc1 parameter settings for simulations are recorded in the Virtual Data Catalog database using normalized components: parameter collections structured “orthogonally” Data reproducibility Application complexity Grid location Automatic “garbage collection” by the job scheduler: Agents pull the next derivation from VDC After the data has been materialized agents register “success” in VDC When previous invocation has not been completed within the specified timeout period, it is invoked again

30 Alexandre Vaniachine (ANL) Athena Generators HepMC.root digis.zebra atlsim Athena conversion digis.rootAthena recon recon.root QA.ntuple geometry.zebra Athena QA Athena Atlfast filtering.ntuple geometry.root Athena conversion QA.ntuple Athena QA Atlfast.root Atlfast recon recon.root Exercising rich possibilities for data processing comprised of multiple independent data transformation steps Tree-like Data Flow

31 Alexandre Vaniachine (ANL) Data Reproducibility The goal is to validate DC samples productions by insuring the reproducibility of simulations run at different sites We need the tool capable to establish the similarity or the identity of two samples produced in different conditions, e.g at different sites A very important (and sometimes overlooked) component for the Grid computing deployment It is complementary to the software and/or data digital signatures approaches that are still in the R&D phase

32 Alexandre Vaniachine (ANL) Grid Production Validation Simulations are run in different conditions; for instance, same generation input but different production sites For each sample, Reconstruction, i.e Atrecon is run to produce standard CBNT ntuples The validation application launches specialized independent analyses for ATLAS subsystems For each sample standard histograms are produced

33 Alexandre Vaniachine (ANL) Comparison Procedure Test sample Reference sample Superimposed Samples Contributions to  2

34 Alexandre Vaniachine (ANL) Comparison procedure ends with a  2 -bar chart summary Give a pretty nice overview of how samples compare: Summary of Comparison

35 Alexandre Vaniachine (ANL) Example of Finding Comparing energy in calorimeters for Z  2l samples DC0, DC1 It works! Difference caused by the  cut at generation

36 Alexandre Vaniachine (ANL) Summary ATLAS computing is in the middle of first period of Data Challenges of increasing scope and complexity and is steadily progressing towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data These Data Challenges are executed at the prototype tier centers and use as much as possible the Grid middleware being developed in Grid projects around the world In close collaboration between the Grid and Data Challenge communities ATLAS is testing large-scale testbed prototypes, deploying prototype components to integrate and test Grid software in a production environment, and running Data Challenge 1 production in 26 prototype tier centers in 17 countries on four continents Quite promising start for ATLAS Data Challenges!


Download ppt "Alexandre Vaniachine (ANL) ATLAS Collaboration Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 Alexandre Vaniachine (ANL) Data."

Similar presentations


Ads by Google