Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.

Similar presentations


Presentation on theme: "ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN."— Presentation transcript:

1 ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN

2 Introduction ATLAS (A ToroidaL ApparatuS) is one of the four LHC (Large Hadron Collider) experiments based at CERN. Detector for the study of high- energy proton-proton collisions. The offline computing will have to deal with an output event rate of 200 Hz. i.e. 10 9 events per year, with an average event size of 1.6 MB. Storage: - Raw recording rate 320 MB/sec - Accumulating at 5-8 PB/year - 20 PB of disk - 10 PB of tape Processing: - 40,000 of today’s fastest PCs A solution: Grid

3 In 2002 ATLAS computing planned a first series of Data Challenges (DC’s) in order to validate its: - Computing Model - Software - Data Model The ATLAS collaboration decided to perform the DCs using the Grid middleware developed in several Grid projects (Grid flavours): - LHC Computing Grid project (LCG) / EGEE (Enabling Grids for E-science in Europe), to which CERN is committed - Open Science Grid / Grid3 - NorduGRID / ARC ATLAS collaboration is preparing for data taking and analysis at the CERN LHC, scheduled to start operating in 2008. Physics studies in ATLAS will require analysis of data volumes of the order of PetaBytes per year. The analysis will reply on the computing resources and the data will be distributed over the world-wide collaborating institutions. ATLAS Data Challenge In order to handle the task of ATLAS DCs, an automated production system (ProdSys) was designed.

4 The production database (ProdDB), which contains abstract job definitions; The Eowyn supervisor that reads the production database for job definitions and present them to the different Grid executors in an easy-to-parse XML format; The Executors, one for each Grid flavor, that receive the job-definitions in XML format and convert them to the JDL of that particular Grid; DDM, the Atlas Distributed Data Management System, moves files from their temporary output locations to their final destination on some Storage Element and registers the files in the catalogue of that Grid. The ATLAS production system (ProdSys) consists of 4 components: The ATLAS ProdSys has been successfully used to run production of ATLAS jobs at an unprecedented scale. On successful days there were more then 10000 jobs processed by the system. The experiences obtained operating the system, which includes several grid systems, are considered to be essential also to perform analysis using Grid resources. ATLAS Production System (ProdSys)

5 - Distributed Analysis Strategy The grid-based ATLAS distributed analysis aims to deal with the challenge of supporting distributed users, data and processing enabling physicists to exploit the whole computing resource provided by the three ATLAS grid infrastructures: LCG/EGEE, OSG/Grid3 and Nordugrid/ARC. Distributed Analysis must support all the analysis activities, including the simulated data production, hiding users from the complexities of the grid environment. According to the ATLAS computing model, Distributed Analysis will enable users to submit jobs from any location helping them to effectively use the grid for performing their analysis activities. In addition, Distributed Analysis should satisfy the ATLAS analysis model requirement: data is distributed among several computing facilities and analysis jobs in turn routed base on the availability of relevant data. ATLAS strategy takes several approaches for Distributed Analysis to fully exploit its major grid deployments. ATLAS strategy takes several approaches for Distributed Analysis to fully exploit its major grid deployments

6 Using latest version of Production System: - Supervisor: Eowyn - Executors: Condor-G, Lexor - Data Management: DDM and LFC catalog - Database: dedicated DA database Generic analysis transformation has been created: - compiles user code/package on the worker node - processes Analysis Object Data (AOD) input files - produces histogram + n-tuple file as outputs User Interface: AtCom4 The ATLAS Commander (ATCOM) was used as a graphical user interface. Currently used for task and job definitions: - task: contains summary information about the jobs to be run (input/output datasets, transformation parameters, resource + environment requirements, etc). - job: concrete parameters needed for running, but no Grid-specifics - Following the ProdDB schema and xml description Setup for Distributed Analysis using ProdSys

7 A job in GANGA is constructed from a set of building blocks. All jobs have to specify the software to be run (application) and the processing system (back-end) to be used. Setup for Distributed Analysis using GANGA GANGA is a user-friendly job definition and management front-end tool. Allows switching between testing on a local batch system and large-scale data processing on the distributed resources (Grid). It supports applications based on ATHENA framework, ProdSys and DQ2 data-management system. It is coded using the Python framework.

8 One dataset was used: - 50 events per file, a total of 400 files. - Jobs with 100 input files each were defined with ATCOM and GANGA. - these jobs ran in several LCG sites. Each job produced three output files (ntuple, histogram and log) stored at Castor. - ROOT has been used to merge these histogram output files, in a post-processing step. Experience running the Analysis (I) The algorithm of choice has been a Z H  t t-bar, a heavy Z decaying into tops in the Little Higgs model. This dataset was made in the official production for the ATLAS Exotics working group using the Athena full chain simulation. A total of 400 AOD´s were produced, each AOD containing 50 events (20000 events in total). The analysis has been performed using the production system and GANGA.

9 GANGA EXPERIENCE: - The IFIC Tier-2 infrastructure was used to process jobs using our CE with dedicated queues for analysis jobs. The processing is started within a few minutes. Also jobs were sent to several LCG sites. In this case the waiting time to get the job executing were very long because of the CE queues were occupied by the production job. Hence, the deployed of the job priority mechanism is relevant important to take full advance from the whole grid infrastructure for distributed analysis. Concerning to GANGA, in terms of configuring, submitting, monitoring and output retrieving has demonstrated a good performance. However, error handling and recovery of failed jobs in the user analysis code needs to be improved by an automatic error parsing Experience running the Analysis (II) PRODUCTION SYSTEM EXPERIENCE: - Analysis has been done running our own supervisor and Lexor/CondorG instance. - Delays due to data transfer are not an issue any more because AOD input is available on-site and jobs are sent to those sites only. - System setup is not able yet to support long queues (simulation) and short queues (analysis) in parallel. Work in progress on Job Prorities: - queues are filled with simulation jobs - long pending times for analysis jobs

10 Conclusions - The analysis has been launched over 350000. - Z H  ttbar reconstructed masses, after merging the histogram files, were produced through the ATLAS production system and GANGA (“a la Grid”) with 100 input files each. - With free resources, system was able to process 10k-event jobs in about 10 min (total).


Download ppt "ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN."

Similar presentations


Ads by Google