Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)

Slides:



Advertisements
Similar presentations
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
CERN – June 2007 View of the ATLAS detector (under construction) 150 million sensors deliver data … … 40 million times per second.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
USATLAS SC4. 2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
ATLAS: Heavier than Heaven? Roger Jones Lancaster University GridPP19 Ambleside 28 August 2007.
Zprávy z ATLAS SW Week March 2004 Seminář ATLAS SW CZ Duben 2004 Jiří Chudoba FzÚ AV CR.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
INFSO-RI Enabling Grids for E-sciencE Experience of using gLite for analysis of ATLAS combined test beam data A. Zalite / PNPI.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Summary of Services for the MC Production Patricia Méndez Lorenzo WLCG T2 Workshop CERN, 12 th June 2006.
SC4 Planning Planning for the Initial LCG Service September 2005.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
ATLAS Grid Computing Rob Gardner University of Chicago ICFA Workshop on HEP Networking, Grid, and Digital Divide Issues for Global e-Science THE CENTER.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ATLAS Experience on Large Scale Productions on the Grid CHEP-2006 Mumbai 13th February 2006 Gilbert Poulard (CERN PH-ATC) on behalf of ATLAS Data Challenges;
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
1 S. JEZEQUEL- First chinese-french workshop 13 December 2006 Grid: An LHC user point of vue S. Jézéquel (LAPP-CNRS/Université de Savoie)
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Data Challenge with the Grid in ATLAS
Readiness of ATLAS Computing - A personal view
ATLAS DC2 ISGC-2005 Taipei 27th April 2005
US ATLAS Physics & Computing
R. Graciani for LHCb Mumbay, Feb 2006
LHC Data Analysis using a worldwide computing grid
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)

IMPF-2006G. Poulard - CERN PH-ATC2 Overview  Introduction  LHC experiments Computing challenges  WLCG: Worldwide LHC Computing Grid  ATLAS experiment o Building the Computing System  Conclusions

IMPF-2006G. Poulard - CERN PH-ATC3 LHC (CERN) Introduction: LHC/CERN Mont Blanc, 4810 m Geneva

IMPF-2006G. Poulard - CERN PH-ATC4 LHC Computing Challenges  Large distributed community  Large data volume … and access to it to everyone  Large CPU capacity

IMPF-2006G. Poulard - CERN PH-ATC5 Challenge 1: Large, distributed community CMS ATLAS LHCb ~ 5000 Physicists around the world - around the clock “Offline” software effort: 1000 person-years per experiment Software life span: 20 years

IMPF-2006G. Poulard - CERN PH-ATC6 Large data volume Rate [Hz] RAW [MB] ESD rDST RECO [MB] AOD [kB] Monte Carlo [MB/evt] Monte Carlo % of real ALICE HI ALICE pp ATLAS CMS LHCb days running in seconds/year pp from 2008 on  ~2 x 10 9 events/experiment 10 6 seconds/year heavy ion

IMPF-2006G. Poulard - CERN PH-ATC7 Large CPU capacity CPU (MSi2k)Disk (PB)Tape (PB) Tier CERN Analysis Facility Sum of Tier-1s Sum of Tier-2s Total ~50000 today’s CPU  ATLAS resources in 2008 o Assume 2 x 10 9 events per year (1.6 MB per event) o First pass reconstruction will run at CERN Tier-0 o Re-processing will be done at Tier-1s (Regional Computing Centers) (10) o Monte Carlo simulation will be done at Tier-2s (e.g. Physics Institutes) (~30) 4  Full simulation of ~20% of the data rate o Analysis will be done at Analysis Facilities; Tier-2s; Tier-3s; …

IMPF-2006G. Poulard - CERN PH-ATC8 CPU Requirements CERN Tier-1 Tier-2 58% pledged

IMPF-2006G. Poulard - CERN PH-ATC9 Disk Requirements CERN Tier-1 Tier-2 54% pledged

IMPF-2006G. Poulard - CERN PH-ATC10 Tape Requirements CERN Tier-1 75% pledged

IMPF-2006G. Poulard - CERN PH-ATC11 LHC Computing Challenges  Large distributed community  Large data volume … and access to it to everyone  Large CPU capacity  How to face the problems?  CERN Computing Review ( )  “Grid” is the chosen solution  “Build” the LCG (LHC Computing Grid) project  Roadmap for the LCG project  And for experiments o In 2005 LCG became WLCG

IMPF-2006G. Poulard - CERN PH-ATC12 What is the Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations.  The Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe. o Global Resource Sharing o Secure Access o Resource Use Optimization o The “Death of Distance ” - networking o Open Standards

IMPF-2006G. Poulard - CERN PH-ATC13 The Worldwide LHC Computing Grid Project - WLCG  Collaboration o LHC Experiments o Grid projects: Europe, US o Regional & national centres  Choices o Adopt Grid technology. o Go for a “Tier” hierarchy  Goal o Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors. grid for a physics study group Tier3 physics department    Desktop Germany Tier 1 USA UK France Italy Taipei SARA Spain CERN Tier 0 Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group

IMPF-2006G. Poulard - CERN PH-ATC14  Members o The experiments o The computing centres – Tier-0, Tier-1, Tier-2  Memorandum of understanding o Resources, services, defined service levels o Resource commitments pledged for the next year, with a 5-year forward look The Worldwide LCG Collaboration

IMPF-2006G. Poulard - CERN PH-ATC15 WLCG services – built on two major science grid infrastructures EGEE - Enabling Grids for E-SciencE OSG - US Open Science Grid

IMPF-2006G. Poulard - CERN PH-ATC16 Enabling Grids for E-SciencE EU supported project Develop and operate a multi- science grid Assist scientific communities to embrace grid technology First phase concentrated on operations and technology Second phase ( ) Emphasis on extending the scientific, geographical and industrial scope  world-wide Grid infrastructure  international collaboration  in phase 2 will have > 90 partners in 32 countries

IMPF-2006G. Poulard - CERN PH-ATC17 Open Science Grid  Multi-disciplinary Consortium o Running physics experiments: CDF, D0, LIGO, SDSS, STAR o US LHC Collaborations o Biology, Computational Chemistry o Computer Science research o Condor and Globus o DOE Laboratory Computing Divisions o University IT Facilities  OSG today o 50 Compute Elements o 6 Storage Elements o VDT o 23 VOs

IMPF-2006G. Poulard - CERN PH-ATC18 Architecture – Grid services  Storage Element o Mass Storage System (MSS) (CASTOR, Enstore, HPSS, dCache, etc.) o Storage Resource Manager (SRM) provides a common way to access MSS, independent of implementation o File Transfer Services (FTS) provided e.g. by GridFTP or srmCopy  Computing Element o Interface to local batch system e.g. Globus gatekeeper. o Accounting, status query, job monitoring  Virtual Organization Management o Virtual Organization Management Services (VOMS) o Authentication and authorization based on VOMS model.  Grid Catalogue Services o Mapping of Globally Unique Identifiers (GUID) to local file name o Hierarchical namespace, access control  Interoperability o EGEE and OSG both use the Virtual Data Toolkit (VDT) o Different implementations are hidden by common interfaces

IMPF-2006G. Poulard - CERN PH-ATC19 Technology - Middleware  Currently, the LCG-2 middleware is deployed in more than 100 sites  It originated from Condor, EDG, Globus, VDT, and other projects.  Will evolve now to include functionalities of the gLite middleware provided by the EGEE project which has just been made available.  Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and gLite.  VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and gLite) maintaining close relations with VDT, Condor and Globus.

IMPF-2006G. Poulard - CERN PH-ATC20 Technology – Fabric Technology  Moore’s law still holds for processors and disk storage o For CPU and disks we count a lot on the evolution of the consumer market o For processors we expect an increasing importance of 64-bit architectures and multicore chips  Mass storage (tapes and robots) is still a computer centre item with computer centre pricing o It is too early to conclude on new tape drives and robots  Networking has seen a rapid evolution recently o Ten-gigabit Ethernet is now in the production environment o Wide-area networking can already now count on 10 Gb connections between Tier-0 and Tier-1s. This will move gradually to the Tier-1 – Tier-2 connections.

IMPF-2006G. Poulard - CERN PH-ATC21 Common Physics Applications  Core software libraries o SEAL-ROOT merger o Scripting: CINT, Python o Mathematical libraries o Fitting, MINUIT (in C++)  Data management o POOL: ROOT I/O for bulk data RDBMS for metadata o Conditions database – COOL  Event simulation o Event generators: generator library (GENSER) o Detector simulation: GEANT4 (ATLAS, CMS, LHCb) o Physics validation, compare GEANT4, FLUKA, test beam  Software development infrastructure o External libraries o Software development and documentation tools o Quality assurance and testing o Project portal: Savannah

IMPF-2006G. Poulard - CERN PH-ATC22 The Hierarchical Model  Tier-0 at CERN o Record RAW data (1.25 GB/s ALICE; 320 MB/s ATLAS) o Distribute second copy to Tier-1s o Calibrate and do first-pass reconstruction  Tier-1 centres (11 defined) o Manage permanent storage – RAW, simulated, processed o Capacity for reprocessing, bulk analysis  Tier-2 centres (>~ 100 identified) o Monte Carlo event simulation o End-user analysis  Tier-3 o Facilities at universities and laboratories o Access to data and processing in Tier-2s, Tier-1s o Outside the scope of the project

IMPF-2006G. Poulard - CERN PH-ATC23 Tier-1s Tier-1 Centre Experiments served with priority ALICEATLASCMSLHCb TRIUMF, CanadaX GridKA, GermanyXXXX CC, IN2P3, FranceXXXX CNAF, ItalyXXXX SARA/NIKHEF, NLXXX Nordic Data Grid Facility (NDGF) XXX ASCC, TaipeiXX RAL, UKXXXX BNL, USX FNAL, USX PIC, SpainXXX

IMPF-2006G. Poulard - CERN PH-ATC24 Tier-2s ~100 identified – number still growing

IMPF-2006G. Poulard - CERN PH-ATC25 Tier Connectivity National Research Networks (NRENs) at Tier-1s: ASnet LHCnet/ESnet GARR LHCnet/ESnet RENATER DFN SURFnet6 NORDUnet RedIRIS UKERNA CANARIE

IMPF-2006G. Poulard - CERN PH-ATC26 Prototypes  It is important that the hardware and software systems developed in the framework of LCG be exercised in more and more demanding challenges  Data Challenges have been recommended by the ‘Hoffmann Review’ of They though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed. o A series of Data Challenges have been run by the 4 experiments  Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over extended period leading to stable production services.  The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware gLite for each of the LHC experiments.

IMPF-2006G. Poulard - CERN PH-ATC27 Service Challenges  Purpose real grid service o Understand what it takes to operate a real grid service – run for days/weeks at a time (not just limited to experiment Data Challenges) o Trigger and verify Tier1 & large Tier-2 planning and deployment – - tested with realistic usage patterns o Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance  Four progressive steps from October 2004 thru September 2006 o End SC1 – data transfer to subset of Tier-1s o Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier- 2s o 2 nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services o Jun-Sep 2006 – SC4 – pilot service

IMPF-2006G. Poulard - CERN PH-ATC28 Key dates for Service Preparation SC3 LHC Service Operation Full physics run First physics First beams cosmics Sep05 - SC3 Service Phase Jun06 –SC4 Service Phase Sep06 – Initial LHC Service in stable operation SC4 SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s) SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput) LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput Apr07 – LHC Service commissioned

IMPF-2006G. Poulard - CERN PH-ATC29 ARDA: A Realisation of Distributed Analysis for LHC  Distributed analysis on the Grid is the most difficult and least defined topic  ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware.  ALICE uses the AliROOT framework based on PROOF.  ATLAS has used DIAL services with the gLite prototype as backend; this is rapidly evolving.  CMS has prototyped the ‘ARDA Support for CMS Analysis Processing’ (ASAP) that us used by several CMS physicists for daily analysis work.  LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb.

IMPF-2006G. Poulard - CERN PH-ATC30 Production Grids What has been achieved  Basic middleware  A set of baseline services agreed and initial versions in production  All major LCG sites active  1 GB/sec distribution data rate mass storage to mass storage, > 50% of the nominal LHC data rate  Grid job failure rate 5-10% for most experiments, down from ~30% in 2004  Sustained 10K jobs per day  > 10K simultaneous jobs during prolonged periods

IMPF-2006G. Poulard - CERN PH-ATC31 Summary on WLCG  Two grid infrastructures are now in operation, on which we are able to complete the computing services for LHC  Reliability and performance have improved significantly over the past year  The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams.  Development of new functionality and services must continue, but we must be careful that this does not interfere with the main priority for this year – reliable operation of the baseline services From Les Robertson (CHEP’06)

IMPF-2006G. Poulard - CERN PH-ATC32 ATLAS ATLAS  Detector for the study of high-energy proton-proton collision.  The offline computing will have to deal with an output event rate of 200 Hz. i.e 2x10 9 events per year with an average event size of 1.6 Mbyte.  Researchers are spread all over the world. ATLAS: ~ 2000 Collaborators ~150 Institutes 34 Countries Diameter25 m Barrel toroid length26 m Endcap end-wall chamber span46 m Overall weight 7000 Tons A Toroidal LHC ApparatuS

IMPF-2006G. Poulard - CERN PH-ATC33 Tier2 Centre ~200kSI2k Event Builder Event Filter ~159kSI2k T0 ~5MSI2k UK Regional Centre (RAL) US Regional Centre Spanish Regional Centre (PIC) Italian Regional Centre SheffieldManchesterLiverpool Lancaster ~0.25TIPS Workstations 10 GB/sec 450 Mb/sec MB/s Some data for calibration and monitoring to institutess Calibrations flow back Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Physics data cache ~Pb/sec ~ 300MB/s/T1 /expt Tier2 Centre ~200kSI2k  622Mb/s Tier 0 Tier 1 Desk top PC (2004) = ~1 kSpecInt2k Northern Tier ~200kSI2k Tier 2  ~200 Tb/year/T2  ~7.7MSI2k/T1  ~2 Pb/year/T1  ~9 Pb/year/T1  No simulation  622Mb/s The Computing Model

IMPF-2006G. Poulard - CERN PH-ATC34 ATLAS Data Challenges (1)  LHC Computing Review (2001) “Experiments should carry out Data Challenges of increasing size and complexity to validate their Computing Model their Complete Software suite their Data Model to ensure the correctness of the technical choices to be made”

IMPF-2006G. Poulard - CERN PH-ATC35 ATLAS Data Challenges (2)  DC1 ( ) o First ATLAS exercise on world-wide scale  O(1000) CPUs peak o Put in place the full software chain  Simulation of the data; digitization; pile-up; reconstruction o Production system  Tools Bookkeeping of data and Jobs (~AMI); Monitoring; Code distribution o “Preliminary” Grid usage  NorduGrid: all production performed on the Grid  US: Grid used at the end of the exercise  LCG-EDG: some testing during the Data Challenge but not “real” production o At least one person per contributing site  Many people involved o Lessons learned  Management of failures is a key concern  Automate to cope with large amount of jobs o “Build” the ATLAS DC community  Physics Monte Carlo data needed for ATLAS High Level Trigger Technical Design Report

IMPF-2006G. Poulard - CERN PH-ATC36 ATLAS Data Challenges (3)  DC2 (2004) o Similar exercise as DC1 (scale; physics processes) BUT o Introduced the new ATLAS Production System (ProdSys)  Unsupervised production across many sites spread over three different Grids (US Grid3; ARC/NorduGrid; LCG-2)  Based on DC1 experience with AtCom and GRAT Core engine with plug-ins  4 major components Production supervisor Executor Common data management system Common production database  Use middleware components as much as possible Avoid inventing ATLAS’s own version of Grid –Use middleware broker, catalogs, information system, …  Immediately followed by “Rome” production (2005) o Production of simulated data for an ATLAS Physics workshop in Rome in June 2005 using the DC2 infrastructure.

IMPF-2006G. Poulard - CERN PH-ATC37 ATLAS Production System  ATLAS uses 3 Grids o LCG (= EGEE) o ARC/NorduGrid (evolved from EDG) o OSG/Grid3 (US)  Plus possibility for local batch submission (4 interfaces)  Input and output must be accessible from all Grids  The system makes use of the native Grid middleware as much as possible (e.g.. Grid catalogs); not “re-inventing” its own solution.

IMPF-2006G. Poulard - CERN PH-ATC38 ATLAS Production System  The production database, which contains abstract job definitions  A supervisor (Windmill; Eowyn) that reads the production database for job definitions and present them to the different Grid executors in an easy-to- parse XML format  The Executors, one for each Grid flavor, that receives the job-definitions in XML format and converts them to the job description language of that particular Grid  DonQuijote (DQ), the ATLAS Data Management System, moves files from their temporary output locations to their final destination on some Storage Elements and registers the files in the Replica Location Service of that Grid  In order to handle the task of ATLAS DCs an automated Production system was developed.  It consists of 4 components:

IMPF-2006G. Poulard - CERN PH-ATC39 The 3 Grid flavors: LCG-2 Number of sites; resources are evolving quickly ATLAS DC2 Autumn 2004

IMPF-2006G. Poulard - CERN PH-ATC40 The 3 Grid flavors: Grid3  The deployed infrastructure has been in operation since November 2003  At this moment running 3 HEP and 2 Biological applications  Over 100 users authorized to run in GRID3 Sep sites, multi-VO shared resources ~3000 CPUs (shared) ATLAS DC2 Autumn 2004

IMPF-2006G. Poulard - CERN PH-ATC41 The 3 Grid flavors: NorduGrid > 10 countries, 40+ sites, ~4000 CPUs, ~30 TB storage  NorduGrid is a research collaboration established mainly across Nordic Countries but includes sites from other countries.  They contributed to a significant part of the DC1 (using the Grid in 2002).  It supports production on several operating systems. ATLAS DC2 Autumn 2004

IMPF-2006G. Poulard - CERN PH-ATC42 Hits MCTruth Digits (RDO) MCTruth Bytestream Raw Digits ESD Geant4 Reconstruction Pile-up Bytestream Raw Digits Bytestream Raw Digits Hits MCTruth Digits (RDO) MCTruth Physics events Events HepMC Events HepMC Hits MCTruth Digits (RDO) MCTruth Geant4 Digitization Digits (RDO) MCTruth Bytestream Raw Digits Bytestream Raw Digits Bytestream Raw Digits Events HepMC Hits MCTruth Geant4 Pile-up Digitization Mixing Reconstruction ESD Pythia Event generation Detector Simulation Digitization (Pile-up) Reconstruction Event Mixing Byte stream Events HepMC Min. bias Events Piled-up events Mixed events With Pile-up ~5 TB 20 TB30 TB 20 TB5 TB TB Volume of data for 10 7 events Persistency: Athena-POOL Production phases AOD

IMPF-2006G. Poulard - CERN PH-ATC43 ATLAS productions  DC2 o Few datasets o Different type of jobs  Physics Events Generation Very short  Geant simulation Geant3 in DC1; Geant4 in DC2 & “Rome” Long: more than 10 hours  Digitization Medium: ~5 hours  Reconstruction short o All types of jobs run sequentially  Each phase one after the other  “Rome” o Many different (>170) datasets  Different physics channels o Same type of jobs  Event Generation; Simulation, etc. o All type of jobs run in parallel  Now “continuous” production o Goal is to reach 2M events per week. The different type of running has a large impact on the production rate

IMPF-2006G. Poulard - CERN PH-ATC44 ATLAS Productions: countries (sites)  Australia (1) (0)  Austria (1)  Canada (4) (3)  CERN (1)  Czech Republic (2)  Denmark (4) (3)  France (1) (4)  Germany (1+2)  Greece (0) (1)  Hungary (0) (1)  Italy (7) (17)  Japan (1) (0)  Netherlands (1) (2)  Norway (3) (2)  Poland (1)  Portugal (0) (1)  Russia (0) (2)  Slovakia (0) (1)  Slovenia (1)  Spain (3)  Sweden (7) (5)  Switzerland (1) (1+1)  Taiwan (1)  UK (7) (8)  USA (19) DC2: 20 countries; 69 sites “Rome”: 22 countries; 84 sites DC2: 13 countries; 31 sites “Rome”: 17 countries; 51 sites DC2: 7 countries; 19 sites “Rome”: 7 countries; 14 sites Spring 2006: 30 countries; 126 sites LCG: 104 OSG/Grid3: 8 NDGF: 14

IMPF-2006G. Poulard - CERN PH-ATC45 ATLAS DC2: Jobs Total 20 countries 69 sites ~ Jobs ~ 2 MSi2k.months As of 30 November 2004

IMPF-2006G. Poulard - CERN PH-ATC46 Rome production Number of Jobs As of 17 June % 5 % 6 % 4 % 5 % 4 % 6 %

IMPF-2006G. Poulard - CERN PH-ATC47 Rome production statistics  173 datasets  6.1 M events simulated and reconstructed (without pile- up)  Total simulated data 8.5 M events  Pile-up done for 1.3 M events o 50 K reconstructed

IMPF-2006G. Poulard - CERN PH-ATC48 ATLAS Production (2006)

IMPF-2006G. Poulard - CERN PH-ATC49 ATLAS Production (July May 2005)

IMPF-2006G. Poulard - CERN PH-ATC50 ATLAS & Service Challenges 3  Tier-0 scaling tests o Test of the operations at CERN Tier-0 o Original goal: 10% exercise  Preparation phase July-October 2005  Tests October’05-January’06

IMPF-2006G. Poulard - CERN PH-ATC51 ATLAS & Service Challenges 3  The Tier-0 facility at CERN is responsible for the following operations: o Calibration and alignment; o First-pass ESD production; o First-pass AOD production; o TAG production; o Archiving of primary RAW and first-pass ESD, AOD and TAG data; o Distribution of primary RAW and first-pass ESD, AOD and TAG data.

IMPF-2006G. Poulard - CERN PH-ATC52 ATLAS SC3/Tier-0 (1)  Components of Tier-0 o Castor mass storage system and local replica catalogue; o CPU farm; o Conditions DB; o TAG DB; o Tier-0 production database; o Data management system, Don Quijote 2 (DQ2) o To be orchestred by the Tier-0 Management System:  TOM, based on ATLAS Production System (ProdSys)

IMPF-2006G. Poulard - CERN PH-ATC53 ATLAS SC3/Tier-0 (2)  Deploy and test o LCG/gLite components (main focus on T0 exercise) T0  FTS server at T0 and T1 T0T1  LFC catalog at T0, T1 and T2 T0T1  VOBOX at T0, T1 and T2 T0T1  SRM Storage element at T0, T1 and T2 o ATLAS DQ2 specific components  Central DQ2 dataset catalogs  DQ2 site services Sitting in VOBOXes  DQ2 client for TOM

IMPF-2006G. Poulard - CERN PH-ATC54 ATLAS Tier-0 EF CPU T1 castor tape RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s

IMPF-2006G. Poulard - CERN PH-ATC55 Scope of the Tier-0 Scaling Test  It was only possible to test o EF writing into Castor o ESD/AOD production on reco farm o archiving to tape o export to Tier-1s of RAW/ESD/AOD  the goal was to test as much as possible, as realistic as possible  mainly data-flow/infrastructure test (no physics value) calibration & alignment processing not included yet CondDB and TagDB streams

IMPF-2006G. Poulard - CERN PH-ATC56 Oct-Dec 2005 Test: Some Results Castor Writing Rates (Dec 19-20) - EF farm  Castor (write.raw) - reco farm  Castor - reco jobs: write.esd + write.aodtmp - AOD-merging jobs: write.aod

IMPF-2006G. Poulard - CERN PH-ATC57 Tier-0 Internal Test, Jan 28-29, 2006 READING (nom. rate: 780 MB/s) - Disk  WN - Disk  Tape WRITING (nom. rate: 460 MB/s) - SFO  Disk - WN  Disk 440 M 460 M 780 M WRITING (nom. rate: 440 MB/s) - Disk  Tape

IMPF-2006G. Poulard - CERN PH-ATC58 ATLAS SC4 Tests (June to December 2006)  Complete Tier-0 test o Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm o Calibration loop and handling of conditions data  Including distribution of conditions data to Tier-1s (and Tier-2s) o Transfer of RAW, ESD, AOD and TAG data to Tier-1s o Transfer of AOD and TAG data to Tier-2s o Data and dataset registration in DB  Distributed production o Full simulation chain run at Tier-2s (and Tier-1s)  Data distribution to Tier-1s, other Tier-2s and CAF o Reprocessing raw data at Tier-1s  Data distribution to other Tier-1s, Tier-2s and CAF  Distributed analysis o “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) o Tests of performance of job submission, distribution and output retrieval Need to define and test Tiers infrastructure and Tier-1 Tier-1 Tier-2s associations

IMPF-2006G. Poulard - CERN PH-ATC59 ATLAS Tier-1s “2008” Resources CPUDiskTape MSI2K%PB% % CanadaTRIUMF FranceCC-IN2P Germany FZK Italy CNAF Nordic Data Grid Facility NetherlandsSARA SpainPIC TaiwanASGC UKRAL USABNL Total 2008 pledged needed missing

IMPF-2006G. Poulard - CERN PH-ATC60 ATLAS Tiers Association (SC4-draft) Associated Tier-1 Tier-2 or planned Tier-2 % DiskTB%PB% CanadaTRIUMF5.3SARA East T2 Fed. West T2 Fed. FranceCC-IN2P313.5BNL CC-IN2P3 AF GRIFLPCHEP-Beijing Romanian T2 GermanyFZK-GridKa10.5BNLDESY Munich Fed. Freiburg Uni. Wuppertal Uni. FZU AS (CZ) Polish T2 Fed. ItalyCNAF7.5RAL INFN T2 Fed. NetherlandsSARA13.0 TRIUMF ASGC Nordic Data Grid Facility 5.5PIC SpainPIC5.5NDGF ATLAS T2 Fed TaiwanASGC7.7SARA Taiwan AF Fed UKRAL7.5CNAFGrid LondonNorthGridScotGridSouthGrid USABNL24 CC-IN2P3 FZK-GridKa BU/HU T2Midwest T2 Southwest T2 No association (yet) Melbourne Uni. ICEPP Tokyo LIP T2 HEP-IL Fed. Russian Fed. CSCS (CH)UIBK Brazilian T2 Fed.

IMPF-2006G. Poulard - CERN PH-ATC61 Computing System Commissioning  We have defined the high-level goals of the Computing System Commissioning operation during 2006 o More a running-in of continuous operation than a stand-alone challenge  Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007: o Calibration and alignment procedures and conditions DB o Full trigger chain o Event reconstruction and data distribution o Distributed access to the data for analysis  At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates

IMPF-2006G. Poulard - CERN PH-ATC62

IMPF-2006G. Poulard - CERN PH-ATC63 Conclusions (ATLAS)  Data Challenges (1,2); productions(“Rome”; “current (continuous)”) o Have proven that the 3 Grids LCG-EGEE; OSG/Grid3 and Arc/NorduGrid can be used in a coherent way for real large scale productions  Possible, but not easy  In SC3 o We succeeded to reach the nominal data transfer at Tier-0 (internally) and reasonable transfers to Tier-1  SC4 o Should allow us to test the full chain using the new WLCG middleware and infrastructure and the new ATLAS Production and Data management systems o This will include a more complete Tier-0 test; Distributed productions and distributed analysis tests  Computing System Commissioning o Will have as main goal to have a full working and operational system o Leading to a Physics readiness report

IMPF-2006G. Poulard - CERN PH-ATC64 Thank you