Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)

Similar presentations


Presentation on theme: "Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)"— Presentation transcript:

1 Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)

2 IMPF-2006G. Poulard - CERN PH-ATC2 Overview  Introduction  LHC experiments Computing challenges  WLCG: Worldwide LHC Computing Grid  ATLAS experiment o Building the Computing System  Conclusions

3 IMPF-2006G. Poulard - CERN PH-ATC3 LHC (CERN) Introduction: LHC/CERN Mont Blanc, 4810 m Geneva

4 IMPF-2006G. Poulard - CERN PH-ATC4 LHC Computing Challenges  Large distributed community  Large data volume … and access to it to everyone  Large CPU capacity

5 IMPF-2006G. Poulard - CERN PH-ATC5 Challenge 1: Large, distributed community CMS ATLAS LHCb ~ 5000 Physicists around the world - around the clock “Offline” software effort: 1000 person-years per experiment Software life span: 20 years

6 IMPF-2006G. Poulard - CERN PH-ATC6 Large data volume Rate [Hz] RAW [MB] ESD rDST RECO [MB] AOD [kB] Monte Carlo [MB/evt] Monte Carlo % of real ALICE HI 100 12.5 2.5250 300100 ALICE pp 100 1 0.044 0.4100 ATLAS 200 1.6 0.5100 220 CMS 150 1.5 0.2550 2100 LHCb 2000 0.025 0.520 50 days running in 2007 10 7 seconds/year pp from 2008 on  ~2 x 10 9 events/experiment 10 6 seconds/year heavy ion

7 IMPF-2006G. Poulard - CERN PH-ATC7 Large CPU capacity CPU (MSi2k)Disk (PB)Tape (PB) Tier-04.10.45.7 CERN Analysis Facility2.71.90.5 Sum of Tier-1s24.014.49.0 Sum of Tier-2s19.98.70.0 Total50.725.415.2 ~50000 today’s CPU  ATLAS resources in 2008 o Assume 2 x 10 9 events per year (1.6 MB per event) o First pass reconstruction will run at CERN Tier-0 o Re-processing will be done at Tier-1s (Regional Computing Centers) (10) o Monte Carlo simulation will be done at Tier-2s (e.g. Physics Institutes) (~30) 4  Full simulation of ~20% of the data rate o Analysis will be done at Analysis Facilities; Tier-2s; Tier-3s; …

8 IMPF-2006G. Poulard - CERN PH-ATC8 CPU Requirements CERN Tier-1 Tier-2 58% pledged

9 IMPF-2006G. Poulard - CERN PH-ATC9 Disk Requirements CERN Tier-1 Tier-2 54% pledged

10 IMPF-2006G. Poulard - CERN PH-ATC10 Tape Requirements CERN Tier-1 75% pledged

11 IMPF-2006G. Poulard - CERN PH-ATC11 LHC Computing Challenges  Large distributed community  Large data volume … and access to it to everyone  Large CPU capacity  How to face the problems?  CERN Computing Review (2000-2001)  “Grid” is the chosen solution  “Build” the LCG (LHC Computing Grid) project  Roadmap for the LCG project  And for experiments o In 2005 LCG became WLCG

12 IMPF-2006G. Poulard - CERN PH-ATC12 What is the Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations.  The Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe. o Global Resource Sharing o Secure Access o Resource Use Optimization o The “Death of Distance ” - networking o Open Standards

13 IMPF-2006G. Poulard - CERN PH-ATC13 The Worldwide LHC Computing Grid Project - WLCG  Collaboration o LHC Experiments o Grid projects: Europe, US o Regional & national centres  Choices o Adopt Grid technology. o Go for a “Tier” hierarchy  Goal o Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors. grid for a physics study group Tier3 physics department    Desktop Germany Tier 1 USA UK France Italy Taipei SARA Spain CERN Tier 0 Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group

14 IMPF-2006G. Poulard - CERN PH-ATC14  Members o The experiments o The computing centres – Tier-0, Tier-1, Tier-2  Memorandum of understanding o Resources, services, defined service levels o Resource commitments pledged for the next year, with a 5-year forward look The Worldwide LCG Collaboration

15 IMPF-2006G. Poulard - CERN PH-ATC15 WLCG services – built on two major science grid infrastructures EGEE - Enabling Grids for E-SciencE OSG - US Open Science Grid

16 IMPF-2006G. Poulard - CERN PH-ATC16 Enabling Grids for E-SciencE EU supported project Develop and operate a multi- science grid Assist scientific communities to embrace grid technology First phase concentrated on operations and technology Second phase (2006-08) Emphasis on extending the scientific, geographical and industrial scope  world-wide Grid infrastructure  international collaboration  in phase 2 will have > 90 partners in 32 countries

17 IMPF-2006G. Poulard - CERN PH-ATC17 Open Science Grid  Multi-disciplinary Consortium o Running physics experiments: CDF, D0, LIGO, SDSS, STAR o US LHC Collaborations o Biology, Computational Chemistry o Computer Science research o Condor and Globus o DOE Laboratory Computing Divisions o University IT Facilities  OSG today o 50 Compute Elements o 6 Storage Elements o VDT 1.3.9 o 23 VOs

18 IMPF-2006G. Poulard - CERN PH-ATC18 Architecture – Grid services  Storage Element o Mass Storage System (MSS) (CASTOR, Enstore, HPSS, dCache, etc.) o Storage Resource Manager (SRM) provides a common way to access MSS, independent of implementation o File Transfer Services (FTS) provided e.g. by GridFTP or srmCopy  Computing Element o Interface to local batch system e.g. Globus gatekeeper. o Accounting, status query, job monitoring  Virtual Organization Management o Virtual Organization Management Services (VOMS) o Authentication and authorization based on VOMS model.  Grid Catalogue Services o Mapping of Globally Unique Identifiers (GUID) to local file name o Hierarchical namespace, access control  Interoperability o EGEE and OSG both use the Virtual Data Toolkit (VDT) o Different implementations are hidden by common interfaces

19 IMPF-2006G. Poulard - CERN PH-ATC19 Technology - Middleware  Currently, the LCG-2 middleware is deployed in more than 100 sites  It originated from Condor, EDG, Globus, VDT, and other projects.  Will evolve now to include functionalities of the gLite middleware provided by the EGEE project which has just been made available.  Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and gLite.  VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and gLite) maintaining close relations with VDT, Condor and Globus.

20 IMPF-2006G. Poulard - CERN PH-ATC20 Technology – Fabric Technology  Moore’s law still holds for processors and disk storage o For CPU and disks we count a lot on the evolution of the consumer market o For processors we expect an increasing importance of 64-bit architectures and multicore chips  Mass storage (tapes and robots) is still a computer centre item with computer centre pricing o It is too early to conclude on new tape drives and robots  Networking has seen a rapid evolution recently o Ten-gigabit Ethernet is now in the production environment o Wide-area networking can already now count on 10 Gb connections between Tier-0 and Tier-1s. This will move gradually to the Tier-1 – Tier-2 connections.

21 IMPF-2006G. Poulard - CERN PH-ATC21 Common Physics Applications  Core software libraries o SEAL-ROOT merger o Scripting: CINT, Python o Mathematical libraries o Fitting, MINUIT (in C++)  Data management o POOL: ROOT I/O for bulk data RDBMS for metadata o Conditions database – COOL  Event simulation o Event generators: generator library (GENSER) o Detector simulation: GEANT4 (ATLAS, CMS, LHCb) o Physics validation, compare GEANT4, FLUKA, test beam  Software development infrastructure o External libraries o Software development and documentation tools o Quality assurance and testing o Project portal: Savannah

22 IMPF-2006G. Poulard - CERN PH-ATC22 The Hierarchical Model  Tier-0 at CERN o Record RAW data (1.25 GB/s ALICE; 320 MB/s ATLAS) o Distribute second copy to Tier-1s o Calibrate and do first-pass reconstruction  Tier-1 centres (11 defined) o Manage permanent storage – RAW, simulated, processed o Capacity for reprocessing, bulk analysis  Tier-2 centres (>~ 100 identified) o Monte Carlo event simulation o End-user analysis  Tier-3 o Facilities at universities and laboratories o Access to data and processing in Tier-2s, Tier-1s o Outside the scope of the project

23 IMPF-2006G. Poulard - CERN PH-ATC23 Tier-1s Tier-1 Centre Experiments served with priority ALICEATLASCMSLHCb TRIUMF, CanadaX GridKA, GermanyXXXX CC, IN2P3, FranceXXXX CNAF, ItalyXXXX SARA/NIKHEF, NLXXX Nordic Data Grid Facility (NDGF) XXX ASCC, TaipeiXX RAL, UKXXXX BNL, USX FNAL, USX PIC, SpainXXX

24 IMPF-2006G. Poulard - CERN PH-ATC24 Tier-2s ~100 identified – number still growing

25 IMPF-2006G. Poulard - CERN PH-ATC25 Tier-0 -1 -2 Connectivity National Research Networks (NRENs) at Tier-1s: ASnet LHCnet/ESnet GARR LHCnet/ESnet RENATER DFN SURFnet6 NORDUnet RedIRIS UKERNA CANARIE

26 IMPF-2006G. Poulard - CERN PH-ATC26 Prototypes  It is important that the hardware and software systems developed in the framework of LCG be exercised in more and more demanding challenges  Data Challenges have been recommended by the ‘Hoffmann Review’ of 2001. They though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed. o A series of Data Challenges have been run by the 4 experiments  Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over extended period leading to stable production services.  The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware gLite for each of the LHC experiments.

27 IMPF-2006G. Poulard - CERN PH-ATC27 Service Challenges  Purpose real grid service o Understand what it takes to operate a real grid service – run for days/weeks at a time (not just limited to experiment Data Challenges) o Trigger and verify Tier1 & large Tier-2 planning and deployment – - tested with realistic usage patterns o Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance  Four progressive steps from October 2004 thru September 2006 o End 2004 - SC1 – data transfer to subset of Tier-1s o Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier- 2s o 2 nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services o Jun-Sep 2006 – SC4 – pilot service

28 IMPF-2006G. Poulard - CERN PH-ATC28 Key dates for Service Preparation SC3 LHC Service Operation Full physics run 200520072006 2008 First physics First beams cosmics Sep05 - SC3 Service Phase Jun06 –SC4 Service Phase Sep06 – Initial LHC Service in stable operation SC4 SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s) SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput) LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput Apr07 – LHC Service commissioned

29 IMPF-2006G. Poulard - CERN PH-ATC29 ARDA: A Realisation of Distributed Analysis for LHC  Distributed analysis on the Grid is the most difficult and least defined topic  ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware.  ALICE uses the AliROOT framework based on PROOF.  ATLAS has used DIAL services with the gLite prototype as backend; this is rapidly evolving.  CMS has prototyped the ‘ARDA Support for CMS Analysis Processing’ (ASAP) that us used by several CMS physicists for daily analysis work.  LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb.

30 IMPF-2006G. Poulard - CERN PH-ATC30 Production Grids What has been achieved  Basic middleware  A set of baseline services agreed and initial versions in production  All major LCG sites active  1 GB/sec distribution data rate mass storage to mass storage, > 50% of the nominal LHC data rate  Grid job failure rate 5-10% for most experiments, down from ~30% in 2004  Sustained 10K jobs per day  > 10K simultaneous jobs during prolonged periods

31 IMPF-2006G. Poulard - CERN PH-ATC31 Summary on WLCG  Two grid infrastructures are now in operation, on which we are able to complete the computing services for LHC  Reliability and performance have improved significantly over the past year  The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams.  Development of new functionality and services must continue, but we must be careful that this does not interfere with the main priority for this year – reliable operation of the baseline services From Les Robertson (CHEP’06)

32 IMPF-2006G. Poulard - CERN PH-ATC32 ATLAS ATLAS  Detector for the study of high-energy proton-proton collision.  The offline computing will have to deal with an output event rate of 200 Hz. i.e 2x10 9 events per year with an average event size of 1.6 Mbyte.  Researchers are spread all over the world. ATLAS: ~ 2000 Collaborators ~150 Institutes 34 Countries Diameter25 m Barrel toroid length26 m Endcap end-wall chamber span46 m Overall weight 7000 Tons A Toroidal LHC ApparatuS

33 IMPF-2006G. Poulard - CERN PH-ATC33 Tier2 Centre ~200kSI2k Event Builder Event Filter ~159kSI2k T0 ~5MSI2k UK Regional Centre (RAL) US Regional Centre Spanish Regional Centre (PIC) Italian Regional Centre SheffieldManchesterLiverpool Lancaster ~0.25TIPS Workstations 10 GB/sec 450 Mb/sec 100 - 1000 MB/s Some data for calibration and monitoring to institutess Calibrations flow back Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Physics data cache ~Pb/sec ~ 300MB/s/T1 /expt Tier2 Centre ~200kSI2k  622Mb/s Tier 0 Tier 1 Desk top PC (2004) = ~1 kSpecInt2k Northern Tier ~200kSI2k Tier 2  ~200 Tb/year/T2  ~7.7MSI2k/T1  ~2 Pb/year/T1  ~9 Pb/year/T1  No simulation  622Mb/s The Computing Model

34 IMPF-2006G. Poulard - CERN PH-ATC34 ATLAS Data Challenges (1)  LHC Computing Review (2001) “Experiments should carry out Data Challenges of increasing size and complexity to validate their Computing Model their Complete Software suite their Data Model to ensure the correctness of the technical choices to be made”

35 IMPF-2006G. Poulard - CERN PH-ATC35 ATLAS Data Challenges (2)  DC1 (2002-2003) o First ATLAS exercise on world-wide scale  O(1000) CPUs peak o Put in place the full software chain  Simulation of the data; digitization; pile-up; reconstruction o Production system  Tools Bookkeeping of data and Jobs (~AMI); Monitoring; Code distribution o “Preliminary” Grid usage  NorduGrid: all production performed on the Grid  US: Grid used at the end of the exercise  LCG-EDG: some testing during the Data Challenge but not “real” production o At least one person per contributing site  Many people involved o Lessons learned  Management of failures is a key concern  Automate to cope with large amount of jobs o “Build” the ATLAS DC community  Physics Monte Carlo data needed for ATLAS High Level Trigger Technical Design Report

36 IMPF-2006G. Poulard - CERN PH-ATC36 ATLAS Data Challenges (3)  DC2 (2004) o Similar exercise as DC1 (scale; physics processes) BUT o Introduced the new ATLAS Production System (ProdSys)  Unsupervised production across many sites spread over three different Grids (US Grid3; ARC/NorduGrid; LCG-2)  Based on DC1 experience with AtCom and GRAT Core engine with plug-ins  4 major components Production supervisor Executor Common data management system Common production database  Use middleware components as much as possible Avoid inventing ATLAS’s own version of Grid –Use middleware broker, catalogs, information system, …  Immediately followed by “Rome” production (2005) o Production of simulated data for an ATLAS Physics workshop in Rome in June 2005 using the DC2 infrastructure.

37 IMPF-2006G. Poulard - CERN PH-ATC37 ATLAS Production System  ATLAS uses 3 Grids o LCG (= EGEE) o ARC/NorduGrid (evolved from EDG) o OSG/Grid3 (US)  Plus possibility for local batch submission (4 interfaces)  Input and output must be accessible from all Grids  The system makes use of the native Grid middleware as much as possible (e.g.. Grid catalogs); not “re-inventing” its own solution.

38 IMPF-2006G. Poulard - CERN PH-ATC38 ATLAS Production System  The production database, which contains abstract job definitions  A supervisor (Windmill; Eowyn) that reads the production database for job definitions and present them to the different Grid executors in an easy-to- parse XML format  The Executors, one for each Grid flavor, that receives the job-definitions in XML format and converts them to the job description language of that particular Grid  DonQuijote (DQ), the ATLAS Data Management System, moves files from their temporary output locations to their final destination on some Storage Elements and registers the files in the Replica Location Service of that Grid  In order to handle the task of ATLAS DCs an automated Production system was developed.  It consists of 4 components:

39 IMPF-2006G. Poulard - CERN PH-ATC39 The 3 Grid flavors: LCG-2 Number of sites; resources are evolving quickly ATLAS DC2 Autumn 2004

40 IMPF-2006G. Poulard - CERN PH-ATC40 The 3 Grid flavors: Grid3  The deployed infrastructure has been in operation since November 2003  At this moment running 3 HEP and 2 Biological applications  Over 100 users authorized to run in GRID3 Sep 04 30 sites, multi-VO shared resources ~3000 CPUs (shared) ATLAS DC2 Autumn 2004

41 IMPF-2006G. Poulard - CERN PH-ATC41 The 3 Grid flavors: NorduGrid > 10 countries, 40+ sites, ~4000 CPUs, ~30 TB storage  NorduGrid is a research collaboration established mainly across Nordic Countries but includes sites from other countries.  They contributed to a significant part of the DC1 (using the Grid in 2002).  It supports production on several operating systems. ATLAS DC2 Autumn 2004

42 IMPF-2006G. Poulard - CERN PH-ATC42 Hits MCTruth Digits (RDO) MCTruth Bytestream Raw Digits ESD Geant4 Reconstruction Pile-up Bytestream Raw Digits Bytestream Raw Digits Hits MCTruth Digits (RDO) MCTruth Physics events Events HepMC Events HepMC Hits MCTruth Digits (RDO) MCTruth Geant4 Digitization Digits (RDO) MCTruth Bytestream Raw Digits Bytestream Raw Digits Bytestream Raw Digits Events HepMC Hits MCTruth Geant4 Pile-up Digitization Mixing Reconstruction ESD Pythia Event generation Detector Simulation Digitization (Pile-up) Reconstruction Event Mixing Byte stream Events HepMC Min. bias Events Piled-up events Mixed events With Pile-up ~5 TB 20 TB30 TB 20 TB5 TB TB Volume of data for 10 7 events Persistency: Athena-POOL Production phases AOD

43 IMPF-2006G. Poulard - CERN PH-ATC43 ATLAS productions  DC2 o Few datasets o Different type of jobs  Physics Events Generation Very short  Geant simulation Geant3 in DC1; Geant4 in DC2 & “Rome” Long: more than 10 hours  Digitization Medium: ~5 hours  Reconstruction short o All types of jobs run sequentially  Each phase one after the other  “Rome” o Many different (>170) datasets  Different physics channels o Same type of jobs  Event Generation; Simulation, etc. o All type of jobs run in parallel  Now “continuous” production o Goal is to reach 2M events per week. The different type of running has a large impact on the production rate

44 IMPF-2006G. Poulard - CERN PH-ATC44 ATLAS Productions: countries (sites)  Australia (1) (0)  Austria (1)  Canada (4) (3)  CERN (1)  Czech Republic (2)  Denmark (4) (3)  France (1) (4)  Germany (1+2)  Greece (0) (1)  Hungary (0) (1)  Italy (7) (17)  Japan (1) (0)  Netherlands (1) (2)  Norway (3) (2)  Poland (1)  Portugal (0) (1)  Russia (0) (2)  Slovakia (0) (1)  Slovenia (1)  Spain (3)  Sweden (7) (5)  Switzerland (1) (1+1)  Taiwan (1)  UK (7) (8)  USA (19) DC2: 20 countries; 69 sites “Rome”: 22 countries; 84 sites DC2: 13 countries; 31 sites “Rome”: 17 countries; 51 sites DC2: 7 countries; 19 sites “Rome”: 7 countries; 14 sites Spring 2006: 30 countries; 126 sites LCG: 104 OSG/Grid3: 8 NDGF: 14

45 IMPF-2006G. Poulard - CERN PH-ATC45 ATLAS DC2: Jobs Total 20 countries 69 sites ~ 260000 Jobs ~ 2 MSi2k.months As of 30 November 2004

46 IMPF-2006G. Poulard - CERN PH-ATC46 Rome production Number of Jobs As of 17 June 2005 6 % 5 % 6 % 4 % 5 % 4 % 6 %

47 IMPF-2006G. Poulard - CERN PH-ATC47 Rome production statistics  173 datasets  6.1 M events simulated and reconstructed (without pile- up)  Total simulated data 8.5 M events  Pile-up done for 1.3 M events o 50 K reconstructed

48 IMPF-2006G. Poulard - CERN PH-ATC48 ATLAS Production (2006)

49 IMPF-2006G. Poulard - CERN PH-ATC49 ATLAS Production (July 2004 - May 2005)

50 IMPF-2006G. Poulard - CERN PH-ATC50 ATLAS & Service Challenges 3  Tier-0 scaling tests o Test of the operations at CERN Tier-0 o Original goal: 10% exercise  Preparation phase July-October 2005  Tests October’05-January’06

51 IMPF-2006G. Poulard - CERN PH-ATC51 ATLAS & Service Challenges 3  The Tier-0 facility at CERN is responsible for the following operations: o Calibration and alignment; o First-pass ESD production; o First-pass AOD production; o TAG production; o Archiving of primary RAW and first-pass ESD, AOD and TAG data; o Distribution of primary RAW and first-pass ESD, AOD and TAG data.

52 IMPF-2006G. Poulard - CERN PH-ATC52 ATLAS SC3/Tier-0 (1)  Components of Tier-0 o Castor mass storage system and local replica catalogue; o CPU farm; o Conditions DB; o TAG DB; o Tier-0 production database; o Data management system, Don Quijote 2 (DQ2) o To be orchestred by the Tier-0 Management System:  TOM, based on ATLAS Production System (ProdSys)

53 IMPF-2006G. Poulard - CERN PH-ATC53 ATLAS SC3/Tier-0 (2)  Deploy and test o LCG/gLite components (main focus on T0 exercise) T0  FTS server at T0 and T1 T0T1  LFC catalog at T0, T1 and T2 T0T1  VOBOX at T0, T1 and T2 T0T1  SRM Storage element at T0, T1 and T2 o ATLAS DQ2 specific components  Central DQ2 dataset catalogs  DQ2 site services Sitting in VOBOXes  DQ2 client for TOM

54 IMPF-2006G. Poulard - CERN PH-ATC54 ATLAS Tier-0 EF CPU T1 castor tape RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s

55 IMPF-2006G. Poulard - CERN PH-ATC55 Scope of the Tier-0 Scaling Test  It was only possible to test o EF writing into Castor o ESD/AOD production on reco farm o archiving to tape o export to Tier-1s of RAW/ESD/AOD  the goal was to test as much as possible, as realistic as possible  mainly data-flow/infrastructure test (no physics value) calibration & alignment processing not included yet CondDB and TagDB streams

56 IMPF-2006G. Poulard - CERN PH-ATC56 Oct-Dec 2005 Test: Some Results Castor Writing Rates (Dec 19-20) - EF farm  Castor (write.raw) - reco farm  Castor - reco jobs: write.esd + write.aodtmp - AOD-merging jobs: write.aod

57 IMPF-2006G. Poulard - CERN PH-ATC57 Tier-0 Internal Test, Jan 28-29, 2006 READING (nom. rate: 780 MB/s) - Disk  WN - Disk  Tape WRITING (nom. rate: 460 MB/s) - SFO  Disk - WN  Disk 440 M 460 M 780 M WRITING (nom. rate: 440 MB/s) - Disk  Tape

58 IMPF-2006G. Poulard - CERN PH-ATC58 ATLAS SC4 Tests (June to December 2006)  Complete Tier-0 test o Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm o Calibration loop and handling of conditions data  Including distribution of conditions data to Tier-1s (and Tier-2s) o Transfer of RAW, ESD, AOD and TAG data to Tier-1s o Transfer of AOD and TAG data to Tier-2s o Data and dataset registration in DB  Distributed production o Full simulation chain run at Tier-2s (and Tier-1s)  Data distribution to Tier-1s, other Tier-2s and CAF o Reprocessing raw data at Tier-1s  Data distribution to other Tier-1s, Tier-2s and CAF  Distributed analysis o “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) o Tests of performance of job submission, distribution and output retrieval Need to define and test Tiers infrastructure and Tier-1 Tier-1 Tier-2s associations

59 IMPF-2006G. Poulard - CERN PH-ATC59 ATLAS Tier-1s “2008” Resources CPUDiskTape MSI2K%PB% % CanadaTRIUMF1.064.40.624.30.44.4 FranceCC-IN2P33.0212.61.7612.21.15 12.8 Germany FZK 2.4101.44100.910 Italy CNAF 1.767.30.85.50.677.5 Nordic Data Grid Facility 1.466.10.624.30.626.9 NetherlandsSARA3.0512.71.7812.31.1612.9 SpainPIC1.250.7250.455 TaiwanASGC1.877.80.835.80.717.9 UKRAL1.576.50.896.21.0311.5 USABNL5.322.13.0921.42.0222.5 Total 2008 pledged 22.6994.512.55879.11101.4 2008 needed 23.9710014.431008.99100 2008 missing 1.285.51.8813-0.12-1.4

60 IMPF-2006G. Poulard - CERN PH-ATC60 ATLAS Tiers Association (SC4-draft) Associated Tier-1 Tier-2 or planned Tier-2 % DiskTB%PB% CanadaTRIUMF5.3SARA East T2 Fed. West T2 Fed. FranceCC-IN2P313.5BNL CC-IN2P3 AF GRIFLPCHEP-Beijing Romanian T2 GermanyFZK-GridKa10.5BNLDESY Munich Fed. Freiburg Uni. Wuppertal Uni. FZU AS (CZ) Polish T2 Fed. ItalyCNAF7.5RAL INFN T2 Fed. NetherlandsSARA13.0 TRIUMF ASGC Nordic Data Grid Facility 5.5PIC SpainPIC5.5NDGF ATLAS T2 Fed TaiwanASGC7.7SARA Taiwan AF Fed UKRAL7.5CNAFGrid LondonNorthGridScotGridSouthGrid USABNL24 CC-IN2P3 FZK-GridKa BU/HU T2Midwest T2 Southwest T2 No association (yet) Melbourne Uni. ICEPP Tokyo LIP T2 HEP-IL Fed. Russian Fed. CSCS (CH)UIBK Brazilian T2 Fed.

61 IMPF-2006G. Poulard - CERN PH-ATC61 Computing System Commissioning  We have defined the high-level goals of the Computing System Commissioning operation during 2006 o More a running-in of continuous operation than a stand-alone challenge  Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007: o Calibration and alignment procedures and conditions DB o Full trigger chain o Event reconstruction and data distribution o Distributed access to the data for analysis  At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates

62 IMPF-2006G. Poulard - CERN PH-ATC62

63 IMPF-2006G. Poulard - CERN PH-ATC63 Conclusions (ATLAS)  Data Challenges (1,2); productions(“Rome”; “current (continuous)”) o Have proven that the 3 Grids LCG-EGEE; OSG/Grid3 and Arc/NorduGrid can be used in a coherent way for real large scale productions  Possible, but not easy  In SC3 o We succeeded to reach the nominal data transfer at Tier-0 (internally) and reasonable transfers to Tier-1  SC4 o Should allow us to test the full chain using the new WLCG middleware and infrastructure and the new ATLAS Production and Data management systems o This will include a more complete Tier-0 test; Distributed productions and distributed analysis tests  Computing System Commissioning o Will have as main goal to have a full working and operational system o Leading to a Physics readiness report

64 IMPF-2006G. Poulard - CERN PH-ATC64 Thank you


Download ppt "Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)"

Similar presentations


Ads by Google