Download presentation
Presentation is loading. Please wait.
1
Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006 Gilbert Poulard (CERN PH-ATC)
2
IMPF-2006G. Poulard - CERN PH-ATC2 Overview Introduction LHC experiments Computing challenges WLCG: Worldwide LHC Computing Grid ATLAS experiment o Building the Computing System Conclusions
3
IMPF-2006G. Poulard - CERN PH-ATC3 LHC (CERN) Introduction: LHC/CERN Mont Blanc, 4810 m Geneva
4
IMPF-2006G. Poulard - CERN PH-ATC4 LHC Computing Challenges Large distributed community Large data volume … and access to it to everyone Large CPU capacity
5
IMPF-2006G. Poulard - CERN PH-ATC5 Challenge 1: Large, distributed community CMS ATLAS LHCb ~ 5000 Physicists around the world - around the clock “Offline” software effort: 1000 person-years per experiment Software life span: 20 years
6
IMPF-2006G. Poulard - CERN PH-ATC6 Large data volume Rate [Hz] RAW [MB] ESD rDST RECO [MB] AOD [kB] Monte Carlo [MB/evt] Monte Carlo % of real ALICE HI 100 12.5 2.5250 300100 ALICE pp 100 1 0.044 0.4100 ATLAS 200 1.6 0.5100 220 CMS 150 1.5 0.2550 2100 LHCb 2000 0.025 0.520 50 days running in 2007 10 7 seconds/year pp from 2008 on ~2 x 10 9 events/experiment 10 6 seconds/year heavy ion
7
IMPF-2006G. Poulard - CERN PH-ATC7 Large CPU capacity CPU (MSi2k)Disk (PB)Tape (PB) Tier-04.10.45.7 CERN Analysis Facility2.71.90.5 Sum of Tier-1s24.014.49.0 Sum of Tier-2s19.98.70.0 Total50.725.415.2 ~50000 today’s CPU ATLAS resources in 2008 o Assume 2 x 10 9 events per year (1.6 MB per event) o First pass reconstruction will run at CERN Tier-0 o Re-processing will be done at Tier-1s (Regional Computing Centers) (10) o Monte Carlo simulation will be done at Tier-2s (e.g. Physics Institutes) (~30) 4 Full simulation of ~20% of the data rate o Analysis will be done at Analysis Facilities; Tier-2s; Tier-3s; …
8
IMPF-2006G. Poulard - CERN PH-ATC8 CPU Requirements CERN Tier-1 Tier-2 58% pledged
9
IMPF-2006G. Poulard - CERN PH-ATC9 Disk Requirements CERN Tier-1 Tier-2 54% pledged
10
IMPF-2006G. Poulard - CERN PH-ATC10 Tape Requirements CERN Tier-1 75% pledged
11
IMPF-2006G. Poulard - CERN PH-ATC11 LHC Computing Challenges Large distributed community Large data volume … and access to it to everyone Large CPU capacity How to face the problems? CERN Computing Review (2000-2001) “Grid” is the chosen solution “Build” the LCG (LHC Computing Grid) project Roadmap for the LCG project And for experiments o In 2005 LCG became WLCG
12
IMPF-2006G. Poulard - CERN PH-ATC12 What is the Grid? The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations. The Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe. o Global Resource Sharing o Secure Access o Resource Use Optimization o The “Death of Distance ” - networking o Open Standards
13
IMPF-2006G. Poulard - CERN PH-ATC13 The Worldwide LHC Computing Grid Project - WLCG Collaboration o LHC Experiments o Grid projects: Europe, US o Regional & national centres Choices o Adopt Grid technology. o Go for a “Tier” hierarchy Goal o Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors. grid for a physics study group Tier3 physics department Desktop Germany Tier 1 USA UK France Italy Taipei SARA Spain CERN Tier 0 Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a regional group
14
IMPF-2006G. Poulard - CERN PH-ATC14 Members o The experiments o The computing centres – Tier-0, Tier-1, Tier-2 Memorandum of understanding o Resources, services, defined service levels o Resource commitments pledged for the next year, with a 5-year forward look The Worldwide LCG Collaboration
15
IMPF-2006G. Poulard - CERN PH-ATC15 WLCG services – built on two major science grid infrastructures EGEE - Enabling Grids for E-SciencE OSG - US Open Science Grid
16
IMPF-2006G. Poulard - CERN PH-ATC16 Enabling Grids for E-SciencE EU supported project Develop and operate a multi- science grid Assist scientific communities to embrace grid technology First phase concentrated on operations and technology Second phase (2006-08) Emphasis on extending the scientific, geographical and industrial scope world-wide Grid infrastructure international collaboration in phase 2 will have > 90 partners in 32 countries
17
IMPF-2006G. Poulard - CERN PH-ATC17 Open Science Grid Multi-disciplinary Consortium o Running physics experiments: CDF, D0, LIGO, SDSS, STAR o US LHC Collaborations o Biology, Computational Chemistry o Computer Science research o Condor and Globus o DOE Laboratory Computing Divisions o University IT Facilities OSG today o 50 Compute Elements o 6 Storage Elements o VDT 1.3.9 o 23 VOs
18
IMPF-2006G. Poulard - CERN PH-ATC18 Architecture – Grid services Storage Element o Mass Storage System (MSS) (CASTOR, Enstore, HPSS, dCache, etc.) o Storage Resource Manager (SRM) provides a common way to access MSS, independent of implementation o File Transfer Services (FTS) provided e.g. by GridFTP or srmCopy Computing Element o Interface to local batch system e.g. Globus gatekeeper. o Accounting, status query, job monitoring Virtual Organization Management o Virtual Organization Management Services (VOMS) o Authentication and authorization based on VOMS model. Grid Catalogue Services o Mapping of Globally Unique Identifiers (GUID) to local file name o Hierarchical namespace, access control Interoperability o EGEE and OSG both use the Virtual Data Toolkit (VDT) o Different implementations are hidden by common interfaces
19
IMPF-2006G. Poulard - CERN PH-ATC19 Technology - Middleware Currently, the LCG-2 middleware is deployed in more than 100 sites It originated from Condor, EDG, Globus, VDT, and other projects. Will evolve now to include functionalities of the gLite middleware provided by the EGEE project which has just been made available. Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and gLite. VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and gLite) maintaining close relations with VDT, Condor and Globus.
20
IMPF-2006G. Poulard - CERN PH-ATC20 Technology – Fabric Technology Moore’s law still holds for processors and disk storage o For CPU and disks we count a lot on the evolution of the consumer market o For processors we expect an increasing importance of 64-bit architectures and multicore chips Mass storage (tapes and robots) is still a computer centre item with computer centre pricing o It is too early to conclude on new tape drives and robots Networking has seen a rapid evolution recently o Ten-gigabit Ethernet is now in the production environment o Wide-area networking can already now count on 10 Gb connections between Tier-0 and Tier-1s. This will move gradually to the Tier-1 – Tier-2 connections.
21
IMPF-2006G. Poulard - CERN PH-ATC21 Common Physics Applications Core software libraries o SEAL-ROOT merger o Scripting: CINT, Python o Mathematical libraries o Fitting, MINUIT (in C++) Data management o POOL: ROOT I/O for bulk data RDBMS for metadata o Conditions database – COOL Event simulation o Event generators: generator library (GENSER) o Detector simulation: GEANT4 (ATLAS, CMS, LHCb) o Physics validation, compare GEANT4, FLUKA, test beam Software development infrastructure o External libraries o Software development and documentation tools o Quality assurance and testing o Project portal: Savannah
22
IMPF-2006G. Poulard - CERN PH-ATC22 The Hierarchical Model Tier-0 at CERN o Record RAW data (1.25 GB/s ALICE; 320 MB/s ATLAS) o Distribute second copy to Tier-1s o Calibrate and do first-pass reconstruction Tier-1 centres (11 defined) o Manage permanent storage – RAW, simulated, processed o Capacity for reprocessing, bulk analysis Tier-2 centres (>~ 100 identified) o Monte Carlo event simulation o End-user analysis Tier-3 o Facilities at universities and laboratories o Access to data and processing in Tier-2s, Tier-1s o Outside the scope of the project
23
IMPF-2006G. Poulard - CERN PH-ATC23 Tier-1s Tier-1 Centre Experiments served with priority ALICEATLASCMSLHCb TRIUMF, CanadaX GridKA, GermanyXXXX CC, IN2P3, FranceXXXX CNAF, ItalyXXXX SARA/NIKHEF, NLXXX Nordic Data Grid Facility (NDGF) XXX ASCC, TaipeiXX RAL, UKXXXX BNL, USX FNAL, USX PIC, SpainXXX
24
IMPF-2006G. Poulard - CERN PH-ATC24 Tier-2s ~100 identified – number still growing
25
IMPF-2006G. Poulard - CERN PH-ATC25 Tier-0 -1 -2 Connectivity National Research Networks (NRENs) at Tier-1s: ASnet LHCnet/ESnet GARR LHCnet/ESnet RENATER DFN SURFnet6 NORDUnet RedIRIS UKERNA CANARIE
26
IMPF-2006G. Poulard - CERN PH-ATC26 Prototypes It is important that the hardware and software systems developed in the framework of LCG be exercised in more and more demanding challenges Data Challenges have been recommended by the ‘Hoffmann Review’ of 2001. They though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed. o A series of Data Challenges have been run by the 4 experiments Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over extended period leading to stable production services. The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware gLite for each of the LHC experiments.
27
IMPF-2006G. Poulard - CERN PH-ATC27 Service Challenges Purpose real grid service o Understand what it takes to operate a real grid service – run for days/weeks at a time (not just limited to experiment Data Challenges) o Trigger and verify Tier1 & large Tier-2 planning and deployment – - tested with realistic usage patterns o Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance Four progressive steps from October 2004 thru September 2006 o End 2004 - SC1 – data transfer to subset of Tier-1s o Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier- 2s o 2 nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services o Jun-Sep 2006 – SC4 – pilot service
28
IMPF-2006G. Poulard - CERN PH-ATC28 Key dates for Service Preparation SC3 LHC Service Operation Full physics run 200520072006 2008 First physics First beams cosmics Sep05 - SC3 Service Phase Jun06 –SC4 Service Phase Sep06 – Initial LHC Service in stable operation SC4 SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s) SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput) LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput Apr07 – LHC Service commissioned
29
IMPF-2006G. Poulard - CERN PH-ATC29 ARDA: A Realisation of Distributed Analysis for LHC Distributed analysis on the Grid is the most difficult and least defined topic ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware. ALICE uses the AliROOT framework based on PROOF. ATLAS has used DIAL services with the gLite prototype as backend; this is rapidly evolving. CMS has prototyped the ‘ARDA Support for CMS Analysis Processing’ (ASAP) that us used by several CMS physicists for daily analysis work. LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb.
30
IMPF-2006G. Poulard - CERN PH-ATC30 Production Grids What has been achieved Basic middleware A set of baseline services agreed and initial versions in production All major LCG sites active 1 GB/sec distribution data rate mass storage to mass storage, > 50% of the nominal LHC data rate Grid job failure rate 5-10% for most experiments, down from ~30% in 2004 Sustained 10K jobs per day > 10K simultaneous jobs during prolonged periods
31
IMPF-2006G. Poulard - CERN PH-ATC31 Summary on WLCG Two grid infrastructures are now in operation, on which we are able to complete the computing services for LHC Reliability and performance have improved significantly over the past year The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams. Development of new functionality and services must continue, but we must be careful that this does not interfere with the main priority for this year – reliable operation of the baseline services From Les Robertson (CHEP’06)
32
IMPF-2006G. Poulard - CERN PH-ATC32 ATLAS ATLAS Detector for the study of high-energy proton-proton collision. The offline computing will have to deal with an output event rate of 200 Hz. i.e 2x10 9 events per year with an average event size of 1.6 Mbyte. Researchers are spread all over the world. ATLAS: ~ 2000 Collaborators ~150 Institutes 34 Countries Diameter25 m Barrel toroid length26 m Endcap end-wall chamber span46 m Overall weight 7000 Tons A Toroidal LHC ApparatuS
33
IMPF-2006G. Poulard - CERN PH-ATC33 Tier2 Centre ~200kSI2k Event Builder Event Filter ~159kSI2k T0 ~5MSI2k UK Regional Centre (RAL) US Regional Centre Spanish Regional Centre (PIC) Italian Regional Centre SheffieldManchesterLiverpool Lancaster ~0.25TIPS Workstations 10 GB/sec 450 Mb/sec 100 - 1000 MB/s Some data for calibration and monitoring to institutess Calibrations flow back Each Tier 2 has ~25 physicists working on one or more channels Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data Tier 2 do bulk of simulation Physics data cache ~Pb/sec ~ 300MB/s/T1 /expt Tier2 Centre ~200kSI2k 622Mb/s Tier 0 Tier 1 Desk top PC (2004) = ~1 kSpecInt2k Northern Tier ~200kSI2k Tier 2 ~200 Tb/year/T2 ~7.7MSI2k/T1 ~2 Pb/year/T1 ~9 Pb/year/T1 No simulation 622Mb/s The Computing Model
34
IMPF-2006G. Poulard - CERN PH-ATC34 ATLAS Data Challenges (1) LHC Computing Review (2001) “Experiments should carry out Data Challenges of increasing size and complexity to validate their Computing Model their Complete Software suite their Data Model to ensure the correctness of the technical choices to be made”
35
IMPF-2006G. Poulard - CERN PH-ATC35 ATLAS Data Challenges (2) DC1 (2002-2003) o First ATLAS exercise on world-wide scale O(1000) CPUs peak o Put in place the full software chain Simulation of the data; digitization; pile-up; reconstruction o Production system Tools Bookkeeping of data and Jobs (~AMI); Monitoring; Code distribution o “Preliminary” Grid usage NorduGrid: all production performed on the Grid US: Grid used at the end of the exercise LCG-EDG: some testing during the Data Challenge but not “real” production o At least one person per contributing site Many people involved o Lessons learned Management of failures is a key concern Automate to cope with large amount of jobs o “Build” the ATLAS DC community Physics Monte Carlo data needed for ATLAS High Level Trigger Technical Design Report
36
IMPF-2006G. Poulard - CERN PH-ATC36 ATLAS Data Challenges (3) DC2 (2004) o Similar exercise as DC1 (scale; physics processes) BUT o Introduced the new ATLAS Production System (ProdSys) Unsupervised production across many sites spread over three different Grids (US Grid3; ARC/NorduGrid; LCG-2) Based on DC1 experience with AtCom and GRAT Core engine with plug-ins 4 major components Production supervisor Executor Common data management system Common production database Use middleware components as much as possible Avoid inventing ATLAS’s own version of Grid –Use middleware broker, catalogs, information system, … Immediately followed by “Rome” production (2005) o Production of simulated data for an ATLAS Physics workshop in Rome in June 2005 using the DC2 infrastructure.
37
IMPF-2006G. Poulard - CERN PH-ATC37 ATLAS Production System ATLAS uses 3 Grids o LCG (= EGEE) o ARC/NorduGrid (evolved from EDG) o OSG/Grid3 (US) Plus possibility for local batch submission (4 interfaces) Input and output must be accessible from all Grids The system makes use of the native Grid middleware as much as possible (e.g.. Grid catalogs); not “re-inventing” its own solution.
38
IMPF-2006G. Poulard - CERN PH-ATC38 ATLAS Production System The production database, which contains abstract job definitions A supervisor (Windmill; Eowyn) that reads the production database for job definitions and present them to the different Grid executors in an easy-to- parse XML format The Executors, one for each Grid flavor, that receives the job-definitions in XML format and converts them to the job description language of that particular Grid DonQuijote (DQ), the ATLAS Data Management System, moves files from their temporary output locations to their final destination on some Storage Elements and registers the files in the Replica Location Service of that Grid In order to handle the task of ATLAS DCs an automated Production system was developed. It consists of 4 components:
39
IMPF-2006G. Poulard - CERN PH-ATC39 The 3 Grid flavors: LCG-2 Number of sites; resources are evolving quickly ATLAS DC2 Autumn 2004
40
IMPF-2006G. Poulard - CERN PH-ATC40 The 3 Grid flavors: Grid3 The deployed infrastructure has been in operation since November 2003 At this moment running 3 HEP and 2 Biological applications Over 100 users authorized to run in GRID3 Sep 04 30 sites, multi-VO shared resources ~3000 CPUs (shared) ATLAS DC2 Autumn 2004
41
IMPF-2006G. Poulard - CERN PH-ATC41 The 3 Grid flavors: NorduGrid > 10 countries, 40+ sites, ~4000 CPUs, ~30 TB storage NorduGrid is a research collaboration established mainly across Nordic Countries but includes sites from other countries. They contributed to a significant part of the DC1 (using the Grid in 2002). It supports production on several operating systems. ATLAS DC2 Autumn 2004
42
IMPF-2006G. Poulard - CERN PH-ATC42 Hits MCTruth Digits (RDO) MCTruth Bytestream Raw Digits ESD Geant4 Reconstruction Pile-up Bytestream Raw Digits Bytestream Raw Digits Hits MCTruth Digits (RDO) MCTruth Physics events Events HepMC Events HepMC Hits MCTruth Digits (RDO) MCTruth Geant4 Digitization Digits (RDO) MCTruth Bytestream Raw Digits Bytestream Raw Digits Bytestream Raw Digits Events HepMC Hits MCTruth Geant4 Pile-up Digitization Mixing Reconstruction ESD Pythia Event generation Detector Simulation Digitization (Pile-up) Reconstruction Event Mixing Byte stream Events HepMC Min. bias Events Piled-up events Mixed events With Pile-up ~5 TB 20 TB30 TB 20 TB5 TB TB Volume of data for 10 7 events Persistency: Athena-POOL Production phases AOD
43
IMPF-2006G. Poulard - CERN PH-ATC43 ATLAS productions DC2 o Few datasets o Different type of jobs Physics Events Generation Very short Geant simulation Geant3 in DC1; Geant4 in DC2 & “Rome” Long: more than 10 hours Digitization Medium: ~5 hours Reconstruction short o All types of jobs run sequentially Each phase one after the other “Rome” o Many different (>170) datasets Different physics channels o Same type of jobs Event Generation; Simulation, etc. o All type of jobs run in parallel Now “continuous” production o Goal is to reach 2M events per week. The different type of running has a large impact on the production rate
44
IMPF-2006G. Poulard - CERN PH-ATC44 ATLAS Productions: countries (sites) Australia (1) (0) Austria (1) Canada (4) (3) CERN (1) Czech Republic (2) Denmark (4) (3) France (1) (4) Germany (1+2) Greece (0) (1) Hungary (0) (1) Italy (7) (17) Japan (1) (0) Netherlands (1) (2) Norway (3) (2) Poland (1) Portugal (0) (1) Russia (0) (2) Slovakia (0) (1) Slovenia (1) Spain (3) Sweden (7) (5) Switzerland (1) (1+1) Taiwan (1) UK (7) (8) USA (19) DC2: 20 countries; 69 sites “Rome”: 22 countries; 84 sites DC2: 13 countries; 31 sites “Rome”: 17 countries; 51 sites DC2: 7 countries; 19 sites “Rome”: 7 countries; 14 sites Spring 2006: 30 countries; 126 sites LCG: 104 OSG/Grid3: 8 NDGF: 14
45
IMPF-2006G. Poulard - CERN PH-ATC45 ATLAS DC2: Jobs Total 20 countries 69 sites ~ 260000 Jobs ~ 2 MSi2k.months As of 30 November 2004
46
IMPF-2006G. Poulard - CERN PH-ATC46 Rome production Number of Jobs As of 17 June 2005 6 % 5 % 6 % 4 % 5 % 4 % 6 %
47
IMPF-2006G. Poulard - CERN PH-ATC47 Rome production statistics 173 datasets 6.1 M events simulated and reconstructed (without pile- up) Total simulated data 8.5 M events Pile-up done for 1.3 M events o 50 K reconstructed
48
IMPF-2006G. Poulard - CERN PH-ATC48 ATLAS Production (2006)
49
IMPF-2006G. Poulard - CERN PH-ATC49 ATLAS Production (July 2004 - May 2005)
50
IMPF-2006G. Poulard - CERN PH-ATC50 ATLAS & Service Challenges 3 Tier-0 scaling tests o Test of the operations at CERN Tier-0 o Original goal: 10% exercise Preparation phase July-October 2005 Tests October’05-January’06
51
IMPF-2006G. Poulard - CERN PH-ATC51 ATLAS & Service Challenges 3 The Tier-0 facility at CERN is responsible for the following operations: o Calibration and alignment; o First-pass ESD production; o First-pass AOD production; o TAG production; o Archiving of primary RAW and first-pass ESD, AOD and TAG data; o Distribution of primary RAW and first-pass ESD, AOD and TAG data.
52
IMPF-2006G. Poulard - CERN PH-ATC52 ATLAS SC3/Tier-0 (1) Components of Tier-0 o Castor mass storage system and local replica catalogue; o CPU farm; o Conditions DB; o TAG DB; o Tier-0 production database; o Data management system, Don Quijote 2 (DQ2) o To be orchestred by the Tier-0 Management System: TOM, based on ATLAS Production System (ProdSys)
53
IMPF-2006G. Poulard - CERN PH-ATC53 ATLAS SC3/Tier-0 (2) Deploy and test o LCG/gLite components (main focus on T0 exercise) T0 FTS server at T0 and T1 T0T1 LFC catalog at T0, T1 and T2 T0T1 VOBOX at T0, T1 and T2 T0T1 SRM Storage element at T0, T1 and T2 o ATLAS DQ2 specific components Central DQ2 dataset catalogs DQ2 site services Sitting in VOBOXes DQ2 client for TOM
54
IMPF-2006G. Poulard - CERN PH-ATC54 ATLAS Tier-0 EF CPU T1 castor tape RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s
55
IMPF-2006G. Poulard - CERN PH-ATC55 Scope of the Tier-0 Scaling Test It was only possible to test o EF writing into Castor o ESD/AOD production on reco farm o archiving to tape o export to Tier-1s of RAW/ESD/AOD the goal was to test as much as possible, as realistic as possible mainly data-flow/infrastructure test (no physics value) calibration & alignment processing not included yet CondDB and TagDB streams
56
IMPF-2006G. Poulard - CERN PH-ATC56 Oct-Dec 2005 Test: Some Results Castor Writing Rates (Dec 19-20) - EF farm Castor (write.raw) - reco farm Castor - reco jobs: write.esd + write.aodtmp - AOD-merging jobs: write.aod
57
IMPF-2006G. Poulard - CERN PH-ATC57 Tier-0 Internal Test, Jan 28-29, 2006 READING (nom. rate: 780 MB/s) - Disk WN - Disk Tape WRITING (nom. rate: 460 MB/s) - SFO Disk - WN Disk 440 M 460 M 780 M WRITING (nom. rate: 440 MB/s) - Disk Tape
58
IMPF-2006G. Poulard - CERN PH-ATC58 ATLAS SC4 Tests (June to December 2006) Complete Tier-0 test o Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape, CPU farm o Calibration loop and handling of conditions data Including distribution of conditions data to Tier-1s (and Tier-2s) o Transfer of RAW, ESD, AOD and TAG data to Tier-1s o Transfer of AOD and TAG data to Tier-2s o Data and dataset registration in DB Distributed production o Full simulation chain run at Tier-2s (and Tier-1s) Data distribution to Tier-1s, other Tier-2s and CAF o Reprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF Distributed analysis o “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) o Tests of performance of job submission, distribution and output retrieval Need to define and test Tiers infrastructure and Tier-1 Tier-1 Tier-2s associations
59
IMPF-2006G. Poulard - CERN PH-ATC59 ATLAS Tier-1s “2008” Resources CPUDiskTape MSI2K%PB% % CanadaTRIUMF1.064.40.624.30.44.4 FranceCC-IN2P33.0212.61.7612.21.15 12.8 Germany FZK 2.4101.44100.910 Italy CNAF 1.767.30.85.50.677.5 Nordic Data Grid Facility 1.466.10.624.30.626.9 NetherlandsSARA3.0512.71.7812.31.1612.9 SpainPIC1.250.7250.455 TaiwanASGC1.877.80.835.80.717.9 UKRAL1.576.50.896.21.0311.5 USABNL5.322.13.0921.42.0222.5 Total 2008 pledged 22.6994.512.55879.11101.4 2008 needed 23.9710014.431008.99100 2008 missing 1.285.51.8813-0.12-1.4
60
IMPF-2006G. Poulard - CERN PH-ATC60 ATLAS Tiers Association (SC4-draft) Associated Tier-1 Tier-2 or planned Tier-2 % DiskTB%PB% CanadaTRIUMF5.3SARA East T2 Fed. West T2 Fed. FranceCC-IN2P313.5BNL CC-IN2P3 AF GRIFLPCHEP-Beijing Romanian T2 GermanyFZK-GridKa10.5BNLDESY Munich Fed. Freiburg Uni. Wuppertal Uni. FZU AS (CZ) Polish T2 Fed. ItalyCNAF7.5RAL INFN T2 Fed. NetherlandsSARA13.0 TRIUMF ASGC Nordic Data Grid Facility 5.5PIC SpainPIC5.5NDGF ATLAS T2 Fed TaiwanASGC7.7SARA Taiwan AF Fed UKRAL7.5CNAFGrid LondonNorthGridScotGridSouthGrid USABNL24 CC-IN2P3 FZK-GridKa BU/HU T2Midwest T2 Southwest T2 No association (yet) Melbourne Uni. ICEPP Tokyo LIP T2 HEP-IL Fed. Russian Fed. CSCS (CH)UIBK Brazilian T2 Fed.
61
IMPF-2006G. Poulard - CERN PH-ATC61 Computing System Commissioning We have defined the high-level goals of the Computing System Commissioning operation during 2006 o More a running-in of continuous operation than a stand-alone challenge Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007: o Calibration and alignment procedures and conditions DB o Full trigger chain o Event reconstruction and data distribution o Distributed access to the data for analysis At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates
62
IMPF-2006G. Poulard - CERN PH-ATC62
63
IMPF-2006G. Poulard - CERN PH-ATC63 Conclusions (ATLAS) Data Challenges (1,2); productions(“Rome”; “current (continuous)”) o Have proven that the 3 Grids LCG-EGEE; OSG/Grid3 and Arc/NorduGrid can be used in a coherent way for real large scale productions Possible, but not easy In SC3 o We succeeded to reach the nominal data transfer at Tier-0 (internally) and reasonable transfers to Tier-1 SC4 o Should allow us to test the full chain using the new WLCG middleware and infrastructure and the new ATLAS Production and Data management systems o This will include a more complete Tier-0 test; Distributed productions and distributed analysis tests Computing System Commissioning o Will have as main goal to have a full working and operational system o Leading to a Physics readiness report
64
IMPF-2006G. Poulard - CERN PH-ATC64 Thank you
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.