Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June 2004 1 ATLAS Grid Computing Model and Data Challenges 2 June 2004 Dario Barberis.

Similar presentations


Presentation on theme: "Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June 2004 1 ATLAS Grid Computing Model and Data Challenges 2 June 2004 Dario Barberis."— Presentation transcript:

1 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June ATLAS Grid Computing Model and Data Challenges 2 June 2004 Dario Barberis (CERN & Genoa University)

2 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Event Data Flow from Online to Offline l The trigger system will reduce the event rate from 40 MHz to: n20-30 kHz after the Level-1 trigger (muons and calorimetry) n~3000 Hz after the Level-2 trigger (several algorithms in parallel, running independently for each subdetector) n~200 Hz after the Event Filter (offline algorithms on full event) l These rates are almost independent of luminosity: nthere is more interesting physics than 200 Hz even at low luminosity ntrigger thresholds will be adjusted to follow the luminosity l The nominal event size is 1.6 MB ninitially it may be much larger (7-8 MB) until data compression in the calorimetry is switched on l The nominal rate from online to offline is therefore 320 MB/s

3 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Parameters of the Computing Model l Data Sizes: nSimulated Event Data2.0 MB (raw data + MC truth) nRaw Data 1.6 MB (from DAQ system) nEvent Summary Data0.5 MB (full reconstruction output) nAnalysis Object Data0.1 MB (summary of reconstruction) nTAG Data0.5 kB (event tags in SQL database) l Other parameters: nTotal Trigger Rate200 Hz nPhysics Trigger Rate180 Hz nNominal year 10 7 s nTime/event for Simul. 60 kSI2k s nTime/event for Recon.6.4 kSI2k s

4 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Operation of Tier-0 l The Tier-0 facility at CERN will have to: nhold a copy of all raw data to tape ncopy in real time all raw data to Tier-1s (second copy useful also for later reprocessing) nkeep calibration data on disk nrun first-pass reconstruction ndistribute ESDs to external Tier-1s (2/N to each one of N Tier-1s) l Currently under discussion: nshelf vs automatic tapes narchiving of simulated data nsharing of facilities between HLT and Tier-0 l Tier-0 will have to be a dedicated facility, where the CPU power and network bandwidth match the real time event rate

5 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Operation of Tier-1s and Tier-2s l We envisage at least 6 Tier-1s for ATLAS. Each one will: nkeep on disk 2/N of the ESDs and a full copy of AODs and TAGs nkeep on tape 1/N of Raw Data nkeep on disk 2/N of currently simulated ESDs and on tape 1/N of previous versions nprovide facilities (CPU and disk space) for user analysis (~200 users/Tier-1) nrun simulation, calibration and/or reprocessing of real data l We estimate ~4 Tier-2s for each Tier-1. Each one will: nkeep on disk a full copy of AODs and TAGs n(possibly) keep on disk a selected sample of ESDs nprovide facilities (CPU and disk space) for user analysis (~50 users/Tier-2) nrun simulation and/or calibration procedures

6 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Analysis on Tier-2s and Tier-3s l This area is under the most active change nWe are trying to forecast resource usage and usage patterns from Physics Working Groups l Assume about ~10 selected large AOD datasets, one for each physics analysis group l Assume that each large local centre will have full TAG to allow simple selections nUsing these, jobs submitted to T1 cloud to select on full ESD nNew collection or ntuple-equivalent returned to local resource l Distributed analysis systems under development nMetadata integration, event navigation, database designs are all at top priority nARDA may help, but will be late in the day for DC2 (risk of interference with DC2 developments)

7 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Data Challenge 2 l DC2 operation in 2004: ndistributed production of (>10 7 ) simulated events in May-July nevents sent to CERN in ByteStream (raw data) format to Tier-0 nreconstruction processes run on prototype Tier-0 in a short period of time (~10 days, 10% data flow test) nreconstruction results distributed to Tier-1s and analysed on Grid l Main new software to be used (wrt DC1 in 2002/2003): nGeant4-based simulation, pile-up and digitization in Athena ncomplete new EDM and Detector Description interfaced to simulation and reconstruction nPOOL persistency nLCG-2 Grid infrastructure nDistributed Production and Analysis environment

8 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Phases of DC2 operation l Consider DC2 as a three-part operation: npart I: production of simulated data (May-July 2004) needs Geant4, digitization and pile-up in Athena, POOL persistency minimal reconstruction just to validate simulation suite will run on any computing facilities we can get access to around the world npart II: test of Tier-0 operation (July-August 2004) needs full reconstruction software following RTF report design, definition of AODs and TAGs reconstruction will run on Tier-0 prototype as if data were coming from the online system (at 10% of the rate) output (ESD+AOD) will be distributed to Tier-1s in real time for analysis nin parallel: run distributed reconstruction on simulated data this is useful for the Physics community as MC truth info is kept npart III: test of distributed analysis on the Grid (Aug.-Oct. 2004) access to event and non-event data from anywhere in the world both in organized and chaotic ways

9 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June DC2: Scenario & Time scale September 03: Release7 March 17 th 04: Release 8 (simulation) May 3 rd 04: DC2/I End June 04: Release 9 (reconstruction) July 15 th 04: DC2/II August 1 st 04: DC2/III Put in place, understand & validate: Geant4; POOL; LCG applications Event Data Model Digitization; pile-up; byte-stream persistency tests and reconstruction Testing and validation Run test-production Start final validation Start simulation; Pile-up & digitization Event mixing Transfer data to CERN Intensive Reconstruction on Tier0 Distribution of ESD & AOD Start Physics analysis Reprocessing

10 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Task Flow for DC2 data

11 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June DC2 resources (needed) Process No. of events Time duration CPU power Volume of data At CERN Off site monthskSI2kTB Simulation Phase I (May- -July) RDO Pile-up & Digitization (?) ~30(?) Event mixing & Byte-stream (small)20? 0 Total Phase I ~100~60 Reconstruction Tier Phase II (>July) Reconstruction Tier Total (39?) 71

12 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June DC2: Mass Production tools l We use: n3 Grid flavours (LCG-2; Grid3+; NorduGrid) We must build over all three (submission, catalogues,…) nAutomated production system New production DB (Oracle) Supervisor-executer component model lWindmill supervisor project lExecuters for each Grid and legacy systems (LSF, PBS) nData management system Don Quijote DMS project Successor of Magda nbut uses native catalogs nAMI (ATLAS Metadata Interface, mySQL database) for bookkeeping Going to web services Integrated with POOL

13 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June New Production System (1) l DC1 production in 2002/2003 was done mostly with traditional tools (scripts) nManpower intensive! l Main features of new system: nCommon production database for all of ATLAS nCommon ATLAS supervisor run by all facilities/managers nCommon data management system nExecutors developed by middleware experts (LCG, NorduGrid, Chimera teams) nFinal verification of data done by supervisor

14 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June New Production System (2) LCGNGGrid3LSF LCG exe LCG exe NG exe G3 exe LSF exe super ProdDB Data Man. System RLS jabber soap jabber Don Quijote Windmill Lexor AMI Capone Dulcinea

15 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Roles of Tiers in DC2 (1) l Tier-0 n20% of simulation will be done at CERN nAll data in ByteStream format (~16 TB) will be copied to CERN nReconstruction will be done at CERN (in ~10 days). nReconstruction output (ESD) will be exported in 2 copies from Tier-0 ( 2 x ~5 TB).

16 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Roles of Tiers in DC2 (2) l Tier-1s will have to nHost simulated data produced by them or coming from Tier-2s; plus ESD (& AOD) coming from Tier-0 nRun reconstruction in parallel to Tier-0 exercise (~2 months) This will include links to MCTruth Produce and host ESD and AOD nProvide access to the ATLAS V.O. members l Tier-2s nRun simulation (and other components if they wish to) nCopy (replicate) their data to Tier-1 l All information will be entered into the relevant database and catalog

17 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June ATLAS production l Will be done as much as possible on Grid nFew production managers nData stored on Tier1s nExpressions of Interest to distribute the data in an efficient way – anticipates efficient migration of data nKeep the possibility to use standard batch facilities but using the same production system nWill use several catalogs; DMS will take care of them nPlan: 20% Grid3 20% NorduGrid 60% LCG-2 (10 Tier1s) To be adapted based on experience

18 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Current Grid3 Status (3/1/04) (http://www.ivdgl.org/grid2003) 28 sites, multi-VO shared resources ~2000 CPUs dynamic – roll in/out

19 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June l NorduGrid middleware is deployed in: nSweden (15 sites) nDenmark (10 sites) nNorway (3 sites) nFinland (3 sites) nSlovakia (1 site) nEstonia (1 site) l Sites to join before/during DC2 (preliminary): nNorway (1-2 sites) nRussia (1-2 sites) nEstonia (1-2 sites) nSweden (1-2 sites) nFinland (1 site) nGermany (1 site) l Many of the resources will be available for ATLAS DC2 via the NorduGrid middleware nNordic countries will coordinate their shares nFor others, ATLAS representatives will negotiate the usage NorduGrid Resources: details

20 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June LCG-2 today (May 14)

21 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Tiers in DC2 CountryTier-1SitesGridkSI2k AustraliaNG12 AustriaLCG7 CanadaTRIUMF7LCG331 CERN 1LCG700 China30 Czech RepublicLCG25 FranceCCIN2P31LCG~ 140 GermanyGridKa3LCG+NG90 GreeceLCG10 Israel2LCG23 ItalyCNAF5LCG200 JapanTokyo1LCG127 NetherlandsNIKHEF1LCG75 NorduGridNG30NG380 PolandLCG80 RussiaLCG~ 70 SlovakiaLCG SloveniaNG SpainPIC4LCG50 SwitzerlandLCG18 TaiwanASTW1LCG78 UKRAL8LCG~ 1000 USBNL28Grid3/LCG~ 1000 Total~ 4500

22 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June ATLAS Distributed Analysis & GANGA l The ADA (ATLAS Distributed Analysis) project started in late 2003 to bring together in a coherent way all efforts already present in the ATLAS Collaboration to develop a DA infrastructure: nGANGA (GridPP in the UK) – front-end, splitting nDIAL (PPDG in the USA) – job model l It is based on a client/server model with an abstract interface between services nthin client in the users computer, analysis service consisting itself of a collection of services in the server l The vast majority of GANGA modules fit easily into this scheme (or are being integrated right now): nGUI, CLI, JobOptions editor, job splitter, output merger,... l Job submission will go through (a clone of) the production system nusing the existing infrastructure to access resources on the 3 Grids and the legacy systems l The forthcoming release of ADA (with GANGA 2.0) will have the first basic functionality to allow DC2 Phase III to proceed

23 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Analysis l This is just the first step l Integrate with the ARDA back-end l Much work needed on metadata for analysis (LCG and GridPP metadata projects) l NB: GANGA allows non- production MC job submission and data reconstruction end- to-end in LCG l Interface to ProdSys will allow submission to any ATLAS resource Middleware service interfaces CE WMS File Catalog etc....etc. Middleware services High level service interfaces (AJDL) Analysis Service ROOT cmd line Client GANGA cmd line Client GANGA Task Mgt Graphical Job Builder GANGA Job Mgt High-level services Client tools Catalogue services GANGA GUI Dataset Splitter Dataset Merger Job Management

24 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Monitoring & Accounting l At a very early stage in DC2 nNeeds more discussion within ATLAS Metrics to be defined Development of a coherent approach nCurrent efforts: Job monitoring around the production database lPublish on the web, in real time, relevant data concerning the running of DC-2 and event production lSQL queries are submitted to the Prod DB hosted at CERN lResult is HTML formatted and web published lA first basic tool is already available as a prototype On LCG: effort to verify the status of the Grid otwo main tasks: site monitoring and job monitoring obased on GridICE and R-GMA, integrated with the current production Grid middleware MonaLisa is deployed for Grid3 and NG monitoring

25 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June DC2: where are we? l DC2 Phase I nPart 1: event generation Release (end April) for Pythia generation (70% of data) ltested, validated, distributed ltest production started 2 weeks ago lreal production started this week with current release nPart 2: Geant4 simulation Release (mid May) reverted to Geant4 6.0 (with MS from 5.2) ltested, validated, distributed lproduction will start later this week with current release nPart 3: pile-up and digitization Release (bug fix release, if needed, next week) lcurrently under test (performance optimization) lproduction later in June

26 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June ATLAS Computing Timeline POOL/SEAL release (done) ATLAS release 7 (with POOL persistency) (done) LCG-1 deployment (done) ATLAS complete Geant4 validation (done) ATLAS release 8 (done) DC2 Phase 1: simulation production (in progress) DC2 Phase 2: intensive reconstruction (the real challenge!) Combined test beams (barrel wedge) Computing Model paper Computing Memorandum of Understanding ATLAS Computing TDR and LCG TDR DC3: produce data for PRR and test LCG-n Physics Readiness Report Start commissioning run GO! NOW

27 Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June Final prototype: DC3 l We should consider DC3 as the final prototype, for both software and computing infrastructure ntentative schedule is Q to end Q cosmic run will be later in 2006 l This means that on that timescale (in fact, earlier than that, if we have learned anything from DC1 and DC2) we need: na complete s/w chain for simulated and for real data including aspects missing from DC2: trigger, alignment etc. na deployed Grid infrastructure capable of dealing with our data nenough resources to run at ~50% of the final data rate for a sizable amount of time l After DC3 surely we will be forced to sort out problems day-by- day, as the need arises, for real, imperfect data coming from the DAQ: no time for more big developments!


Download ppt "Dario Barberis: ATLAS Computing Model & Data Challenges GridPP - 2 June 2004 1 ATLAS Grid Computing Model and Data Challenges 2 June 2004 Dario Barberis."

Similar presentations


Ads by Google