Univ. Milano-Bicocca e INFN

Slides:



Advertisements
Similar presentations
Nicola De Filippis Integration meeting, 28 th September p. 1 MC production for CSA06 Department of Physics and INFN Bari N. De Filippis S. My and.
Advertisements

Nicola De Filippis Workshop sulla fisica di ATLAS e CMS, Bologna, Nov p. 1 The CMS Computing Software and Analysis Challenge 2006 Department.
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Operational Experience with CMS Tier-2 Sites I. González Caballero (Universidad de Oviedo) for the CMS Collaboration.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Introduction to CMS computing J-Term IV 8/3/09 Oliver Gutsche, Fermilab.
Nicola De Filippis CMS Italia, Napoli, Feb p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,
1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
ATLAS Trigger Development
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
04/09/2007 Reconstruction of LHC events at CMS Tommaso Boccali - INFN Pisa Shahram Rahatlou - Roma University Lucia Silvestris - INFN Bari On behalf of.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
David Lange Lawrence Livermore National Laboratory
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
1 M. Paganoni, 17/1/08 Modello di calcolo di CMS M. Paganoni Workshop Storage T2 - 17/01/08.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
The CMS Experiment at LHC
Real Time Fake Analysis at PIC
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Dirk Duellmann CERN IT/PSS and 3D
INFN GRID Workshop Bari, 26th October 2004
Pasquale Migliozzi INFN Napoli
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
INFN-GRID Workshop Bari, October, 26, 2004
ALICE Physics Data Challenge 3
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Zhongliang Ren 12 June 2006 WLCG Tier2 Workshop at CERN
CC IN2P3 - T1 for CMS: CSA07: production and transfer
Conditions Data access using FroNTier Squid cache Server
The CMS Computing, Software & Analysis Challenge
Computing Overview Topics here: CSA lessons (briefly) PADA
T1 visit to IN2P3 Computing
US ATLAS Physics & Computing
N. De Filippis - LLR-Ecole Polytechnique
LCG Service Challenges Overview
LHC Data Analysis using a worldwide computing grid
DØ MC and Data Processing on the Grid
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

Univ. Milano-Bicocca e INFN Data challenge CMS M. Paganoni Univ. Milano-Bicocca e INFN

Contributors CNAF: William Bacchi, Daniele Bonacorsi, Paolo Capiluppi, Giuseppe Codispoti, Alessandra Fanfani T2 Roma: Luciano Barone, Pietro Govoni, Martina Malberti, Paolo Meridiani, Giovanni Organtini, Shahram Rahatlou, Francesco Safai Tehrani T2 LNL: Massimo Biasotto, Federica Fanzago, Ugo Gasparini, Martino.Margoni, Gaetano Maron, Ezio Torassa Pisa: Filippo Ambroglini, Giuseppe Bagliesi, Paolo.Bartalini, Tommaso Boccali, Federico Calzolari, Livio Fanò Bari: Nicola De Filippis, Giacinto Donvito, Giorgio Maggi

Computing Software Analysis Challenge 2006 A 50 million event exercise to test the workflow and dataflow as defined in the CMS computing model A test at 25% of the capacity needed in 2008 Main components: Preparation of large MC simulated datasets (some with HLT-tags) Prompt reconstruction at Tier-0: Reconstruction at 40 Hz (over 150 Hz) using CMSSW Application of calibration constants from offline DB Generation of Reco, AOD, and AlCaReco datasets Splitting of an HLT-tagged sample into 10 streams Distribution of all AOD & some FEVT to all participating Tier-1s Calibration jobs on AlCaReco datasets at some Tier-1s and CAF Re-reconstruction performed at Tier-1s Skim jobs at some Tier-1s with data propagated to Tier-2s Physics jobs at Tier-2s and Tier-1s on AOD and Reco Italian contribution

Official Timeline – June 1: computing systems ready for Service Challenge SC4 – June 15: physics simulation validation complete – July 1: start MC production – Aug.15: calibration, alignment, HLT, reconstruction, and analysis tools ready – Aug.30: 50 Mevt produced, 5M with HLT pre-processing – Sep. 1: Computing systems ready for CSA – Sep 15: Start CSA06 – Oct 1: start smooth operation for CSA06 – Oct 30: End smooth operation for CSA06 – Nov 15: Finish CSA06

Success metrics Item Goal Threshold Result # Tier1 7 5 # Tier2 20 15 24 Weeks of sustained rate 4 2 Tier0 efficiency 80 % 30 % 100 % Running jobs per day (2h) Tier1+Tier2 50k 30k Grid job efficiency 90 % 75 % 95 % Data serving (storage to CPU) 1 MB/s/slot 300 MB/s (T1) 100 MB/s (T2) OK Data transfer Tier0-all Tier1 (tape) 150 MB/s 75 MB/s 550 MB/s Data transfer Tier1 - Tier2 20 MB/s 5 MB/s

Computing resources Tier-0 (CERN): 1.4M SI2K (~ 1400 CPUs at CERN) 240 TB Tier-1 (7 sites): 2500 CPUs in total 70 TB disk + tape as minimum to participate Tier-2 (25 sites): 2400 CPUs in total Average 10 TB disk at participating Tier-2

MC production software and tools ProdAgent tool used to automatise the production: consists of many agents running in parallel: JobCreator, JobSubmitter, JobTracking, MergeSensor…. ouput files are registered in Data bookkeping service (DBS); blocks of files are registered in Data Location System (DLS); local catalogue used to map LFNs to local PFNs Files are merged for optimum size before transfer to CERN CMSSW installed via grid tools or directly by site admins in remote sites. Storage management deployed: CASTOR, dCache, DPM

MC pre-production 4 production teams active: 1 for OSG -- Ajit Mohapatra – Wisconsin (taking care of 7 OSG CMS Tier2) 3 for LCG: -- Jose Hernandez – Madrid (Spain, France, Belgium, CERN) -- Carsten Hof – Aachen (Germany, Estonia, Taiwan, Russia, Switzerland, FNAL) -- Nicola De Filippis – Bari (Italy, UK, Hungary) Large partecipation of CMS T1s and T2s involved

Monitoring of production via web interface First prototype of monitoring was developed by Bari team:

Monitoring of MinBias (1) Maximum rate per day: 1.15 M

Monitoring of MinBias (2) CNAF LNL Roma Bari Pisa Most of the failures at CNAF were related to stageout and stagein problems with CASTOR2

12 M of events produced by the LCG(3) team Dataset statistics Total: ~ 66 M events No Pileup Total FEVT: O(150) TB 1. Minimum bias (40M) 2. Zµµ (2M) 3. We (4M) 4. t-tbar (6M) [all decays] 5. Electroweak soup (5M) Wl nu + Drell-Yan (m>15 GeV) + WW + HWW 6. HLT soup (5M): 10 effective MC HLT triggers W (leptons) + Drell-Yan (leptons) + t-tbar (all modes) + dijets 7. Jet calibration soup (1M) dijet + Z+jet 8. Soft Muon Soup (2M) Inclusive muons in minbias + J/Psi production 9. Exotics Soup (1M) LM1 SUSY, Z’ (700 GeV), and excited quark (2000 GeV) [all decays] for calibration 12 M of events produced by the LCG(3) team

Efficiency and problems Overall efficiency: 88% Probability for a job to end successfully once it is submitted Grid efficiency: 95% Aborted jobs: jobs not submitted because requirements not met (merge jobs) or jobs once submitted fail due to Grid infrastructure reason Problems: stage out was the main cause of job failures. More robust checking were implemented, more attempts to stage, a fallback strategy etc.. merge jobs caused tipically an overload of the storage system because of the high rate of read access; CASTOR2 at CNAF was tuned to cope with the needs of the production (D. Bonacorsi and CNAF admins) site validation: storage, software tag, software mount points, matching of CE consistency between fileblock/files in DBS/DLS and the reality at sites. Support of Italian Tier-1 and Tier-2 very effective also in August

Tier-0 tasks in CSA06 Reconstruction with CMSSW_1_0_x (x6) All main reconstruction components included Detector-specific local reconstruction and clustering Tracking (only 1 algo used), vertexing, standalone , jets Global  (with tracker), electrons, photons, b&tau tagging Reconstruction time small: 4.5s/ev MB, 20s/ev ttbar Computing model assumes 25 s/ev Calibration/Alignment Ability to pull in constants from Offline DB included for ECAL, Tracker, and Muon reconstruction Direct access to Oracle or via Frontier cache

Tier-0 operations 4 weeks uptime (goal), 207M events processed 2 Oct. : operations at Tier0 started First week mostly minbias (with some EWK) using CMSSW102 while bugs fixed to improve robustness on signal samples Second week processing included signal samples at rates generally matched to T1 bandwidth metrics and using CMSSW103 After having run for about 23 days, 120M events at 100% uptime, decided to increase scale for last days Reprocessed all signal samples in ~5 days using CMSSW106 and maximum CPU usage Performance: 160 Hz processing rate, peaking at 300 Hz 1250 CPUs for prompt reconstruction 150 CPUs for AOD and AlCaReco production (separate step) All constants pulled from Frontier i.e. full complexity of CSA exercise 4 weeks uptime (goal), 207M events processed

Calibration/Alignment exercise at Tier-0 CAF Calibration/alignment tasks: Specialized tasks to align/calibrate subsystems using start-up miscalibrated samples, e.g. Align a portion of Tracker with HIP algorithm by using Z →mm sample on the central analysis facility (CAF) for prompt calibration/alignment Intercalibrate ECAL crystals by phi symmetry in minbias events, 0/, or by isolated electrons from W/Z Specialized reduced RECO data format (AlCaReco) to be used for calibration/alignment stream from Tier-0 Mechanism to write constants back into offline DB to be used Re-reconstruction at Tier-1 required to test new constants Propose that miscalibration is applied at RECO

Tracker Alignment exercise CSA06 misalignment scenario: TIB dets and TOB rods misaligned by applying: random shifts, drawing from a flat distribution of witdth +/-100 mm, in (x,y,z) for the double sided modules and in x (sensitive coordinate) for the single sided ones random rotations, drawing from a flat distribution of witdth +/-10 mrad, in (alpha,beta,gamma) for all the modules TIB double sided dets positions Alignment exercise: to read the object in the DB, to apply the initial misalignment; to run the iterative HIP algorithm and to determine alignment constants; 1M events used and 10 iterations. jobs running in parallel on 20 CPUs on a dedicated queue at Tier-0; new costants inserted into the DB

Transfer Tier-0/Tier-1s All 7 Tier-1 participated in the challenge performing very well some storage element software or hardware problems at individual sites The longest down time at any site has been about 18 hours Files are injected into the CMS data transfer system PhEDEx and transferred using FTS Highest rate from CERN was 550MB/s First 3 Week Average Site Rate ASGC 14.3MB/s CNAF 18.0MB/s FNAL 47.8MB/s GridKa 21.7MB/s IN2P3 14.6MB/s PIC 14.4MB/s RAL 16.4MB/s Total 147MB/s

Transfer Tier-0/Tier-1s

Skimming data at Tier-1s To fit data at T2, and to reduce primary datasets to manageable sizes, it was needed to run skim jobs at T1s to select events according to the analyses Skim configuration files prepared according to the RECO and AOD format (also including some “MC truth” information) Organized skim jobs ran with ProdAgent Different skim procedures prepared by the users for running on the same dataset were unified in a single skim job producing different streams 10 filters prepared by the Italian people to cope with the analyses prepared 4 teams for running skim jobs at tier-1s N. De Filippis: Electroweak soup (RAL, CNAF, ASGC, IN2P3) D. Mason: Jets (FNAL) C. Hof: TTbar ( FZK and FNAL) J. Hernandez: Zmumu (PIC and CNAF) Skim job output files shipped to Tier-2s for end-user analyses 9 Oct. – T1 Skim jobs started

Monitoring of skim jobs at Tier-1s

Transfer of skim outputs from Tier-1s to Tier-2s Problems related to: wrong config. of Tier-2 sites wrong setup of download agents with FTS CNAF related problems (FTS server, CASTOR)

Total transfer Tier-0 to Tier-1s and Tier-2s Exceeded 1PB in 1 month!

Analyses at Tier-2s (1)

minimum bias underlying event Analyses at Tier-2s (2) All INFN Tier2s took part to the last step of the CSA06: the physics analyses starting from the output of skim procedures LNL Wmn selection Pisa (tau validation) minimum bias underlying event Rome electron reco Bari tracker misalignment

Analysis at Rome Three analyses with goals: to study of the electron reconstruction in Z  ee events (Meridiani) to measure the W mass in W  en events (Tabarelli, Malberti) to run a simple calibration with W  en events (Govoni) Electron and Z mass reconstruction using the hybrid supercluster Eff vs pT Eff vs h mZ

Analysis at Pisa (1) The general idea is to simulate a "early data taking" activity of the t group: study the tau tag efficiency from the Z tt events study the misidentification with the recoiling jet with Z+jet, Z  mm events In addition: run t validation package on skimmed events pure di-tau sample and ttbar sample (S. Gennai, G. Bagliesi). Isolation efficiency vs Isolation Cone : pT of the jet

Analysis at Pisa (2) Study of minimum bias/underlying event (Fanò, Ambroglini, Bartalini) Monte Carlo tuning for LHC Pileup undestanding UE contribution measurements in MB events UE MinBias

Analysis at LNL Goal: to study the W mn preselection with different Monte Carlo data samples Two data samples were considered (Torassa, Margoni, Gasparini): (1) the electroweak soup (3.4 M evts, 50% Wmn and 50% DY) (2) the soft muons (1.8 M evts, 50% minimum bias and 50% J/y, pTm > 4 GeV) EWK soup GlobalMuon reconstructor

Analysis at Bari Goals: to study the effect of tracker misalignment on track reconstruction performances (De Filippis): with the perfect tracker geometry; in the short term and in the long term misalignment scenario by reading misalignment position and errors via frontier/squid from the offline database ORCAOFF. by using the tracker module position and errors as obtained by the output of the alignment process that will be run at CERN T0. Data samples used: Z→mm and ttbar (the second for computing the fake rate)

Analysis jobs at Bari grid efficiency = 99 %, appl. eff = 94 % CRAB_1_4_0 used to submit 1.8 k jobs grid efficiency = 99 %, appl. eff = 94 % Bunch of 150 jobs run in different time slots max 45 jobs run in parallel the configuration of squid tuned to ensure that the alignment data were read by the local cache of squid via the frontier client rather than from CERN (blue histo).  frontier/squid works as expected at tier-2 Bari when accessing alignment data

Re-reconstruction at Tier-1s The last step of CSA06: Re-reconstruction at Tier-1s Goals: to demonstrate re-reconstruction from some RAW data at Tier-1s as part of the calibration exercise Status: access of Offline database via frontier working re-reconstruction demonstrated at ASGC, FNAL, IN2P3, PIC and CNAF Running at RAL and further tests at CNAF PIC

Tier2 - LNL Import: 200TB transferred, rate 20-50 MB/s Export: 60TB transferred, rate 5-20 MB/s SC4 challenge (Jun-Sep 06) fake analysis jobs MC production user analysis Jun-Sep 2006 (~50K jobs)

Tier2 - LNL (2) Total number of jobs running (last 6 months) CMS CSA06 CMS MC production

Tier2 - Roma Nuove cpu installate 12/10/06

Pisa /day fake analysis jobs MC production

Bari

Conclusions CSA06 was supposed to be a challenge to commission the computing/software/analysis system but in some cases it required also development/deployment of the tools CSA06 was successful at INFN (all the steps were executed) but thanks to the 100 % work of few experts and to the coordinated effort of many people at Tier-1 and Tier-2 sites. CSA06 analysis exercises need to be he ramp-up for the physics program/organization in Italy CSA07 should cope with both simulated and real data and focus on start-up operations (calibration and alignment) and analysis preparation

Rimodulazione

Richieste 09/2006 1 TB = 1.5 kEuro 1 box = 8 kSI = 3 kEuro Le richieste totali (inclusi SJ) porterebbero sul totale dei 4 centri di calcolo ad avere un fattore 2 sotto il piano iniziale, per tenere conto della rimodulazione di LHC (248 TB contro 530 TB e 1014 kSI2K contro 1600 kSI2K)