Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,

Similar presentations


Presentation on theme: "Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,"— Presentation transcript:

1 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università, Politecnico e INFN Bari N. De Filippis M. Abbrescia, G. Cuscela, G. Donvito, G. Maggi, S. My, A. Pierro, A. Pompili, + contribution of developers (Kavka, Fanfani, Codispoti, Bacchi)

2 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 2 Outline  Status of CMS Monte Carlo production: organization and current requests  Monte Carlo production in Italy: Activity post –CSA06 Problems with sites Efficiency of italian sites Reliability of sites  CMS plans and milestones for 2007

3 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 3  Goal of MC production: to produce events for CMSSW validation (simulation/reconstruction) and physics studies  Small RelVal samples upon a new CMSSW release  PhysVal / HLT groups make requests in form of cfg´s  Experts provide ProdAgent Workflows  Assignment to Production Teams posted on twiki: https://twiki.cern.ch/twiki/bin/view/CMS/ProdOps  Currently 6 teams: LCG(1,2,3,5,6) and OSG  Each team has O(10) dedicated T1/T2 sites’  When done, files merged and injected to PhEDEx  Too many manual steps and too many extra-prod. duties (e.g. monitoring/dealing with sites availability & stability)  A lot of pressure from SDV group ( P. Janot) to produce events ASAP MC production cycle

4 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 4  After CSA06: CMSSW_1_1_1 and 1_1_2 used until Xmas  CMSSW_1_2_0 released mid-Dec06  Production with CMSSW_1_2_0 running continously since Dec06  PhysVal requests (10M w/o PU + 16.5M w PU)  HLT requests (100M w/o PU+ 20M w PU x 2)  HLT + PU in 2 steps GEN-SIM / DIGI-RECO about 20M done, many running, but very tight schedule! some samples: –QCD di-jets (0 < pt-bin< 3.5TeV), w & w/o PU –Excl. W & Z decays, Wjets(0 < pt < 1TeV) w & w/o PU –Inclusive ttbar, … see https://twiki.cern.ch/twiki/bin/view/CMS/ProdOps120 Current official requests P. Kreuzer

5 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 5 PhysVal samples with CMSSW_1_2_0 LCG (3)

6 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 6 HLT samples with CMSSW_1_2_0 LCG (3) After120 bulk production over, a few «special» requests will be addressed: – Muon Enriched sample with 121: few hundredK events – Cosmics for Tracker with122: 2.5 -5M events

7 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 7 On going effort of the OSG, LCG1,2,5,6 Conclusions of P. Kreuzer:  with2 new and efficient production teams on board, remaining120 assignments should be delivered(at least partially) within 10 days.

8 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 8 MC production in Italy

9 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 9 Post-CSA06 activity (1) Official CSA06 note complete Internal CMS note on CSA06 in italian tiers complete CSA06 analyses completed

10 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 10 Post-CSA06 activity (2) Since October 2006 until today the LCG(3) team:  re-started the Monte Carlo production withous stops also during the Xmas break  has increased the number of esperts to run ProdAgent  has exported the monitoring tool developed at Bari also at the other LCG teams  has produced about 15 M events for the studies of Physics validation and HLT with and without PU…..1/3 of the entire production in CMS  has used the European LCG resources with continuity, giving enormous feedback for the problem resolution of remote sites

11 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 11 Sites used by the LCG(3) team CERN used intensively before and after Xmas Italian sites English sites Hungary Taiwan IN2P3

12 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 12 On going effort of LCG (3) On going GEN-SIM and DIGI-RECO with low luminosity Pileup

13 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 13 Issues about ProdAgent  Production setup at Bari:  3 instances of PA running at Bari:  two for FEVT and GEN-SIM production  one for DIGI-RECO production with PU  one machine for on-line dump of the DBs  Monitoring tool exported to other LCG teams with positive feedback.  The submission of jobs is somehow slow (up to 2-3 job/min) due to:  performances of the PA machines which are two years old  overhead of the RBs  no bulk submission  The control of jobs that failed or aborted because of the middleware problems is difficult. Killing jobs of a given production or submitted to a given site was problematic  PA developers provided a script to do this.  LCG(3) will smoothly leave English CEs to LCG (6) (the english team) and IN2P3 to LCG(5) (the belgian team) w.r.t debugging & intensive use.  On the long run: BulkSubmission& Resource Monitor

14 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 14  Most of LCG(3) sites had various problems before and during the Xmas break  November: Bari, Pisa, Roma when restarting production, CNAF: problems with castor  English sites and IN2P3 had alternate periods of activity also during last month. Italian sites were really efficient during last month.  Debugging of sites is tipically really painful and requires continous interaction with the site administrators.  Problems:  stage out was the main cause of job failures.  site validation: storage, software tag, software mount points, local copy of PU  grid problems: instabilities of the CE because of high load, overload of RBs which caused:  RB didn´t change status of jobs («Waiting» status forever)  No chance to monitor: FWJobreport and log files lost  Difficult/tedious for prod. teams to kill jobs via BOSS commands  The debugging of sites is not a task to be covered by production teams.  CMS is reacting and preparing centralized tests to ensure the reliability of sites. Problems with sites

15 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 15 Efficiency of the italian sites (last month): CNAF No PU CE replaced Except for few days CNAF worked very well to ensure high efficiency of the CMS production during last month

16 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 16 CPU hours and the percentage % of Tier-1 resources used by CMS: Month-week | CPU hr | % --------------------------------------- 15 jan  21 jan : 33.4% 22 jan  28 jan : 19.0% 29 jan  4 feb : 24.8% 5 feb  11 feb : 22.4% Statistics of use of CNAF (last month) The percentage of use depends on the fairshare setup at CNAF Successful jobs Queues always full of jobs, CMS at maximum of use at CNAF.

17 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 17 Efficiency of the italian sites (last month): INFN Except for limited problems with the storage at Bari, Pisa and Rome all the Italian tier-2 like sites worked very well during last month.

18 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 18 Statistics from dashboard

19 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 19 Reliability of sites: tests 1)Submit a small processing job for each advertised CMSSW release at a site. This job checks:  Job can be submitted to site  Local stage out can be done  report can be made back via grid middleware  10 event Minimum Bias?  test frontier access as well? 2)Following completion of the test job, submit a read back job:  verifies job submission  checks data access  clean up file to test cleanup procedure 3)Check global DBS datasets at site:  check read access to all fileblocks at site  report back bad files and invalidate in DBS  perhaps randomly select a dataset to test every day/week etc. Following the feedback of problems found by production operators CMS is defining centralized tests to be run every given time to certify sites for production and analysis. The ideas are:

20 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 20 Reliability of sites: SAM tests SAM (Service Availibility Monitoring) Hopefully the human resources needed for MC production are expected to decrease so less production teams submitting jobs to any sites

21 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 21 Plans for MC production in 2007

22 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 22  Finalize 120 Production (aim for mid-Feb!)  Expecting small 12x requests (RelVal, Muon-enrichedHLT, …)  130 Release (all HLT components) end Feb07  130 HLT Production in Mar07  In parallel, Alpgen Integration in Production  Timescale: integrate till Mar07 + test samples, PH prod. Apr-May07  140 Release (new geo) end Mar07  140 Physics production Apr-May07 (30M / month)  150 Release mid-May07 with improved reco algorithms(re-RECO)  Launch CSA07 with16x end-July07 To be defined the contribution of Italy to the previous activities and the manpower. In addition the CSA07 during summer could be a real problem. 2007 milestones

23 Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 23 Conclusions Monte Carlo production of LCG(3) team run continuosly since the end of CSA06 until now About 15M of events produced (1/3 of the overall CMS productio) Italian sites are working very well during last month to unsure high efficiency production. Warning: keep high the attention to Italian Tiers, mainly at CNAF Effective interaction between operators and developers of PA The load of production operators should decrease as soon as (possible) the centralized SAM tests will run to certify sites for production. The Italian contribution to the activities in preparation and for CSA07 has to be discussed.


Download ppt "Nicola De Filippis CMS Italia, Napoli, 13-14 Feb. 2007 - p. 1 Produzioni MC ai Tiers CMS nel 2007: prospettive CMS-wide e contributo italiano Università,"

Similar presentations


Ads by Google