Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.

Similar presentations


Presentation on theme: "1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007."— Presentation transcript:

1 1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007

2 2 M. Paganoni, HCP2007 Outline CMS Computing and Analysis Model CMS workflow components 25 % capacity test (CSA06 challenge) CMSSW validation LoadTest07, Site Availability Monitor and Grid gLite 3.1 The goals for 2007 Physics validation with high statistics Full detector readout during commissioning 50 % capacity test (CSA07 challenge) Analysis workflow

3 3 M. Paganoni, HCP2007 CMS schedule March April May June July Aug. Sep. Oct. Nov. 1) Detector Installation, Commissioning & Operation First Global Readout Test Barrel ECAL Inserted Tracker Inserted Trigger/DAQ Ready for System Commissioning CMS Ready to Close 2) Preparation of Software, Computing and Physics Analysis HLT exercise complete Pre-CSA07 Computing Software Analysis Challenge 2007 Physics Analyses completed CSA07 All CMS Systems Ready for Global Data Taking

4 4 M. Paganoni, HCP2007 The present status of CMS computing From development service/data challenges (both WLCG wide and experiment specific) of increasing scale and complexity to operations data distribution MC production physics analysis Primary needs: Smoothly running Tier1’s and Tier2’s, concurrent with other experiments Streamlined and automatic operations to ease the operation load Full monitoring to have early detection of Grid and site problems and reach stability Sustainable operations in terms of DM, WM, user support, site configuration and availability, continouous significant load

5 5 M. Paganoni, HCP2007 The CMS computing Model Tier-0:  Accepts data from DAQ  Prompt reconstruction  Data archive and distribution to T1s Tier-1’s:  Data and MC archiving  Re-processing  Skimming and other data- intensive analysis tasks  Data serving to T2s Tier-2’s:  User data Analysis  MC production  Calibration/alignment and detector studies ~30

6 6 M. Paganoni, HCP2007 CMS data formats and data flow RAW RECO AOD TAG CMS: ~1.5 MB/ev 2 copies: 1 at T0 and 1 over T1s 4.5 PB/yr CMS: ~250 kB/ev 1 copy spread over T1s 2.1 PB/yr CMS: ~50 kB/ev 1 copy to each T1, data serving to T2s 2.6 PB/yr CMS: ~1-10 kB/ev MC in 1:1 ratio with data

7 7 M. Paganoni, HCP2007 The MC production Production of 200M events (50M/month), for HLT and Physics Notes, started at T2s with new MC Production System less man-power consuming, better handling of Grid-sites unreliability, better use of resources, automatic retrials, better error report/handling More flexible and automated architecture ProdManager (PM) (+ the policy piece) –manage the assignment of requests to 1+ ProdAgents and tracks the global completion of the task ProdAgent (PA) –Job creation, submission and tracking, management of merges, failures, resubmissions Tier-0/1 Policy/scheduling controller PM Official MC Prod Develop. MC Prod PA Tier-1/2 PA PM

8 8 M. Paganoni, HCP2007 CMS Remote Analysis Builder CRAB is a user oriented tool for Grid submission and handling of physics analysis jobs data discovery (DBS/DLS) interactions with the Grid (also error handling, resubmission) output retrieval routinely used since 2004 on both EGEE and OSG (MTCC, PTDR, CSA06, tracker commissioning…) New client-server architecture improve scalability increase automatization

9 9 M. Paganoni, HCP2007 The data placement (PhEDEx) Data placement system for CMS (in production since > 3 years) large scale reliable dataset/fileblock replication multi-hop routing following a transfer topology (T0  T1’s  T2’s), data pre-stage from tape, data archiving to tape, monitoring, bookkeeping, priorities and policy, fail-over tactics PhEDEx made of a sets of independent agents, integrated with gLite File Transfer Service (FTS) It works with both EGEE and OSG Automatic subscription to DBS/DLS

10 10 M. Paganoni, HCP2007 Data processing workflow

11 11 M. Paganoni, HCP2007 Computing Simulation Analysis challenge 2006 The first test of the complete CMS workflow and dataflow 60M events exercise to ramp up at 25% of the 2008 capacity T0: prompt reconstruction 207M events reconstructed (RECO, AOD) applying alignment/calibration from offline DB 0.5 PB transferred to 7 T1s T1s: skimming (to get manageable datasets), re-reconstruction automatic data serving to T2s via injection to PhEDEx and registration in DBS/DLS T2s: access to the skimmed data, alignment/calibration jobs, Physics analysis jobs submission of analysis jobs to the Grid with CRAB by single users and groups insertion of new constants in offline DB

12 12 M. Paganoni, HCP2007 CSA06: T0 and T0 -> T1 Prompt Reconstruction @ T0 peak rate: >300 Hz for >10 hours uptime: 100% over 4 weeks best eff.: 96% (1400 CPUs) for ~12 h T0 -> T1 transfer average rate: 250 MB/s peak rate: 650 MB/s

13 13 M. Paganoni, HCP2007 CSA06: job submission >50K jobs/day in final week  30K/day robot jobs  production jobs managed by Production Agent  analysis jobs submitted via CRAB to the Grid 90% job efficiency a typical CSA06 day CRAB submissions

14 14 M. Paganoni, HCP2007 CSA06: calibration minimum bias  -symmetry single electrons W ➔ e ν Z mass reconstruction ECAL calibration with  -symmetry of energy deposits in minimum bias few hours of data calibration workflow

15 15 M. Paganoni, HCP2007 CSA06: alignment TIB modules - positions Closing the loop: analysis of re-reconstructed Z ➔ μμ data at T1/T2 site Determine new alignment: run HIP algo on multiple CPU’s over dedicated alignement skim from T0 1M events ~ 4 h on 20 CPU write new alignment into offline DB at T0 distribute offline DB to T1/T2’s for re-reconstruction Reconstructed Z mass

16 16 M. Paganoni, HCP2007 CMSSW validation: tracking Reproduce with CMSSW framework (1.2M lines of simulation, reconstruction and analysis software) the detector performance reported in PTDR vol. 1 Muons (CMSSW) CMSSW Pixel Seeding

17 17 M. Paganoni, HCP2007 CMSSW validation: electrons electron classification momentum at vertex electron/supercluster matching Already improving PTDR results in many areas (forward tracking, electron reconstruction …)

18 18 M. Paganoni, HCP2007 Site Availability Monitor Measure the site availability by testing:  Analysis submission  Production  Dbase caching  Data transfer With Site Availability Monitor (SAM) infrastructure, developed in collaboration with LCG and CERN/IT The goal is 90 % for T1s and 80 % for T2s Run tests at each EGEE sites every 2 hours now 5 CMS specific tests, more under development Feedback to site administrators and targeting individual components

19 19 M. Paganoni, HCP2007 WMS acceptance tests on gLite 3.1 115000 jobs submitted in 7 days on a single WMS instance ~16000 jobs/day well exceeding acceptance criteria ~0.3% of jobs with problems, well below the required threshold Recoverable using a proper command by the user The WMS dispatched jobs to computing elements with no noticeable delay

20 20 M. Paganoni, HCP2007 An infrastructure by CMS to help Tiers to exercise transfers Based on a new traffic load generator Coordination within the CMS Facilities/Infrastructure project Exercises T0  T1(tape), T1  T1, T1  T2 (‘regional’), T1  T2 (‘non-regional’) CMS LoadTest 2007 Important achievements routinely transferring all Tiers report it’s useful higher participation of Tiers less efforts, improved stability Automatic, streamlined operations T0-T1 only CMS LoadTest Cycles ~2.5 PB in 1.5 months CMS CSA06

21 21 M. Paganoni, HCP2007 Goals of Computing in 2007 Support of global data taking during detector commissioning commissioning of the end-to-end chain: P5 --> T0 --> T1s (tape) data transfers and access through the complete DM system 3-4 days every month starting in May Demonstrate Physics Analysis performance using final software with high statistics. Major MC production of up to 200M events started in March Analysis starts in June, finishes by September Ramping up of the distributed computing at scale (CSA07) 50 % challenge of 2008 system scale Adding new functionalities HLT farm (DAQ storage manager -> T0) T1 - T1 and non regional T1 - T2 Increase the user load for physics analysis

22 22 M. Paganoni, HCP2007 CSA07 workflow

23 23 M. Paganoni, HCP2007 CSA07 success metrics

24 24 M. Paganoni, HCP2007 CSA07 and Physics Analysis We have roughly 10-15 T2s that have sufficient storage and CPU resources to support multiple datasets Skims in CSA06 were about ~500 GB, the largest of the raw samples was ~8 TB Improvements in site availability with SAM Improve non-regional Tier-1 - Tier-2 transfers Publish data hosting proposals for Tier-1 and Tier-2 sites User Analysis Distributed analysis through CRAB to Tier-2 centers Dynamic use of the Tier-2 storage Calibration workflow activities

25 25 M. Paganoni, HCP2007 Ingredients for analysis workflows Event Filters Pre-select Analysis output Event Producer Can create new contents to be included for Analysis output EDM output configurability Can keep or drop any collections Flexibility in the event content Flexibility with different steps of data reduction Input Output Analysis job Can be mixed in any combination

26 26 M. Paganoni, HCP2007 Analysis workflow at Tier0/CAF HLT Output RECO AOD RAW (optional) RECO AOD RAW (optional) one in-time processed stream, or HLT primary streams Early discovery express stream Physics Data Quality Monitoring Physics Data Quality Monitoring Standard Model ‘Candles’ Object ID efficiency Calibration with Control samples Dedicated stream(s) for fast calibration Initial fast calibration Actual output of HLT farm still to be detailed…

27 27 M. Paganoni, HCP2007 Conclusions Commissioning, integration remain major tasks in 2007 To balance the needs for physics, computing, detector will be a logistics challenge Transition to Operations has started. Scaling at production level, while keeping high efficiency is the critical point Continuous effort to be monitored in detail Keep as flexible as possible in the analysis model An increasing number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS physics analysis


Download ppt "1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007."

Similar presentations


Ads by Google