Download presentation
Presentation is loading. Please wait.
Published byElizabeth Houston Modified over 8 years ago
1
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY
2
27.3.2007 ISGC 2007: CMS Computing2/26 CMS achievements 2006 CMS achievements 2006 Magnet & Cosmics Test (August 06) Detector Lowering (January 07)
3
27.3.2007 ISGC 2007: CMS Computing3/26 CMS achievements 2006 : Physics TDRs Feb 2006: Volume I of the P-TDR; describes detector performance and software. Jun 2006: Volume II describes the physics performance. The two volumes constitute the culmination of our plans for data analysis in CMS with up to 30 fb -1 of data. –The special study of detector commissioning and data analysis during the startup of CMS, has been deferred to 2007. This activity mobilized hundreds of collaborators during the past two years, and many useful lessons have been learned.
4
27.3.2007 ISGC 2007: CMS Computing4/26 CMS: Computing highlights 2006 Main computing/software milestones: –Magnet Test Cosmic challenge (Apr 06) –Computing Software and Analysis Challenge 06 (Nov 06) 2006: a year of fundamental software changes –New simulation and reconstruction software packages released Very positive feedback from users –Developed procedures for release integration, building and distribution. Control release tools, Hypernews, Nightly builds, Tag collector, WorkBook,… –Design control of all interfaces and data formats in place CMSSW framework, framework-light, ROOT available for data access Integration with CMS detector and commissioning activities –Strong connections with various detector groups – key for commissioning –Validation software packages and validation procedure in place – crucial for startup preparation
5
27.3.2007 ISGC 2007: CMS Computing5/26 Major Milestone in 2006: CSA06 Combined Computing, Software, and Analysis challenge (CSA06) A “25% of 2008” data challenge of the CMS data handling model, computing operations –Integrated test of full end-to-end chain of the complete system, from (simulated) raw data to analysis at Tier-1 and Tier-2 centers. –Launched on Oct 2, 2006; many months of preparation and following the development of about 0.5M lines of software in the new CMSSW framework. –6 weeks later having achieved all technical goals of the challenge. Code ran with negligible crash rate, without any memory problems on all samples By the end of CSA06: Tier-0 centre reconstructed > 200M events; >1 Petabyte of data shipped across network between Tier-0, Tier-1, and Tier-2 centers. –Excellent collaboration with IT department was an important factor in the success of the challenge –World-wide distributed system of regional Tier1 and Tier2 centers
6
27.3.2007 ISGC 2007: CMS Computing6/26 CSA06: T0 Goals & Achievements Prompt Reconstruction at 40 Hz –50 Hz for 2 weeks, then 100 Hz –Peak rate: >300 Hz for >10 hours –207M events total Uptime: 80% of best 2 weeks –Achieved 100% of 4 weeks Use of Frontier for DB access to prompt reconstruction conditions –The CSA challenge was the first opportunity to test this on a large scale with developed reconstruction software –Initial difficulties encountered during commissioning, but patches and reduced logging allowed full inclusion into CSA CPU use –Max CPU efficiency: 96% of 1400 CPUs over ~12 hours Explored realistic T0 operations, upgrading and intervening on a running system
7
27.3.2007 ISGC 2007: CMS Computing7/26 CSA06: T0 T1 Transfers Goal was to sustain 150 MB/s to T1s –Twice the expected 40 Hz output rate Last week’s averages hit 350MB/s (daily) 650MB/s (hourly) i.e. exceeded 2008 levels for ~10 days (with some backlog observed) Monthly T1 Transfer plot signals start T0 rate:54110 170 160 Hz Min bias only @ start Target rate
8
27.3.2007 ISGC 2007: CMS Computing8/26 CSA06: Individual T0 - T1 Performance 6 of 7 Tier-1s exceed 90% availability for 30 days U.S. T1 (FNAL) hit 2X goal 5 sites stored data to MSS (tape) Goals Achievements
9
27.3.2007 ISGC 2007: CMS Computing9/26 CSA06: Jobs Execution on the Grid > 50K jobs/day submitted on all but one day in final week –> 30K/day robot jobs –90% job completion efficiency –Robot jobs have same mechanics as user job submissions via CRAB –Mostly T2 centers as expected OSG carries large proportion –Scaling issues encountered, but subsequently solved
10
27.3.2007 ISGC 2007: CMS Computing10/26 TIB DS modules - positions Closing the loop: analysis of re-reconstructed Z + - data at T1/T2 site: Three scenarios: Ideal/misaligned/realigned (grid jobs at T1-PIC) Determine new alignment: Run “HIP” algorithm on multiple CPUs at CERN over dedicated alignment skim from T0 1 Million events ~4h on 20CPU Write new alignment into offline DB at T0 (ORCOFF) distribute offline DB to T1/T2’s CSA06: Prompt Tracker Alignment results 2 days after AlCaReco!
11
27.3.2007 ISGC 2007: CMS Computing11/26 CSA07: Physics Analysis Demonstrations These demonstrations proved to be useful training exercises for collaborators in the new software and computing tools. Muon: –Extraction of W –Di-Muon reconstruction efficiency Z, J/ + - Northwestern and Purdue groups and T2 activity Tau: –Selection of Z tau tau l+jet –Tau mis-id study from Z+jet –Tau tagging efficiency 1 GLB + 1 tracker track 2 GLB tracks 1 GLB + 1 STA track
12
27.3.2007 ISGC 2007: CMS Computing12/26 CSA06 Summary All goals were met –T0 prompt reconstruction of RECO, AOD, AlCaReco, and with Frontier access @100% efficiency for 207M events –Export to T1 @ 150 MB/s and higher –Data reduction (skim) production at T1s performed, transferred to T2s –Re-reconstruction demonstrated at 6 T1 centers –Job load exceeded 50K/day –Alignment/Calibration/Physics analyses widely demonstrated CSA06 was a huge enterprise –Commissioned the CMS data-handling workflow @ 25% scale –Everything worked down to the final analysis plots –Many lessons can be drawn for the future as we prepare for data- handling operations, and more things to commission DAQ Storage Manager T0 Support of global data-taking during detector commissioning
13
27.3.2007 ISGC 2007: CMS Computing13/26 Some Lessons from CSA06 CMS needs some development work to ease the operations load Strong engagement with OSG, WLCG and sites was extremely useful –Grid service and site problems were addressed promptly. –FTS at CERN was carefully monitored, response when needed –CASTOR support at CERN was excellent –Support from CERN IT was key for success and very instrumental Data management needs an automatic way to ensure consistency across all components Scale testing continues to be an extremely important activity
14
27.3.2007 ISGC 2007: CMS Computing14/26 CMS Outlook and Perspectives for 2007 Lower all the detector, and commission it underground. Prepare final distributed computing and software system and physics analysis capability. Initial* CMS detector will be ready for collisions at 900 GeV at the end of 2007. Low luminosity detector will be ready for collisions at design energy in mid-2008. Initial* CMS detector is the low luminosity detector minus ECAL endcaps and pixels. Install both during 07/08 winter shutdown.
15
27.3.2007 ISGC 2007: CMS Computing15/26 CMS computing goals in 2007 Demonstrate Physics Analysis performance using final software with high statistics. –Major MC production of up to 200M events started last week –Analysis starts in June, finishes by September Regular data taking: Detector – HLT – TAPE - T0 - T1 –At regular intervals, 3-4 days per months, starting May –Month of October: MTCC3 Readout of (successively more) components, data will be processed and distributed to T1
16
27.3.2007 ISGC 2007: CMS Computing16/26 Computing Commissioning Plans 2007 March April May June July Aug. Sep. Oct. Nov. Start large MC Production Global Detector Run February –Deploy PhEDEx 2.5 –T0-T1, T1-T1, T1-T2 independent transfers –Restart job robot –Start work on SAM –FTS full deployment March –SRM v2.2 tests start –T0-T1(tape)-T2 coupled transfers (same data) –Measure data serving at sites (esp. T1) –Production/analysis share at sites verified April –Repeat transfer tests with SRM v2.2, FTS v2 –Scale up job load –gLite WMS test completed (synch. with Atlas) May –Start ramping up to CSA07 July –CSA07 Start Global data-taking runs CSA07 LHC Eng. run preCSA07 Event Filter tests Start Analysis
17
27.3.2007 ISGC 2007: CMS Computing17/26 Motivations for CSA07 There are two important goals for 2007, the last year of preparations for physics and analysis 1) Scaling We need to reach 100% of system scale and functionality by spring of 2008 CSA06 demonstrated between 25% and 50% depending on the metric 2) We need to transition to sustainable operations This spans all areas of computing Data management Job processing User Support Site configuration and consistency In the past functionality was valued higher than the operations load As we prepare for long term support this emphasis needs to change
18
27.3.2007 ISGC 2007: CMS Computing18/26 CSA07 Goals: Increase Scale CMS demonstrated 25% performance in 2006. We have two more factors of 2 to ramp up before data taking in 2008 The data transfer between Tier-0 and Tier-1 reached about 50% of scale –Very successful test, but some signs of system stress were visible Job submission rate reached 25%. We plan another formal challenge in 2007 A > 50% challenge in the summer of 2007 –Extend the system to include the HLT farm –Add elements like simulation production –Increase user load –Run concurrent with other experiments stressing the system
19
27.3.2007 ISGC 2007: CMS Computing19/26 CMS Computing Model & Resources CMS Tier-1 centers:
20
27.3.2007 ISGC 2007: CMS Computing20/26 CSA07 Workflow
21
27.3.2007 ISGC 2007: CMS Computing21/26 CSA07 success metrics
22
27.3.2007 ISGC 2007: CMS Computing22/26 CSA07 Goals for Tier-1s In the Computing Model the Tier-1 centers perform 4 functions: Archive Data, both real and simulation from Tier-2 centers Execute skimming and selection for users and groups on the data Re-reconstruction of raw data Serving data samples to Tier-2 centers for further analysis As we transition to operations we should bring the Tier-1 centers into alignment with their core functionality
23
27.3.2007 ISGC 2007: CMS Computing23/26 CSA07: expectations of Tier-2s MC Production at Tier-2s were a significant contributor to the 25M events/month for CSA06 When the experiment is running the Tier-2s are the only dedicated simulation resources and the expectations is 100M per month –Now CMS produces 30M events/months, goal for CSA07 is 50M Analysis submission The Tier-2s are expected to support communities –Either local groups or regions of interest –Only implemented in a couple of specific communities Unlike Tier-1 data subscriptions and processing expectations, which are largely specified by the experiment centrally, the Tier-2s have control over the data and the activity CMS will work to improve the reliability and availability of the Tier-2 centers
24
27.3.2007 ISGC 2007: CMS Computing24/26 Tier-2 Analysis goals in 2007 Tier-2s are the primary analysis resource controlled by physicists The activities are intended to be controlled by user communities Up to now most of the analysis has been hosted at the Tier-1 sites CMS will enlarge analysis support by hosting important physics samples exclusively at Tier-2 centers We have roughly 10-15 sites that have sufficient disk and CPU resources to support multiple datasets –Skims in CSA06 were about ~500GB –The largest of the raw samples was ~8TB Force the migration of analysis to Tier-2s by hosting data at Tier-2s
25
27.3.2007 ISGC 2007: CMS Computing25/26 Transition to operations in 2007, Goals We plan to measure the transition to operations with concrete metrics Site availability: SAM tests (Site Availability Monitor) Put CMS functions in the site functional testing –Analysis submissions –Production –Frontier –Data Transfer Measure the site availability The WLCG goal for the Tier-1 in early 2007 is 90% –We should establish a goal for Tier-2s, 80% seams reasonable Goals for summer of 07 would be 95% and 90% respectively
26
27.3.2007 ISGC 2007: CMS Computing26/26 Prepare CMS for Analysis: Summary Prepare CMS for Analysis: Summary 2006 was a very successful year for CSM software and computing 2007 promises to be a very busy year for Computing and Offline Commissioning, Integration remains major task in 2007 –To balance the needs for physics, computing, detector will be a logistics challenge Transition to Operations has started; data operations group formed Facilities will be ramping up resources to be ready for pilot run and the 2008 physics run An increased number of CMS people will be involved in the facilities, commissioning and operations to prepare for CMS analysis
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.