Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.

Similar presentations


Presentation on theme: "Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term."— Presentation transcript:

1 Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term development team ~10 people; core deployment team ~10 people? (incl. UK). New generation of tools Based upon existing distributed toolset: IMPALA, BOSS, RefDB Evolution draws from experience gained in DC02 Not explicitly designed for use on LCG testbed, but intended to operate on Grid later (experience from CMS EDG stress test, etc). New umbrella project: OCTOPUS Covers all CMS distributed production and Grid tools “Overtly Contrived Toolkit of Previously Unrelated Stuff”? “Oh Crap: Time to Operate Production Uber-Software” Formal support system / bug tracking now in place (via Savannah) Our worldwide Octopus has more than eight arms…

2 Dave Newbold, University of Bristol24/6/2003 The problems to solve The nature of CMS production: Highly distributed (~30 sites) Some sites have MUCH more resource (kit, people) than others We produce ‘useful data’, so DQM is very important The application chain is somewhat complex Different event types require different processing chains High-lumi background simulation presents some special problems Some key issues: Communication (~ fortnightly VRVS meetings, very useful) Documentation, support for installation and use of tools Adaptability of production system to local conditions (now easier) Real-time data and metadata validation Data storage and migration between sites (data is NOT bunged off to CERN) ‘Hotspots’ in distributed computing system (CERN + RAL, FNAL)

3 Dave Newbold, University of Bristol24/6/2003 Core user-side toolset McRunjob: generic python local production framework Originally a D0 tool – D0 and CMS versions almost merged ‘Glues together’ the various stages of a production chain in a consistent and generic way; handles job setup and input / output tracking CMS-specific classes are provided to configure our applications. ImpalaLite: CMS-specific modules in McRunjob Core functionality from IMPALA, handling job preparation Interfaces global CMS bookkeeping database (RefDB), data validation, job submission BOSS: local job submission and tracking Provides a uniform interface to the various batch systems (PBS, LSF, BQS, MOP etc etc) Based on MySQL job tracking database BODE is a web-based front end for local job management

4 Dave Newbold, University of Bristol24/6/2003 System-side toolset RefDB: central bookkeeping / metadata database Provides (physicist) user interface for requesting data Web interface allows users to track their requests, drill down into detailed metadata corresponding to produced data Used remotely by ImpalaLite at job preparation time to establish job input parameters, etc Based upon MySQL database at CERN DAR: packaging of applications Very simple way of automatically packaging CMS software components (CMKIN, CMSIM, OSCAR, ORCA) with required libraries, etc Minimal dependence upon site conditions Ensures uniformity of application versions, etc, across sites. NB: only one current platform for production, linux RH73

5 Dave Newbold, University of Bristol24/6/2003 RefDB web user interface One drawback: need big laptop screen for browser!

6 Dave Newbold, University of Bristol24/6/2003 Data handling Dcache: pileup background serving Highly challenging from the hardware point of view e.g. need to serve up to ~200MByte/s to the RAL farm during high- lumi digitisation step; cheap disk servers don’t cut it due to ‘random seek’ access pattern Some large sites planning to use dcache for background library Each ‘sub-farm’ (workers on one network switch) has its own local disk pool – should provide a scaleable solution without killing network SRB: wide-area data management Subject of some debate in CMS (versus Grid tools) SRB is short-term solution, since nothing else works (at 100TB scale) – results from CMS EDG stress test, UK / US work in ‘03. Supported via UCSD / FNAL and RAL e-science centre RAL will host central MCAT server for PCP03 (thanks RAL). Generic Interface to RAL datastore in testing phase CMSUK responsible for roll-out and support for PCP03

7 Dave Newbold, University of Bristol24/6/2003 Grid integration Current status Toolset designed for distributed use… but not built on Grid middleware Reflection of the current scalability of many Grid components? EDG stress test taught us a lot about what is possible (now). Plan: Grid tools to be introduced and tested during PCP03 The goal: Grid data handling, monitoring, job scheduling for DC04 Some first targets: BOSS + RGMA for real-time monitoring replica management to supplement / replace SRB CMS ‘owned’ testbed (“LCG-0”) in place at several sites Yes, yet another testbed Based upon LCG pilot + VOMS + R-GMA + Ganglia Can test “CMSprod” product, integrating existing toolset with Grid middleware NB: many crucial ‘local’ issues unaddressed by Grid model – discuss!

8 Dave Newbold, University of Bristol24/6/2003 The worrying side effects of PCP


Download ppt "Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term."

Similar presentations


Ads by Google