3ObjectiveBuild an analysis framework to enable CMS physicists to perform end user analysis of DC04 data in batch mode on the Grid.i.e. run a private analysis algorithm over a set of reconstructed datai.e. not interactive – submit once and leave to run.Deliver by October ‘03
4ApproachBuild tactical solution using (as far as possible) tools that already exist.Iterative development – deliver first prototype quickly covering major use cases, iterating design and implementation later to refine and extend the solution.Build scalable architecture that will inevitably have to evolve as requirements change (or get better determined).Use good software design practices (i.e. distinct analysis-design-implementation phases).
6What’s involved… Data handling Job splitting Job submission Job monitoringJob information archiving
7Requirements Analysis (1) Typical Use Case 1Private Single Step AnalysisUser wishes to run their analysis algorithm over 10Million MC events from a particular dataset.S/he compiles code and tests over a small sample of data on a local machine.Configures a JDL listing data sample to run over, ORCA libraries and ORCA executable to use, plus any steering files required.Submits to the framework.Uses AF to monitor jobs as they run.Uses AF to locate output data stored on Grid.Uses AF to keep record of analysis details, details of which are stored locally (privately).
8Requirements Analysis (2) Typical Use Case 2Private Multi Step AnalysisAs Use Case 1 but user wishes input data for analysis task to be the output data from a previous analysis.Typical Use Case 3Group Multi Step AnalysisGroup of physicists wish to share input and output dataMust also share details of how output data is created.
9Requirements Analysis (3) Some other important requirementsUser’s analysis code should be identical whether running locally or on the Grid.No constraint on size of data sample to run over.Interface to Analysis Framework must be simple to use (users are physicists, not software developers)Single configuration fileSingle step submission
10Batch Object Submission System WrapperBOSSLocalSchedulerfarm nodeboss submitboss queryboss killBOSSDBfarm nodeAccepts job submission from usersStores info about job in a DBBuilds a wrapper around the job (jobExecutor)Sends the wrapper to the local schedulerThe wrapper sends to the DB info about the job
12Implementation 3 Development Areas Job preparation/submission module Data handling interfaceMonitoring module
13Job preparation and submission Split analysis task into multiple jobsPrepare each job for submission to the Gridcreate JDLcreate job wrapper scriptArchive details of each task and job for future reference by user and to enable resubmission (input/output files, software versions, etc).
14Prototype Tested on CMS-LCG-0 Simple shell scripts written to emulate frameworkORCA installed on CMS-LCG-0 WNs at ImperialSimple ORCA analysis code written and compiled on a local machine.Successfully submitted and ran analysis jobs to Grid.Important for development of job wrapper script
16Data Handling Input Data Output Data Physics catalogue (REFDB) contains metadata (user makes selection on) and data file GUIDs.Interface to REFDB exists (currently PHP)Job will be split to analyse one run at a time (ensures all data for a job is co-located).RB will send job to where data for the run is.Output DataOutput file GUIDs and analysis details stored:Private analysis use case: local mySQL databaseGroup analysis use case: centralised database (RefDB)
17Monitoring Using Boss on the GRID BOSS DB Local BOSSgatewaygatekeeperfarm nodeBOSSDBfarm nodeboss submitboss queryboss killGRIDSchedulergatekeeperboss registerSchedulerCan be used now using native MySQL calls – tested and used for CMS productions on EDG and CMS/LCG-0More scalable transport mechanism being investigated using RGMA (IC/Brunel team – see Peter Hobson’s talk).
18Implementation Roadmap Requirements analysisGain experience running ORCA locallyRun ORCA on CMS-LCG-0 using simple prototypeDesignImplementationBuild job submission moduleBuild monitoring moduleBuild data catalogue interfaceCommence testing on LCG-1Iterate design/implementation as users give feedback new requirements.Completed1-Oct