Presentation on theme: "Hugh Tallini Imperial College London CMS Grid Batch Analysis Framework HughTallini David Colling Barry Macevoy Stuart Wakefield."— Presentation transcript:
Hugh Tallini Imperial College London CMS Grid Batch Analysis Framework HughTallini David Colling Barry Macevoy Stuart Wakefield
Hugh Tallini Imperial College London Contents Project objectives Requirements analysis Design outline Implementation Future plans
Hugh Tallini Imperial College London Build an analysis framework to enable CMS physicists to perform end user analysis of DC04 data in batch mode on the Grid. Objective i.e. not interactive – submit once and leave to run. Deliver by October 03 i.e. run a private analysis algorithm over a set of reconstructed data
Hugh Tallini Imperial College London Approach –Build tactical solution using (as far as possible) tools that already exist. –Iterative development – deliver first prototype quickly covering major use cases, iterating design and implementation later to refine and extend the solution. –Build scalable architecture that will inevitably have to evolve as requirements change (or get better determined). –Use good software design practices (i.e. distinct analysis-design-implementation phases).
Hugh Tallini Imperial College London DC04 Analysis challenge DC04 Calibration challenge T0 T1 T2 T1 T2 Fake DAQ (CERN) DC04 T0 challenge SUSY Background DST HLT Filter ? CERN disk pool ~40 TByte (~10 days data) 50M events 75 Tbyte 1TByte/day 2 months PCP CERN Tape archive TAG/AOD (replica) TAG/AOD (replica) TAG/AOD ( kB/evt) Replica Conditions DB Replica Conditions DB Higgs DST Event streams Calibration sample Calibration Jobs MASTER Conditions DB 1 st pass Recon- struction 25Hz 2MB/evt 50MByte/s 4 Tbyte/day Archive storage CERN Tape archive Disk cache 25Hz 1MB/e vt raw 25Hz 0.5MB reco DST Higgs background Study (requests New events) Event server DC04 Tier-0 challenge Data distribution Calibration challenge Analysis challenge
Hugh Tallini Imperial College London Whats involved… –Data handling –Job splitting –Job submission –Job monitoring –Job information archiving
Hugh Tallini Imperial College London Requirements Analysis (1) Typical Use Case 1 –Private Single Step Analysis User wishes to run their analysis algorithm over 10Million MC events from a particular dataset. S/he compiles code and tests over a small sample of data on a local machine. Configures a JDL listing data sample to run over, ORCA libraries and ORCA executable to use, plus any steering files required. Submits to the framework. Uses AF to monitor jobs as they run. Uses AF to locate output data stored on Grid. Uses AF to keep record of analysis details, details of which are stored locally (privately).
Hugh Tallini Imperial College London Requirements Analysis (2) Typical Use Case 2 –Private Multi Step Analysis As Use Case 1 but user wishes input data for analysis task to be the output data from a previous analysis. Typical Use Case 3 –Group Multi Step Analysis Group of physicists wish to share input and output data Must also share details of how output data is created.
Hugh Tallini Imperial College London Requirements Analysis (3) Some other important requirements –Users analysis code should be identical whether running locally or on the Grid. –No constraint on size of data sample to run over. –Interface to Analysis Framework must be simple to use (users are physicists, not software developers) Single configuration file Single step submission
Hugh Tallini Imperial College London Batch Object Submission System Accepts job submission from users Stores info about job in a DB Builds a wrapper around the job (jobExecutor) Sends the wrapper to the local scheduler The wrapper sends to the DB info about the job boss submit boss query boss kill BOSS DB BOSS Local Scheduler farm node Wrapper
Hugh Tallini Imperial College London COMMON BOSS/AF DATABASE DATA INTERFACE MONITORING MODULE UI JOB SUBMISSION MODULE WN RB PHYSICS META-CATALOG GRID USER Schematic Architecture BOSS System Design
Hugh Tallini Imperial College London Implementation 3 Development Areas –Job preparation/submission module –Data handling interface –Monitoring module
Hugh Tallini Imperial College London Job preparation and submission –Split analysis task into multiple jobs –Prepare each job for submission to the Grid create JDL create job wrapper script –Archive details of each task and job for future reference by user and to enable resubmission (input/output files, software versions, etc).
Hugh Tallini Imperial College London Prototype Tested on CMS-LCG-0 –Simple shell scripts written to emulate framework –ORCA installed on CMS-LCG-0 WNs at Imperial –Simple ORCA analysis code written and compiled on a local machine. –Successfully submitted and ran analysis jobs to Grid. –Important for development of job wrapper script
Hugh Tallini Imperial College London +createTask(UserSpecFile) +createJobs(..) +submitJobs(..) +killJobs(..) TASK +set(UserSpecFile) +getXYZ() TASKSPEC -UserExecutable: string -InputFiles: vector -OutputFiles: vector -DataQuery: string -OrcaVersion : string -JDLRequirements: string +AddOutGUID(File) +AddOutSandbox(File) +AddInSandbox(File) +Submit() +Kill() +getXYZ() JOB -DataSelection: int -UniqID: string -Executable: string -OutGUIDs: vector -LocalInFiles: vector -OutSandbox: vector -InGUIDs: vector +write(filename) WRAPPER +getRuns(DataQuery):vector +getGUIDS(run):vector PHYS_CAT +write(filename) JDL * creates queries described by uses * Job Preparation and Submission Object Model BOSS +submitJob(..) +getJobStatus(..) 1 uses Composed of
Hugh Tallini Imperial College London Data Handling Input Data Physics catalogue (REFDB) contains metadata (user makes selection on) and data file GUIDs. Interface to REFDB exists (currently PHP) Job will be split to analyse one run at a time (ensures all data for a job is co-located). RB will send job to where data for the run is. Output Data Output file GUIDs and analysis details stored: –Private analysis use case: local mySQL database –Group analysis use case: centralised database (RefDB)
Hugh Tallini Imperial College London Monitoring boss submit boss query boss kill BOSS DB Local BOSS gateway GRID Scheduler boss registerScheduler gatekeeper farm node Can be used now using native MySQL calls – tested and used for CMS productions on EDG and CMS/LCG-0 More scalable transport mechanism being investigated using RGMA (IC/Brunel team – see Peter Hobsons talk). Using Boss on the GRID
Hugh Tallini Imperial College London –Requirements analysis –Gain experience running ORCA locally –Run ORCA on CMS-LCG-0 using simple prototype –Design –Implementation Build job submission module Build monitoring module Build data catalogue interface –Commence testing on LCG-1 –Iterate design/implementation as users give feedback new requirements. Implementation Roadmap 1-Oct Completed