Presentation is loading. Please wait.

Presentation is loading. Please wait.

RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003.

Similar presentations


Presentation on theme: "RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003."— Presentation transcript:

1 RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003

2 Véronique Lefébure - CHEP20032 Functionalities of RefDB 1.Management of Physics Production Requests 2.Distribution, Coordination and Progress Tracking of Production around the World: Production Assignments 3.Definition of Production Instructions for workflow-planner 4. Catalogue Publication of Real and Virtual Data  MySQL Database hosted at CERN  Web-server,.htaccess and Php scripts

3 Véronique Lefébure - CHEP20033 General Data Flow Web Interface : http://cmsdoc.cern.ch/…./*.php RefDB Request Physicist (many) Production Coordinator (one) Assignment Production Operator (many) Workflow Planner * RUN Summary CPU E-mail Mail box *IMPALA, McRunjob, CMSProd

4 Véronique Lefébure - CHEP20034 Statistics RefDB was designed and implemented in Nov., Dec. of 2001, and is used intensively by CMS since January 2002 –DAQ TDR Spring 2002 Production –2003 Production for preparation of 2004 Data Challenge ~ 20 Requestors > 20 Regional Centres, >40 Production sites, 70 Production Operators > 2000 Requests, Assignments > 300 Parameter Files, > 1300 Parameter Values ~ 24 MB of MySQL data

5 Véronique Lefébure - CHEP20035 Physics Production Request Definition of an Atomic Production Request (“Derivation”): 1.Executable (“Transformation”) 2.Input Physics Parameters 3.Input Data and Number of Events 4.Input Production Parameters Defined by the Physicist Defined by the Production Coordinator

6 Véronique Lefébure - CHEP20036 Physics Production Request 1. Executable Selected according to –Software Name –Software Version –Executable Name (eg: “ORCA ORCA_7_1_1 writeAllDigis”) Binaries, distributed with DAR* tool Based on tagged code (CVS, SCRAM) but private code may be supported (system for loading and archiving code) I/O File-Type constraints Monitoring Schema and Algorithm (can be used by BOSS**) * DAR: “Distribution After Release” (http://computing.fnal.gov/cms/natasha/DAR) ** BOSS: “Batch Object Submission System” (http://www.bo.infn.it/cms/computing/BOSS)

7 Véronique Lefébure - CHEP20037 Tables:Software & Executable Software Name, Version, Dates SoftwareType Name SoftwareMap DARFile Name, Dates, Status DarFileElement Executable Name, Package ExecutableUse FileType Name MonitoringDefinition Schema, Algorithms ProductionStep Name, Shortname Distribution Web forms in out

8 Véronique Lefébure - CHEP20038 Tables: Monitoring MonitoringBlock Regular Expression, Piece of code MonitoringDefinition MonitoringProcess MonitoringProcessType MonitoringSchema pre run post ProductionStep MonitoringObject Name, Type, Description

9 Véronique Lefébure - CHEP20039 Physics Production Request 2. Physics Parameters Input Parameter File is made of  1 File Fragment(s): –Modularity: Detector parameters Beam-luminosity parameters, … Parameter File Fragment: list of (Name,Value) pairs for each parameter –Specialised scripts for file formatting –Uniqueness checked Single Parameter and its Value: –selected by the Physicist –or new parameter and/or new Value entered by him/her

10 Véronique Lefébure - CHEP200310 Tables: Input Parameters Parameter Name, Description ParameterFile ListOfParameterValues, Location, URL ParameterType Name ParameterValue Value, Description SoftwareType Name ParameterMap Web forms

11 Véronique Lefébure - CHEP200311 Physics Production Request 3. Input Data Number of Events to be produced or processed Input Data: –Selection of Logical Name of Input Data Collection (Real or Virtual Data) Type checked or –Definition of the Name of a new Dataset Uniqueness checked

12 Véronique Lefébure - CHEP200312 Datasets and Collections Dataset –Physics Channel: primary interactions –Detector Configuration (geometry, material, magnetic field) Collection –For Particle tracking through detector Track reconstruction Physics reconstruction – one can change Software Software versions Parameters 1 Dataset - Many Collections (re-processing, beam luminosities, filtering, cloning and adding new objects, analysis ntuples, …) Production Cycle

13 Véronique Lefébure - CHEP200313 Tables: Dataset & Collection Dataset Name, Description, Validity, Date, Cross-section, NbOfEvents DataType DatasetMap Collection DatasetName, CollectionName Status, NbOfEvents Geometry Software Executable ParameterFile Owner Name ProductionCycle Calo/Tk/MuDigis(on/off) Name PUCondition Input Collection

14 Véronique Lefébure - CHEP200314 Tables: Pile-Up Conditions Dataset DataType DatasetMap Collection ParameterFile PUCondition Name “Minimum Bias”

15 Véronique Lefébure - CHEP200315 Physics Production Request 4. Production Parameters Data Clustering Commit Interval Monitoring JobSplitting Placeholders in Parameter file: –for defining Output file names input/output run numbers, random number seeds, …. –overwritten by the php script that gives access the to the Parameter file the workflow planner, with values defined by RefDB  Job decomposition defined either –by granularity of input data (runs) or –by adequate Nb of Events per Run for a reasonable job CPU time and output data size

16 Véronique Lefébure - CHEP200316 Physics Production Request: Procedure All steps via web-forms Pre-registered “Requestors” for each Physics Group:.htaccess permissions Creation of Parameter File(s) or selection of existing ones Request web-form starting from any point in the production chain: atomic or chain requests –Selection of Identity (Name, Group) –Selection of Software, Version, Executable –Selection of Parameter file(s) –Selection of Input Collection or Definition of Dataset Name + Description for new Physics Channels –Uniqueness of Request checked Email notification to Requestor, Group Coordinator, Production Coordinator

17 Véronique Lefébure - CHEP200317 Production Assignments Assignment of (slices of) Requests to Regional Centres Assignment centrally created by the Production Coordinator –Minimize file transfers –Local physics interest –Farm performance and status, function of time –Local manpower availability, function of time –Priority of request RC = 1 farm or many farms or Grid –Assignments can be re-assigned by local production coordinator to local production sites Assignment Status updated quasi online –Job Monitoring: log file parsed, summary sent by email –Estimation of local and global production rate AssignmentID = key for Production Instructions

18 Véronique Lefébure - CHEP200318 Tables: Request & Assignment Assignment Dates (assignment,Start, End) Status, NbOfEvents (assigned, produced) NbOfEventsperRun, ChainAssignment MasterCopyLocation RegionalCentre Name. NickName, HostRC MotherRCID, Dates (start, end) Request Dates (request, delivery), NbOfEvents(requested, produced) NtupleOnly, Status Person PersonType PersonMap PhysicsGroup Dataset CollectionProductionCycle input output MonitoringDefinition

19 Véronique Lefébure - CHEP200319 Production Instructions Production Instructions: –Executable Name, Software, Version –Parameter File URL –Job Splitting Instructions URL Table of Placeholders versus Values –Monitoring Instructions URL Parsing script for email summary Parsing scripts and schema for BOSS (optional) –URL for Geometry File or META files, i.e. Detector Configuration (pre-created) –Dataset Name, Production Cycle NB: Workflow-planner knows which output files to be saved Chain Assignments: for running sequentially several executables in one job

20 Véronique Lefébure - CHEP200320 Production Book-Keeping one Table per Dataset, one Row per Generation Run for each Production Cycle: –Run Number –Seeds –(Cross-section) –LFN –Status –Assignment ID –Number of input Events –Number of output Events Monitored values sent by email at end of successful jobs

21 Véronique Lefébure - CHEP200321 Data Catalogue RefDB Tables: –List of Catalogues Objectivity/DB, POOL disk or tapes –Catalogue – Publication Site Map –Catalogue – Collection Map Completeness checking Scripts for Dataset queries

22 Véronique Lefébure - CHEP200322 Prospects Local installation of RefDB for “private” productions Extend I/O file-type checking to Software compatibility

23 Véronique Lefébure - CHEP200323 Software Executable ExecutableUse FileType ProductionStep MonitoringBlock MonitoringDefinition MonitoringProcess MonitoringSchemaMonitoringObject Parameter ParameterFileParameterValue Dataset Collection Geometry ProductionCycle PUCondition Assignment RegionalCentre Request PersonPhysicsGroup


Download ppt "RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003."

Similar presentations


Ads by Google