RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP 2003 - San Diego, California 25 th of March 2003.

Slides:



Advertisements
Similar presentations
CMS Grid Batch Analysis Framework
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
RunJob in CMS Greg Graham Discussion Slides. RunJob in CMS RunJob is an Application Configuration and Job Creation Tool –RunJob uses metadata to abstract.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
A tool to enable CMS Distributed Analysis
August 98 1 Jürgen Knobloch ATLAS Software Workshop Ann Arbor ATLAS Computing Planning ATLAS Software Workshop August 1998 Jürgen Knobloch Slides also.
ArcGIS Workflow Manager An Introduction
L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Lecture 7 Interaction. Topics Implementing data flows An internet solution Transactions in MySQL 4-tier systems – business rule/presentation separation.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Geant4 Acceptance Suite for Key Observables CHEP06, T.I.F.R. Mumbai, February 2006 J. Apostolakis, I. MacLaren, J. Apostolakis, I. MacLaren, P. Mendez.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Distribution After Release Tool Natalia Ratnikova.
Nick Brook Current status Future Collaboration Plans Future UK plans.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
LHCb and DataGRID - the workplan for 2001 Eric van Herwijnen Wednesday, 28 march 2001.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
ATLAS Data Challenges US ATLAS Physics & Computing ANL October 30th 2001 Gilbert Poulard CERN EP-ATC.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
4 December 2003 Natalia Ratnikova Natalia Ratnikova 1 Software Packaging with DAR Natalia Ratnikova, Anzar Afaq, Greg Graham Fermilab Tony Wildish, Veronique.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
Muen Policy & Toolchain
U.S. ATLAS Grid Production Experience
CMS High Level Trigger Configuration Management
Moving the LHCb Monte Carlo production system to the GRID
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Artem Trunov and EKP team EPK – Uni Karlsruhe
Vandy Berten Luc Goossens Alvin Tan
ATLAS DC2 & Continuous production
Production client status
Presentation transcript:

RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP San Diego, California 25 th of March 2003

Véronique Lefébure - CHEP20032 Functionalities of RefDB 1.Management of Physics Production Requests 2.Distribution, Coordination and Progress Tracking of Production around the World: Production Assignments 3.Definition of Production Instructions for workflow-planner 4. Catalogue Publication of Real and Virtual Data  MySQL Database hosted at CERN  Web-server,.htaccess and Php scripts

Véronique Lefébure - CHEP20033 General Data Flow Web Interface : RefDB Request Physicist (many) Production Coordinator (one) Assignment Production Operator (many) Workflow Planner * RUN Summary CPU Mail box *IMPALA, McRunjob, CMSProd

Véronique Lefébure - CHEP20034 Statistics RefDB was designed and implemented in Nov., Dec. of 2001, and is used intensively by CMS since January 2002 –DAQ TDR Spring 2002 Production –2003 Production for preparation of 2004 Data Challenge ~ 20 Requestors > 20 Regional Centres, >40 Production sites, 70 Production Operators > 2000 Requests, Assignments > 300 Parameter Files, > 1300 Parameter Values ~ 24 MB of MySQL data

Véronique Lefébure - CHEP20035 Physics Production Request Definition of an Atomic Production Request (“Derivation”): 1.Executable (“Transformation”) 2.Input Physics Parameters 3.Input Data and Number of Events 4.Input Production Parameters Defined by the Physicist Defined by the Production Coordinator

Véronique Lefébure - CHEP20036 Physics Production Request 1. Executable Selected according to –Software Name –Software Version –Executable Name (eg: “ORCA ORCA_7_1_1 writeAllDigis”) Binaries, distributed with DAR* tool Based on tagged code (CVS, SCRAM) but private code may be supported (system for loading and archiving code) I/O File-Type constraints Monitoring Schema and Algorithm (can be used by BOSS**) * DAR: “Distribution After Release” ( ** BOSS: “Batch Object Submission System” (

Véronique Lefébure - CHEP20037 Tables:Software & Executable Software Name, Version, Dates SoftwareType Name SoftwareMap DARFile Name, Dates, Status DarFileElement Executable Name, Package ExecutableUse FileType Name MonitoringDefinition Schema, Algorithms ProductionStep Name, Shortname Distribution Web forms in out

Véronique Lefébure - CHEP20038 Tables: Monitoring MonitoringBlock Regular Expression, Piece of code MonitoringDefinition MonitoringProcess MonitoringProcessType MonitoringSchema pre run post ProductionStep MonitoringObject Name, Type, Description

Véronique Lefébure - CHEP20039 Physics Production Request 2. Physics Parameters Input Parameter File is made of  1 File Fragment(s): –Modularity: Detector parameters Beam-luminosity parameters, … Parameter File Fragment: list of (Name,Value) pairs for each parameter –Specialised scripts for file formatting –Uniqueness checked Single Parameter and its Value: –selected by the Physicist –or new parameter and/or new Value entered by him/her

Véronique Lefébure - CHEP Tables: Input Parameters Parameter Name, Description ParameterFile ListOfParameterValues, Location, URL ParameterType Name ParameterValue Value, Description SoftwareType Name ParameterMap Web forms

Véronique Lefébure - CHEP Physics Production Request 3. Input Data Number of Events to be produced or processed Input Data: –Selection of Logical Name of Input Data Collection (Real or Virtual Data) Type checked or –Definition of the Name of a new Dataset Uniqueness checked

Véronique Lefébure - CHEP Datasets and Collections Dataset –Physics Channel: primary interactions –Detector Configuration (geometry, material, magnetic field) Collection –For Particle tracking through detector Track reconstruction Physics reconstruction – one can change Software Software versions Parameters 1 Dataset - Many Collections (re-processing, beam luminosities, filtering, cloning and adding new objects, analysis ntuples, …) Production Cycle

Véronique Lefébure - CHEP Tables: Dataset & Collection Dataset Name, Description, Validity, Date, Cross-section, NbOfEvents DataType DatasetMap Collection DatasetName, CollectionName Status, NbOfEvents Geometry Software Executable ParameterFile Owner Name ProductionCycle Calo/Tk/MuDigis(on/off) Name PUCondition Input Collection

Véronique Lefébure - CHEP Tables: Pile-Up Conditions Dataset DataType DatasetMap Collection ParameterFile PUCondition Name “Minimum Bias”

Véronique Lefébure - CHEP Physics Production Request 4. Production Parameters Data Clustering Commit Interval Monitoring JobSplitting Placeholders in Parameter file: –for defining Output file names input/output run numbers, random number seeds, …. –overwritten by the php script that gives access the to the Parameter file the workflow planner, with values defined by RefDB  Job decomposition defined either –by granularity of input data (runs) or –by adequate Nb of Events per Run for a reasonable job CPU time and output data size

Véronique Lefébure - CHEP Physics Production Request: Procedure All steps via web-forms Pre-registered “Requestors” for each Physics Group:.htaccess permissions Creation of Parameter File(s) or selection of existing ones Request web-form starting from any point in the production chain: atomic or chain requests –Selection of Identity (Name, Group) –Selection of Software, Version, Executable –Selection of Parameter file(s) –Selection of Input Collection or Definition of Dataset Name + Description for new Physics Channels –Uniqueness of Request checked notification to Requestor, Group Coordinator, Production Coordinator

Véronique Lefébure - CHEP Production Assignments Assignment of (slices of) Requests to Regional Centres Assignment centrally created by the Production Coordinator –Minimize file transfers –Local physics interest –Farm performance and status, function of time –Local manpower availability, function of time –Priority of request RC = 1 farm or many farms or Grid –Assignments can be re-assigned by local production coordinator to local production sites Assignment Status updated quasi online –Job Monitoring: log file parsed, summary sent by –Estimation of local and global production rate AssignmentID = key for Production Instructions

Véronique Lefébure - CHEP Tables: Request & Assignment Assignment Dates (assignment,Start, End) Status, NbOfEvents (assigned, produced) NbOfEventsperRun, ChainAssignment MasterCopyLocation RegionalCentre Name. NickName, HostRC MotherRCID, Dates (start, end) Request Dates (request, delivery), NbOfEvents(requested, produced) NtupleOnly, Status Person PersonType PersonMap PhysicsGroup Dataset CollectionProductionCycle input output MonitoringDefinition

Véronique Lefébure - CHEP Production Instructions Production Instructions: –Executable Name, Software, Version –Parameter File URL –Job Splitting Instructions URL Table of Placeholders versus Values –Monitoring Instructions URL Parsing script for summary Parsing scripts and schema for BOSS (optional) –URL for Geometry File or META files, i.e. Detector Configuration (pre-created) –Dataset Name, Production Cycle NB: Workflow-planner knows which output files to be saved Chain Assignments: for running sequentially several executables in one job

Véronique Lefébure - CHEP Production Book-Keeping one Table per Dataset, one Row per Generation Run for each Production Cycle: –Run Number –Seeds –(Cross-section) –LFN –Status –Assignment ID –Number of input Events –Number of output Events Monitored values sent by at end of successful jobs

Véronique Lefébure - CHEP Data Catalogue RefDB Tables: –List of Catalogues Objectivity/DB, POOL disk or tapes –Catalogue – Publication Site Map –Catalogue – Collection Map Completeness checking Scripts for Dataset queries

Véronique Lefébure - CHEP Prospects Local installation of RefDB for “private” productions Extend I/O file-type checking to Software compatibility

Véronique Lefébure - CHEP Software Executable ExecutableUse FileType ProductionStep MonitoringBlock MonitoringDefinition MonitoringProcess MonitoringSchemaMonitoringObject Parameter ParameterFileParameterValue Dataset Collection Geometry ProductionCycle PUCondition Assignment RegionalCentre Request PersonPhysicsGroup