Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios.

Slides:



Advertisements
Similar presentations
Yingcai Xiao Chapter 4 The Visualization Pipeline.
Advertisements

M. D'Amato, M. Mennea, L.Silvestris INFN-Bari CMS Data Model 9-11 Aprile 2001, Catania I Workshop INFN Grid CMS DATA MODEL M. D’Amato, M. Mennea, L. Silvestris.
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Vincenzo Innocente, BluePrint RTAGNuts & Bolts1 Architecture Nuts & Bolts Vincenzo Innocente CMS.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Reconstruction and Analysis on Demand: A Success Story Christopher D. Jones Cornell University, USA.
Usage of the Python Programming Language in the CMS Experiment Rick Wilkinson (Caltech), Benedikt Hegner (CERN) On behalf of CMS Offline & Computing 1.
Computer Organization and Architecture
Architectural Styles SE 464 / ECE 452 / CS 446 Chang Hwan Peter Kim Based on slides prepared by Michał Antkiewicz June 24, 2006.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
CHEP `03 March 24, 2003 Vincenzo Innocente CERN/EP CMS Data Analysis: Present Status, Future Strategies Vincenzo.
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
Microsoft Visual Basic 2005 CHAPTER 1 Introduction to Visual Basic 2005 Programming.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Oracle PL/SQL Programming Steven Feuerstein All About the (Amazing) Function Result Cache of Oracle Database 11g.
Introduzione al Software di CMS N. Amapane. Nicola AmapaneTorino, Aprile Outline CMS Software projects The framework: overview Finding more.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
1 Extend is a simulation tool to create models quickly, with all the blocks you need and without even having to type an equation. You can use a series.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
WEP Presentation for non-IT Steps and roles in software development 2. Skills developed in 1 st year 3. What can do a student in 1 st internship.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Optimizing CMS Data Formats for Analysis Peerut Boonchokchuay August 11 th,
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
21 April, 1999 Vincenzo Innocente LHC++ Meeting1 Time-Ordered Persistent Collections Vincenzo Innocente CMS Collaboration see also contribution to RD45.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Reactive and Output-Only HKOI Training Team 2006 Liu Chi Man (cx) 11 Feb 2006.
Page 1© Crown copyright 2004 FLUME Marco Christoforou, Rupert Ford, Steve Mullerworth, Graham Riley, Allyn Treshansky, et. al. 19 October 2007.
Extraction Tools and Relational Database Schemas for CVS, SVN, and Bazaar Revision Control Systems.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Why A Software Review? Now have experience of real data and first major analysis results –What have we learned? –How should that change what we do next.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Claudio Grandi INFN-Bologna CHEP 2000Abstract B 029 Object Oriented simulation of the Level 1 Trigger system of a CMS muon chamber Claudio Grandi INFN-Bologna.
General requirements for BES III offline & EF selection software Weidong Li.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
RefDB: The Reference Database for CMS Monte Carlo Production Véronique Lefébure CERN & HIP CHEP San Diego, California 25 th of March 2003.
Geant4 User Workshop 15, 2002 Lassi A. Tuura, Northeastern University IGUANA Overview Lassi A. Tuura Northeastern University,
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
DZero Monte Carlo Production Ideas for CMS Greg Graham Fermilab CD/CMS 1/16/01 CMS Production Meeting.
Vincenzo Innocente, CHEP Beijing 9/01FrameAtWork1 Software Frameworks for HEP Data Analysis Vincenzo Innocente CERN/EP.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
7.1 Operating Systems. 7.2 A computer is a system composed of two major components: hardware and software. Computer hardware is the physical equipment.
VI/ CERN Dec 4 CMS Software Architecture vs Hybrid Store Vincenzo Innocente CMS Week CERN, Dec
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Vincenzo Innocente, CERN/EP Persistency: October 1999, CERN 1st Internal Review of CMS Software and Computing Why a Commercial ODMBS can suit CMS.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
AliRoot survey: Calibration P.Hristov 11/06/2013.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
OPERATING SYSTEMS CS 3502 Fall 2017
CMS High Level Trigger Configuration Management
Introduction to Visual Basic 2008 Programming
Vincenzo Innocente CERN/EP/CMC
Present by Andie Saizan, MCP
Laura Bright David Maier Portland State University
Creating Computer Programs
CMS Persistent Event Structure
Gordon Erlebacher Florida State University
CMS Software Architecture
Creating Computer Programs
Presentation transcript:

Vincenzo Innocente, CERN/EPUser Collections1 Grid Scenarios in CMS Vincenzo Innocente CERN/EP Simulation, Reconstruction and Analysis scenarios

Vincenzo Innocente, CERN/EPUser Collections2 CMS Data Analysis Model Detector Control Online Monitoring Environmental data store Request part of event Simulation store Data Quality Calibrations Group Analysis User Analysis on demand Request part of event Request part of event Store rec-Obj and calibrations Quasi-online Reconstruction Request part of event Store rec-Obj Persistent Object Store Manager Database Management System Event Filter Object Formatter PhysicsPaper

Vincenzo Innocente, CERN/EPUser Collections3 Forgive me the Obvious No simple solution to complex problems No Silver Bullet: Technology is a helper, not a solution by itself What counts is the global analysis efficiency (time to paper) Single job turn-around is just one component

Vincenzo Innocente, CERN/EPUser Collections4 Assumptions A CMS Generic “query” is too complex to be fully specified in a way different than several thousand lines of “code”. Including: Which files and which objects will be open (or not) Which objects will be created and stored and where A CMS job is composed by A CMS software configuration including the executable A set of user shared-libraries A configuration file defining User libraries to load Input event collection Output Dataset (including physical-clustering directives) Values for user-configurable parameters Heuristics do exist that allows to “map” the configuration file to file-sets

Vincenzo Innocente, CERN/EPUser Collections5 Some COBRA principles Developers and users are the same physicists Everything is expressed in code (C++) Unique development and running environment Minimal pre-requisites and pre-specifications Dependencies are implicit and expressed in the code COBRA discovers and self-adapts to the environment the configuration the data-product to materialize COBRA is able to navigate from MetaData to the Event-Data and back

Vincenzo Innocente, CERN/EPUser Collections6 Generic COBRA job Input Event collection is user-made Can be easily time ordered May contain events requiring different detector configurations With different materialized data products Reconstructed with different configurations COBRA discovers Materializable data-products From the existing configuration of the output dataset From the instantiated algorithms (i.e. loaded shared libs) Nothing prevents to load new libraries and instantiate new algorithms in the middle of the processing A data-product is materialized When an algorithms “dereference” it If an “equivalent version” is not present in the input event If the user explicitly asks for it Dependencies among data-products are know only a-posteriory

Vincenzo Innocente, CERN/EPUser Collections7 Grid Constraints Broker needs means to optimize resources in a competitive environment Compute cost (cpu, I/O bandwidth and volume) Identify input and output SE Identify most suitable CE Establish priorities Easy for bread-and-butter physics More difficult for discovery physics

Vincenzo Innocente, CERN/EPUser Collections8 Pre-emptive “Quasi-online” production Heuristics are usually easy Input event-collection is homogeneous Same Materialized data products Same configuration Output Dataset is homogeneous No fancy event selection Fixed, “stamped” configuration Predefined typed of data products to Materialized In steady state not difficult for production management (Human) to know and specify Which input data product will be used Which data product will be materialized and where

Vincenzo Innocente, CERN/EPUser Collections9 Reality strikes also production To go faster and keep up with input rate Some Data products will not be materialized Fancy selections will be put in place to fully reconstruct only some classes of events Configuration (and algorithms) will be modified almost online to get best resolution even from pass1 Input collection to reprocessing (pass 2) will usually not be as homogeneous as hoped…. Unless restart from scratch

Vincenzo Innocente, CERN/EPUser Collections10 Production of Analysis-Group AOD Input event collection Selected events from previous pre-emptive reconstruction Should contain (by definition) all required intermediate data-products Output DataSet Well defined configuration Only and all the group AOD Access to raw-data not required? Recalibration Fancy (slow) reconstruction algorithm in special cases

Vincenzo Innocente, CERN/EPUser Collections11 User end analysis Input event collection Group analysis AOD Output dataset Personal AOD Ntuple No access to raw-data No access to basic reconstructed objects BUT random full access to selected events for detailed (interactive) analysis and visualization purposes

Vincenzo Innocente, CERN/EPUser Collections12 A possible scenario Run a small (0.1/%) production test For each event in the full input collection Compare input configuration (and actual list of materialized data-products) with the output configuration from the test Produce a list of probable data-products to materialize and their cost For each data-product get the actual dependencies from production test Compile the list of input data-products Map each input data-product to a logical file Compile the list of required input files Rearrange and split input collection according to file location and cost of materialization Send jobs

Vincenzo Innocente, CERN/EPUser Collections13 Conclusion CMS can provide already today a COBRA application that Running 0.1% of the production Processing all Event Headers and metadata produces the list of most probable data-products to materialize and logical files to access for each event