A tool to enable CMS Distributed Analysis

Slides:



Advertisements
Similar presentations
1 14 Feb 2007 CMS Italia – Napoli A. Fanfani Univ. Bologna A. Fanfani University of Bologna MC Production System & DM catalogue.
Advertisements

1 CRAB Tutorial 19/02/2009 CERN F.Fanzago CRAB tutorial 19/02/2009 Marco Calloni CERN – Milano Bicocca Federica Fanzago INFN Padova.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Physicists's experience of the EGEE/LCG infrastructure usage for CMS jobs submission Natalia Ilina (ITEP Moscow) NEC’2007.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
The ILC And the Grid Andreas Gellrich DESY LCWS2007 DESY, Hamburg, Germany
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Summary Distributed Data Analysis Track F. Rademakers, S. Dasu, V. Innocente CHEP06 TIFR, Mumbai.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
DBS/DLS Data Management and Discovery Lee Lueking 3 December, 2006 Asia and EU-Grid Workshop 1-4 December, 2006.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1. 2 Overview Extremely short summary of the physical part of the conference (I am not a physicist, will try my best) Overview of the Grid session focused.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Accessing the VI-SEEM infrastructure
BOSS: the CMS interface for job summission, monitoring and bookkeeping
CRAB and local batch submission
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
N. De Filippis - LLR-Ecole Polytechnique
Grid Computing in CMS: Remote Analysis & MC Production
Job Application Monitoring (JAM)
Presentation transcript:

A tool to enable CMS Distributed Analysis CRAB A tool to enable CMS Distributed Analysis

CMS (fast) overview CMS will produce a large amount of data (events) ~2 PB/year (assumes startup luminosity 2x1033 cm-2 s-1) All events will be stored into files O(10^6) files/year Files will be grouped in Fileblocks O(10^3) Fileblocks/year Fileblocks will be grouped in Datasets O(10^3) Datasets (total after 10 years of CMS)

CMS Computing Model . . . . . Tier 0 Tier 1 Tier 2 Tier 3 Offline farm Online system Tier 0 Tier 1 Tier 2 Tier 3 Offline farm CERN Computer center . . Tier2 Center InstituteB InstituteA . . . workstation Italy Regional Center Fermilab France recorded data The CMS offline computing system is arranged in four Tiers and is geographically distributed

So what? Large amount of data to be analyzed Large community of physicists which wants to access data Many distributed sites where data will be stored

Help! WLCG, WorldWide LHC Computing Grid, that is a distributed computing environment Two main different flavours LCG/gLite in Europe, OSG in the US CRAB a python tool which helps the user to build, manage and control analysis jobs over grid environments

Typical user analysis workflow User writes his/her own analysis code Starting from CMS specific analysis software Builds executable and libraries He wants to apply the code to a given amount of events splitting the load over many jobs But generally he is allowed to access only local data He should write wrapper scripts and use a local batch system to exploit all the computing power Comfortable until data you’re looking for are sitting just by your side Then should submit all by hand and check the status and overall progress Finally should collect all output files and store them somewhere

CRAB main purposes Keeps easy to create large number of user analysis job Assume all jobs are the same except for some parameters (event number to be accessed, output file name…) Allows to access distributed data efficiently Hiding WLCG middleware complications. All interactions are transparent for the end user Manages job submission, tracking, monitoring and output harvesting User doesn’t have to take care about how to interact with sometimes complicated grid commands Leaves time to get a coffee …

Log Files/(Job output) CRAB workflow 1) Data location UI RefDb (DBS) DB CRAB 2) Job preparation 3) Job submission PubDb (DLS) 4) Job status DB 5) Job output retrieval RB LCG/OSG Local file catalog CE CE CE CE CE CE CE SE SE Data WN WN WN WN WN WN WN WN WN WN WN WN WN WN WN WN Job output Log Files/(Job output)

Main CRAB functionalities Data discovery Data are distributed so we need to know where data have been sent Job creation Both .sh (wrapper script for the real executable) and .jdl (a script which drives the real job towards the “grid”) User parameters passed via config file (executable name, output file names, specific executable parameters…) Job submission Scripts are ready to be sent to those sites which host data Boss, the job submitter and tracking tool, takes care of submitting jobs to the Resource Broker

Main CRAB functionalities (cont’d) CRAB monitors, via Boss, the status of the whole submission The user has to ask for jobs status When jobs finish CRAB retrieves all output Both standard output/error and relevant files produced by the analysis code Either the job copies the output on the SE Or it takes it back to the UI

Most accessed dataset since last July So far (so good?) CRAB is currently used to analyze data for the CMS Physics TDR (being written now…) Most accessed dataset since last July D.Spiga: CRAB Usage and jobs-flow Monitoring (DDA-252)

Some statistics CRAB jobs so far Most accessed sites since July 05 D.Spiga: CRAB Usage and jobs-flow Monitoring (DDA-252)

Crab usage during CMS SC3 CRAB has been extensively used to test CMS T1 sites partecipating SC3 The goal was to stress the computing facilities through the full analysis chain over all distributed data J. Andreeva: CMS/ARDA activity within the CMS distributed computing system (DDA-237)

Crab (and CMS comp) evolves CRAB needs to evolve to integrate with new CMS computing components New data discovery components (DBS, DLS):under testing New Event Data Model New computing paradigm Integration into a set of services which manage jobs on behalf of the user allowing him to interact only with “light” clients

Conclusions CRAB was born in April ’05 Lot of work and efforts have been made to make it robust, flexible and reliable Users appreciate the tool and are asking for further improvements Crab has been used to analyze data for CMS Physics TDR CRAB is used to continuously test CMS Tiers to prove the whole infrastructure robustness

Pointers CRAB web page CRAB monitoring ARDA monitoring for CRAB jobs http://cmsdoc.cern.ch/cms/ccs/wm/www/Crab/ Links to documentation, tutorials and mailing lists CRAB monitoring http://cmsgridweb.pg.infn.it/crab/crabmon.php ARDA monitoring for CRAB jobs http://www-asap.cern.ch/dashboard