A tool to enable CMS Distributed Analysis

A tool to enable CMS Distributed Analysis
CRAB A tool to enable CMS Distributed Analysis

CMS (fast) overview CMS will produce a large amount of data (events)
~2 PB/year (assumes startup luminosity 2x1033 cm-2 s-1) All events will be stored into files O(10^6) files/year Files will be grouped in Fileblocks O(10^3) Fileblocks/year Fileblocks will be grouped in Datasets O(10^3) Datasets (total after 10 years of CMS)

CMS Computing Model . . . . . Tier 0 Tier 1 Tier 2 Tier 3 Offline farm
Online system Tier 0 Tier 1 Tier 2 Tier 3 Offline farm CERN Computer center . . Tier2 Center InstituteB InstituteA . . . workstation Italy Regional Center Fermilab France recorded data The CMS offline computing system is arranged in four Tiers and is geographically distributed

So what? Large amount of data to be analyzed
Large community of physicists which wants to access data Many distributed sites where data will be stored

Help! WLCG, WorldWide LHC Computing Grid, that is a distributed computing environment Two main different flavours LCG/gLite in Europe, OSG in the US CRAB a python tool which helps the user to build, manage and control analysis jobs over grid environments

Typical user analysis workflow
User writes his/her own analysis code Starting from CMS specific analysis software Builds executable and libraries He wants to apply the code to a given amount of events splitting the load over many jobs But generally he is allowed to access only local data He should write wrapper scripts and use a local batch system to exploit all the computing power Comfortable until data you’re looking for are sitting just by your side Then should submit all by hand and check the status and overall progress Finally should collect all output files and store them somewhere

CRAB main purposes Keeps easy to create large number of user analysis job Assume all jobs are the same except for some parameters (event number to be accessed, output file name…) Allows to access distributed data efficiently Hiding WLCG middleware complications. All interactions are transparent for the end user Manages job submission, tracking, monitoring and output harvesting User doesn’t have to take care about how to interact with sometimes complicated grid commands Leaves time to get a coffee …

Log Files/(Job output)
CRAB workflow 1) Data location UI RefDb (DBS) DB CRAB 2) Job preparation 3) Job submission PubDb (DLS) 4) Job status DB 5) Job output retrieval RB LCG/OSG Local file catalog CE CE CE CE CE CE CE SE SE Data WN WN WN WN WN WN WN WN WN WN WN WN WN WN WN WN Job output Log Files/(Job output)

Main CRAB functionalities
Data discovery Data are distributed so we need to know where data have been sent Job creation Both .sh (wrapper script for the real executable) and .jdl (a script which drives the real job towards the “grid”) User parameters passed via config file (executable name, output file names, specific executable parameters…) Job submission Scripts are ready to be sent to those sites which host data Boss, the job submitter and tracking tool, takes care of submitting jobs to the Resource Broker

Main CRAB functionalities (cont’d)
CRAB monitors, via Boss, the status of the whole submission The user has to ask for jobs status When jobs finish CRAB retrieves all output Both standard output/error and relevant files produced by the analysis code Either the job copies the output on the SE Or it takes it back to the UI

Most accessed dataset since last July
So far (so good?) CRAB is currently used to analyze data for the CMS Physics TDR (being written now…) Most accessed dataset since last July D.Spiga: CRAB Usage and jobs-flow Monitoring (DDA-252)

Some statistics CRAB jobs so far Most accessed sites since July 05
D.Spiga: CRAB Usage and jobs-flow Monitoring (DDA-252)

Crab usage during CMS SC3
CRAB has been extensively used to test CMS T1 sites partecipating SC3 The goal was to stress the computing facilities through the full analysis chain over all distributed data J. Andreeva: CMS/ARDA activity within the CMS distributed computing system (DDA-237)

Crab (and CMS comp) evolves
CRAB needs to evolve to integrate with new CMS computing components New data discovery components (DBS, DLS):under testing New Event Data Model New computing paradigm Integration into a set of services which manage jobs on behalf of the user allowing him to interact only with “light” clients

Conclusions CRAB was born in April ’05
Lot of work and efforts have been made to make it robust, flexible and reliable Users appreciate the tool and are asking for further improvements Crab has been used to analyze data for CMS Physics TDR CRAB is used to continuously test CMS Tiers to prove the whole infrastructure robustness

Pointers CRAB web page CRAB monitoring ARDA monitoring for CRAB jobs
Links to documentation, tutorials and mailing lists CRAB monitoring ARDA monitoring for CRAB jobs

A tool to enable CMS Distributed Analysis

Similar presentations

Presentation on theme: "A tool to enable CMS Distributed Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A tool to enable CMS Distributed Analysis

Similar presentations

Presentation on theme: "A tool to enable CMS Distributed Analysis"— Presentation transcript:

Similar presentations

About project

Feedback