Presentation is loading. Please wait.

Presentation is loading. Please wait.

Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.

Similar presentations


Presentation on theme: "Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and."— Presentation transcript:

1 Costin Grigoras ALICE Offline

2 In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and MC production and also (quite successfully) for end user analysis To help the Grid users and administrators, many applications have been developed in the early years of the Grid. ALICE has made an effort to consolidate all of these in a coherent set of monitoring and control tools The following presentation is a quick overview of some of them 2010-10-19Consolidation of Grid operations2

3 Speed is of the essence – the RAW reconstruction follows promptly the data taking, allowing for immediate QA and physics analysis LPM (Lightweight Production Manager) Several triggers to assure RAW and conditions data integrity Fully automatic Does also replication of RAW to T1 Manages not only Pass1, but all central RAW and MC productions and the organized analysis trains Up to now, 360 production cycles have been handled by LPM 2010-10-19Consolidation of Grid operations3

4 Data processing jobs which must be launched only when a previous process has successfully completed For example, the QA tasks are ‘cascaded’ after Pass1 RAW reco. is completed Same for AOD production, data merging The depth of cascading is unlimited Speeds up considerably the data production! 2010-10-19Consolidation of Grid operations4

5 5 Reco. 1job/chunk Reco. 1job/chunk QA 1job/chunk QA 1job/chunk QA merging QA merging Delete partial output Delete partial output Merge ROOT tags AOD 1job/chunk AOD 1job/chunk AOD Merging AOD Merging Delete partial output Resubmit error jobs Same mechanism is used also for MonteCarlo productions and analysis trains on MC and RAW data When complete, start in parallel

6 Parallel productions are possible With different weights / priorities Branches can be temporarily disabled Tasks can be simple JDLs or more complex code that prepares the execution (creating collections, checking conditions) 2010-10-19Consolidation of Grid operations6

7 Monitoring data (MonALISA) is used to trigger the LPM activity New jobs are submitted when the number of waiting tasks pass below a threshold Pre-staging of data from tape is triggered before the reconstruction jobs are submitted Running jobs are tracked individually for resources usage Automatic alerts in case of unreasonable disk/memory/CPU consumption, jobs can be terminated… 2010-10-19Consolidation of Grid operations7

8 Trigger now at 2GB RSS Mail sent to both admins and the user 2010-10-19Consolidation of Grid operations8

9 A client-to-storage metric allows the automatic discovery of the closest (working) storage elements from every job Based on the network topology information collected by MonALISA Continuous functional tests of storages SE occupancy status Users specify the number of output replicas and type of storage (disk, custodial), but not the SEs 2010-10-19Consolidation of Grid operations9

10 France Italy Nordic Countries Russia USA 2010-10-19Consolidation of Grid operations10

11 Web-based access to the AliEn catalogue (with certificate authentication) Insert your favorite plugin (ROOT) here

12 Viewer with syntax highlight and catalogue links SE discovery syntax is highlighted 2010-10-19Consolidation of Grid operations12

13 Full job tracking, with submission and resubmission capabilities 2010-10-19Consolidation of Grid operations13

14 Detailed view of a particular masterjob All trace logs can be accessed online 14

15 The Grid is in a full production mode since almost one year Its operation is very successful, providing millions of CPU days and PBs of storage To efficiently use there resources, consolidated tools 2010-10-19Consolidation of Grid operations15

16 http://alimonitor.cern.ch/


Download ppt "Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and."

Similar presentations


Ads by Google