Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.

Slides:



Advertisements
Similar presentations
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Advertisements

The LEGO Train Framework
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Chiara Zampolli in collaboration with C. Cheshkov, A. Dainese ALICE Offline Week Feb 2009C. Zampolli 1.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Experience with analysis of TPC data Marian Ivanov.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.
V4-20-Release P. Hristov 08/08/ Changes: v4-20-Rev-38 #85151 Memory leak in T0 DQM agent. From rev #85276 AliGRPPreprocessor.cxx: Port to.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
DQM status report Y. Foka (GSI) Offline week from pp to PbPb.
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
ALICE Offline Week, CERN, Andrea Dainese 1 Primary vertex with TPC-only tracks Andrea Dainese INFN Legnaro Motivation: TPC stand-alone analyses.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Gustavo Conesa ALICE offline week Gamma and Jet correlations analysis framework Short description, Status, HOW TO use and TO DO list 1/9.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
S.Linev: Go4 - J.Adamczewski, H.G.Essel, S.Linev ROOT 2005 New development in Go4.
Data processing Offline review Feb 2, Productions, tools and results Three basic types of processing RAW MC Trains/AODs I will go through these.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
PWG3 analysis (barrel)
Physics selection: online changes & QA M Floris, JF Grosse-Oetringhaus Weekly offline meeting 30/01/
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Analysis train M.Gheata ALICE offline week, 17 March '09.
M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.
V5-01-Release & v5-02-Release Peter Hristov 20/02/2012.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
V4-20-Release P. Hristov 21/02/ Changes: v4-20-Rev-14 #78385 Please port AliTPCPreprocessor.cxx Rev to Release #78344 Request to port changes.
Analysis framework plans A.Gheata Offline week 13 July 2011.
Some topics for discussion 31/03/2016 P. Hristov 1.
AliRoot survey: Calibration P.Hristov 11/06/2013.
AAF tips and tricks Arsen Hayrapetyan Yerevan Physics Institute, Armenia.
MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.
V4-19-Release P. Hristov 11/10/ Not ready (27/09/10) #73618 Problems in the minimum bias PbPb MC production at 2.76 TeV #72642 EMCAL: Modifications.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
CALIBRATION: PREPARATION FOR RUN2 ALICE Offline Week, 25 June 2014 C. Zampolli.
1 14th June 2012 CPass0/CPass1 status and development.
Status of the analysis by the analysis team!. Join the stimulus Last meeting we were in a deep crisis… and we decided to sign a stimulus plan! We organized.
V4-18-Release P. Hristov 21/06/2010.
Jan Fiete Grosse-Oetringhaus
Calibration: preparation for pa
Analysis trains – Status & experience from operation
PWG2 Analysis status Adam Kisiel, CERN for the PWG2 group.
Analysis tools in ALICE
Developments of the PWG3 muon analysis code
Production status – christmas processing
ALICE analysis preservation
v4-19-Release & v4-20-Release
fields of possible improvement
v4-18-Release: really the last revision!
New calibration strategy – follow-up from the production review
Analysis Trains - Reloaded
Simulation use cases for T2 in ALICE
Analysis framework - status
Performance optimizations for distributed analysis in ALICE
Status and plans for bookkeeping system and production tools
Presentation transcript:

Trains status&tests M. Gheata

Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be applied for the different cases (pre-configure stage) Std AOD, muons and vertexing Running automatically after reco – On-demand filtering using special tender settings Generally for pass2 Std AOD, muons and vertexing – Quite stable code, to be moved in Release – Vertexing is cuts dependent (needs many passes), to be done on demand as post-filtering from now on (?) – Memory stable (~2GB) but CPU and disk consuming – A queue of trains according priorities for non default trains (?) Discussed and approved weekly PWG trains – Just started to be run centrally, finished tests – Probably many trains per PWG needed, managed by PWG

PWG1 QA train 20 different modules, 1 to come (FMD) – Heavy memory requirements (above 3GB) Cleanup needed, see next slides – Train gets kicked out by some sites Merging is heavy for runs with too many chunks – Memory in the last merging step gets too big – Process only 10% of chunks Some new trigger classes used recently (CINT7, EMC7, …) – QA train was set for kMB, now we need to run it with kAnyInt plus other triggers All tasks requesting a special trigger should filter from the trigger mask and fill separate histograms (within the memory limit) Requirements described in Savannah bug #84007 To be run with the Release

Testing QA wagons Prepare checking scripts to be run on a test machine – Testing each QA wagon independently on a local dataset – Checking is based on syswatch.root – Future support in the alien plug-in for producing the scripts Simple check and fit of memory profiles extracted by the analysis manager – Doing a linear fit in a specified range (stable regions) and extracting memory value + leak – Subtracting the baseline given by EventSelection task + OCDB connect – Not easy to set-up automatically due to memory jumps

Most recent checks Versions tested on pcaliense05 – Root v e – AliRoot v AN Run tested: from LHC11c – 120K events Wagons that did not finished the test: – ZDC (segfault) – savannah # wagons checked in parallel (few hours)

Resident memory for the train Train run on 50 local files (~ twice the size of QA jobs) Some leaks make the memory go above 3 GB

Resident memory (1) BASELINE (PhysSel + CDBconnect) SDDdEdx SDDTPC (!)SPD VERTEXVZEROQAsym

Resident memory (2) TRDITSITSsaTracksITSalign CALOMUONtrigImpParRes MUON tracking

Resident memory(3) TOF HMPIDT0 ZDC segfault

Train leak plot Fit parameters (p0 + p1*event) for all memory plots p0 should subtract the baseline, not plotted here p1 (slope) plotted per 100K events (2x what the train usually processes per job) Some leaks visible, at a level that does not affect the job (except TPC - more than 200 MB per 100K events) MB

Output size in memory Looping task directory in output file, reading all keys in memory Making the difference between the resident memory after and before reading histograms in memory Sum is ~1.1 GB

Performance checks CPU time I/O Local dataset on a test machine, processing took 2 days for ~100K events valgrind –tool=callgrind aliroot –b –q QA.C 61.5 % CPU, 38.5 % I/O (ratio is different if accessing remote files)

CPU performance per task ITS ImpParRes 28.2% ITS VertexESD 10.1% QASym 5.7% Friends 10.4% ITS align 5.6% ITS tracking 3.5% ITS tasks use ~50% of the CPU time

Trains management Trains can be now prepared and checked using the alien plug-in Submission and management now done mostly by Costin and Latchezar – We need a simple system that can be used by train administrators Start, stop, cleanup, input datasets – Defining a queue of central non-default trains

Conclusions Currently implemented checking macros for QA train Many central trains and more to come, need some automated assembly and checking procedure Make train administration easier and allow more people using it