Analysis tools in ALICE

Slides:



Advertisements
Similar presentations
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
Advertisements

The LEGO Train Framework
– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Trains status&tests M. Gheata. Train types run centrally FILTERING – Default trains for p-p and Pb-Pb, data and MC (4) Special configuration need to be.
QA train tests M. Gheata. Known problems QA tasks create too many histograms – Pushing resident memory limit above 3GB – Train gets kicked out by some.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Analysis infrastructure/framework A collection of questions, observations, suggestions concerning analysis infrastructure and framework Compiled by Marco.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Infrastructure for QA and automatic trending F. Bellini, M. Germain ALICE Offline Week, 19 th November 2014.
PWG3 Analysis: status, experience, requests Andrea Dainese on behalf of PWG3 ALICE Offline Week, CERN, Andrea Dainese 1.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
Analysis trains – Status & experience from operation Mihaela Gheata.
ALICE analysis framework References for Analysis Tools used to the ALICE simulated data.
ALICE Operations short summary ALICE Offline week June 15, 2012.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Gustavo Conesa ALICE offline week Gamma and Jet correlations analysis framework Short description, Status, HOW TO use and TO DO list 1/9.
1 Offline Week, October 28 th 2009 PWG3-Muon: Analysis Status From ESD to AOD:  inclusion of MC branch in the AOD  standard AOD creation for PDC09 files.
M. Gheata ALICE offline week, October Current train wagons GroupAOD producersWork on ESD input Work on AOD input PWG PWG31 (vertexing)2 (+
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Analysis train M.Gheata ALICE offline week, 17 March '09.
M. Gheata ALICE offline week, 24 June  A new analysis train macro was designed for production  /ANALYSIS/macros/AnalysisTrainNew.C /ANALYSIS/macros/AnalysisTrainNew.C.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Analysis framework plans A.Gheata Offline week 13 July 2011.
CCR e INFN-GRID Workshop, Palau, Andrea Dainese 1 L’analisi per l’esperimento ALICE Andrea Dainese INFN Padova Una persona attiva come utente.
AliRoot survey: Calibration P.Hristov 11/06/2013.
AAF tips and tricks Arsen Hayrapetyan Yerevan Physics Institute, Armenia.
V4-19-Release P. Hristov 11/10/ Not ready (27/09/10) #73618 Problems in the minimum bias PbPb MC production at 2.76 TeV #72642 EMCAL: Modifications.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Jan Fiete Grosse-Oetringhaus
Data Formats and Impact on Federated Access
Database Replication and Monitoring
ALICE experience with ROOT I/O
(on behalf of the POOL team)
Analysis trains – Status & experience from operation
ALICE Monitoring
DPG Activities DPG Session, ALICE Monthly Mini Week
Status of the Analysis Task Force
ALICE analysis preservation
PROOF – Parallel ROOT Facility
Existing Perl/Oracle Pipeline
Experience in ALICE – Analysis Framework and Train
Analysis Trains - Reloaded
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Data Preparation Group Summary of the Activities
Simulation use cases for T2 in ALICE
Analysis framework - status
Performance optimizations for distributed analysis in ALICE
Leigh Grundhoefer Indiana University
Support for ”interactive batch”
Framework for the acceptance and efficiency corrections
PROOF - Parallel ROOT Facility
ATLAS DC2 & Continuous production
Presentation transcript:

Analysis tools in ALICE Andrei Gheata Tier-1/2 Worshop 24-26 January 2012 - KIT

Analysis tools in ALICE Outline Data challenges for ALICE Data structures concerning analysis The analysis framework Analysis data flow & grid Analysis tools @GRID Perspectives A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE ALICE Data Challenges ALICE is recording and storing an unprecedented amount of data Few PByte/year taking into account replication We deal with events ranging from kBytes to ~Gbyte Multiplicities up to few tens of thousand tracks Very different access patterns, wide I/O ranges and TTL for jobs We have to consider that all this data needs to be processed several times Large throughput, intensive I/O, disk space We implemented a framework to handle all that and provided tools to make data handling and analysis easier for the users A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Main data types in ALICE Event Summary Data Pass1 – T0 Raw data AliRoot RECONSTRUCTION Conditions Calibration Alignment data Event Summary Data Pass2 – T1 Event Summary Data PassN – T1 OCDB (updated by pass0 -passN filtering AOD standard AliEn FC ESD Analysis Analysis Analysis Monte Carlo + extra ESD – run/event numbers, trigger word, primary vertex, arrays of tracks/vertices, detector info AOD standard – cleaned-up ESD’s, reducing the size by a factor of 5 Can be extended on user demand with extra information ESD and AOD inheriting from the same base class (keep same event interface) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE Analysis data flow Central analysis, user trains, PWG lego trains Reconstruction pass N AliESDs.root T0, T1 QA 1xT0, 2xT1 Analysis results Same job FILTERING AliAOD.root T0, T1, T2 Simulation AliESDs.root Kinematics.root T0, T1, T2 QA 3xT0/1/2 Same job FILTERING AliAOD.root A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

PHYSICS WORKING GROUPS PWG-PP Detector Performance Containing all analysis aiming to assess the quality of the data (both simulated and reconstructed) and of the general utilities. (9 activities) PWG-CF Correlations, fluctuations & bulk properties 4 activities PWG-DQ Dileptons and quarkonia 5 activities PWG-HF Heavy flavors 3 activities PWG-GA Photon and pion working group PWG-LF Light flavor and spectra PWG-JE Jets PWG-UD A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Towards organized analysis Limited resources compared to the number of users/demands We have a democratic computing model, but a small component of chaotic access is unavoidable To make analysis efficient we learned that certain rules and access patterns have to be implemented The ALICE analysis framework was designed to maximize the CPU usage for the same amount of I/O By grouping in the same job as many analysis tasks as possible within the memory and time slot available We provided an analysis framework that sits between the user analysis algorithms and the existing back-ends Common access to data and CPU for a “train” of analysis tasks Develops a well documented knowledge base and terminology within a uniform environment Optimizes CPU/IO usage and makes results reproducible Hides the complexity of the GRID and PROOF systems and balances usage of distributed resources A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Scheduled analysis producing AOD`s Acceptance and Efficiency Correction Services, tag selection TASK 1 TASK 2 TASK … TASK N AOD ESD/AOD Monte Carlo Truth A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

AliAnalysisTask framework Data-oriented model composed of independent tasks Able to process generic data types, but specialized for ROOT trees and ALICE-specific events (ESD, AOD, MC truth) Task connected via data containers (receive input data and publish results without direct dependencies) Methods to be implemented by tasks inspired by the TSelector model Functionality provided for single and multi event analysis Event loop steered by: TTree::Process() Tasks are owned by a manager class Hide computing scheme dependent code (same approach for LOCAL, PROOF and GRID modes) Steering event processing in the task graph (train model is the commonly used but just a trivial use case) General design, steering components not bound to ALICE software AliAnalysisTask INPUT SLOT 0 INPUT SLOT 1 OUTPUT SLOT 0 CONTAINER 0 CONTAINER 1 CONTAINER 2 A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

How analysis tasks work AliAnalysisManager TObjArray *fContainers TObjArray *fTasks AliAnalysisSelector Chain->Process() EVENT LOOP Top cont ESD chain Top level tasks and containers (“Train”) task1 task2 output1 output2 POST EVENT LOOP Task Fit task4 result result A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE Task life cycle Constructor - task object created and configured via API LocalInit() - optional initialization method called once Client CreateOutputObjects() - called once on each worker to book histograms and connect trees to files ConnectInputData() - called for each change of tree in the chain Exec() - main event processing method, called for each event Worker Terminate() - called once on the client after the merged results are back A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE The overall picture Virtual layers on top of the basic analysis objects (events, particles, tracks) AND data access via handlers allowing generic analysis on different data types AliAnalysisManager AliAODHandler (Output) AliAODEvent AliMCEventHandler AliVEventHandler AliMCEvent AliAnalysisTask UserANALYSISTask AliMCParticle AliAODtrack AliESDEvent (AliAODEvent) AliESDtrack AliESDInputHandler AliAODInputHandler AliVParticle AliVEvent Data AliAnalysisTasSE Tasks A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

A transparent approach MyAnalysis.C MyResults.root MY MACHINE StartAnalysis(“local”) PROOF SETUP ________________ TProof::Open(“user@lxb6046”) gProof->UploadPackage(“pack.par”) gProof->EnablePackage(“pack”) .... StartAnalysis(“proof”) + AliEn SETUP CREATE + CONFIGURE GRID PLUGIN StartAnalysis(“grid”) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE PROOF Analysis PROOF stands for Parallel ROOT Facility Using trivial event based parallelism Forces user implementation of a TSelector – derived class The framework comes with a special selector streaming a full analysis session to the PROOF cluster Completely transparent for users The same local analysis can be run in PROOF with minor changes Connect to PROOF cluster and upload/enable the user code StartAnalysis(“proof”) This mode is in production in different ALICE AF centers (CAF and SKAF the most well known) Very good scalability for reasonable cluster load Used when fast response is needed A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE GRID Analysis Submitting directly batch analysis in GRID is somehow more difficult because user has to: Create dataset(s), write fully customized JDL, write executable and validation scripts, copy all dependency files in AliEn FC, handle merging … An AliEn plugin tool for ALICE analysis framework was developed to: Keep user at ROOT prompt, allowing straightforward customization via a simple API Automate all interactions with AliEn, generate all needed files, submit the job and collect the results. Everything implemented using ROOT TGrid interface A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE No JDL, simple API A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

GRID analysis via plugin CLIENT ALIEN UI AM->StartAnalysis(“grid”) AM->StartAnalysis(“local”) TAlien AliEn grid plugin SetGridDataDir() AddRunNumber() SetAditionalLibs() SetOutputFile() WN WN AM Outputs MyAnalysis.C Analysis Manager Analysis Manager task1 task2 task3 taskN Outputs SE SE MyAnalysis.root File catalog Terminate() MyAnalysis.jdl AnalysisPlayer.C SE Dataset.xml submit WN Terminate() A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis plug-in evolved Due to the high success of the AliEn plug-in, this was extended to handle also local and PROOF analysis modes -> analysis plug-in An extra extension was recently implemented to allow generation and testing of analysis trains assembled via web forms (LEGO trains) Practically all ALICE users are now using this tool to deploy their analysis in GRID Physics working groups also in the near future A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Central analysis trains The framework is centrally used by alidaq/alitrain power users to deploy general interest analysis trains (reconstruction QA and ESD filtering) Running also common analysis such as centrality or event plane Regular runs of these train in GRID are scheduled together with reconstruction and via Savannah tasks Can accommodate custom requirements to some extent The testing procedure for these trains is quite extensive, to minimize the number of failing jobs due to bugs or hitting memory limits. In the near future all PWG production analysis will also run in such central trains Via the LEGO train framework (see next) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE Checking train leaks MB Tests are run on each new tag/release on dedicated local machines Each analysis wagon in the train is run separately, while its memory usage is sampled for each event Memory profile are fitted to extract the leaks Savannah bugs opened and assigned to responsibles, while the faulty code is temporary disabled from production. A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Testing individual output sizes Looping task directory in output file, reading all keys in memory Making the difference between the resident memory after and before reading histograms in memory Making sure that the output per task and the overall sum are below established limits A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE Performance checks CPU time I/O Valgrind checks on local datasets profiling the main cycle consumers Allows understanding the CPU/IO performed by the train A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

CPU performance per task ITS ImpParRes 28.2% QASym 5.7% ITS VertexESD 10.1% Friends 10.4% ITS align 5.6% ITS tracking 3.5% Collecting info on the train balancing (CPU wise) allows grouping better the analysis tasks to be run A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Support for lego trains We added support for a self-consistent description of an analysis task, including possible dependencies to other tasks (physics selection, centrality, …) Embedded in a custom object (AliAnalysisTaskCfg), this makes lego-like components that can be used stand-alone to assemble trains Combined with the appropriate web-enabled train management and job submission tools, it makes a complete system to handle central analysis trains Motivations: Automate validity and performance checks at individual task level Ease-up train assembling and submission Balance better central maintenance effort with PWG groups A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

LEGO support in the analysis plug-in plugin→CopyLocalDataset Stress test Benchmarks plugin→GenerateTest() plugin→GenerateTrain() TObjArray * AliAnalysisTaskCfg:: ExtractModulesFrom(“train.cfg”) plugin→AddModules(array) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Configuration & Testing Base line Phys Sel Centr Sel User A User B User C Train Configuration New class AliAnalysisTaskCfg Contains description of wagons (add task macro, libraries, dependencies) Testing Uses alientest04 machine Downloads AliEn packages (ROOT, AliRoot) Copies a part of the input data set to the local machine Runs tests per wagon Uses syswatch to extract mem/cpu information Tests also "base line" task which is empty A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Workflow User 1. adds wagons LPM MonALISA Train operator AliEn config 2. composes train 4. recompose after test test results 6. runs train Test machine train files 3. generates test files + executes test 5. generates train jdl + scripts A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Screenshot Handler configuration Wagon configuration Data configuration Testing and running status A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

Analysis tools in ALICE Summary ALICE developed a dedicated analysis framework to optimize efficiency and usage of GRID resources Core part very general, providing a template formalizing the different analysis phases Allowing to run many analysis algorithms in one go to alleviate the I/O problem Virtualizing the base data structures and the data access to allow development of generic analysis code Several efforts done to develop and improve the analysis tools for the users To hide the specificities of the processing infrastructures To allow monitoring the performance of the analysis code To encourage clustering and central running of analysis code The framework and data structures are now stable Moving to a new era in ALICE analysis, we are now focusing on stability of the analysis code and improvement of the tools allowing to automate checking and deploying it. A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE