Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis tools in ALICE

Similar presentations


Presentation on theme: "Analysis tools in ALICE"— Presentation transcript:

1 Analysis tools in ALICE
Andrei Gheata Tier-1/2 Worshop 24-26 January KIT

2 Analysis tools in ALICE
Outline Data challenges for ALICE Data structures concerning analysis The analysis framework Analysis data flow & grid Analysis Perspectives A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

3 Analysis tools in ALICE
ALICE Data Challenges ALICE is recording and storing an unprecedented amount of data Few PByte/year taking into account replication We deal with events ranging from kBytes to ~Gbyte Multiplicities up to few tens of thousand tracks Very different access patterns, wide I/O ranges and TTL for jobs We have to consider that all this data needs to be processed several times Large throughput, intensive I/O, disk space We implemented a framework to handle all that and provided tools to make data handling and analysis easier for the users A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

4 Main data types in ALICE
Event Summary Data Pass1 – T0 Raw data AliRoot RECONSTRUCTION Conditions Calibration Alignment data Event Summary Data Pass2 – T1 Event Summary Data PassN – T1 OCDB (updated by pass0 -passN filtering AOD standard AliEn FC ESD Analysis Analysis Analysis Monte Carlo + extra ESD – run/event numbers, trigger word, primary vertex, arrays of tracks/vertices, detector info AOD standard – cleaned-up ESD’s, reducing the size by a factor of 5 Can be extended on user demand with extra information ESD and AOD inheriting from the same base class (keep same event interface) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

5 Analysis tools in ALICE
Analysis data flow Central analysis, user trains, PWG lego trains Reconstruction pass N AliESDs.root T0, T1 QA 1xT0, 2xT1 Analysis results Same job FILTERING AliAOD.root T0, T1, T2 Simulation AliESDs.root Kinematics.root T0, T1, T2 QA 3xT0/1/2 Same job FILTERING AliAOD.root A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

6 PHYSICS WORKING GROUPS
PWG-PP Detector Performance Containing all analysis aiming to assess the quality of the data (both simulated and reconstructed) and of the general utilities. (9 activities) PWG-CF Correlations, fluctuations & bulk properties 4 activities PWG-DQ Dileptons and quarkonia 5 activities PWG-HF Heavy flavors 3 activities PWG-GA Photon and pion working group PWG-LF Light flavor and spectra PWG-JE Jets PWG-UD A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

7 Towards organized analysis
Limited resources compared to the number of users/demands We have a democratic computing model, but a small component of chaotic access is unavoidable To make analysis efficient we learned that certain rules and access patterns have to be implemented The ALICE analysis framework was designed to maximize the CPU usage for the same amount of I/O By grouping in the same job as many analysis tasks as possible within the memory and time slot available We provided an analysis framework that sits between the user analysis algorithms and the existing back-ends Common access to data and CPU for a “train” of analysis tasks Develops a well documented knowledge base and terminology within a uniform environment Optimizes CPU/IO usage and makes results reproducible Hides the complexity of the GRID and PROOF systems and balances usage of distributed resources A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

8 Scheduled analysis producing AOD`s
Acceptance and Efficiency Correction Services, tag selection TASK 1 TASK 2 TASK … TASK N AOD ESD/AOD Monte Carlo Truth A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

9 AliAnalysisTask framework
Data-oriented model composed of independent tasks Able to process generic data types, but specialized for ROOT trees and ALICE-specific events (ESD, AOD, MC truth) Task connected via data containers (receive input data and publish results without direct dependencies) Methods to be implemented by tasks inspired by the TSelector model Functionality provided for single and multi event analysis Event loop steered by: TTree::Process() Tasks are owned by a manager class Hide computing scheme dependent code (same approach for LOCAL, PROOF and GRID modes) Steering event processing in the task graph (train model is the commonly used but just a trivial use case) General design, steering components not bound to ALICE software AliAnalysisTask INPUT SLOT 0 INPUT SLOT 1 OUTPUT SLOT 0 CONTAINER 0 CONTAINER 1 CONTAINER 2 A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

10 How analysis tasks work
AliAnalysisManager TObjArray *fContainers TObjArray *fTasks AliAnalysisSelector Chain->Process() EVENT LOOP Top cont ESD chain Top level tasks and containers (“Train”) task1 task2 output1 output2 POST EVENT LOOP Task Fit task4 result result A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

11 Analysis tools in ALICE
Task life cycle Constructor task object created and configured via API LocalInit() optional initialization method called once Client CreateOutputObjects() - called once on each worker to book histograms and connect trees to files ConnectInputData() called for each change of tree in the chain Exec() main event processing method, called for each event Worker Terminate() called once on the client after the merged results are back A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

12 Analysis tools in ALICE
The overall picture Virtual layers on top of the basic analysis objects (events, particles, tracks) AND data access via handlers allowing generic analysis on different data types AliAnalysisManager AliAODHandler (Output) AliAODEvent AliMCEventHandler AliVEventHandler AliMCEvent AliAnalysisTask UserANALYSISTask AliMCParticle AliAODtrack AliESDEvent (AliAODEvent) AliESDtrack AliESDInputHandler AliAODInputHandler AliVParticle AliVEvent Data AliAnalysisTasSE Tasks A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

13 A transparent approach
MyAnalysis.C MyResults.root MY MACHINE StartAnalysis(“local”) PROOF SETUP ________________ gProof->UploadPackage(“pack.par”) gProof->EnablePackage(“pack”) .... StartAnalysis(“proof”) + AliEn SETUP CREATE + CONFIGURE GRID PLUGIN StartAnalysis(“grid”) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

14 Analysis tools in ALICE
PROOF Analysis PROOF stands for Parallel ROOT Facility Using trivial event based parallelism Forces user implementation of a TSelector – derived class The framework comes with a special selector streaming a full analysis session to the PROOF cluster Completely transparent for users The same local analysis can be run in PROOF with minor changes Connect to PROOF cluster and upload/enable the user code StartAnalysis(“proof”) This mode is in production in different ALICE AF centers (CAF and SKAF the most well known) Very good scalability for reasonable cluster load Used when fast response is needed A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

15 Analysis tools in ALICE
GRID Analysis Submitting directly batch analysis in GRID is somehow more difficult because user has to: Create dataset(s), write fully customized JDL, write executable and validation scripts, copy all dependency files in AliEn FC, handle merging … An AliEn plugin tool for ALICE analysis framework was developed to: Keep user at ROOT prompt, allowing straightforward customization via a simple API Automate all interactions with AliEn, generate all needed files, submit the job and collect the results. Everything implemented using ROOT TGrid interface A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

16 Analysis tools in ALICE
No JDL, simple API A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

17 GRID analysis via plugin
CLIENT ALIEN UI AM->StartAnalysis(“grid”) AM->StartAnalysis(“local”) TAlien AliEn grid plugin SetGridDataDir() AddRunNumber() SetAditionalLibs() SetOutputFile() WN WN AM Outputs MyAnalysis.C Analysis Manager Analysis Manager task1 task2 task3 taskN Outputs SE SE MyAnalysis.root File catalog Terminate() MyAnalysis.jdl AnalysisPlayer.C SE Dataset.xml submit WN Terminate() A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

18 Analysis plug-in evolved
Due to the high success of the AliEn plug-in, this was extended to handle also local and PROOF analysis modes -> analysis plug-in An extra extension was recently implemented to allow generation and testing of analysis trains assembled via web forms (LEGO trains) Practically all ALICE users are now using this tool to deploy their analysis in GRID Physics working groups also in the near future A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

19 Central analysis trains
The framework is centrally used by alidaq/alitrain power users to deploy general interest analysis trains (reconstruction QA and ESD filtering) Running also common analysis such as centrality or event plane Regular runs of these train in GRID are scheduled together with reconstruction and via Savannah tasks Can accommodate custom requirements to some extent The testing procedure for these trains is quite extensive, to minimize the number of failing jobs due to bugs or hitting memory limits. In the near future all PWG production analysis will also run in such central trains Via the LEGO train framework (see next) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

20 Analysis tools in ALICE
Checking train leaks MB Tests are run on each new tag/release on dedicated local machines Each analysis wagon in the train is run separately, while its memory usage is sampled for each event Memory profile are fitted to extract the leaks Savannah bugs opened and assigned to responsibles, while the faulty code is temporary disabled from production. A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

21 Testing individual output sizes
Looping task directory in output file, reading all keys in memory Making the difference between the resident memory after and before reading histograms in memory Making sure that the output per task and the overall sum are below established limits A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

22 Analysis tools in ALICE
Performance checks CPU time I/O Valgrind checks on local datasets profiling the main cycle consumers Allows understanding the CPU/IO performed by the train A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

23 CPU performance per task
ITS ImpParRes 28.2% QASym 5.7% ITS VertexESD 10.1% Friends 10.4% ITS align 5.6% ITS tracking 3.5% Collecting info on the train balancing (CPU wise) allows grouping better the analysis tasks to be run A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

24 Support for lego trains
We added support for a self-consistent description of an analysis task, including possible dependencies to other tasks (physics selection, centrality, …) Embedded in a custom object (AliAnalysisTaskCfg), this makes lego-like components that can be used stand-alone to assemble trains Combined with the appropriate web-enabled train management and job submission tools, it makes a complete system to handle central analysis trains Motivations: Automate validity and performance checks at individual task level Ease-up train assembling and submission Balance better central maintenance effort with PWG groups A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

25 LEGO support in the analysis plug-in
plugin→CopyLocalDataset Stress test Benchmarks plugin→GenerateTest() plugin→GenerateTrain() TObjArray * AliAnalysisTaskCfg:: ExtractModulesFrom(“train.cfg”) plugin→AddModules(array) A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

26 Configuration & Testing
Base line Phys Sel Centr Sel User A User B User C Train Configuration New class AliAnalysisTaskCfg Contains description of wagons (add task macro, libraries, dependencies) Testing Uses alientest04 machine Downloads AliEn packages (ROOT, AliRoot) Copies a part of the input data set to the local machine Runs tests per wagon Uses syswatch to extract mem/cpu information Tests also "base line" task which is empty A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

27 Workflow User 1. adds wagons LPM MonALISA Train operator AliEn config
2. composes train 4. recompose after test test results 6. runs train Test machine train files 3. generates test files + executes test 5. generates train jdl + scripts A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

28 Screenshot Handler configuration Wagon configuration
Data configuration Testing and running status A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE

29 Analysis tools in ALICE
Summary ALICE developed a dedicated analysis framework to optimize efficiency and usage of GRID resources Core part very general, providing a template formalizing the different analysis phases Allowing to run many analysis algorithms in one go to alleviate the I/O problem Virtualizing the base data structures and the data access to allow development of generic analysis code Several efforts done to develop and improve the analysis tools for the users To hide the specificities of the processing infrastructures To allow monitoring the performance of the analysis code To encourage clustering and central running of analysis code The framework and data structures are now stable Moving to a new era in ALICE analysis, we are now focusing on stability of the analysis code and improvement of the tools allowing to automate checking and deploying it. A.Gheata, Tier-1/2 Workshop Analysis tools in ALICE


Download ppt "Analysis tools in ALICE"

Similar presentations


Ads by Google