PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.

Slides:



Advertisements
Similar presentations
ALICE Offline Tutorial Markus Oldenburg – CERN May 15, 2007 – University of Sao Paulo.
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
PROOF and AnT in PHOBOS Kristjan Gulbrandsen March 25, 2004 Collaboration Meeting.
The LEGO Train Framework
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Status of CAF and SKAF Arsen Hayrapetyan, Yerevan Physics Institute 1 November 18, 2010 ALICE Offline Week.
WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
MapReduce.
PROOF - Parallel ROOT Facility Kilian Schwarz Robert Manteufel Carsten Preuß GSI Bring the KB to the PB not the PB to the KB.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
1 Part III: PROOF Jan Fiete Grosse-Oetringhaus – CERN Andrei Gheata - CERN V3.2 –
1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Andrei Gheata, Mihaela Gheata, Andreas Morsch ALICE offline week, 5-9 July 2010.
5/2/  Online  Offline 5/2/20072  Online  Raw data : within the DAQ monitoring framework  Reconstructed data : with the HLT monitoring framework.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Application Software System Software.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
A. Gheata, ALICE offline week March 09 Status of the analysis framework.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Analysis train M.Gheata ALICE offline week, 17 March '09.
Go4 Workshop J.Adamczewski-Musch, S.Linev Go4 advanced features.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
ROOT and PROOF Tutorial Arsen HayrapetyanMartin Vala Yerevan Physics Institute, Yerevan, Armenia; European Organization for Nuclear Research (CERN)
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008.
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
AAF tips and tricks Arsen Hayrapetyan Yerevan Physics Institute, Armenia.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Experience of PROOF cluster Installation and operation
Big Data is a Big Deal!.
(on behalf of the POOL team)
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the Analysis Task Force
Status of the CERN Analysis Facility
PROOF – Parallel ROOT Facility
Experience in ALICE – Analysis Framework and Train
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Analysis framework - status
Kristjan Gulbrandsen March 25, 2004 Collaboration Meeting
Support for ”interactive batch”
PROOF - Parallel ROOT Facility
5/7/2019 Map Reduce Map reduce.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN

PROOF RPOOF stands for Parallel ROOT Facility It allows parallel processing of large amount of data. The output results can be directly visualised (e.g. the output histogram can be drawn at the end of the proof session.) The data you process can reside on your computer disk (PROOF Lite), PROOF cluster disks or grid. The usage of PROOF is transparent ◦ The same code can be run locally and in a PROOF system (certain rules have to be followed) PROOF is part of ROOT ALICE Offline Tutorial, March

root Remote PROOF Cluster Data root Client – Local PC ana.C stdout/result node1 node2 node3 node4 ana.C root How does PROOF analysis work? Data Proof master Proof slave Result Data Result Data Result ALICE Offline Tutorial, March

Event based (trivial) Parallelism ALICE Offline Tutorial, March

Terminology Client Your machine running a ROOT session that is connected to a PROOF master Master PROOF machine coordinating work between slaves Slave/Worker PROOF machine that processes data Query A job submitted from the client to the PROOF system. A query consists of a selector and a chain Selector A class containing the analysis code. In ALICE we use the Analysis Framework, therefore a AliAnalysisTaskSE is sufficient Chain A list of files (trees) to process (more details later) ALICE Offline Tutorial, March

How to use PROOF The analysis framework is used ◦ Files to be analyzed are put into a chain  TChain. ◦ Analysis written as a task (already introduced in previous tutorial)  AliAnalysisTaskSE ◦ The same analysis like written previously can be used If additional libraries are needed, these have to be distributed as a "package” (PAR: PRoof Archive ) Analysis (AliAnalysisTaskSE) Input Files (TChain) Output ALICE Offline Tutorial, March

once on your client once on each slave for each tree for each event AliAnalysisTaskSE Classes derived from AliAnalysisTaskSE can run locally, in PROOF and in AliEn ◦ "Constructor" ◦ UserCreateOutputObjects() ◦ ConnectInputData() ◦ UserExec() ◦ Terminate() ALICE Offline Tutorial, March

Class TTree A tree is a container for data storage It consists of several branches ◦ These can be in one or several files ◦ Branches are stored contiguously (split mode) ◦ When reading a tree, certain branches can be switched off  speed up of analysis when not all data is needed Set of helper functions to visualize content (e.g. Draw, Scan) Compressed Tree Branch point xyzxyz xxxxxxxxxxyyyyyyyyyyzzzzzzzzzz Branches File ALICE Offline Tutorial, March

TChain A chain is a list of trees (in several files) Normal TTree functions can be used ◦ Draw(...), Scan(...)  these iterate over all elements of the chain Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4) ALICE Offline Tutorial, March

Merging The analysis runs on several slaves, therefore partial results have to be merged Merging can be done in one of the following ways: ◦ On few workers (submergers; their number and location is decided by PROOF) and, finally, on master ◦ Directly on master (not desirable in case of large output) Result from Slave 1 Result from Slave 2 Final result Merge() ALICE Offline Tutorial, March

Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4) Workflow Summary Analysis (AliAnalysisTask) Input proof ALICE Offline Tutorial, March

Workflow Summary Analysis (AliAnalysisTask) proof Output Merged Output ALICE Offline Tutorial, March

Packages PAR files: PROOF ARchive. Like Java jar ◦ Gzipped tar file ◦ PROOF-INF directory  BUILD.sh, building the package, executed per slave  SETUP.C, set environment, load libraries, executed per slave API to manage and activate packages ◦ UploadPackage("package") ◦ EnablePackage("package") ALICE Offline Tutorial, March

CERN Analysis Facility The CERN Analysis Facility (CAF) will run PROOF for ALICE ◦ Prompt analysis of pp data ◦ Pilot analysis of PbPb data ◦ Calibration & Alignment Available to the whole collaboration but the number of users will be limited for efficiency reasons Design goals ◦ 500 CPUs ◦ 100 TB of selected data locally available ALICE Offline Tutorial, March

ALICE Analysis Facilities (AAF) ◦ CAF - CERN ◦ SKAF - Slovakia ◦ KiAF - Korea ◦ SAF – France (Subatech) ◦ LAF – France (CCIN2P3, Lyon) ◦ JRAF – Russia (JINR) ◦ TAF – Italy (Torino) ALICE Offline Tutorial, March

PROOF datasets A dataset represents a list of files (e.g. physics run X) ◦ Correspondence between AliEn collection and PROOF dataset Users register datasets ◦ The files contained in a dataset are automatically staged from AliEn (and kept available) ◦ Datasets are used for processing with PROOF  Contain all relevant information to start processing (location of files, abstract description of content of files) Datasets are public for reading, common datasets are available (for data of common interest) ALICE Offline Tutorial, March

17 Datasets in Practice Upload to PROOF cluster ◦ gProof->RegisterDataSet("myDataSet", proofColl); Check status ◦ gProof->ShowDataSets(); -> Datasets -> CAF ALICE Offline Tutorial, March

Looking at the task Constructor ◦ Called once when the task is created ◦ Input/Output is connected UserCreateOutputObjects ◦ Called once per slave ◦ Create histograms UserExec ◦ Called once per event ◦ Track loop, tracks are counted, histogram filled, output "posted" Terminate ◦ Called once on the client (your laptop/PC) ◦ Histogram read back from the output stream, visualized, saved to disk ALICE Offline Tutorial, March

Reading log files When your task crashes You can access the output of the last query by clicking on the “Show Log” button in the PROOF progress window You can retrieve the output from any previous query ◦ Open ROOT ◦ Get a PROOF manager object mgr = TProof::Mgr(”alice-caf") ◦ Get the log files from the last session logs = mgr->GetSessionLogs(0) // 0=last query ◦ Display them logs->Display() ◦ Search for a special word (e.g. segmentation violation) logs->Grep("segmentation violation") ◦ Save them to a file logs->Save("*", "logs.txt") ALICE Offline Tutorial, March

Some Goodies... Resetting environment ◦ TProof::Reset(”alicecaf") ◦ TProof::Reset(”alicecaf", kTRUE) Compile with debug ◦ Load(" +g”) ALICE Offline Tutorial, March