Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008.

Similar presentations


Presentation on theme: "ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008."— Presentation transcript:

1 ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008

2 2 PROOF Parallel ROOT Facility Interactive parallel analysis on a local cluster Parallel processing of (local) data (trivial parallelism) Fast Feedback Output handling with direct visualization Not a batch system PROOF itself is not related to Grid Can access Grid files The usage of PROOF is transparent The same code can be run locally and in a PROOF system (certain rules have to be followed) PROOF is part of ROOT

3 3 root Remote PROOF Cluster Data root Client – Local PC ana.C stdout/result node1 node2 node3 node4 ana.C root PROOF Schema Data Proof master Proof slave Result Data Result Data Result

4 4 Event based (trivial) Parallelism

5 5 Terminology Client Your machine running a ROOT session that is connected to a PROOF master Master PROOF machine coordinating work between slaves Slave/Worker PROOF machine that processes data Query A job submitted from the client to the PROOF system. A query consists of a selector and a chain Selector A class containing the analysis code In ALICE we use the Analysis Framework, therefore a AliAnalysisTask is sufficient Chain A list of files (trees) to process (more details later)

6 6 ALICE Analysis Framework Transparent access to all resources with the same code Usage: Local, AliEn grid, CAF/PROOF Transparent access to different inputs ESD, AOD, Kinematics tree (MC truth) Allow for „scheduled“ analysis Common and well tested environment to run several tasks Defines a common terminology

7 7 How to use PROOF The analysis framework is used Files to be analyzed are put into a chain  TChain Analysis written as a task  AliAnalysisTask The same analysis like in the local case can be used in a parallel environment If additional libraries are needed, these have to be distributed as a "package" Analysis (AliAnalysisTask) Input Files (TChain) Output

8 8 once on your client once on each slave for each tree for each event Classes derived from AliAnalysisTask can run locally, in PROOF and in AliEn "Constructor" CreateOutputObjects() ConnectInputData() Exec() Terminate() AliAnalysisTask

9 9 Class TTree A tree is a container for data storage It consists of several branches These can be in one or several files Branches are stored contiguously (split mode) When reading a tree, certain branches can be switched off  speed up of analysis when not all data is needed Set of helper functions to visualize content (e.g. Draw, Scan) Compressed Tree Branch point xyzxyz xxxxxxxxxxyyyyyyyyyyzzzzzzzzzz Branches File

10 10 TChain A chain is a list of trees (in several files) Normal TTree functions can be used Draw(...), Scan(...)  these iterate over all elements of the chain Selectors can be used with chains Process(const char* selectorFileName) After using SetProof() these calls are run in PROOF Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4)

11 11 Merging The analysis runs on several slaves, therefore partial results have to be merged Objects are identified by name Standard merging implementation for histograms available Other classes need to implement Merge(TCollection*) When no merging function is available all the individual objects are returned Result from Slave 1 Result from Slave 2 Final result Merge()

12 12 PROOF Output list AMAM O1 AM task1 task2 task3 taskN InputsOutputs AM task1 task2 task3 taskN InputsOutputs AM task1 task2 task3 taskN InputsOutputs AM task1 task2 task3 taskN InputsOutputs AM task1 task2 task3 taskN InputsOutputs AM task1 task2 task3 taskN InputsOutputs Input list AM AliAnalysisManager – PROOF mode Analysis Manager task1 task2 task3 taskN Input chainOutputs Worke r AliAnalysisSelector TSelector AM- >StartAnalysis(“proof”) MyAnalysis.C CLIENT O2 On O O O Master O2 O1 On Terminate( ) SlaveBegin () Process() SlaveTerminat e() CLIENTPROOF

13 13 Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4) Workflow Summary Analysis (AliAnalysisTask) Input proof

14 14 Workflow Summary Analysis (AliAnalysisTask) proof Output Merged Output

15 15 Packages PAR files: PROOF ARchive. Like Java jar Gzipped tar file PROOF-INF directory BUILD.sh, building the package, executed per slave SETUP.C, set environment, load libraries, executed per slave API to manage and activate packages UploadPackage("package") EnablePackage("package")

16 16 CERN Analysis Facility The CERN Analysis Facility (CAF) will run PROOF for ALICE Prompt analysis of pp data Pilot analysis of PbPb data Calibration & Alignment Available to the whole collaboration but the number of users will be limited for efficiency reasons Design goals 500 CPUs 100 TB of selected data locally available

17 17 Evaluation of PROOF Test setup since May 2006 40 machines, 2 CPUs each, 200 GB disk Tests performed Usability tests Simple speedup plot Evaluation of different query types Evaluation of the system when running a combination of query types Goal: Realistic simulation of users using the system

18 18 Query Type Cocktail A realistic stress test consists of different users that submit different types of queries 4 different query types 20% very short queries 40% short queries 20% medium queries 20% long queries User mix 33 nodes available for the test Maximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs)

19 19 Hands-On Getting ready... Run a task that accesses ESD Locally PROOF Modify it... Run a task that accesses MC PROOF Reading log files, resetting session, etc.

20 20 Warm up Preconditions From your lxplus account, do: wget http://aliceinfo.cern.ch/Offline/Activities/Analysis/C AF/proof-tutorial.tgz gtar xvzf proof-tutorial.tgz Set up environment source /afs/cern.ch/alice/caf/caf-lxplus.sh v4-14- Release NB: This works only for the tutorial machines, i.e. SLC4 i686 How to enable it on LXPLUS is explained later in the tutorial Check ROOT Start it. Does it show ROOT version 5.21/01?

21 21 Files to be used CreateESDChain.C: Creates a chain from a list of file names ESD82XX_30K.txt: List of PDC07 files distributed on the CAF AF-v4-12.par: Par archive for PDC07 data and analysis framework AliAnalysisTaskPt.{cxx,h}: Task that creates an uncorrected pT spectrum from ESD tracks AliAnalysisTaskPtMC.{cxx,h}: Task that creates an pT spectrum from the MC particles

22 22 Run a task locally Start ROOT Try the following lines and once they work add them to a macro run.C (enclose in {}) Load needed libraries gSystem->Load("libTree"); gSystem->Load(“libGeom”); gSystem->Load("libVMC"); gSystem->Load(“libPhysics”); gSystem->Load("libSTEERBase"); gSystem->Load("libAOD"); gSystem->Load("libESD"); gSystem->Load("libANALYSIS"); Add the AliRoot include path (only needed for local case) gROOT->ProcessLine(".include $ALICE_ROOT/include");

23 23 Run a task locally (2) Create the analysis manager mgr = new AliAnalysisManager("mgr"); Create the analysis task and add it to the manager gROOT->LoadMacro("AliAnalysisTaskPt.cxx++g"); "+" means compile; "g" means debug task = new AliAnalysisTaskPt; mgr->AddTask(task); Add the ESD handler (to access the ESD) AliESDInputHandler* esdH = new AliESDInputHandler; mgr->SetInputEventHandler(esdH); Add the lines to the macro run.C

24 24 Run a task locally (3) Create a chain gROOT->LoadMacro("CreateESDChain.C"); chain = CreateESDChain("ESD82XX_30K.txt", 10); Attach the input (the chain) cInput = mgr->CreateContainer("cInput", TChain::Class(), AliAnalysisManager::kInputContainer); mgr->ConnectInput(task, 0, cInput); Create a place for the output (a histogram: TH1) cOutput = mgr->CreateContainer("cOutput", TH1::Class(), AliAnalysisManager::kOutputContainer, "Pt.root"); mgr->ConnectOutput(task, 0, cOutput); Enable debug (optional) mgr->SetDebugLevel(2); Add the lines to the macro run.C

25 25 Run a task locally (4) Initialize the manager mgr->InitAnalysis(); Print the status (optional) mgr->PrintStatus(); Run the analysis mgr->StartAnalysis("local", chain); Add the lines to the macro run.C After running look at the output and check the content of the file Pt.root

26 26 run.C

27 27 Package Management Connecting to the PROOF cluster TProof::Open("lxb6046"); Managing packages Upload (= copy to the cluster) gProof->UploadPackage(“AF-v4-14"); Enable (= compile) gProof->EnablePackage("AF-v4-14"); Clean (= remove) gProof->ClearPackage("AF-v4-14"); Known issue on AFS: Removal may fail. Try again after few seconds… Clean all (in case some libraries are messed up) gProof->ClearPackages();

28 28 Running a task in PROOF Copy run.C to runProof.C Add connecting to the cluster TProof::Open("lxb6046") Replace the loading of the libraries with uploading the packages gProof->UploadPackage("AF-v4-14") gProof->EnablePackage("AF-v4-14") Replace the loading of the task with gProof->Load("AliAnalysisTaskPt.cxx++g") Replace in StartAnalysis "local" with "proof" Run it! Increase the number of files to 200 20 files 200 files

29 29 runProof.C

30 30 Progress dialog Query statistics Abort query and view results up to now Abort query and discard results Show log files Show processing rate

31 31 Looking at the task Constructor Called once when the task is created Input/Output is connected ConnectInputData (usually does not need to be changed) Called once per tree on each slave Connect fESD pointer CreateOutputObjects Called once per slave Create histograms Exec Called once per event Track loop, tracks are counted, histogram filled, output "posted" Terminate Called once on the client (your laptop/PC) Histogram read back from the output stream, visualized, saved to disk

32 32 Changing the task Add a |  | < 0.5 cut Float_t eta = track->Eta(); if (TMath::Abs(eta) > 0.5) continue;

33 33 Changing the task (2) Add a second plot:  distribution Header file (.h file) Add new member: TH1F* fEta; // eta distribution Constructor Initialize member: fEta(0) Add second output slot: DefineOutput(1, TH1F::Class()) CreateOutputObjects Create histogram fEta = new TH1F("fEta", "#eta distribution", 20, -2, 2); Exec Get  like in previous example Fill histogram: fEta->Fill(eta); Post output: PostData(1, fEta)

34 34 Changing the task (3) Terminate Read histogram from the output slot fEta = dynamic_cast (GetOutputData(1)); Introduce an if statement if the object was retrieved if (!fEta) { Printf("ERROR: fEta was not found"); return; } Draw the histogram new TCanvas; fEta->DrawCopy(); Copy runProof.C to runProof2.C and change: Add second output slot cOutput2 = mgr->CreateContainer("cOutput2", TH1::Class(), AliAnalysisManager::kOutputContainer, "Pt.root"); mgr->ConnectOutput(task, 1, cOutput2);

35 35 Read Monte Carlo tracks Use task AliAnalysisTaskPtMC.{h,cxx} Copy runProof.C to runProofMC.C Change AliAnalysisTaskPt to AliAnalysisTaskPtMC Add access to the MC event handler handler = new AliMCEventHandler; mgr->SetMCtruthEventHandler (handler); Change output filename to PtMC.root Run it!

36 36 runProofMC.C

37 37 Looking at the MC task Very similar to ESD track case Instead of looping over content of fESD, MC event is retrieved by AliMCEventHandler* eventHandler = dynamic_cast (AliAnalysisManager::GetAnalysisManager() ->GetMCtruthEventHandler()); if (!eventHandler) { Printf("ERROR: Could not retrieve MC event handler"); return; } AliMCEvent* mcEvent = eventHandler->MCEvent(); if (!mcEvent) { Printf("ERROR: Could not retrieve MC event"); return; }

38 38 Reading log files When your task crashes You can access the output via the PROOF progress window In rare casrs you have to restart the ROOT session Reading output from last query Open ROOT Get a PROOF manager object mgr = TProof::Mgr("lxb6046") Get the log files from the last session logs = mgr->GetSessionLogs(0) Display them logs->Display() Search for a special word (e.g. segmentation violation) logs->Grep("segmentation violation") Save them to a file logs->Save("*", "logs.txt")

39 39 Some Goodies... Resetting environment TProof::Reset("lxb6046") Do not put this in your macro, if really needed, call it manually in a root session Compile with debug Load(" +g") Create a package from AliROOT make ESD.par

40 40 CAF use from LXPLUS "Any" ROOT version is ok Recommended version for LXPLUS LXPLUS runs SLC4 on x86_64 cd /afs/cern.ch/alice/caf source caf-lxplus.sh v4-12-Release (bash shell) v4-12-Release is the AliRoot version that is enabled at the same time More information on http://aliceinfo.cern.ch/Offline/Activities/Analysis/CAF Please join the mailing list alice-project- analysis-task-force@cern.ch by going to http://listboxservices.web.cern.ch/listboxservi ces/

41 41 Backup

42 42 PROOF Installation Install ROOT with PROOF enabled (default) More information: http://root.cern.ch Configuration (see next slides) xrootd config file: xrd.cf PROOF config file: proof.conf Start xrootd service Requires unprivileged user account

43 43 xrd.cf ## Load the XrdProofd protocol: xrd.protocol xproofd:1093 /opt/root/lib/libXrdProofd.so ## Set ROOTSYS xpd.rootsys /opt/root ## Working directory for sessions xpd.workdir /pool/proofbox

44 44 xrd.cf (2) ## xpd.resource static [ ] [ucfg: ] [wmx: ] [selopt: ] xpd.resource static /etc/proof/proof.conf wmx:-1 selopt:roundrobin ## Server role (master, worker) [default: any] xpd.role worker if lxb*.cern.ch xpd.role master if lxb6046.cern.ch ## Master(s) allowed to connect. By default all connections are allowed. xpd.allow lxb6046.cern.ch

45 45 proof.conf ## machine running the master master lxb6046.cern.ch ## machine(s) running Workers, dual CPU machines have to be listed twice worker lxb6047.cern.ch worker lxb6048.cern.ch...

46 46 Starting xrootd Service xrootd -b -l xrootd.log -R proofaccount -c xrd.cf -d Options: -b : background (skip for debugging) -l : log file -R : user account that runs xrootd service -c : configuration file -d : debug flag Do not forget full paths to the files

47 47 Learning about Branches The ESD tree consists of several branches Switching off not needed branches increases speed of analysis significantly Looking at the available branches chain = new TChain("esdTree") chain->Add("root://lxb6046.cern.ch//pool/proofpool/ pdc06/100/002/root_archive.zip#AliESDs.root") chain->Print() Disable all branches (in Init) tree->SetBranchStatus("*", 0) Enable a needed branch (in Init) tree->SetBranchStatus("fTracks.fCp", 1) Try this! What is the increase in processing speed?


Download ppt "ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008."

Similar presentations


Ads by Google