PROOF and AnT in PHOBOS Kristjan Gulbrandsen March 25, 2004 Collaboration Meeting.

2 What is PROOF? A system integrated into ROOT which allows for interactive analysis of large data sets using parallel processing and I/O Transparent – difference between running a local session and over multiple computers is minimal Adaptable – can react network conditions, system performance and multiple architectures Scalable – no manifest limitations on size

3 PROOF Architecture Client connects to a master server local to cluster Master server talks to slaves on nodes where (ideally) data is located Slaves run in parallel Master server collects results minimizing slow interaction with client Internet Master Slave User

4 TSelector Interface class TSelector{ Begin() SlaveBegin() Process() SlaveTerminate() Terminate() } Client: Begin() Terminate() (n) Slaves: SlaveBegin() Process() … Process() … Process() … SlaveTerminate() If a tree exists, tree->MakeSelector() creates a skeleton class deriving from TSelector A copy of each object exists in each slave } code normally in for loops Create histograms

5 Using PROOF Call gROOT->Proof(“proof:// ”) to begin a proof session A set of file names must be added to a TDSet similar to adding files to a TChain Call TDSet->Process( ) where contain TSelector code Additional supporting files/libraries can be used by creating PAR files

6 PROOF Execution root Remote PROOF Cluster proof TNetFile TFile Local PC $ root ana.C stdout/obj node1 node2 node3 node4 $ root root [0].x ana.C $ root root [0].x ana.C root [1] gROOT->Proof(“remote”) $ root root [0] tree->Process(“ana.C”) root [1] gROOT->Proof(“remote”) root [2] dset->Process(“ana.C”) ana.C proof proof = slave server proof proof = master server #proof.conf slave node1 slave node2 slave node3 slave node4 *.root TFile

7 PROOF in PHOBOS PROOF is installed on the Pharm cluster Newest ROOT version (4.00/03) is needed and exists in /usr/local/root Proofserver is compiled with libnew (for now to allow PhatII classes to be used without modification PhatII structure is ideal for transferring individual libraries among slave nodes

8 AnT Trees A tree format has been created to hold summary information for analyses Trees are designed to have basic summary information used for analyses and allow pieces of data to be ignored (not read) decreasing I/O TRefs allowing partial information to be read in while maintaining the ability to cross reference information (i.e. tracks referring to their hits)

9 AnT Structure EventInfo: Run Seq Ev_No Date Time Polarity Prim_vtx-> Tracks[]: PID Charge MeandE SigmadE Prob Chi^2 Xprod[3] Mom[3] HitArray[]-> TriggerInfo: IsCol L0 L1 EOct ERing TrgT_Extra[] TrgE_Extra[] Paddle: TruncMeanP TruncMeanN SumP SumN TDiff Vertex[]: Status ID Prob Pos[3] Sigma[3] Hits[]: Layer SensorLabel dE Pos[3] Pad[2] ZDC: SumP SumN TZDCP TZDCN TOF Info? PCAL Info? HitArrays are being developed

10 Current AnT Trees Prototype AnT trees currently exist on Pharm (10 runs, 56 Seqs) and can be used Analysis personnel needed to use the trees and provide information about necessary additions making them useful for many analyses

11 Analysis using AnT/PROOF AnT/PROOF has been used to generate p t distributions from current data Using AnT/PROOF speeds up analysis from an hour to a minute Disabling hit read in speeds up processing by more than factor of 10

12 Summary PROOF is ready for use on Pharm. Simple example macros exist explaining how to use PROOF AnT trees have been created for quick analysis of large data sets in conjunction with PROOF Users are needed to test/try both PROOF and AnT to provide information on data format and stress PROOF architecture

