Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of.

Slides:



Advertisements
Similar presentations
ALICE Offline Tutorial Markus Oldenburg – CERN May 15, 2007 – University of Sao Paulo.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Computer Organization and Architecture
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
PROOF Status and Perspectives G. GANIS CERN / LCG VII ROOT Users workshop, CERN, March 2007.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Can we use the XROOTD infrastructure in the PROOF context ? The need and functionality of a PROOF Master coordinator has been discussed during the meeting.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 Part III: PROOF Jan Fiete Grosse-Oetringhaus – CERN Andrei Gheata - CERN V3.2 –
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Next Generation of Apache Hadoop MapReduce Owen
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
ROOT and PROOF Tutorial Arsen HayrapetyanMartin Vala Yerevan Physics Institute, Yerevan, Armenia; European Organization for Nuclear Research (CERN)
ALICE Offline Tutorial PART 3: PROOF Alice Core Offline 5 th June, 2008.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Experience of PROOF cluster Installation and operation
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Status of the CERN Analysis Facility
PROOF – Parallel ROOT Facility
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Ákos Frohner EGEE'08 September 2008
G. Ganis, 2nd LCG-France Colloquium
Support for ”interactive batch”
CPU SCHEDULING.
PROOF - Parallel ROOT Facility
Presentation transcript:

Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of Information Technology, VAST, Hanoi

René Brun2 Outline  PROOF essentials  Integration with experiment software  ALICE experience at the CAF

René Brun3 PROOF essentials  Motivation: provide an alternative, dynamic, approach to end-user HEP analysis on distributed systems  Typical HEP analysis is a continuous refinement cycle  Data sets are collections of independent events  Large (e.g. ALICE ESD+AOD: ~350 TB / year)  Spread over many disks and mass storage systems  Exploiting intrinsic parallelism is the only way to analyze the data in reasonable times Imple ment algorithm Run over data set Make improvements

René Brun4 PROOF essentials: classic approach Storage Batch farm queues manager catalog query submi t files jobs data file splitting myAna.C merging final analysis  static use of resources  jobs frozen: 1 job / worker node  manual splitting, merging  monitoring requires instrumentation outputs

René Brun5 PROOF essentials: an alternative approach catalog Storage PROOF farm scheduler query MASTER PROOF query: data file list, myAna.C files final outputs feedbacks (merged)  farm perceived as extension of local PC  same syntax as in local session  more dynamic use of resources  real time feedback  automated splitting and merging

René Brun6 PROOF essentials: target Short analysis using local resources, e.g. - end-analysis calculations - visualization Long analysis jobs with well defined algorithms (e.g. production of personal trees) Medium term jobs, e.g. analysis design and development using also non-local resources  Optimize response for short / medium jobs  Perceive medium as short

René Brun7 PROOF essentials: design goals  Transparency  minimal impact on ROOT user habits  Scalability  full exploitation of available resources  Adaptability  cope transparently with heterogeneous environments  Real-time interaction and feedback  Addresses the case of  Central or Departmental Analysis Facilities (Tier-2’s)  Multi-core, multi-disks desktops

René Brun8 Trivial Parallelism

René Brun9 Terminology  Client  Your machine running a ROOT session that is connected to a PROOF master  Master  PROOF machine coordinating work between slaves  Slave/Worker  PROOF machine that processes data  Query  A job submitted from the client to the PROOF system. A query consists of a selector and a chain  Selector  A class containing the analysis code (more details later)  Chain  A list of files (trees) to process (more details later)

René Brun10 How to use PROOF  Files to be analyzed are listed in a chain (  TTree/TChain)  Analysis written as a selector (  TSelector)  Input/Output is sent using dedicated lists  If additional libraries are needed, these have to be distributed as packages ("par" = PROOF archive) Analysis (TSelector) Input Files (TChain) Output (TList) Input (TList)

René Brun11 Class TTree  A tree is a container for data storage  It consists of several branches  These can be in one or several files  Branches are stored contiguously (split mode)  When reading a tree, certain branches can be switched off  speed up of analysis when not all data is needed  Set of helper functions to visualize content (e.g. Draw, Scan)  Compressed Tree Branch point xyzxyz xxxxxxxxxxyyyyyyyyyyzzzzzzzzzz Branches File

René Brun12 TChain  A chain is a list of trees (in several files)  Normal TTree methods can be used  Draw, Scan  these iterate over all elements of the chain  Selectors can be used with chains  Process(const char* selectorFileName)  After using SetProof() these calls are run in PROOF Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4)

René Brun13 PROOF essentials: what can be done?  Ideally everything that can be split in independent tasks  Currently available:  Processing of trees (see next slide)  Processing of independent objects in a file  Tree processing and drawing functionality complete // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); PROOFLOCAL

René Brun14 The ROOT data model: Trees & Selectors Begin() Create histos, … Define output list Process() preselection analysis Terminate() Final analysis (fitting, …) output list Selector loop over events OK event branch leaf branch leaf 12 n last n read needed parts only Chain branch leaf

René Brun15 PROOF essentials: multi-tier architecture xproofd Node sessions started by Xrootd Structured master - adapt to clusters of clusters - improve scalability One sub-master per geographic domain Heterogenous hardware / OS

René Brun16 PROOF essentials: connection layer  Sets-up the client session  Authentication, sandbox setup, start sessions on nodes  Based on xrootd  Light weight, industrial strength, networking and protocol handler  New PROOF-related protocol plug-in, xpd  xpd launches and controls PROOF sessions (proofserv)  xrootd act as a coordinator on the farm  Client disconnection / reconnection handled naturally  Can use the same daemon for data and PROOF serving

René Brun17 once on your client once on each slave TSelector for each tree for each event  Classes derived from TSelector can run locally and in PROOF  Begin()  SlaveBegin()  Init(TTree* tree)  Process(Long64_t entry)  SlaveTerminate()  Terminate()

René Brun18 Input / Output  The TSelector class has two members of type TList:  fInput, fOutput  These are used to get input data or put output data  Input list  Before running a query the input list is populated gProof->AddInput(myObj)  In the selector (Begin, SlaveBegin) the object is retrieved: fInput->FindObject("myObject")

René Brun19 Input / Output (2)  Output list  The output has to be added to the output list on each slave (in SlaveBegin/SlaveTerminate) fOutput->Add(fResult)  PROOF merges the results from each slave automatically (see next slide)  On your client (in Terminate) you retrieve the object and save it, display it,... fOutput->FindObject("myResult")

René Brun20 Input / Output (3)  Merging  Objects are identified by name  Standard merging implementation for histograms, trees, n-tuples available  Other classes need to implement Merge(TCollection*)  When no merging function is available all the individual objects are returned Result from Slave 1 Result from Slave 2 Final result Merge()

René Brun21 Chain Tree1 (File1) Tree2 (File2) Tree3 (File3) Tree4 (File3) Tree5 (File4) Workflow Summary Analysis (TSelector) Input (TList) proof

René Brun22 Workflow Summary Analysis (TSelector) Input (TList) proof Output (TList) Output (TList) Output (TList) Merged Output

René Brun23 PROOF essentials: dynamic load balancing  Pull architecture guarantees scalability  Adapts to definitive / temporary variation in performance Worker 1 Worker NMaster

René Brun24 PROOF essentials: intrinsic scalability  Strictly concurrent user jobs at CAF (100% CPU used)  In-memory data  Dual Xeon, 2.8 GHz  CMS analysis  1 master, 80 workers  Dual Xeon 3.2 GHz  Local data: 1.4 GB / node  Non-Blocking GB Ethernet 1 user 2 users 4 users 8 users I. Gonzales, Cantabria

René Brun25 PROOF essentials: exploiting multi-cores  Alice search for  0 ’s  4 GB simulated data  Instantaneous rates (evt/s, MB/s)  Clear advantage of quad core  Additional computing power fully exploited

René Brun26 PROOF essentials: additional remarks  Intrinsic serial overhead small  requires reasonable connection between a (sub-)master and its workers  Hardware considerations  IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better than 1 big one  Good amount of RAM for efficient data caching  Data access is The Issue:  Optimize for data locality, when possible  Efficient access to mass storage (next slide)

René Brun27 PROOF essentials: data access issues  Low latency in data access is essential for high performance  Not only a PROOF issue  File opening overhead  Minimized using asynchronous open techniques  Data retrieval  caching, pre-fetching of data segments to be analyzed Recently introduced in ROOT for TTree  Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluated

René Brun28 PROOF essentials: scheduling multi- users  Fair resource sharing, enforce priority policies  Priority-based worker level load balancing  Simple and solid implementation, no central unit Slowdown lower priority sessions Group priorities defined in the configuration file  Future: central scheduler for per-query decisions based on:  cluster load, resources need by the query, user history and priorities  Generic interface to external schedulers planned  MAUI, LSF, …

René Brun29 PROOF essentials: management tools  Data sets  Optimized distribution of data files on the farm By direct upload By staging out from mass storage (e.g. CASTOR)  Query results  Retrieve, archive  Packages  Optimized upload of additional libraries needed the analysis

René Brun30 PROOF essentials: monitoring  Internal  File access rates, packet latencies, processing time, etc.  Basic set of histograms available at tunable frequency Client temporary output objects can also be retrieved  Possibility of detailed tree for further analysis  MonALISA-based  Each host reports CPU, memory, swap, network  Each worker reports CPU, memory, evt/s, IO vs. network rate  pcalimonitor.cern.ch:8889 pcalimonitor.cern.ch:8889 Network traffic between nodes

René Brun31 PROOF GUI controller  Allows full on-click control  define a new session  submit a query, execute a command  query editor  create / pick up a TChain  choose selectors  online monitoring of feedback histograms  browse folders with results of query  retrieve, delete, archive functionality

René Brun32 PROOF basics (1)  To loop over bigger statistics, PROOF can be used  (Re-)start your ROOT session  Connect to PROOF ")  Create a chain of files that are stored in the PROOF cluster.L CreateESDChain.C chain = CreateESDChain("input.txt", 1000 )

René Brun33 PROOF basics (2)  "Connect" the chain with PROOF chain->SetProof()  Draw something chain->Draw("fZ") chain->Draw("fTPCncls:fTPCnclsF", "", "COLZ")  Local (before): 1 file = 100 events  PROOF (now): 1000 files = events

René Brun34 PROOF packages  (Re-)start your ROOT session  Connect to PROOF server ")  Upload the ESD package (only first time; can be done each time) gProof->UploadPackage("ESD")  Build the package, load the library (each time) gProof->EnablePackage("ESD")  Show available packages gProof->ShowPackages()  Remove a package (don’t do this now, we need it) gProof->ClearPackage("ESD") gProof->ClearPackages()

René Brun35 Run selector locally  (Re-)start ROOT session  Load ESD library (needed for this data).x loadlibs.C  Create a chain and add a local file chain = new TChain("esdTree") chain->Add("AliESDs.root")  Execute the selector chain->Process("TMySelector.cxx+")  Look at the output

René Brun36 Run selector in PROOF  (Re-)start ROOT session  Connect to PROOF server ")  Create a (long) chain.L CreateESDChain.C chain = CreateESDChain("input.txt", 100)  Enable the ESD package gProof->EnablePackage("ESD")  "Connect" the chain with PROOF chain->SetProof()  Execute the selector chain->Process("TMySelector.cxx+")

René Brun37 Progress dialog Query statistics Abort query and view results up to now Abort query and discard results Show log files Show processing rate

René Brun38 Integration with experiment software: main issues  Finding, using the experiment software  Environment settings, libraries loading  Implementing the analysis algorithms  TSelector strengths Automatic tree interaction Structured analysis  TSelector weaknesses Big macros New analysis implies new selector Change in the tree definition implies a new selector  Add layer to improve flexibility and to hide irrelevant details

René Brun39 Integration with experiment software  Experiment software framework available on nodes  Working group dedicated packages uploaded / enabled as PROOF packages (next slide)  Allows user to run her/his own modifications  Minimal ROOT environment set by the daemons before starting proofserv  Setting the experiment environment  Statically, before starting xrootd (inherited by proofserv)  Dynamically, by evaluating a user defined script in front of proofserv Allows to select different versions at run time

René Brun40 PROOF package management  Allows client to add software to be used in the analysis  Uploaded in the form of PAR files (Proof ARchive)  Simple structure package/ –Source / binary files package/PROOF-INF/BUILD.sh –How to build the package (makefile) package/PROOF-INF/SETUP.C –How to enable the package (load, dependencies)  Versioning support being added  Possibility to modify library / include paths to use public external packages (experiment libraries)

René Brun41 Integration with experiment software: ALICE  AliROOT analysis framework  Deployed on all nodes  Needs to be rebuilt for each new ROOT version Versioning issue being solved  One additional package (ESD) needed to read Event Summary Data  Uploaded as PAR file  Working group software automatically converted to PROOF packages (‘make’ target added to Makefile)  Generic AliSelector hiding details  User’s selector derives from AliSelector  Access to data by member fESD TSelector AliSelector

René Brun42 Integration with experiment software: ALICE  Alternative solution:  split analysis in functional modules (tasks)  Each task corresponds to well defined action  Tasks are connected via input/output data containers, defining their inter-dependencies  User creates tasks (derivation of AliAnalysisTask) and registers them to a task manager provide by the framework  The task manager, which derives from TSelector, takes care of the proper execution respecting the dependencies

René Brun43 Integration with experiment software: Phobos  TAM: Tree Analysis Modules solution  Modules structured like TSelector (Begin, Process, …) separating tree structure from analysis  Organization:  TAModule (: public TTask), base class of all modules ReqBranch (name, pointer) – attach to a branch in Begin() or SlaveBegin() LoadBranch (name) –Load the branch data in Process()  TAMSelector (: public TSelector) Module running and management Handle interaction with tree  TAMOutput: stores module output objects

René Brun44 Integration with experiment software: Phobos class TMyMod : public TAModule { private: TPhAnTEventInfo* fEvtInfo; // event info TH1F* fEvtNumH; // event num histogram protected: void SlaveBegin(); void Process(); void TMyMod::SlaveBegin() { ReqBranch(“eventInfo”, fEvtInfo); fEvtNumH = new TH1F(“EvtNumH”,”Event Num”,10,0,10); } void TMyMod::Process() { LoadBranch(“eventInfo”); fEvtNumH->Fill(fEvtInfo->fEventNum); } Example of user’s module

René Brun45 Integration with experiment software: Phobos Example analysis  Build module hierarchy  No PROOF  PROOF TMyMod* myMod = new TMyMod; TMyOtherMod* subMod = new TMyOtherMod; myMod->Add(subMod); TAMSelector* mySel = new TAMSelector; mySel->AddInput(myMod); tree->Process(mySel); TList* output = mySel->GetModOutput(); dset->AddInput(myMod); dset->Process(“TAMSelector”); TList* output = gProof->GetOutputList();

René Brun46 Integration with experiment software: CMS  Environment: CMS needs to run SCRAM before proofserv  PROOF_INITCMD contains the path of a script  The script initializes the CMS environment using SCRAM TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”) #!/bin/sh # Export the architecture export SCRAM_ARCH=slc3_ia32_gcc323 # Init CMS defaults cd ~maartenb/proj/cms/CMSSW_1_1_1. /app/cms/cmsset_default.sh # Init runtime environment scramv1 runtime -sh > /tmp/dummy cat /tmp/dummy

René Brun47 Integration with experiment software: CMS  CMSSW: software framework provides EDAnalyzer technology for analysis purpose  Write algorithms that can be used with both technologies (EDAnalyzer and TSelector)  Possible if well defined interface class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & ); };  Used in a TSelector templated framework TFWLiteSelector TFWLiteSelector

René Brun48 Integration with experiment software: CMS  In PROOF selectors libraries distributed as PAR file // Load framework library gSystem->Load(“libFWCoreFWLite”); AutoLibraryLoader::enable(); // Load TSelector library gSystem->Load(“libPhysicsToolsParallelAnalysis”);

René Brun49 ALICE experience at the CAF  Alternative to using the Grid  Massive execution of jobs vs. fast response time  Available to the whole collaboration  number of users will be limited for efficiency reasons  Design goals  500 CPUs, 200 TB of selected data locally available  CERN Analysis Facility used for short / medium tasks  p-p prompt analysis, Pb-Pb pilot analysis  Calibration & Alignment

René Brun50 ALICE at CAF: example of data distribution 20% Last day RAW events 3.2M PbPb or 40M pp 20% Fixed RAW events 1.6M PbPb and 20M pp Total: 200 TB 20% Fixed ESDs 8M PbPb and 500M pp 40 % Cache for files retrieved from AliEn Grid, Castor Sizes of single events from Computing TDR

René Brun51 ALICE experience at the CAF: test setup  Test setup since May 2006  40 machines, 2 CPUs each (Xeon 2.8 Ghz), ~200 GB disk  5 as development partition, 35 as production partition  Machine pools are managed by xrootd  Fraction of data of Physics Data Challenge ’06 distributed (~ 1 M events)  Tests performed  Usability tests  Speedup tests  Evaluation of the system when running a combination of query types  Integration with ALICE’s analysis framework (AliROOT)

René Brun52 ALICE experience at the CAF: realistic stress test  A realistic stress test consists of different users that submit different types of queries  4 different query types  20% very short queries (0.4 GB)  40% short queries (8 GB)  20% medium queries (60 GB)  20% long queries (200 GB)  User mix  33 nodes available for the test  Maximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs)

René Brun53 ALICE experience at the CAF: query types Name# files# evts processed data avg. time* Submission Interval VeryShort202K0.4 GB9 ± 1 s30 ± 15 s Short2040K8 GB150 ± 10 s120 ± 30 s Medium150300K60 GB1,380 ± 60 s300 ± 120 s Long5001M200 GB4,500 ± 200 s600 ± 120 s *run in PROOF, 10 users, 10 workers each

René Brun54 ALICE experience at the CAF: speed-up

René Brun55 ALICE experience at the CAF: speed-up  Theoretical batch limit achieved and by-passed automatically  Machines load was 80-90% during the test  Adding workers may be inefficient  Tune number of workers for optimal response  Depends on query type and internals Number, type, size of output objects  Shows importance of active scheduling

René Brun56 PROOF Dataset Features  A dataset represents a list of files (e.g. physics run X)  Correspondence between AliEn dataset and PROOF dataset  Users register datasets  The files contained in a dataset are automatically staged from AliEn/CASTOR (and kept available)  Datasets are used for processing with PROOF Contain all relevant information to start processing (location of files, abstract description of content of files)  File-level storing by underlying xrootd infrastructure  Datasets are public for reading  Global datasets

René Brun57 PROOF worker / xrootd disk server (many) PROOF Master / xrootd redirector PROOF master Dataset removes dataset uses dataset registers dataset data manager daemon data manager daemon keeps dataset persistent by requesting staging updating file information touching files olbd/ xrootd olbd/ xrootd file stager stages files removes files that are not used (least recently used above threshold) selects disk server and forwards stage request WN disk … write delete read read, touch stage Dataset concept AliEn SE CASTOR MSS

René Brun58 Monitoring with MonALISA  Cluster (machine-level) with ApMon  Query statistics  Sent at end of each query  CPU quotas: Consolidation done by ML  Disk quotas: Visualized by ML Watch live: pcalimonitor.cern.ch "CAF monitoring" CPU per group Aggregation plot of CPU used by each query type time overall cluster usage (only user cpu) 0 95  overall cluster usage > 70% (user CPU!)

René Brun59 Summary  PROOF provides an alternative approach to HEP analysis on farms trying to automatically avoid under-usage, preserving the goodies of interactivity  Real issue is data access (everybody affected!)  pre-fetching and asynchronous techniques help  Alternative technologies (e.g. InfiniBand) or alternative ideas (PetaCache) worth to be investigated