G. Ganis, 2nd LCG-France Colloquium

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.
Chapter 1 and 2 Computer System and Operating System Overview
Computer Organization and Architecture
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
PROOF Status and Perspectives G. GANIS CERN / LCG VII ROOT Users workshop, CERN, March 2007.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Interactive Data Analysis with PROOF Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers CERN.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Marek BiskupACAT2005PROO F Parallel Interactive and Batch HEP-Data Analysis with PROOF Maarten Ballintijn*, Marek Biskup**, Rene Brun**, Philippe Canal***,
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.
ROOT-CORE Team 1 PROOF xrootd Fons Rademakers Maarten Ballantjin Marek Biskup Derek Feichtinger (ARDA) Gerri Ganis Guenter Kickinger Andreas Peters (ARDA)
Introduction to the PROOF system Ren é Brun CERN Do-Son school on Advanced Computing and GRID Technologies for Research Institute of.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
1 Status of PROOF G. Ganis / CERN Application Area meeting, 24 May 2006.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
Next Generation of Apache Hadoop MapReduce Owen
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
ROOT and PROOF Tutorial Arsen HayrapetyanMartin Vala Yerevan Physics Institute, Yerevan, Armenia; European Organization for Nuclear Research (CERN)
PROOF on multi-core machines G. GANIS CERN / PH-SFT for the ROOT team Workshop on Parallelization and MultiCore technologies for LHC, CERN, April 2008.
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
HYDRA Framework. Setup of software environment Setup of software environment Using the documentation Using the documentation How to compile a program.
Lyon Analysis Facility - status & evolution - Renaud Vernet.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Workload Management Workpackage
Experience of PROOF cluster Installation and operation
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Database Replication and Monitoring
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Status of the Analysis Task Force
Status of the CERN Analysis Facility
Processes and Threads Processes and their scheduling
PROOF – Parallel ROOT Facility
INFN-GRID Workshop Bari, October, 26, 2004
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
PROOF in Atlas Tier 3 model
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Experience in ALICE – Analysis Framework and Train
Comments about PROOF / ROOT evolution during coming years
Ákos Frohner EGEE'08 September 2008
Partner: LMU (Atlas), GSI (Alice)
LCG middleware and LHC experiments ARDA project
Computing Infrastructure for DAQ, DM and SC
Haiyan Meng and Douglas Thain
Support for ”interactive batch”
Characteristics of Reconfigurable Hardware
Operating Systems.
CPU SCHEDULING.
PROOF - Parallel ROOT Facility
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

G. Ganis, 2nd LCG-France Colloquium PROOF Gerardo GANIS CERN / LCG 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Outline PROOF essentials Integration with experiment software ALICE experience at the CAF 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Outline PROOF essentials Integration with experiment software ALICE experience at the CAF 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium PROOF essentials Motivation: provide an alternative, dynamic, approach to end-user HEP analysis on distributed systems Typical HEP analysis is a continuous refinement cycle Data sets are collections of independent events Large (e.g. ALICE ESD+AOD: ~350 TB / year) Spread over many disks and mass storage systems Exploiting intrinsic parallelism is the only way to analyze the data in reasonable times Implement algorithm Run over data set Make improvements 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: classic approach catalog files Storage Batch farm queues query jobs data file splitting myAna.C submit merging final analysis manager outputs static use of resources jobs frozen: 1 job / worker node manual splitting, merging monitoring requires instrumentation 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: an alternative approach catalog Storage PROOF farm scheduler query MASTER PROOF query: data file list, myAna.C files final outputs (merged) feedbacks farm perceived as extension of local PC same syntax as in local session more dynamic use of resources real time feedback automated splitting and merging 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: target Medium term jobs, e.g. analysis design and development using also non-local resources Short analysis using local resources, e.g. end-analysis calculations visualization Long analysis jobs with well defined algorithms (e.g. production of personal trees) Optimize response for short / medium jobs Perceive medium as short 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: design goals Transparency minimal impact on ROOT user habits Scalability full exploitation of available resources Adaptability cope transparently with heterogeneous environments Real-time interaction and feedback Addresses the case of Central or Departmental Analysis Facilities (Tier-2’s) Multi-core, multi-disks desktops 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: what can be done? Ideally everything that can be split in independent tasks Currently available: Processing of trees (see next slide) Processing of independent objects in a file Tree processing and drawing functionality complete LOCAL PROOF // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); // Create a chain of trees root[0] TChain *c = CreateMyChain.C; // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

The ROOT data model: Trees & Selectors Begin() Create histos, … Define output list Process() preselection analysis Terminate() Final analysis (fitting, …) output list Selector loop over events OK event branch leaf 1 2 n last read needed parts only Chain 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: multi-tier architecture One sub-master per geographic domain xproofd Structured master - adapt to clusters of clusters - improve scalability Heterogenous hardware / OS Node sessions started by Xrootd 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: connection layer Sets-up the client session Authentication, sandbox setup, start sessions on nodes Based on xrootd Light weight, industrial strength, networking and protocol handler New PROOF-related protocol plug-in, xpd xpd launches and controls PROOF sessions (proofserv) xrootd act as a coordinator on the farm Client disconnection / reconnection handled naturally Can use the same daemon for data and PROOF serving 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: dynamic load balancing Pull architecture guarantees scalability Adapts to definitive / temporary variation in performance Worker 1 Master Worker N 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: intrinsic scalability Strictly concurrent user jobs at CAF (100% CPU used) In-memory data Dual Xeon, 2.8 GHz CMS analysis 1 master, 80 workers Dual Xeon 3.2 GHz Local data: 1.4 GB / node Non-Blocking GB Ethernet 1 user 2 users 4 users 8 users I. Gonzales, Cantabria 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: exploiting multi-cores Alice search for 0’s 4 GB simulated data Instantaneous rates (evt/s, MB/s) Clear advantage of quad core Additional computing power fully exploited 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: additional remarks Intrinsic serial overhead small requires reasonable connection between a (sub-)master and its workers Hardware considerations IO bound analysis (frequent in HEP) often limited by hard drive access: N small disks are much better than 1 big one Good amount of RAM for efficient data caching Data access is The Issue: Optimize for data locality, when possible Efficient access to mass storage (next slide) 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: data access issues Low latency in data access is essential for high performance Not only a PROOF issue File opening overhead Minimized using asynchronous open techniques Data retrieval caching, pre-fetching of data segments to be analyzed Recently introduced in ROOT for TTree Techniques improving network performance, e.g. InfiniBand, or file access (e.g. memory-based file serving, PetaCache) should be evaluated 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: scheduling multi-users Fair resource sharing, enforce priority policies Priority-based worker level load balancing Simple and solid implementation, no central unit Slowdown lower priority sessions Group priorities defined in the configuration file Future: central scheduler for per-query decisions based on: cluster load, resources need by the query, user history and priorities Generic interface to external schedulers planned MAUI, LSF, … 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: management tools Data sets Optimized distribution of data files on the farm By direct upload By staging out from mass storage (e.g. CASTOR) Query results Retrieve, archive Packages Optimized upload of additional libraries needed the analysis 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF essentials: monitoring Internal File access rates, packet latencies, processing time, etc. Basic set of histograms available at tunable frequency Client temporary output objects can also be retrieved Possibility of detailed tree for further analysis MonALISA-based Each host reports CPU, memory, swap, network Each worker reports CPU, memory, evt/s, IO vs. network rate pcalimonitor.cern.ch:8889 Network traffic between nodes 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium PROOF GUI controller Allows full on-click control define a new session submit a query, execute a command query editor create / pick up a TChain choose selectors online monitoring of feedback histograms browse folders with results of query retrieve, delete, archive functionality 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Outline PROOF essentials Integration with experiments software Main issues PROOF packages Examples of ALICE, Phobos, CMS ALICE experience at the CAF 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: main issues Finding, using the experiment software Environment settings, libraries loading Implementing the analysis algorithms TSelector strengths Automatic tree interaction Structured analysis TSelector weaknesses Big macros New analysis implies new selector Change in the tree definition implies a new selector Add layer to improve flexibility and to hide irrelevant details 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software Experiment software framework available on nodes Working group dedicated packages uploaded / enabled as PROOF packages (next slide) Allows user to run her/his own modifications Minimal ROOT environment set by the daemons before starting proofserv Setting the experiment environment Statically, before starting xrootd (inherited by proofserv) Dynamically, by evaluating a user defined script in front of proofserv Allows to select different versions at run time 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

PROOF package management Allows client to add software to be used in the analysis Uploaded in the form of PAR files (Proof ARchive) Simple structure package/ Source / binary files package/PROOF-INF/BUILD.sh How to build the package (makefile) package/PROOF-INF/SETUP.C How to enable the package (load, dependencies) Versioning support being added Possibility to modify library / include paths to use public external packages (experiment libraries) 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: ALICE AliROOT analysis framework Deployed on all nodes Needs to be rebuilt for each new ROOT version Versioning issue being solved One additional package (ESD) needed to read Event Summary Data Uploaded as PAR file Working group software automatically converted to PROOF packages (‘make’ target added to Makefile) Generic AliSelector hiding details User’s selector derives from AliSelector Access to data by member fESD TSelector AliSelector <UserSelector> 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: ALICE Alternative solution: split analysis in functional modules (tasks) Each task corresponds to well defined action Tasks are connected via input/output data containers, defining their inter-dependencies User creates tasks (derivation of AliAnalysisTask) and registers them to a task manager provide by the framework The task manager, which derives from TSelector, takes care of the proper execution respecting the dependencies 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: Phobos TAM: Tree Analysis Modules solution Modules structured like TSelector (Begin, Process, …) separating tree structure from analysis Organization: TAModule (: public TTask), base class of all modules ReqBranch (name, pointer) attach to a branch in Begin() or SlaveBegin() LoadBranch (name) Load the branch data in Process() TAMSelector (: public TSelector) Module running and management Handle interaction with tree TAMOutput: stores module output objects 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: Phobos Example of user’s module class TMyMod : public TAModule { private: TPhAnTEventInfo* fEvtInfo; // event info TH1F* fEvtNumH; // event num histogram protected: void SlaveBegin(); void Process(); void TMyMod::SlaveBegin() { ReqBranch(“eventInfo”, fEvtInfo); fEvtNumH = new TH1F(“EvtNumH”,”Event Num”,10,0,10); } void TMyMod::Process() { LoadBranch(“eventInfo”); fEvtNumH->Fill(fEvtInfo->fEventNum); } 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: Phobos Example analysis Build module hierarchy No PROOF PROOF TMyMod* myMod = new TMyMod; TMyOtherMod* subMod = new TMyOtherMod; myMod->Add(subMod); TAMSelector* mySel = new TAMSelector; mySel->AddInput(myMod); tree->Process(mySel); TList* output = mySel->GetModOutput(); dset->AddInput(myMod); dset->Process(“TAMSelector”); TList* output = gProof->GetOutputList(); 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: CMS Environment: CMS needs to run SCRAM before proofserv PROOF_INITCMD contains the path of a script The script initializes the CMS environment using SCRAM TProof::AddEnvVar(“PROOF_INITCMD”, “~maartenb/proj/cms/CMSSW_1_1_1/setup_proof.sh”) #!/bin/sh # Export the architecture export SCRAM_ARCH=slc3_ia32_gcc323 # Init CMS defaults cd ~maartenb/proj/cms/CMSSW_1_1_1 . /app/cms/cmsset_default.sh # Init runtime environment scramv1 runtime -sh > /tmp/dummy cat /tmp/dummy 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: CMS CMSSW: software framework provides EDAnalyzer technology for analysis purpose Write algorithms that can be used with both technologies (EDAnalyzer and TSelector) Possible if well defined interface class MyAnalysisAlgorithm { void process( const edm::Event & ); void postProcess( TList & ); void terminate( TList & ); }; Used in a TSelector templated framework TFWLiteSelector TFWLiteSelector<MyAnalysisAlgorithm> 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

Integration with experiment software: CMS In PROOF selectors libraries distributed as PAR file // Load framework library gSystem->Load(“libFWCoreFWLite”); AutoLibraryLoader::enable(); // Load TSelector library gSystem->Load(“libPhysicsToolsParallelAnalysis”); 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Outline PROOF essentials Integration with experiment software ALICE experience at the CAF 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF CERN Analysis Facility used for short / medium tasks p-p prompt analysis, Pb-Pb pilot analysis Calibration & Alignment Alternative to using the Grid Massive execution of jobs vs. fast response time Available to the whole collaboration number of users will be limited for efficiency reasons Design goals 500 CPUs, 200 TB of selected data locally available 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE at CAF: example of data distribution Total: 200 TB 20% Last day RAW events 3.2M PbPb or 40M pp 20% Fixed RAW events 1.6M PbPb and 20M pp 20% Fixed ESDs 8M PbPb and 500M pp 40 % Cache for files retrieved from AliEn Grid, Castor Sizes of single events from Computing TDR 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF: test setup Test setup since May 2006 40 machines, 2 CPUs each (Xeon 2.8 Ghz), ~200 GB disk 5 as development partition, 35 as production partition Machine pools are managed by xrootd Fraction of data of Physics Data Challenge ’06 distributed (~ 1 M events) Tests performed Usability tests Speedup tests Evaluation of the system when running a combination of query types Integration with ALICE’s analysis framework (AliROOT) 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF: realistic stress test A realistic stress test consists of different users that submit different types of queries 4 different query types 20% very short queries (0.4 GB) 40% short queries (8 GB) 20% medium queries (60 GB) 20% long queries (200 GB) User mix 33 nodes available for the test Maximum average speedup for 10 users = 6.6 (33 nodes = 66 CPUs) 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF: query types Name # files # evts processed data avg. time* Submission Interval VeryShort 20 2K 0.4 GB 9 ± 1 s 30 ± 15 s Short 40K 8 GB 150 ± 10 s 120 ± 30 s Medium 150 300K 60 GB 1,380 ± 60 s 300 ± 120 s Long 500 1M 200 GB 4,500 ± 200 s 600 ± 120 s *run in PROOF, 10 users, 10 workers each 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF: speed-up 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

ALICE experience at the CAF: speed-up Theoretical batch limit achieved and by-passed automatically Machines load was 80-90% during the test Adding workers may be inefficient Tune number of workers for optimal response Depends on query type and internals Number, type, size of output objects Shows importance of active scheduling 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Summary PROOF provides an alternative approach to HEP analysis on farms trying to automatically avoid under-usage, preserving the goodies of interactivity Real issue is data access (everybody affected!) pre-fetching and asynchronous techniques help Alternative technologies (e.g. InfiniBand) or alternative ideas (PetaCache) worth to be investigated ALICE is pioneering the system in LHC environment using a test-CAF at CERN CMS manifested its interest and test-clusters are being set up A lot of useful feedback: PROOF is steadily improving 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Credits PROOF team M. Ballintijn, B. Bellenot, L. Franco, G.G, J. Iwaszkiewizc, F. Rademakers J.F. Grosse-Oetringhaus, A. Peters (ALICE) I. Gonzales, L. Lista (CMS) A. Hanushevsky (SLAC) C. Reed (MIT, Phobos) 15/03/2007 G. Ganis, 2nd LCG-France Colloquium

G. Ganis, 2nd LCG-France Colloquium Questions? 15/03/2007 G. Ganis, 2nd LCG-France Colloquium