Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB.

Similar presentations


Presentation on theme: "1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB."— Presentation transcript:

1 1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB and not the PB to the KB

2 2 What is this talk about G. Ganis, CHEP06, 13 Feb 2006 End-user analysis of HEP data on distributed systems using the ROOT data model ROOT ?  Package providing:  Efficient data storage supporting structured data sets  Efficient query system to access the information  Complete set of tools for scientific analysis  Advanced 2D / 3D visualization and GUI systems  C++ interpreter HEP data ?  Collections of independent events

3 3 The ROOT data model: Trees & Selectors G. Ganis, CHEP06, 13 Feb 2006 Begin() Create histos, … Define output list Process() preselection analysis Terminate() Final analysis (fitting, …) output list Selector loop over events OK event branch leaf branch leaf 12 n last n read needed parts only Chain branch leaf

4 4 End-User Analysis scenarios G. Ganis, CHEP06, 13 Feb 2006  Analysis performance typically I/O bound  Analysis requirements of LHC experiments (~100 users)  CPU: ~ MSI2k (~ 250 Intel Core Duo)  Storage: ~ 0.3-1.6 PB storage  Bandwidth: ~ 100 MB/s Intrinsic parallelism of data classically exploited by splitting long analysis jobs in smaller sub-jobs addressing different portions of data and submitted (run) concurrently Farms of few hundred nodes Good access to data required

5 5 “Classic” approach G. Ganis, CHEP06, 13 Feb 2006 Storage Batch farm queues manager outputs catalog query  “static” use of resources  jobs frozen: 1 job / worker node  “manual” splitting, merging  limited monitoring (end of single job) submit files jobs data file splitting myAna.C merging final analysis

6 6 The PROOF approach G. Ganis, CHEP06, 13 Feb 2006 catalog Storage PROOF farm scheduler query  farm perceived as extension of local PC  same macro, syntax as in local session  more dynamic use of resources  real time feedback  automated splitting and merging MASTER PROOF query: data file list, myAna.C files final outputs (merged) feedbacks (merged)

7 7 PROOF – Multi-tier Architecture G. Ganis, CHEP06, 13 Feb 2006 good connection ? VERY importantless important Optimize for data locality or efficient data server access adapts to cluster of clusters or wide area virtual clusters Geographically separated domains; heterogenous machine types proofd

8 8 PROOF: ingredients G. Ganis, CHEP06, 13 Feb 2006  PROOF servers are full-featured ROOT applications  communication layer setup via light daemons ( (x)proofd )  authentication: password-based, GSI, Kerberos, …  Dynamic load balancing  Pull architecture: workers ask for work when finished  faster workers get more work  Merging infrastructure via Merge()  implemented for standard objects: histograms, trees, …  user-defined strategy by overloading / defining Merge()  Feedback at tunable frequency:  Standard statistics histograms (events/packets per node, …)  Temporary version of any output object  Package manager for optimized upload of additional libraries needed by the analysis

9 9 PROOF – Scalability G. Ganis, CHEP06, 13 Feb 2006 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2x75 GB 15K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8.8 Gbyte in 128 files, 9 million events Efficiency ~ 90 % 8.8GB, 128 files 1 node: 325 s 32 nodes in parallel: 12 s  Case of data locality

10 10 PROOF: data access and scheduling issues G. Ganis, CHEP06, 13 Feb 2006  Low latency in data access is essential  Minimize file opening overhead (asynchronous open)  Caching, asynchronous read-ahead of required segments  Supported by xrootd  Scheduling of large numbers of users:  Interface to generic resource broker to optimize the load  Batch systems have this already:  concrete implementations exist for LSF, Condor  planned for BQS, Sun Grid Engine, …  On the GRID, use available services to determine the session configuration F. Furano, #368, Feb 15th, 16:20 A. Hanushevsky, #407, Feb 15th, 17:00

11 11 PROOF @ GRID G. Ganis, CHEP06, 13 Feb 2006 PROOFMASTERSERVER USER SESSION Guaranteed site access through PROOF Sub-Masters calling out to Master (agent technology) Grid/Root Authentication Grid Access Control Service TGrid UI/Queue UI Proofd Startup GRID Service Interfaces Grid File/Metadata Catalogue Client retrieves list of logical files (LFN + MSN) Slave servers access data via xrootd from local disk pools PROOF SLAVE SERVERS PROOFSUB-MASTERSERVER PROOFSUB-MASTERSERVER PROOFSUB-MASTERSERVERROOT Demo’ed by ALICE at SC03, SC04, …

12 12 What is our goal?

13 13 Typical end-user job-length distribution G. Ganis, CHEP06, 13 Feb 2006 Interactive analysis using local resources, e.g. - end-analysis calculations - visualization Analysis jobs with well defined algorithms (e.g. production of personal trees) Medium term jobs, e.g. analysis design and development using also non-local resources Goal: bring these to the same level of perception

14 14 Sample of analysis activity G. Ganis, CHEP06, 13 Feb 2006 AQ1: 1s query produces a local histogram AQ2: a 10mn query submitted to PROOF1 AQ3->AQ7: short queries AQ8: a 10h query submitted to PROOF2 BQ1: browse results of AQ2 BQ2: browse temporary results of AQ8 BQ3->BQ6: submit 4 10mn queries to PROOF1 CQ1: Browse results of AQ8, BQ3->BQ6 Monday at 10h15 ROOT session on my laptop Monday at 16h25 ROOT session on my laptop Wednesday at 8h40 Browse from any web browser

15 15 What do we have now? G. Ganis, CHEP06, 13 Feb 2006  Most of the ingredients in place  support for multi- sessions  asynchronous (non-blocking) running mode  support for disconnect / reconnect  Use xrootd as launcher of server sessions  query-results classification and management  retrieve / archive / remove  Command-line controlled via TProof API …  … but also GUI controlled Virtual Demo

16 16 A real PROOF session - connection G. Ganis, CHEP06, 13 Feb 2006 Predefined session Define new session Session startup progress bar Session startup status ready

17 17 A real PROOF session - package manager G. Ganis, CHEP06, 13 Feb 2006 Package tab PAR (Proof ARchive)  ROOT-INF directory, BUILD.sh, SETUP.C  Control setup of each worker

18 18 A real PROOF session - query definition and running G. Ganis, CHEP06, 13 Feb 2006 name Execute to create chain Select chain Choose selector Feedback histograms Processing information

19 19 A real PROOF session: query browsing and finalization G. Ganis, CHEP06, 13 Feb 2006 Details about the query Folder with output objects raw histogram finalization

20 20 A real PROOF session: disconnection / reconnection G. Ganis, CHEP06, 13 Feb 2006  Running sessions kept alive by server side coordinator  Reconnection is much faster: no process to fork disconnect reconnect The query is now terminated

21 21 A real PROOF session: chain viewer G. Ganis, CHEP06, 13 Feb 2006 Chain viewer Right-click

22 22 Ongoing activities / Near Future Plans G. Ganis, CHEP06, 13 Feb 2006  Data file upload manager  optimally distribute data on the cluster storage  keep track of existing data sets for optimized re-run  Dynamic cluster configuration  come-and-go functionality for worker nodes  olbd network to get info about the load on the cluster  Optimizations  packetizer (re-assignment of being-processed packets to fast idle slaves)  data access (fully exploit asynchronous features of xrootd)  Monitoring of cluster behaviour  MonAlisa: allows definition of ad hoc parameters, e.g. I/O / node / query  Improve handling of error conditions  identify cases hanging the system, improve error logging, …  exploit olbd control network for better overview of the cluster  Testing and consolidation  Documentation

23 23 Who’s using PROOF? G. Ganis, CHEP06, 13 Feb 2006  ALICE (see, e.g., PROOF@AliEn at CHEP04; LHCC review Nov 2005)  PHOBOS  CMS analysis prototype Development test-beds  Currently using CERN phased out machines:  35 dual Pentium III 800 MHz / 512 MB RAM  100 MBit/s Ethernet  600 GB total hard disk  Contact with Lyon / CC-IN2P3 (ALICE)  up to 16 dual Xeon 2.8 GHz, 200 GB scratch each  Request for new test-bed at CERN (LCG / ALICE / CMS) M. Ballantijn, #374, Feb 15th, 16:40 I. Gonzáles, #267, Feb 14th, 16:00

24 24 People working on the project G. Ganis, CHEP06, 13 Feb 2006  B. Bellenot, M. Biskup, R. Brun, G. Ganis, J. Iwaszkiewicz, G. Kickinger, P. Nilsson, A. Peters, F. Rademakers (CERN)  M. Ballintijn, C. Loizides, C. Reed (MIT)  D. Feichtinger (PSI)  P. Canal (FNAL)

25 25 The End G. Ganis, CHEP06, 13 Feb 2006  Questions? Links  http://root.cern.ch/root/PROOF.htmlhttp://root.cern.ch/root/PROOF.html  Discussion topic at the ROOT forum http://root.cern.ch/phpBB2/

26 26 PROOF – backup slides G. Ganis, CHEP06, 13 Feb 2006  Additional recent improvements  Stateless connection with XrdProofd  XrdProofd basics  Connection layer  Pull architecture  PROOF@AliEn: command-line session

27 27 Additional recent improvements G. Ganis, CHEP06, 13 Feb 2006  Session startup  Parallel startup with threads  Optimized sequential startup  Startup status and progress bar  New progressive packetizer: open files as needed  continuously re-estimate # of entries  can order files based on availability  Draw() and viewer functionality via PROOF

28 28 Interactive batch: stateless connection with XrdProofd G. Ganis, CHEP06, 13 Feb 2006  Needs coordination of alive sessions when clients disconnect  existing proofd required deep re-design  xrd (networking, work dispatching layer of xrootd)  generic framework to handle protocols  used already to launch rootd for TNetFile clients  Dedicated protocol to launch and manage PROOF sessions  forked in separated processes to protect from client bugs  talking to coordinator via UNIX socket  Disconnect / reconnect handled naturally  Asynchronous reading allows to setup a control interrupt network independent of OOB  Xrd/olbd control network used to query status information A.Hanushevsky, #407, Feb 15th, 17:00

29 29 XrdProofd basics G. Ganis, CHEP06, 13 Feb 2006  Prototype based on XROOTD XROOTD links XrdXrootdProtocol files MT stuff user 1 XPD links XrdProofdProtocol proofserv  XrdProofdProtocol: client gateway to proofserv  static area for all client information and its activities static area MT stuff Worker servers user 1 user 2 … …

30 30 … client xc slave n XrdProofd XS slave 1 XrdProofd XS master XrdProofd XS xc XRD links TXSocket xc proofserv xc fork() proofslave xc fork() proofslave xc fork() XrdProofd communication layer G. Ganis, CHEP06, 13 Feb 2006

31 31 Workflow: Pull Architecture G. Ganis, CHEP06, 13 Feb 2006 dynamic load balancing naturally achieved

32 32 PROOF @ AliEn: command-line session G. Ganis, CHEP06, 13 Feb 2006  TGrid : abstract interface for all services // Connect TGrid *alien = TGrid::Connect(“alien://”); // Query TString path= “/alice/cern.ch/user/p/peters/analysis/miniesd/”; TGridResult *res = alien->Query(path, ”*.root“); // Create chain from list of files TChain chain(“Events", “session“, res->GetFileInfoList()); // Open a PROOF session TProof *proof = TProof::Open(“proofmaster”); // Process your query chain.Process(“selector.C”);


Download ppt "1 PROOF The Parallel ROOT Facility Gerardo Ganis / CERN CHEP06, Computing in High Energy Physics 13 – 17 Feb 2006, Mumbai, India Bring the KB to the PB."

Similar presentations


Ads by Google