Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

Similar presentations


Presentation on theme: "PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University."— Presentation transcript:

1 PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University of Warsaw

2 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT2 Outline Introduction to Parallel ROOT Facility Introduction to Parallel ROOT Facility Packetizer – load balancing Packetizer – load balancing Resource Scheduling Resource Scheduling

3 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT3 Analysis of the Large Hadron Collier data Necessity of distributed analysis Necessity of distributed analysis ROOT – popular particle physics data analysis framework ROOT – popular particle physics data analysis framework PROOF (ROOT’s extension) – automatically parallelizes processing to computing clusters or multicore machines PROOF (ROOT’s extension) – automatically parallelizes processing to computing clusters or multicore machines

4 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT4 Who is using PROOF PHOBOS PHOBOS –MIT, dedicated cluster, interfaced with Condor –Real data analysis, in production ALICE ALICE –CERN Analysis Facility (CAF) CMS CMS –Santander group, dedicated cluster –Physics TDR analysis Very positive experience functionality, large speedup, efficient functionality, large speedup, efficient But not really the LHC scenario Usage limited to a few experienced users Usage limited to a few experienced users

5 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT5 Using PROOF: example PROOF is designed for analysis of independent objects, e.g. ROOT Trees (basic data format in partice physics) PROOF is designed for analysis of independent objects, e.g. ROOT Trees (basic data format in partice physics) Example of processing a set of ROOT trees: Example of processing a set of ROOT trees: // Create a chain of trees root[0] TChain *c = CreateMyChain(); // MySelec is a TSelector root[1] c->Process(“MySelec.C+”); // Create a chain of trees root[0] TChain *c = CreateMyChain(); // Start PROOF and tell the chain // to use it root[1] TProof::Open(“masterURL”); root[2] c->SetProof() // Process goes via PROOF root[3] c->Process(“MySelec.C+”); PROOFLocal ROOT

6 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT6 Classic batch processing Storage Batch farm queues manager catalog query submit files jobs data file splitting myAna.C merging final analysis  static use of resources  jobs frozen: 1 job / worker node  external splitting, merging outputs

7 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT7 PROOF processing catalog Storage PROOF farm scheduler query MASTER PROOF job: data file list, myAna.C files final outputs (merged) feedbacks (merged)  farm perceived as extension of local PC  same syntax as in local session  more dynamic use of resources  real time feedback  automated splitting and merging

8 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT8 Challenges for PROOF Remain efficient under heavy load Remain efficient under heavy load 100% exploitation of resources 100% exploitation of resources Reliability Reliability

9 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT9 Levels of scheduling The packetizer The packetizer –Load balancing on the level of a job Resource scheduling Resource scheduling (assigning resources to different jobs) –Introducing a central scheduler –Priority based scheduling on worker nodes

10 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT10 Packetizer’s role Lookup – check locations of all files and initiate staging, if needed Lookup – check locations of all files and initiate staging, if needed Workers contact packetizer and ask for new packets (pull architecture) Workers contact packetizer and ask for new packets (pull architecture) A Packet has info on A Packet has info on –which file to open –which part of file to process Packetizer keeps assigning packets until the dataset is processed Packetizer keeps assigning packets until the dataset is processed

11 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT11 PROOF dynamic load balancing Pull architecture guarantees scalability Pull architecture guarantees scalability Adapts to variations in performance Adapts to variations in performance Worker 1Worker N Master packet: unit of work distribution Time

12 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT12 TPacketizer: the original packetizer Strategy Strategy –Each worker processes its local files and then processes remaining remote files –Fixed size packets –Avoid overloading data server by allowing max 4 remote files to be served Problems with the TPacketizer Problems with the TPacketizer –Long tails with some I/O bound jobs

13 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT13 Performance tests with ALICE 35 PCs, dual Xeon 2.8 Ghz, ~200 GB disk 35 PCs, dual Xeon 2.8 Ghz, ~200 GB disk –Standard CERN hardware for LHC Machine pools managed by xrootd Machine pools managed by xrootd –Data of Physics Data Challenge ’06 distributed (~ 1 M events) Tests performed Tests performed –Speedup (scalability) tests –System response when running a combination of job types for increasing # of concurrent users

14 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT14 Example of problems with some I/O bound jobs Processing rate during a query: Resource utilization:

15 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT15 How to improve Focus on I/O based jobs Focus on I/O based jobs –Limited by hard drive or network bandwidth Predict which data servers can become bottlenecks Predict which data servers can become bottlenecks Make sure that other workers help analyzing data from those servers Make sure that other workers help analyzing data from those servers Use time-based packet sizes Use time-based packet sizes

16 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT16 TAdaptivePacketizer Strategy Strategy –Predicting the processing time of local files for each worker –For the workers that are expected to finish faster, keep assigning remote files from the beginning of the job. –Assign remote files from the most heavily loaded file servers –Variable packet size

17 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT17 Improvement by up to 30% TPacketizerTAdaptivePacketizer

18 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT18 Scaling comparison for randomly distributed data set

19 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT19 Resource scheduling Motivation Motivation Central scheduler Central scheduler –Model –Interface Priority based scheduling on worker nodes Priority based scheduling on worker nodes

20 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT20 Why scheduling? Controlling resources and how they are used Controlling resources and how they are used Improving efficiency Improving efficiency –assigning to a job those nodes that have data which needs to be analyzed. Implementing different scheduling policies Implementing different scheduling policies – e.g. fair share, group priorities & quotas Efficient use even in case of congestion Efficient use even in case of congestion

21 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT21 PROOF specific requirements Interactive system Interactive system –Jobs should be processed as soon as submitted. –However when max. system throughput is reached some jobs has to postponed I/O bound jobs use more resources at the start and less at the end (file distribution) I/O bound jobs use more resources at the start and less at the end (file distribution) Try to process data locally Try to process data locally User defines a dataset not the #workers User defines a dataset not the #workers Possibility to remove/add workers during a job Possibility to remove/add workers during a job

22 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT22 Starting a query with a central scheduler (planed) Dataset Lookup Client Master External Scheduler job packetizer Start workers Cluster status User priority, history

23 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT23 Plans Interface for scheduling "per job” Interface for scheduling "per job” –Special functionality will allow to change the set of nodes during a session without loosing user libraries and other settings Removing workers during a job Removing workers during a job Integration with a scheduler Integration with a scheduler –Maui, LSF?

24 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT24 Priority based scheduling on nodes Priority-based worker level load balancing Priority-based worker level load balancing –Simple and solid implementation, no central unit –Group priorities defined in the configuration file Performed on each worker node independently Performed on each worker node independently Lower priority processes slowdown Lower priority processes slowdown – sleep before next packet request

25 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT25 Summary The adaptive packetizer is working very well in current environment. Will be further tuned after introducing the scheduler The adaptive packetizer is working very well in current environment. Will be further tuned after introducing the scheduler Advanced work on PROOF interface to scheduler. Advanced work on PROOF interface to scheduler. Priority-based scheduling on nodes is being tested Priority-based scheduling on nodes is being tested

26 ACAT 23 - 27th of April 2007Jan Iwaszkiewicz, CERN PH/SFT26 The PROOF Team Maarten Ballintijn Maarten Ballintijn Bertrand Bellenot Bertrand Bellenot Rene Brun Rene Brun Gerardo Ganis Gerardo Ganis Jan Iwaszkiewicz Jan Iwaszkiewicz Andreas Peters Andreas Peters Fons Rademakers Fons Rademakershttp://root.cern.ch


Download ppt "PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University."

Similar presentations


Ads by Google