june 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel and Distributed Systems Group Delft University of Technology
june 6, 2002D.H.J. Epema/PDS/TUD2 Introduction (1) In multicluster systems (like the DAS, in GRIDs), jobs may use co-allocation (i.e., span multiple clusters): –to use available capacity –to process geographically spread data Single-application performance issues: –application restructuring –wide-area runtime systems (e.g., optimize collective communication operations) Multiple-application performance issues: –design/analyze scheduling policies –minimize response time, maximize maximal utilization
june 6, 2002D.H.J. Epema/PDS/TUD3 Introduction (2): Example In april 2001, the Cactus Computational Toolkit was used for four-hour astrophysics simulations involving Einstein’s General Relativity equations Equipment: –At NCSA: 480 CPUs of three SGI Origin2000 systems –At SDSC: 1020 CPUs of Blue Horizon –OC Mbit/s network
june 6, 2002D.H.J. Epema/PDS/TUD4 Introduction (3): Problems time processors (pattern: idle) fits with if flexible fits with if unordered cluster 1 cluster 2 cluster 3 job: 1 23
june 6, 2002D.H.J. Epema/PDS/TUD5 System Model Multicluster system consisting of clusters of processors of equal speed Communication speed ratio : the ratio of the wide-area and local message transfer times ….
june 6, 2002D.H.J. Epema/PDS/TUD6 Job Components A job consists of job components that each go to a single cluster, one task per processor Distributions of job-component sizes: –Uniform: U[a,b] –Truncated and adapted geometric (favors small sizes and powers of 2): D(q) on [1,b] …. job system
june 6, 2002D.H.J. Epema/PDS/TUD7 Job Request Types (1) Ordered and unordered requests specify their job-component sizes: Ordered: Unordered: …. ?
june 6, 2002D.H.J. Epema/PDS/TUD8 Job Request Types (2) Flexible and total requests only specify the total number of processors needed: flexible: total: …. ?
june 6, 2002D.H.J. Epema/PDS/TUD9 Fitting a Job (1) It is clear when an ordered or a total request fits For an unordered request: –order components according to decreasing sizes –use First-Fit (FF) or Worst-Fit (WF) …. job system WF.… in use idle
june 6, 2002D.H.J. Epema/PDS/TUD10 Fitting a Job (2) For a flexible request: –determine minimal number of clusters needed –fill least-loaded clusters (CF) completely, or balance load (LB) (variation: LB-A) CFLB in use idle job
june 6, 2002D.H.J. Epema/PDS/TUD11 Scheduling Policies First Come First Served Fit Processors First Served: search queue for jobs that fit job queue …. system
june 6, 2002D.H.J. Epema/PDS/TUD12 Interarrival/Service Times Poisson arrival process in simulations All tasks in a job have the same service time Service-time distributions used: –Deterministic (mean 1) –Exponential (mean 1) –Hyperexponential (mean 1, coeff. of var. 3) –Derived from the DAS
june 6, 2002D.H.J. Epema/PDS/TUD13 Communication We model jobs without and with communication With communication: –tasks alternate between compute and communication phases –communication phase: all-to-all personalized communication –time for a single local synchronous message send operation: –communication speed ratios considered: 1-100
june 6, 2002D.H.J. Epema/PDS/TUD14 Single-cluster DAS Statistics service timenodes requested number of jobs mean: coeff. of var.: 1.11 mean: (62.66) coeff. of var.: 5.37
june 6, 2002D.H.J. Epema/PDS/TUD15 Performance Evaluation Parameters we vary: –job request structure –job-component-size distribution –service-time distribution –number and sizes of clusters (base case: 4x32) –placement of unordered and flexible jobs –scheduling policy –communication speed ratio –co-allocation versus no co-allocation –queueing structure (global/local) Performance metrics: –mean response time (only simulation) –maximal utilization (analysis and simulation)
june 6, 2002D.H.J. Epema/PDS/TUD16 Influence of Structure and Size response time total ordered unordered utilization distributionmeancoeff.of var. U[1,7] D(0.9) on [1,8] D(0.768)on[1,32] U[1,14] D(0.894)on[1,32]
june 6, 2002D.H.J. Epema/PDS/TUD17 Influence of Communication Speed Ratio utilization response time response time Right to left: total, flexible, unordered, ordered
june 6, 2002D.H.J. Epema/PDS/TUD18 Co-Allocation versus no Co-Alloc. (1) utilization response time flexible 2 components 4 components 1 component no communication unordered jobs job size: 4xD(0.9) on [1,8] (fits on a single cluster)
june 6, 2002D.H.J. Epema/PDS/TUD19 Co-allocation versus no Co-alloc. (2) utilization response time LB-A, ratio 5 LB-A, ratio 50 no co-allocation, FF communication flexible jobs job size: 4xD(0.9) on [1,8]
june 6, 2002D.H.J. Epema/PDS/TUD20 An Application on the DAS (1) Solves the Poisson equation with a red-black Gauss-Seidel scheme Measurements on the DAS (times in ms): Time for diffusing local errors and computing the global error: 14 ms Configuration on unit square number of iterations updateexchange borders, single cluster exchange borders, multicluster 4x x
june 6, 2002D.H.J. Epema/PDS/TUD21 An Application on the DAS (2) utilization response time Equal mix of jobs of sizes (2,2,2,2) and (4,4,4,4) total ordered
june 6, 2002D.H.J. Epema/PDS/TUD22 Maximal Utilization (1) Assume: constant backlog, ordered jobs, exponential service (no communication) Consider: the joint probability distribution of the sizes of jobs in the system Result: this distribution is the same –when the system runs for a long time –when the system is filled from the empty state Use the convolution of the job-size distribution to determine the distribution of the numbers of jobs in the system Compute the maximal utilization
june 6, 2002D.H.J. Epema/PDS/TUD23 Maximal Utilization (2) We have an approximation for the maximal utilization for unordered jobs with WF We use simulations to validate this approximation Capacity loss (1-max. util.) for 4 clusters of size 32, uniform job-component sizes: abordered (exact) unordered (approx.) unordered (simul.) total (exact)