Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse.

Similar presentations


Presentation on theme: "1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse."— Presentation transcript:

1 1 Grid Scheduling Cécile Germain-Renaud

2 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model) Schedule – s(J) = date to begin execution of task J –Alloc(J) = machine assigned to J One of the oldest Computer Science problems Principles of classification: [Graham et al. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math. 5, (1979), 287-326] Computer-aided classification of complexity results (4536 at the time of the paper) [Lageweg et al. Computer-Aided complexity classification of combinational problems. CACM 11:2, 1892]

3 3 Classical scheduling in HPC Context: parallel computing/computers Application = Direct Acyclic Graph (T, E, w, c) –T = set of sequential tasks –E = dependence constraints –w(t) = computational cost of task t –c(t,t’) = communication cost (data sent from t to t’ ) Infrastructure –P identical processors –With or without preemption, dedicated (no sharing) An optimization problem with objective function Makespan = Total execution time S (T) = max (s(t) + w(t)) Complexity –NP-complete for independant tasks and no communication E = vide, p =2 and c = 0 1 –NP-complete for UET-UCT graphs ( w = c = 1) –Very old: without communication, list scheduling provides a (2-1/p) approximation T T’

4 4 Scheduling in Institutional Grids Institutional: federation of ressources –accounted-for: fair-share on the medium to long time scale is a premium constraint –Partially autonomous local policies must be allowed Grid –Permanent regime: on-line decisions –Large scale: strongly distributed Information system Scheduling services Relevant contexts –Autonomous, multi-agents systems –Auction algorithms –Service Level Agreement (SLA) technology

5 5 EGEE gLite Scheduling Broker UI Local scheduler Site (node) CE Proc Broker UI

6 6 EGEE gLite Scheduling Broker UI Local scheduler Site (node) CE Proc Broker UI BDII Publish

7 7 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Rank The information published is Static: eg which type of VO is accepted Dynamic: expected traversal time

8 8 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Rank Rank: may be any user-defined function, e.g. avoid « bad » machines Default is first locality, second expected traversal time

9 9 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Update BDII broker cache

10 10 Not only academic Execution time (s) Overhead Ratio Long waiting times When EGEE was not so heavily loaded

11 11 Batch scheduling Very complex policies Maximise throughput under constraints –Weighted fair-share – VOs, type of jobs –Priorities –Hardware requirements –Advance reservations An indication of job duration is given by the type of queue: infinite, long, medium, short, and exotic ones [B. Bode et al.The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters]

12 12 Classical vs Grid (Relatively) easy: –Throughput instead of makespan + Master-slave graph instead of DAG allow for instance to define cyclic schedules in polynomial time which are asymptotically optimal, but not local [Y. Robert] [A. Rosenberg] Moderately difficult: information about –Applications –Infrastructures The same program on different data may run at very different speed The network performance is dynamic Really difficult –Queues managed by local policies –On-line decision

13 13 Information and Scheduling (I) Considerable work has been done in predicting CPU load in shared environments – desktops, clusters, desktop grids [P.A. Dinda, R. Wolski, J. Schopf] –The basic technique is linear time-series analysis –Self-similarity and epochal behavior –Usual goal is the prediction of the next value –Applied to soft real-time scheduling on shared clusters –Practical application in NWS z t = +  (B)(B)  (B)(1 – B)d atat

14 14 Information and scheduling (III) Less work on predicting the behavior of dedicated systems Papers are on parallel systems, mostly based on time- series techniques, but at least one based on a genetic algorithm [Downey, Foster, Wolski] The traces are much more difficult to access No time slice - Irregular time series: the records are event-driven Which analysis –Average waiting time: clear but not very useful for prediction –Fitting a distribution: not convincing for // systems –Predicting an upper bound with a confidence interval: metric of success?

15 15 Information and grid We cannot directly log the entire state of the system –Access rights –Size Currently available data –The lifecycle of jobs going through certain brokers –The job ranking at the same brokers –The detailed behavior of the queues on certain sites –Certain = LAL + possibly other mainstream Easy to get –Summary data about the lifecycle of all jobs –From which it could be possible to reconstruct the detailed state and dynamic of the CE

16 16 What should we learn ? Learning besides time series make sense in a grid: massive use of community programs instead of (?) sparse runs of a very long and complex digital experiment Information as sketched before –Beware: not be a steady-state system New users, new machines, new software is the expected regime for some years from now A community-based resource will tend display correlated activity –Is there an invariant social graph? Is it a feature? System algorithms e.g. a site scheduler or the broker –Validation ? Scheduling algorithms –Validation ?


Download ppt "1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse."

Similar presentations


Ads by Google