Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium.

Similar presentations


Presentation on theme: "1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium."— Presentation transcript:

1 1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium (IPDPS’05)

2 2 Outline Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

3 3 Introduction 1 A metacomputer is a network of computational resources linked by software in such a way that they can be used as easily as a single computer. A metacomputer is able to support distributed supercomputing applications by combining multiple high-speed high-capacity resources on a computational grid into a single, virtual distributed supercomputer.

4 4 Introduction 2 The most significant result of the paper is that by using any initial order of jobs and any processor allocation algorithm, the list scheduling algorithm can achieve worst-case performance bound with Notation: p is the maximum size of an individual machine P is the total size of a metacomputer s is minimum job size with s ≥ p α is the ratio of the communication bandwidth within a parallel machine to the communication bandwidth of a network β is the fraction of the communication time in the jobs

5 5 Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

6 6 A metacomputer is specified as M = (P 1, P 2,..., P m ), where P j, 1 ≤ j ≤ m, is the name as well as the size (i.e., the number of processors) of a parallel machine. Let P = P 1 +P 2 +…+P m denote the total number of processors. The m machines are connected by a LAN, MAN, WAN, or the Internet. A job J is specified as (s, t), where s is the size of J (i.e., the number of processors required to execute J) and t is J’s execution time. The cost of J is the product st. Given a metacomputer M and a list of jobs L = (J 1, J 2,..., J n ), where J i = (s i, t i ), 1 ≤ i ≤ n, we are interested in scheduling the n jobs on M.

7 7 A schedule of a job J i = (s i, t i ) is τ i is the starting time of J i J i is divided into r i subjobs J i,1, J i,2,..., J i,ri, of sizes s i,1, s i,2,..., s i,ri, respectively, with s i = s i,1 + s i,2 + … + s i,ri The subjob J i,k is executed on P jk by using s i,k processors, for all 1 ≤ k ≤ r i

8 8 Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

9 9 s i processors allocated to J i communicate with each other during the execution of J i. Communication time between two processors residing on different machines connected by a LAN, MAN, WAN, or the Internet is significantly longer than that on the same machine. The communication cost model takes both inter-machine and intra- machine communications into consideration. The execution time t i is divided into two components, t i = t i,comp + t i,comm Each processor on P jk needs to communicate with the s i,k processors on P jk and the s i − s i,k processors on P jk’ with k’ ≠ k. t * I,k, the execution time of the subjob J i,k on P jk, as

10 10

11 11 The execution time of job J i is t * I = max(t * I,1, t * i,2, …, t * I,ri ) we call t * I the effective execution time of job J i. The above measure of extra communication time among processors on different machines discourages division of a job into small subjobs.

12 12 Our job scheduling problem for grid computing on metacomputers can be formally defined as follows: given a metacomputer M = (P 1, P 2,..., P m ) and a list of jobs L = (J 1, J 2,..., J n ), where J i = (s i, t i ), 1 ≤ i ≤ n, find a schedule ψ of L, ψ = (ψ 1, ψ 2,..., ψ n ), with ψ i = (τ i, (P j1, s i,1 ), (P j2, si,2 ),..., (P jri, s i,ri )), where J i is executed during the time interval [τ i, τ i +t * i ] by using s i,k processors on P jk for all 1 ≤ k ≤ ri, such that the total execution time of L on M, is minimized.

13 13 When α = 1, that is, extra communication time over a LAN, MAN, WAN, or the Internet is not a concern, the above scheduling problem is equivalent to the problem of scheduling independent parallel tasks in multiprocessors, which is NP-hard even when all tasks are sequential.

14 14 Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

15 15 A complete description of the list scheduling (LS) algorithm is given in the next slide. There is a choice on the initial order of the jobs in L. Four ordering strategies:  Largest Job First (LJF) – Jobs are arranged such that s 1 ≥ s 2 ≥…≥ s n  Longest Time First (LTF) – Jobs are arranged such that t 1 ≥ t 2 ≥…≥ t n  Largest Cost First (LCF) – Jobs are arranged such that s 1 t 1 ≥ s 2 t 2 ≥…≥ s n t n.  Unordered (U) – Jobs are arranged in any order.

16 16 The number of available processors P’ j on machine P j is dynamically maintained. The total number of available processors is P’ = P’ 1 + P’ 2 + · · · + P’ m

17 17

18 18

19 19 Each job scheduling algorithm needs to use a processor allocation algorithm to find resources in a metacomputer. Several processor allocation algorithms have been proposed, including Naive, LMF (largest machine first), SMF (smallest machine first), and MEET (minimum effective execution time).

20 20 Introduction The Scheduling Model A Communication Cost Model Scheduling Algorithms Worst-Case Performance Analysis Experimental Data

21 21 Let A(L) be the length of a schedule produced by algorithm A for a list L of jobs, and OPT(L) be the length of an optimal schedule of L. We say that algorithm A achieves worst-case performance bound B if A(L)/OPT(L) ≤ B for all L

22 22 Let t * i,LS be the effective execution time of a job J i in an LS schedule. Assume that all the n jobs are executed during the time interval [0, LS(L)]. Let J i be a job which is finished at time LS(L). It is clear that before J i is scheduled at time LS(L) − t * i,LS, there are no s i processors available; otherwise, J i should be scheduled earlier. That is, during the time interval [0, LS(L)−t * i,LS ], the number of busy processors is at least P − s i + 1. During the time interval [LS(L)−t * i,LS, LS(L)], the number of busy processors is at least s i. Define effective cost of L in an LS schedule as Then, we have

23 23 No matter which processor allocation algorithm is used, always have The effective execution time of J i in an optimal schedule is Thus, we get where It is clear that φ i is an increasing function of s i, which is minimized when s i = s. Hence, we have where

24 24 => Since => The right hand side of the above inequality is minimized when =>

25 25 => The right hand side of the above inequality is a decreasing function of S i, which is maximized when S i = s.

26 26 Theorem. If P j ≤ p for all 1 ≤ j ≤ m, and s i ≥ s for all 1 ≤ i ≤ n, where p ≤ s, then algorithm LS can achieve worst-case performance bound where The above performance bound is independent of the initial order of L and the processor allocation algorithm.

27 27 Corollary. If a metacomputer only contains sequential machines, i.e., p = 1, communication heterogeneity vanishes and the worst- case performance bound in the theorem becomes

28 28


Download ppt "1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium."

Similar presentations


Ads by Google