Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

Similar presentations


Presentation on theme: "Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous."— Presentation transcript:

1 Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous cluster of processors, but processors at different sites have different speeds Each site has a homogeneous cluster of processors, but processors at different sites have different speeds Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs Research on parallel job scheduling has concentrated primarily on the homogeneous context Research on parallel job scheduling has concentrated primarily on the homogeneous context The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context Problem Addressed A Characterization of Approaches to Parrallel Job Scheduling Backfilling Conservative Conservative Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. EASY EASY Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation Backfilling Backfilling A later arriving job is allowed to leap frog previously queued jobs A later arriving job is allowed to leap frog previously queued jobs Processors Time Processors Simulation Environment Heterogeneous sites, with a homogeneous cluster of processors at each site Heterogeneous sites, with a homogeneous cluster of processors at each site 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes 24.235.694.89320.328* LU Class B (256 Nodes) 1.1* 1.8 1.82.27241.3147 MG Class B (256 Nodes) 17.2*25.334.335.5 MG Class B (8 Nodes) 17.716.3*22.623.3 IS Class B (8 Nodes) IBM SP (P2SC 160 MHz) Cray T3E 900 IBM SP (WN/66)SGI Origin 2000 * Denotes best runtime for the job We use the following metrics for evaluating the proposed schemes We use the following metrics for evaluating the proposed schemes Average Slowdown Average Slowdown Average Turnaround Time Average Turnaround Time Utilization Utilization Effective Utilization Effective Utilization Metrics Conservative vs. Arrgessive Jobs are processed in arrival order by the meta-scheduler Jobs are processed in arrival order by the meta-scheduler Greedy assigns each job to the site with the lowest instantaneous load Greedy assigns each job to the site with the lowest instantaneous load Greedy-MR (Multiple Requests) submits each job to all sites Greedy-MR (Multiple Requests) submits each job to all sites When the job starts at a site, the other instances are removed When the job starts at a site, the other instances are removed We have shown this mechanism to be effective in a homogenous context (HPDC ’02) We have shown this mechanism to be effective in a homogenous context (HPDC ’02) However, only a slight improvement is seen in a heterogeneous context However, only a slight improvement is seen in a heterogeneous context Jobs are processed in arrival order by the meta-scheduler Jobs are processed in arrival order by the meta-scheduler In a heterogeneous context, the site where the job starts the earliest may not be the best site In a heterogeneous context, the site where the job starts the earliest may not be the best site In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site Conservative performs better than aggressive in all case, quite the opposite of a homogenous context Conservative performs better than aggressive in all case, quite the opposite of a homogenous context Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site Aggressive vs. Conservative Explicitly take into account efficacy to improve the effective utilization Explicitly take into account efficacy to improve the effective utilization Use efficacy as the priority order for the jobs in the reserved and idle queue Use efficacy as the priority order for the jobs in the reserved and idle queue Starvation free Starvation free Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization Efficacy Based Queues Conclusions and Future Work Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation Next Steps: Next Steps: Incorporate these changes into the Silver/Maui Scheduler Incorporate these changes into the Silver/Maui Scheduler Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites Restricted Multi-Site Reservations Gerald Sabin Rajkumar Kettimuthu Arun Rajan P Sadayappan Supported in part by Sandia National Laboratory


Download ppt "Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous."

Similar presentations


Ads by Google