Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

Slides:

Advertisements

Similar presentations

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.

Hadi Goudarzi and Massoud Pedram

Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Studies of the User-Scheduler Relationship Cynthia Bailey Lee Advisor: Allan E. Snavely Department of Computer Science and Engineering San Diego Supercomputer.

Scheduling on Parallel Systems - Sathish Vadhiyar.

Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

The Forgotten Factor: FACTS on Performance Evaluation and its Dependence on Workloads Dror Feitelson Hebrew University.

Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.

Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.

Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

Is 99% Utilization of a Supercomputer a Good Thing? Scheduling in Context: User Utility Functions Cynthia Bailey Lee Department of Computer Science and.

Copyright 2006, Jeffrey K. Hollingsworth Grid Computing Jeffrey K. Hollingsworth Department of Computer Science University of Maryland,

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.

Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.

December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.

Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.

1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE On pearls and perils of hybrid OpenMP/MPI programming.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.

1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.

Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.

Scheduling Generic Parallel Applications –Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.

Power-Aware Parallel Job Scheduling

1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo

System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.

Faucets Queuing System Presented by, Sameer Kumar.

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

Real-Time systems By Dr. Amin Danial Asham.

Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.

1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.

Job Scheduling P. (Saday) Sadayappan Ohio State University.

QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.

Static Process Scheduling

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Basic Concepts Maximum CPU utilization obtained with multiprogramming

Introduction to Load Balancing:

Memory Allocation The main memory must accommodate both:

Process Scheduling B.Ramamurthy 9/16/2018.

Lottery Scheduling Ish Baid.

Process Scheduling B.Ramamurthy 11/18/2018.

Module 5: CPU Scheduling

3: CPU Scheduling Basic Concepts Scheduling Criteria

P. (Saday) Sadayappan Ohio State University

A Characterization of Approaches to Parrallel Job Scheduling

CPU SCHEDULING.

Operating System Concepts

ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC

Chapter 4: Simulation Designs

Module 5: CPU Scheduling

Module 5: CPU Scheduling

Presentation transcript:

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous cluster of processors, but processors at different sites have different speeds Each site has a homogeneous cluster of processors, but processors at different sites have different speeds Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs Much of the research on scheduling in heterogeneous systems has focussed on independent sequential jobs Research on parallel job scheduling has concentrated primarily on the homogeneous context Research on parallel job scheduling has concentrated primarily on the homogeneous context The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs The algorithms used for scheduling sequential tasks on heterogeneous systems are too computationally complex to extend to parallel jobs We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context We extend the techniques used for parallel job scheduling in a homogeneous context to the heterogeneous context Problem Addressed A Characterization of Approaches to Parrallel Job Scheduling Backfilling Conservative Conservative Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. Every job is given a reservation when it enters the system and a job is allowed to backfill only if it does not violate any of the previous reservations. EASY EASY Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation Only the job at the head of the queue is given a reservation and a job is allowed to backfill if it does not violate this reservation Backfilling Backfilling A later arriving job is allowed to leap frog previously queued jobs A later arriving job is allowed to leap frog previously queued jobs Processors Time Processors Simulation Environment Heterogeneous sites, with a homogeneous cluster of processors at each site Heterogeneous sites, with a homogeneous cluster of processors at each site 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) 5000 job subset of either the 430 node Cornell Theory Center (CTC) trace or the 128 node IBM SP2 system at the San Diego Supercomputer Center (SDSC) NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes NAS Parallel Benchmarks 2.0 used to model the heterogeneous runtimes * LU Class B (256 Nodes) 1.1* MG Class B (256 Nodes) 17.2* MG Class B (8 Nodes) * IS Class B (8 Nodes) IBM SP (P2SC 160 MHz) Cray T3E 900 IBM SP (WN/66)SGI Origin 2000 * Denotes best runtime for the job We use the following metrics for evaluating the proposed schemes We use the following metrics for evaluating the proposed schemes Average Slowdown Average Slowdown Average Turnaround Time Average Turnaround Time Utilization Utilization Effective Utilization Effective Utilization Metrics Conservative vs. Arrgessive Jobs are processed in arrival order by the meta-scheduler Jobs are processed in arrival order by the meta-scheduler Greedy assigns each job to the site with the lowest instantaneous load Greedy assigns each job to the site with the lowest instantaneous load Greedy-MR (Multiple Requests) submits each job to all sites Greedy-MR (Multiple Requests) submits each job to all sites When the job starts at a site, the other instances are removed When the job starts at a site, the other instances are removed We have shown this mechanism to be effective in a homogenous context (HPDC ’02) We have shown this mechanism to be effective in a homogenous context (HPDC ’02) However, only a slight improvement is seen in a heterogeneous context However, only a slight improvement is seen in a heterogeneous context Jobs are processed in arrival order by the meta-scheduler Jobs are processed in arrival order by the meta-scheduler In a heterogeneous context, the site where the job starts the earliest may not be the best site In a heterogeneous context, the site where the job starts the earliest may not be the best site In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site In order to get the completion of a job at a particular site, conservative backfilling has to be employed at the local site Conservative performs better than aggressive in all case, quite the opposite of a homogenous context Conservative performs better than aggressive in all case, quite the opposite of a homogenous context Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site Improved backfilling caused by holes created due to the dynamic removal of replicated jobs at each site, and an increased number of jobs to attempt to backfill at each site Aggressive vs. Conservative Explicitly take into account efficacy to improve the effective utilization Explicitly take into account efficacy to improve the effective utilization Use efficacy as the priority order for the jobs in the reserved and idle queue Use efficacy as the priority order for the jobs in the reserved and idle queue Starvation free Starvation free Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization Effective utilization increases and turn around time decreases, in spite of the decreases in raw utilization Efficacy Based Queues Conclusions and Future Work Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation Improvement in turn around time and effective utilization for parallel job scheduling in a heterogeneous grid environment have been demonstrated through simulation Next Steps: Next Steps: Incorporate these changes into the Silver/Maui Scheduler Incorporate these changes into the Silver/Maui Scheduler Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center Deploy and evaluate the scheduler first on our research clusters, and then at the Ohio Super Computer Center When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations When network bandwidth is limited, jobs can be submitted to a smaller number of sites and the multi-reservation scheduler can still realize a substantial fraction of the benefits achievable from a scheduler that schedules each job at all sites, by making fewer reservations Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time Use a more accurate approach to select which sites the job is submitted, instead of using the instantaneous load, we query the site to determine the earliest completion time When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites When using completion time as the criteria, submitting to fewer sites can be almost as effective as submitting to all sites Restricted Multi-Site Reservations Gerald Sabin Rajkumar Kettimuthu Arun Rajan P Sadayappan Supported in part by Sandia National Laboratory