Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.

Slides:



Advertisements
Similar presentations
Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/
Advertisements

Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.
Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
Complementary Capability Computing on HPCx Dr Alan Gray.
Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.
USC Viterbi School of Engineering Ewa Deelman Resource Management.
Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.
Presented by: Priti Lohani
Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008,
Chapter 9 Uniprocessor Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic, N.Z. ©2008,
Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.
Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.
Simulation Evaluation of Hybrid SRPT Policies
Short Term Scheduling Introduction What – Scheduling or timing of operations Where – Timing of operations affects the overall strategy Why – Reduce costs,
Maryam Elahi Fairness in Speed Scaling Design Joint work with: Carey Williamson and Philipp Woelfel.
A Case for Relative Differentiated Services and the Proportional Differentiation Model Constantinos Dovrolis Parameswaran Ramanathan University of Wisconsin-Madison.
Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
June 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel.
Copyright 2006, Jeffrey K. Hollingsworth Grid Computing Jeffrey K. Hollingsworth Department of Computer Science University of Maryland,
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Chapter 4 Processor Management
Chapter 9 Uniprocessor Scheduling Spring, 2011 School of Computer Science & Engineering Chung-Ang University.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.
Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Michael J. Neely, University of Southern California CISS, Princeton University, March 2012 Asynchronous Scheduling for.
Scheduling Generic Parallel Applications –Meta- scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Uniprocessor Scheduling
Power-Aware Parallel Job Scheduling
Using Map-reduce to Support MPMD Peng
System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Faucets Queuing System Presented by, Sameer Kumar.
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Job Scheduling P. (Saday) Sadayappan Ohio State University.
QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Efficient Gigabit Ethernet Switch Models for Large-Scale Simulation Dong (Kevin) Jin David Nicol Matthew Caesar University of Illinois.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Wireless Cache Invalidation Schemes with Link Adaptation and Downlink Traffic Presented by Ying Jin.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Event Based Simulation of The Backfilling Algorithm OOP tirgul No
OPERATING SYSTEMS CS 3502 Fall 2017
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
P. (Saday) Sadayappan Ohio State University
Native simulation of different scheduling policies
A Characterization of Approaches to Parrallel Job Scheduling
Presentation transcript:

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University

Parameter Sweep Applications An important class of applications An important class of applications Set of independent tasks Set of independent tasks MCell Application MCell Application 3D simulations for sub-cellular architecture/physiology 3D simulations for sub-cellular architecture/physiology GTOMO (Parallel Tomography) Application GTOMO (Parallel Tomography) Application Multiple view-point simulation Multiple view-point simulation Systems exist for scheduling on the Grid Systems exist for scheduling on the Grid Cluster-based Scheduling? Cluster-based Scheduling?

Application Level Schedulers Manage the scheduling of applications Manage the scheduling of applications Break the application to appropriate chunks Break the application to appropriate chunks APST (AppLeS Parameter Sweep Template) APST (AppLeS Parameter Sweep Template) NIMROD NIMROD Greedy approach to schedule PSA chunks Greedy approach to schedule PSA chunks

Presentation Roadmap  Job Scheduling in Clusters  Multi-Site Job Scheduling  PSA Scheduling Strategies  Multi-Site Scheduling of PSAs  Performance Evaluation  Conclusions

Job Scheduling in Clusters Mapping arriving jobs to available resources Mapping arriving jobs to available resources Multiple Schemes for Scheduling Multiple Schemes for Scheduling First Come First Serve (FCFS) First Come First Serve (FCFS) Conservative Scheduling Conservative Scheduling Aggressive or EASY Scheduling Aggressive or EASY Scheduling Fair-Share Constraints Fair-Share Constraints A user can not have more than ‘N’ queued jobs A user can not have more than ‘N’ queued jobs Submitting the multiple chunks of a PSA job Submitting the multiple chunks of a PSA job Violation of Fair-Share constraints Violation of Fair-Share constraints Combine chunks to form a single parallel job Combine chunks to form a single parallel job

Formation of PSAs in Clusters Small Independent Tasks Parallel Parameter Sweep Application

Presentation Roadmap  Job Scheduling in Clusters  Multi-Site Job Scheduling  PSA Scheduling Strategies  Multi-Site Scheduling of PSAs  Performance Evaluation  Conclusions

Multi-Site Job Scheduling Multiple Simultaneous Requests Multiple Simultaneous Requests Job submitted to multiple sites Job submitted to multiple sites Started on the earliest cluster Started on the earliest cluster Existing schemes have limitations Existing schemes have limitations Heterogeneous Clusters Heterogeneous Clusters Different Scheduling Schemes Different Scheduling Schemes

Multiple-simultaneous-requests Meta Scheduler Local Scheduler Meta Scheduler Local Scheduler Meta Scheduler Local Scheduler Jobs Site 1Site 2 Site 3

Presentation Roadmap  Job Scheduling in Clusters  Multi-Site Job Scheduling  PSA Scheduling Strategies  Multi-Site Scheduling of PSAs  Performance Evaluation  Conclusions

PSA Scheduling Strategies Flooding based Job Shredding Flooding based Job Shredding Submit all chunks in the PSA at once Submit all chunks in the PSA at once Greedy approach Greedy approach Improves User and System metrics Improves User and System metrics Doesn’t ensure fairness to Non-PSA jobs Doesn’t ensure fairness to Non-PSA jobs Opportune Job Shredding Opportune Job Shredding Uses an additional Application-Level Scheduler Uses an additional Application-Level Scheduler Monitors the current schedule of the system Monitors the current schedule of the system If no normal backfill is possible If no normal backfill is possible Allow PSA jobs to shred and backfill Allow PSA jobs to shred and backfill

Presentation Roadmap  Job Scheduling in Clusters  Multi-Site Job Scheduling  PSA Scheduling Strategies  Multi-Site Scheduling of PSAs  Performance Evaluation  Conclusions

Multi-Site Scheduling for PSAs Two-level Application Level Schedulers Two-level Application Level Schedulers No constraints on sites No constraints on sites Allowed to have different speeds Allowed to have different speeds Allowed to have different scheduling policies Allowed to have different scheduling policies Similar to “Multiple Simultaneous Requests” Similar to “Multiple Simultaneous Requests” Simultaneous requests only for PSAs Simultaneous requests only for PSAs

Multi-Site Scheduling for PSAs App-Level Scheduler Job Queue Local Scheduler App-Level Scheduler Job Queue Local Scheduler App-Level Scheduler Job Queue Local Scheduler Meta Application-Level Scheduler Site 1 Site 2 Site 3

Presentation Roadmap  Job Scheduling in Clusters  Multi-Site Job Scheduling  PSA Scheduling Strategies  Multi-Site Scheduling of PSAs  Performance Evaluation  Conclusions

Performance Metrics Response Time Response Time Completion Time – Submit Time Completion Time – Submit Time Slowdown Slowdown Response Time / Runtime Response Time / Runtime Loss of Capacity (LOC) Loss of Capacity (LOC)  LOC = min {  (waiting jobs procs), idle procs}  LOC = min {  (waiting jobs procs), idle procs}  T = Time for which this state lasts  T = Time for which this state lasts LOC =  LOC x  T LOC =  LOC x  T

Evaluation Scheme Simulation based Approach Simulation based Approach CTC trace from Feitelson’s archive CTC trace from Feitelson’s archive EASY backfilling used EASY backfilling used For multi-site evaluation For multi-site evaluation CTC traces from 3 different months CTC traces from 3 different months Processing speeds in the ratio 2:1:3 Processing speeds in the ratio 2:1:3

Flooding Based Job Shredding Up to 60% improvement for PSA Jobs Up to 90% worse performance for Non-PSA Jobs

Flooding: Job Category wise breakup Narrow Short Non-PSA jobs suffer most Loss of back-filling opportunities is the main reason

Flooding: Loss of Capacity Up to 75% improvement in the Loss of Capacity

Opportune Job Shredding Up to 70% improvement for PSA Jobs Less than 2% worsening in performance for Non-PSA Jobs

Opportune: Job Category wise breakup No category of Non-PSA jobs suffers more than 7%

Opportune: Loss of Capacity Up to 12% improvement in the Loss of Capacity

Opportune (Multi-Site) Up to 95% improvement for PSA Jobs No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site): Response Time Up to 75% improvement for PSA Jobs No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site): Slowdown Up to 95% improvement for PSA Jobs No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site): Loss of Capacity Up to 45% improvement in the Loss of Capacity

Concluding Remarks Opportune Job Shredding Opportune Job Shredding Efficient Scheduling of PSAs Efficient Scheduling of PSAs Single Site and Multi-Site versions Single Site and Multi-Site versions Significant improvement for PSA jobs Significant improvement for PSA jobs Ensures that Non-PSA jobs are not affected Ensures that Non-PSA jobs are not affected Plan to integrate this with Prod. Schedulers Plan to integrate this with Prod. Schedulers

Thank You!