1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse.

Slides:



Advertisements
Similar presentations
Service Level Agreement Based Scheduling Heuristics Rizos Sakellariou, Djamila Ouelhadj.
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Sogang University Advanced Operating Systems (Process Scheduling - Linux) Advanced Operating Systems (Process Scheduling - Linux) Sang Gue Oh, Ph.D. .
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.
CMSC 421: Principles of Operating Systems Section 0202 Instructor: Dipanjan Chakraborty Office: ITE 374
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Scheduling Parallel Task
Communicating with Users about HTCondor and High Throughput Computing Lauren Michael, Research Computing Facilitator HTCondor Week 2015.
Operating Systems Part III: Process Management (CPU Scheduling)
Operating System Concepts and Techniques Lecture 5 Scheduling-1 M. Naghibzadeh Reference M. Naghibzadeh, Operating System Concepts and Techniques, First.
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
Operating System Examples - Scheduling
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
Chapter 5 – CPU Scheduling (Pgs 183 – 218). CPU Scheduling  Goal: To get as much done as possible  How: By never letting the CPU sit "idle" and not.
 H.M.BILAL Operating System Concepts.  What is an Operating System?  Mainframe Systems  Desktop Systems  Multiprocessor Systems  Distributed Systems.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Bug Localization with Machine Learning Techniques Wujie Zheng
ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.
임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Static Process Scheduling
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Scheduling.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Real-Time Operating Systems RTOS For Embedded systems.
HTCondor Accounting Update
Embedded System Scheduling
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
CPU SCHEDULING.
Copyright ©: Nahrstedt, Angrave, Abdelzaher
New Workflow Scheduling Techniques Presentation: Anirban Mandal
CREAM-CE/HTCondor site
On Scheduling in Map-Reduce and Flow-Shops
Operating Systems.
CPU SCHEDULING.
CS703 – Advanced Operating Systems
Overview of Workflows: Why Use Them?
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

1 Grid Scheduling Cécile Germain-Renaud

2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model) Schedule – s(J) = date to begin execution of task J –Alloc(J) = machine assigned to J One of the oldest Computer Science problems Principles of classification: [Graham et al. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math. 5, (1979), ] Computer-aided classification of complexity results (4536 at the time of the paper) [Lageweg et al. Computer-Aided complexity classification of combinational problems. CACM 11:2, 1892]

3 Classical scheduling in HPC Context: parallel computing/computers Application = Direct Acyclic Graph (T, E, w, c) –T = set of sequential tasks –E = dependence constraints –w(t) = computational cost of task t –c(t,t’) = communication cost (data sent from t to t’ ) Infrastructure –P identical processors –With or without preemption, dedicated (no sharing) An optimization problem with objective function Makespan = Total execution time S (T) = max (s(t) + w(t)) Complexity –NP-complete for independant tasks and no communication E = vide, p =2 and c = 0 1 –NP-complete for UET-UCT graphs ( w = c = 1) –Very old: without communication, list scheduling provides a (2-1/p) approximation T T’

4 Scheduling in Institutional Grids Institutional: federation of ressources –accounted-for: fair-share on the medium to long time scale is a premium constraint –Partially autonomous local policies must be allowed Grid –Permanent regime: on-line decisions –Large scale: strongly distributed Information system Scheduling services Relevant contexts –Autonomous, multi-agents systems –Auction algorithms –Service Level Agreement (SLA) technology

5 EGEE gLite Scheduling Broker UI Local scheduler Site (node) CE Proc Broker UI

6 EGEE gLite Scheduling Broker UI Local scheduler Site (node) CE Proc Broker UI BDII Publish

7 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Rank The information published is Static: eg which type of VO is accepted Dynamic: expected traversal time

8 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Rank Rank: may be any user-defined function, e.g. avoid « bad » machines Default is first locality, second expected traversal time

9 EGEE gLite Scheduling Local scheduler Site (node) CE Proc Broker BDII Publish Query UI Update BDII broker cache

10 Not only academic Execution time (s) Overhead Ratio Long waiting times When EGEE was not so heavily loaded

11 Batch scheduling Very complex policies Maximise throughput under constraints –Weighted fair-share – VOs, type of jobs –Priorities –Hardware requirements –Advance reservations An indication of job duration is given by the type of queue: infinite, long, medium, short, and exotic ones [B. Bode et al.The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters]

12 Classical vs Grid (Relatively) easy: –Throughput instead of makespan + Master-slave graph instead of DAG allow for instance to define cyclic schedules in polynomial time which are asymptotically optimal, but not local [Y. Robert] [A. Rosenberg] Moderately difficult: information about –Applications –Infrastructures The same program on different data may run at very different speed The network performance is dynamic Really difficult –Queues managed by local policies –On-line decision

13 Information and Scheduling (I) Considerable work has been done in predicting CPU load in shared environments – desktops, clusters, desktop grids [P.A. Dinda, R. Wolski, J. Schopf] –The basic technique is linear time-series analysis –Self-similarity and epochal behavior –Usual goal is the prediction of the next value –Applied to soft real-time scheduling on shared clusters –Practical application in NWS z t = +  (B)(B)  (B)(1 – B)d atat

14 Information and scheduling (III) Less work on predicting the behavior of dedicated systems Papers are on parallel systems, mostly based on time- series techniques, but at least one based on a genetic algorithm [Downey, Foster, Wolski] The traces are much more difficult to access No time slice - Irregular time series: the records are event-driven Which analysis –Average waiting time: clear but not very useful for prediction –Fitting a distribution: not convincing for // systems –Predicting an upper bound with a confidence interval: metric of success?

15 Information and grid We cannot directly log the entire state of the system –Access rights –Size Currently available data –The lifecycle of jobs going through certain brokers –The job ranking at the same brokers –The detailed behavior of the queues on certain sites –Certain = LAL + possibly other mainstream Easy to get –Summary data about the lifecycle of all jobs –From which it could be possible to reconstruct the detailed state and dynamic of the CE

16 What should we learn ? Learning besides time series make sense in a grid: massive use of community programs instead of (?) sparse runs of a very long and complex digital experiment Information as sketched before –Beware: not be a steady-state system New users, new machines, new software is the expected regime for some years from now A community-based resource will tend display correlated activity –Is there an invariant social graph? Is it a feature? System algorithms e.g. a site scheduler or the broker –Validation ? Scheduling algorithms –Validation ?