Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Slides:



Advertisements
Similar presentations
Exploiting Deadline Flexibility in Grid Workflow Rescheduling Wei Chen Alan Fekete Young Choon Lee.
Advertisements

Bayesian Belief Propagation
Observations of Heterogonous Earliest Finish Time (HEFT) Algorithm
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
CPE555A: Real-Time Embedded Systems
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Fast Algorithms For Hierarchical Range Histogram Constructions
11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.
G. Alonso, D. Kossmann Systems Group
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Planning under Uncertainty
FIN 685: Risk Management Topic 5: Simulation Larry Schrenk, Instructor.
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.
Efficient Methodologies for Reliability Based Design Optimization
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
Robert M. Saltzman © DS 851: 4 Main Components 1.Applications The more you see, the better 2.Probability & Statistics Computer does most of the work.
Scheduling Parallel Task
1 Efficient Re-Analysis Methodology for Probabilistic Vibration of Large-Scale Structures Efstratios Nikolaidis, Zissimos Mourelatos April 14, 2008.
 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya Cloud Computing and Distributed.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Efficient and Scalable Computation of the Energy and Makespan Pareto Front for Heterogeneous Computing Systems Kyle M. Tarplee 1, Ryan Friese 1, Anthony.
Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Probabilistic Reasoning for Robust Plan Execution Steve Schaffer, Brad Clement, Steve Chien Artificial Intelligence.
7. Reliability based design Objectives Learn formulation of reliability design problem. Understand difference between reliability-based design and deterministic.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Presented by: Meysam rahimi
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
SPRUCE Special PRiority and Urgent Computing Environment Advisor Demo Nick Trebon University of Chicago Argonne National Laboratory
Monte Carlo Process Risk Analysis for Water Resources Planning and Management Institute for Water Resources 2008.
1 Stochastic Maintenance Scheduling Problem G. Fleury, P. Lacomme, M. Sevaux Laboratoire d’Informatique Clermont-Ferrand UMR 6158 Laboratoire de Mathématiques.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Static Process Scheduling
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Efficient Load Balancing Algorithm for Cloud Computing Network Che-Lun Hung 1, Hsiao-hsi Wang 2 and Yu-Chen Hu 2 1 Dept. of Computer Science & Communication.
1 1 Slide © 2004 Thomson/South-Western Simulation n Simulation is one of the most frequently employed management science techniques. n It is typically.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
- DAG Scheduling with Reliability - - GridSolve - - Fault Tolerance In Open MPI - Asim YarKhan, Zhiao Shi, Jack Dongarra VGrADS Workshop April 2007.
Scheduling Algorithms Performance Evaluation in Grid Environments R, Zhang, C. Koelbel, K. Kennedy.
Robust Task Scheduling in Non-deterministic Heterogeneous Computing Systems Zhiao Shi Asim YarKhan, Jack Dongarra Followed by GridSolve, FT-MPI, Open MPI.
Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.
Introduction | Model | Solution | Evaluation
A Study of Group-Tree Matching in Large Scale Group Communications
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
Rutgers Intelligent Transportation Systems (RITS) Laboratory
Objective of This Course
CASE − Cognitive Agents for Social Environments
Chapter 6: Temporal Difference Learning
Expectation-Maximization & Belief Propagation
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China Rizos Sakellariou SchoolofComputerScience,TheUniversityofManchester,UK

Previous Presentation (9/06/13) Research Area: Scheduling workflows under heterogeneous environment with variable performance. DAG Scheduling Static (full-ahead)Just In time Dynamic Rescheduling (runtime)

This Presentation DAG Scheduling Static (full-ahead)Just In time Dynamic Rescheduling (runtime)

Introduction General DAG Scheduling assumption: Estimated Execution time for each task is known in advance. Several techniques of estimation: e.g. average over several runs Similarly, estimated data transfer time is known in advance. A study* has shown, there might be significant deviations in observed performance in Grids. To address this deviations, Two approaches are prevalent Just-In-Time (high overhead) RunTime (static schedule + runtime changes) (hypothesis**: might waste resources and increase makespan if static schedule is not very good) * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262

Problem Addressed Generating a better (minimize makespan) “Static” schedule based on the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.

Background and Related Work Heterogeneous Earliest Finish Time heuristic (discussed in the previous presentation) List based scheduling. Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get higher priority) Once task is chosen, map it to “best” available resource. bLevel(i) = wi + max j ∈ Succ(i){wi→j +bLevel(j)}

Problem Description G = (N, E) -> DAG with one entry, one exit node. R -> set of heterogeneous resources Et i,p -> Random variable for execution time Assumption: Network bandwidth is constant. M -> Makespan = finish time of exit node. Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)

Methodology Assumption: Analytical methods that solve the probabilistic optimization problem are too expensive. Use Monte Carlo Sampling (MCS) method. Define a space comprising possible input values I G ={ET i,p :i ∈ N,p ∈ R}. Take an independent sample randomly from the space P G =f smp (I G ) ={t i,p :i ∈ N,p ∈ R} Perform deterministic computation using the sample input (store the result) Ω G =Static_SchedulingHEFT(G,P G ) Repeat 2 and 3 till some exit condition (no. of repetitions) Aggregate the stored results of the individual computations into the final result.

MCS Based Scheduling Complexity: Depends on the deterministic scheduling algorithm For HEFT it is O(v + e * r) = O(e*r) First loop: O(e*r*m) Second loop: O(e * n * k) Total = O(e*r*m + e*n*k)

Example

10,000 iterations - production phase (Gaussian Distribution) 200 iterations - selection phase 20% reduction in makespan Absolute increase in algorithm time: 1.2s

Evaluation Graphs

Threshold Calculation

Convergence (no. of repetitions)

Convergence

Makespan performance evaluation Static HEFT (baseline) with Mean ET values Autopsy – Static HEFT With known ET values MCS - Static ReStatic ReMCS Graph Generation (random generator of given type) Task Execution Time for different runs Select “Mean” for each task. Use a probability distribution to select actual execution time. The variation is bounded by Quality of Estimation (QoE) (0<QoE<1)

Makespan performance evaluation

Summary It is possible to obtain a good full-ahead static schedule that performs well under prediction inaccuracy, without too much overhead. MCS, which has a more robust procedure for selecting an initial schedule, generally results in better performance when rescheduling is applied