Download presentation

Presentation is loading. Please wait.

Published byTiara Greaver Modified about 1 year ago

1
Scheduling Parameter Sweep Applications Sathish Vadhiyar Sources/Credits: Papers on survey of heuristis and APST papers. Figures taken from the papers

2
Background Tasks of a job do not have dependencies A machine executes a single task at a time – space shared machines Collection of tasks and machines are known apriori Matching of tasks to machines done offline Estimates of execution time for each task on each machine is known

3
Scheduling Problem ETC – Expected time to compute matrix ETC(i,j) – estimated execution time of task i on machine j Notations: mat(j) – machine availability time for machine j, i.e., earliest time at which j has completed all tasks that were previously assigned to it mat(j) – machine availability time for machine j, i.e., earliest time at which j has completed all tasks that were previously assigned to it Completion time, ct(i,j) = mat(j)+ETC(i,j) Completion time, ct(i,j) = mat(j)+ETC(i,j) Objective – find max ct(i, j), makespan, and find heuristic with minimum makespan

4
Scheduling Heuristics 1.Opportunistic Load Balancing (OLB) Assign next task (arbitrary order) to the next available machineAssign next task (arbitrary order) to the next available machine Regardless of task’s ETC on that machineRegardless of task’s ETC on that machine 2.User Directed Allocation (UDA) Assign next task (arbitrary order) to the machine with lowest ETCAssign next task (arbitrary order) to the machine with lowest ETC Regardless of machine availabilityRegardless of machine availability 3.Fast greedy Assign each task (arbitrary order) to the machine with minimum ctAssign each task (arbitrary order) to the machine with minimum ct

5
Scheduling Heuristics 4. Min-Min Start with a list of Unmapped tasks, U.Start with a list of Unmapped tasks, U. Determine the set of minimum completion times for U.Determine the set of minimum completion times for U. Choose the next task that has min of min completion times and assign to the machine that provides the min. completion time.Choose the next task that has min of min completion times and assign to the machine that provides the min. completion time. The new mapped task is removed from U and the process is repeated.The new mapped task is removed from U and the process is repeated. Theme - Map as many tasks as possible to their first choice of machineTheme - Map as many tasks as possible to their first choice of machine Since short jobs are mapped first, the percentage of tasks that are allocated to their first choice is highSince short jobs are mapped first, the percentage of tasks that are allocated to their first choice is high

6
Scheduling Heuristics 5. Max-Min Start with a list of Unmapped tasks, U.Start with a list of Unmapped tasks, U. Determine the set of minimum completion times for U.Determine the set of minimum completion times for U. Choose the next task that has max of min completion times and assign to the machine that provides the min. completion time.Choose the next task that has max of min completion times and assign to the machine that provides the min. completion time. The new mapped task is removed from U and the process is repeated.The new mapped task is removed from U and the process is repeated. Avoid starvation of long tasksAvoid starvation of long tasks Long tasks executed concurrently with short tasksLong tasks executed concurrently with short tasks Better machine-utilizationBetter machine-utilization 6.Greedy Combination of max-min and min-minCombination of max-min and min-min Evaluates both and finds the better solutionEvaluates both and finds the better solution

7
Scheduling Heuristics Genetic Algorithm

8
GA Operates 200 chromosomes. A chromosome represents a mapping of task to machines, a vector of size t. Initial population – 200 chromosomes randomly generated with 1 Min-Min seed Evaluation – initial population evaluated based on fitness value (makespan) Selection – Roulette wheel – probabilistically generate new population, with better mappings, from previous population Roulette wheel – probabilistically generate new population, with better mappings, from previous population Elitism – guaranteeing that the best solution (fittest) is carried forward Elitism – guaranteeing that the best solution (fittest) is carried forward

9
GA - Roulette wheel scheme Chromosomes1234 Score Probability of selection Select a random number, r, between 0 and 1. Progressively add the probabilities until the sum is greater than r

10
GA Crossover Choose pairs of chromosomes. Choose pairs of chromosomes. For every pair For every pair Choose a random point exchange machine assignments from that point till the end of the chromosome Mutation. For every chromosome: Randomly select a task Randomly select a task Randomly reassign it to new machine Randomly reassign it to new machineEvaluation Stopping criterion: Either 1000 iterations or Either 1000 iterations or No change in elite chromosome for 150 iterations No change in elite chromosome for 150 iterations

11
Simulated Annealing Poorer solutions accepted with a probability that depends on temperature value Initial mapping Initial temperature – initial makespan Each iteration: Generate new mapping based on mutation of prev. mapping. Obtain new makespan Generate new mapping based on mutation of prev. mapping. Obtain new makespan If new makespan better, accept If new makespan better, accept If new makespan worse, accept if a random number z in [0,1] > y where If new makespan worse, accept if a random number z in [0,1] > y where Reduce temperature by 10% Reduce temperature by 10%

12
Genetic Simulated Annealing (GSA) Almost same as GA During selection, SA is employed to form new population Initial system temperature – average makespan of the population Each iteration of GA Post-mutation or post-crossover, during a comparison of chromosome with the previous chromosome Post-mutation or post-crossover, during a comparison of chromosome with the previous chromosome if (new makespan) < (old makespan + temperature), new chromosome becomes part of the population if (new makespan) < (old makespan + temperature), new chromosome becomes part of the population Temperature decreased by 10% Temperature decreased by 10%

13
Tabu search Keeps track of regions of solution space that have already been searched Starts with a random mapping Generate all possible pairs of tasks, (i,j), t in (0, t-1) and j in (i+1, t) i and j’s machine assignments are exchanged (short hop) and makespan evaluated If makespan better (successful short hop), search begins from i=0, else search continues from previous (i,j) Continue until 1200 successful short hops or all pairs have been evaluated Add final mapping to tabu list. The list keeps track of solution space searched A new random mapping generated that differs from solution space by atleast half the machine assignments (long hop) Search continued until fixed number of short and long hops

14
A Comparison Study of Static Mapping Heuristics for a Class of Meta-Tasks on Heterogeneous Computing Systems Tracy D. Braun et. al

15
Simulation Results ETC matrix randomly generated using uniform distribution ETC matrices may be Consistent – m/c i faster than m/c j for all tasks. Each row is sorted across all columns. Consistent – m/c i faster than m/c j for all tasks. Each row is sorted across all columns. Inconsistent – not sorted Inconsistent – not sorted Semi-consistent – sorted across even columns. Semi-consistent – sorted across even columns. t=512, m=16 OLB, UDA, Fast Greedy and Greedy – few seconds SA, tabu – 30 seconds GA, GSA – 60 seconds A* - 20 minutes

16
Task execution results Consistent cases UDA performs worst in consistent cases – why? Inconsistent cases UDA improves because “best” machines are distributed avoiding load imbalance UDA improves because “best” machines are distributed avoiding load imbalance Fast greedy and Min-Min improve upon UDA since they consider MCTs and MCTs are also evenly distributed Fast greedy and Min-Min improve upon UDA since they consider MCTs and MCTs are also evenly distributed Tabu search gave poor performance for inconsistent cases since there are more successful short hops than long hops. Tabu search gave poor performance for inconsistent cases since there are more successful short hops than long hops. In both cases, Min-Min performed better than Max- Min due to the nature of the task mix In both cases, GA’s performed the best. All tasks were assigned the same machine

17
AppLeS Parameter Sweep Template (APST)

18
APST For efficient deployment of parameter sweep applications on the Grid Distinct experiments share large input files and produce large output files Shared data files must be co-located with experiments PSA – set of independent tasks Input – set of files, a single file can be input to more than 1 task Input – set of files, a single file can be input to more than 1 task Output – each task produces exactly one output Output – each task produces exactly one output Number of computational tasks in PSA orders of magnitude greater than number of processors Number of computational tasks in PSA orders of magnitude greater than number of processors

19
Scheduling Heuristics Self-scheduled workqueue Adaptive scheduling algorithm – assigns tasks to nodes as soon as they are available in a greedy fashion Suitable: If no large input files If no large input files Large clusters interconnected with high-speed clusters Large clusters interconnected with high-speed clusters Computation to data-movement times are high Computation to data-movement times are high

20
Algorithm

21
Gantt Chart

22
Step 4

23
Heuristics Min-min f – minimum of CT i,j f – minimum of CT i,j Best – minimum Best – minimumMax-min f – minimum of CT i,j f – minimum of CT i,j Best – maximum Best – maximumSufferage f – ratio between second minimum and minimum f – ratio between second minimum and minimum Best – maximum Best – maximum a host should be given to a task that would “suffer” the most if not given the host a host should be given to a task that would “suffer” the most if not given the hostXSufferage Site-level Site-level Cluster-level MCTs and cluster-level sufferage Cluster-level MCTs and cluster-level sufferage Avoids deficiencies of sufferage when a task’s input file is in a cluster and 2 hosts in the cluster have identical performance Avoids deficiencies of sufferage when a task’s input file is in a cluster and 2 hosts in the cluster have identical performance

24
Sufferage and XSufferage Host k in cluster j

25
Sufferage and XSufferage

26
Sample APST Setup

27
Impact of Quality of Information on Scheduling Random noise [-p,+p] ; p – 0%-100% added to accuracy of estimates

28
Results

29
Results

30
Robust Static Allocation of Resources for Independent Tasks…-Sugavanam et. al., JPDC 2007 ETC numbers are just estimates and inaccurate Need to map tasks to maximize the robustness of makespan against estimation errors Robustness? Degradation in makespan is within acceptable limits when estimates are perturbed Degradation in makespan is within acceptable limits when estimates are perturbed The goal is to maximize the collective allowable error in estimation without makespan exceeding the constraint

31
Problem Formulation C est – vector of estimated execution times on a machine C – vector of actual times Performance feature, Ø that determines the robustness of the makespan is the finish times of machines = {F j, 1

32
Robustness Radius Within this robustness radius, the finish time of machine j will be atmost the makespan constraint, tau. The above equation can be interpreted as the perpendicular distance from c est to the hyperplane, tau-F j (C) = 0 The above equation can be interpreted as the perpendicular distance from c est to the hyperplane, tau-F j (C) = 0 Rewritten as:

33
Robustness Metric Robustness metric for the mapping: If Euclidean distance between actual and estimated is no larger than the above metric, the makespan will be atmost the constraint, tau The larger the robustness metric, better the mapping Thus the problem is to maximize the metric such that the makespan is within the time constraint

34
Heuristics – Max-Max

35
Greedy Iterative Maximization (GIM)

36

37
Sum Iterative Maximization (SIM) “Robustness improvement” – change in the sum of the robustness radii of the machines after task reassignment or swapping

38
Sum Iterative Maximization (SIM)

39
Genitor

40
Genitor

41
Memetic Algorithm Combines global search using genetic algorithm and local search using hill climbing

42
Memetic

43
HereBoy Evolutionary Algorithm Combines GA and SA

44
HereBoy Evolutionary Algorithm User defined maximum mutation rate Reduces the mutation as the current robustness reaches upper bound Probability of accepting a poorer solution User defined maximum probability

45
UB calculation for Hereboy Assumes a homogeneous MET system – execution time for a task on all machines is equal to minimum execution time on the original set of machines The tasks are arranged in ascending order N=T/M. The first N tasks are stored in a set S

46
GIM and SIM had low makespan and high robustness

47
References The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid by Henri Casanova, Graziano Obertelli, Francine Berman and Rich wolski Proceedings of the Super Computing Conference (SC'2000). Heuristics for Scheduling Parameter Sweep applications in Grid environments by Henri Casanova, Arnaud Legrand, Dmitrii Zagorodnov and Francine Berman in Proceedings of the 9th Heterogeneous Computing workshop (HCW'2000), pp A Comparison Study of Static Mapping Heuristics for a Class of Meta-Tasks on Heterogeneous Computing Systems Tracy D. Braun, Howard Jay Siegel, Noah Beck, Ladislau L. Bölóni, Albert I. Reuther, Mitchell D. Theys, Bin Yao, Richard F. Freund, Muthucumaru Maheswaran, James P. Robertson, Debra Hensgen, Eighth Heterogeneous Computing Workshop, April , 1999, San Juan, Puerto Rico Page 15 Robust static allocation of resources for independent tasks under makespan and dollar cost constraints. Journal of Parallel and Distributed Computing, Volume 67, Issue 4, April 2007, Pages Prasanna Sugavanam, H.J. Siegel, Anthony A. Maciejewski, Mohana Oltikar, Ashish Mehta, Ron Pichel, Aaron Horiuchi, Vladimir Shestak, Mohammad Al-Otaibi, Yogish Krishnamurthy, et al.

48
Simulation Background ETC(i,j) = B(i)*x r ij B(i) = x b i ; x b i Є [1, Ф b ] x r ij Є [1, Ф r ] Ф r, Ф b can control machine and task heterogeneity ETC matrix randomly generated using uniform distribution

49
Models used in APST

50
Scheduling Goal: Minimize application’s makespan - time between when 1 st input file submitted and when last output file is obtained Minimize application’s makespan - time between when 1 st input file submitted and when last output file is obtainedSched(): Takes into account resource performance estimates to generate a plan for assigning file transfers to links and tasks to hosts Takes into account resource performance estimates to generate a plan for assigning file transfers to links and tasks to hosts Call sched repeatedly at various point of time (called scheduling events) to make it more adaptive Call sched repeatedly at various point of time (called scheduling events) to make it more adaptive At each scheduling event, sched() knows: Grid topology Grid topology Number and locations of copies of input files Number and locations of copies of input files List of computations and file transfers currently underway or already completed List of computations and file transfers currently underway or already completed

51
APST Architecture NWS also used for forecasting ETC values

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google