Presentation is loading. Please wait.

Presentation is loading. Please wait.

COST IC804 – IC805 Joint meeting, February 7-8 2013 Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia,

Similar presentations


Presentation on theme: "COST IC804 – IC805 Joint meeting, February 7-8 2013 Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia,"— Presentation transcript:

1 COST IC804 – IC805 Joint meeting, February 7-8 2013 Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, jbarbosa@fe.up.pt Experiments on cost/power and failure aware scheduling for clouds and grids

2 Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 2

3 Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 3

4 Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  Cloud computing paradigm COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 Image source: http://www.commputation.kit.edu/92.php  Dynamic provisioning of computing services.  Employs Virtual Machine (VM) technologies for consolidation and environment isolation purposes.  Node failure can occur due to hardware or software problems.  Dynamic provisioning of computing services.  Employs Virtual Machine (VM) technologies for consolidation and environment isolation purposes.  Node failure can occur due to hardware or software problems. 4

5 Characteristics  Dependability of the infrastructure  Distributed systems continue to grow in scale and in complexity  Failures become norms, which can lead to violation of the negotiated SLAs  Mean Time Between Failures (MTBF) would be 1.25h on a petaflop system (1)  Energy consumption  The main part of energy consumption is determined by the CPU  Energy consumption dominates the operational costs COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 (1) S. Fu, "Failure-aware resource management for high-availability computing clusters with distributed virtual machines," Journal of Parallel and Distributed Computing, vol. 70, April 2010, pp. 384-393, doi: 10.1016/j.jpdc.2010.01.002. VMM VM 1 VM 4VM 2 PM 1PM 2PM 3PM m... Task 1Task 2Task n VM n Task 3 PM – Physical Machine 5

6 Related Work (1) Optimistic Best-Fit (OBFIT) algorithm - Selects the PM with minimum weighted available capacity and reliability. (2) Pessimistic Best-Fit (PBFIT) algorithm - Selects also unreliable PMs in order to increase the job completion rate. - Selects the unreliable PM p with capacity C p such that C avg + C p results in the minimum required capacity COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 C avg average capacity from reliable PMs.  Dynamic allocation of VMs, considering PMs’ reliability  Based in a failure predictor tool with 76.5% of accuracy Proposed architecture for reconfigurable distributed VM (1) 6

7 Approach  The goal  It is a best-effort approach, not a SLA based approach;  Virtual-to-physical resources mapping decisions must consider both the power-efficiency and reliability levels of compute nodes;  Dynamic update of virtual-to-physical configurations (CPU usage and migration). COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 Construct power- and failure-aware computing environments, in order to maximize the rate of completed jobs by their deadline 7

8 Approach  Multi-objective scheduling algorithms are addressed in three ways:  1- Finding the pareto optimal solutions, and let the user select the best solution.  2- Combination of the two functions in a single objective function.  3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 8

9 Approach  Leverage virtualization tools  Xen credit scheduler  Dynamically update cap parameter  But enforcing work-conserving  Stop & copy migration  Faster VM migrations, preferable for proactive failure management COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 CPU CPU% Power consumption 100 0 VM time PM3 PM2 PM1 – Failure– Stop & copy migration Increasing – Failure prediction accuracy9

10 System Overview  Cloud architecture  Private cloud  Homogenous PMs  Cluster coordinator manages user’ jobs  VMs are created and destroyed dynamically COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013  Users’ jobs  A job is a set of independent tasks  A task runs in a single VM, which CPU-intensive workload is known  Number of tasks per job and tasks deadlines are defined by user Private cloud management architecture 10

11 Power Model  Linear power model P = p1 + p2.CPU%  Power Efficiency of P  Completion rate of users’ jobs  Working Efficiency COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 Example of power efficiency curve (p1 = 175w, p2 = 75w) Measures the quantity of useful work done (i.e. completed users’ jobs) by the consumed power. 11

12 Proposed algorithms  Minimum Time Task Execution (MTTE) algorithm COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013  Selects a PM if:  It guarantees maximum processing power required by the VM (task);  It has higher reliability;  And if It increases CPU Power Efficiency.  Selects a PM if:  It guarantees maximum processing power required by the VM (task);  It has higher reliability;  And if It increases CPU Power Efficiency.  PM i capacity constraints  Slack time to accomplish task t 12

13 Proposed algorithms  Relaxed Time Task Execution (RTTE) algorithm COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013  Unlike MTTE, the RTTE algorithm always reserves to VM the minimum amount of resources necessary to accomplish the task within its deadline Host CPU 100% 0% VM Cap set in Xen credit scheduler 13

14 Performance Analysis  Simulation setup  50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS;  VMs stop & copy migration overhead takes 12 secs;  30 synthetic jobs, each being constituted of 5 CPU-intensive workload tasks;  Failed PMs stay unavailable during 60 secs;  Predicted occurrence time of failure precedes the actual occurrence time;  Failures instants, jobs arriving time, and tasks workload sizes follow an uniform distribution; COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 14

15 Performance Analysis  Implementation considerations  Stabilization to avoid multiple migrations  Concurrence among cluster coordinators  Algorithms compared to ours  Common Best-Fit (CBFIT)  Selects the PM with the maximum power-efficiency and do not consider resources reliability  Optimistic Best-Fit (OBFIT)  Pessimistic Best-Fit (PBFIT) COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 15

16 Performance Analysis  Migrations occurring due to proactive failure management only:  Failure predictor tool has 76.5% of accuracy; RTTE algorithm presents the best results;  Working efficiency, as well as the jobs completion rate, decreases with failure prediction inaccuracy. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 16

17 Performance Analysis  Migrations occurring due to proactive failure management and power efficiency:  Sliding window of 36 seconds, with threshold of 65% (a migration starts if CPU usage below 65%);  RTTE returns the best results for 76.5% failure prediction accuracy;  Comparing to earlier results, the rate of completed jobs diminishes, since the number of VMs migrations increases. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 17

18 Performance Analysis  Number of migrations occurring due to failure management and power efficiency  RTTE and MTTE have stable number of migrations and respawns along failure accuracy variation  Migrations occurring due to proactive failure management only (75% accuracy)  RTTE and MTTE return the best working efficiency as the number of failures in the cloud infrastructure rises COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 18

19 Conclusions (1)  Conclusion remarks:  Power- and failure-aware dynamic allocations improve the jobs completion rate;  Dynamically adjusting cap parameter of Xen credit scheduler prove to be capable of obtaining better jobs completion rate (RTTE);  Excessive number of VM migrations to optimizing power efficiency reduces job completion rate.  Future directions:  Dynamic allocation considering workload characteristics;  Data locality;  Scalability;  Compare/integrate DVFS feature;  Improve PM consolidation (why 65% threshold?);  Heterogeneous CPUs. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 19

20 Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 20

21 A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters A Job is represented by a workflow  A workflow is a Directed Acyclic Graph (DAG) a node is an individual task an edge represents the inter-job dependency CPU1 CPU2 CPU3 Workflow scheduling  Mapping Tasks to Resources  Main goal is to have a lower finish time of the exit task COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 21

22 Introduction Target platform: - Utility Grids that are maintained and managed by a service provider. - Based on user requirements, the provider finds a scheduling that meets user constrains. In utility Grids, other QoS attributes than execution time, like economical cost or deadline, may be considered. It is a multi-objective problem. Multi-objective scheduling algorithms are addressed in three ways: 1- Finding the pareto optimal solutions, and let the user select the best solution; 2- Combination of the two functions in a single objective function; 3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 22

23 Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) HBCS has two phases:  Task Selection Phase :  We use Upward rank to assign the priority to tasks in the DAG  Processor Selection Phase :  We combine both objective functions (cost and time) in a single function; the processor that maximizes that function for the current task is selected. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 23

24 Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) 0<=k<= 1 COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 24 (Objective function)

25 Experimental Result 0<=k<= 1 COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 25 Workflow Structure:  Synthetic DAG generation (www.loria.fr/~suter/dags.html)www.loria.fr/~suter/dags.html  Applications have between 30 and 50 tasks, generated randomly.  Total number of DAGs in our simulation is 1000.  Workflow Budget: BUDGET = C cheapest + k (C HEFT – C cheapest ) Lower budget (k=0)  Cheapest scheduling, higher makespan Highest budget (k=1)  shortest makespan (HEFT scheduling) Performance Metric:

26 Experimental Result Simulation Platform :  We use SIMGRID that allows a realistic description of the infrastructure parameters.  We consider a bandwidth sharing policy; only one processor can send data over one network link at a time.  We consider nodes of clusters from the GRID’5000 platform. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 26

27 Results COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 27 ShopiaRennesGrenoble HBCS Time complexity

28 Conclusions (2)  Conclusion remarks  We considered a realistic model of the infrastructure;  The HBCS algorithm achieves better performances, in particular for lower budget values (makespan and time complexity);  Future directions  Compare other combinations of cost and time factors in the objective function;  Data locality;  Multiple DAG scheduling. COST IC804 – IC805 Joint meeting, Tenerife, February 7-8 2013 28

29 29


Download ppt "COST IC804 – IC805 Joint meeting, February 7-8 2013 Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia,"

Similar presentations


Ads by Google