COST IC804 – IC805 Joint meeting, February 7-8 2013 Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia,

Slides:



Advertisements
Similar presentations
Exploiting Deadline Flexibility in Grid Workflow Rescheduling Wei Chen Alan Fekete Young Choon Lee.
Advertisements

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Hadi Goudarzi and Massoud Pedram
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
SLA-Oriented Resource Provisioning for Cloud Computing
Energy-Efficient System Virtualization for Mobile and Embedded Systems Final Review 2014/01/21.
Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Resource Management of Highly Configurable Tasks April 26, 2004 Jeffery P. HansenSourav Ghosh Raj RajkumarJohn P. Lehoczky Carnegie Mellon University.
Tai, Yu-Chang 4/29/2013 Future Generation Computer Systems(FGCS.J) journal homepage: Saeid Abrishami a, ∗, Mahmoud Naghibzadeha,
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
SLA-aware Virtual Resource Management for Cloud Infrastructures
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
Present By : Bahar Fatholapour M.Sc. Student in Information Technology Mazandaran University of Science and Technology Supervisor:
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
New Challenges in Cloud Datacenter Monitoring and Management
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Resource Management in Virtualization-based Data Centers Bhuvan Urgaonkar Computer Systems Laboratory Pennsylvania State University Bhuvan Urgaonkar Computer.
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.
Network Aware Resource Allocation in Distributed Clouds.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Cloud Computing Energy efficient cloud computing Keke Chen.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.
GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov.
Challenges towards Elastic Power Management in Internet Data Center.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Load Balancing The author of these slides is Dr. Arun Sood of George Mason University. Students registered in Computer Science networking courses at GMU.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
Dynamic Resource Monitoring and Allocation in a virtualized environment.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
1 Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium.
Software Architecture for Dynamic Thermal Management in Datacenters Tridib Mukherjee Graduate Research Assistant IMPACT Lab ( Department.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Static Process Scheduling
Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Developing resource consolidation frameworks for moldable virtual machines in clouds Author: Liang He, Deqing Zou, Zhang Zhang, etc Presenter: Weida Zhong.
IMPROVEMENT OF COMPUTATIONAL ABILITIES IN COMPUTING ENVIRONMENTS WITH VIRTUALIZATION TECHNOLOGIES Abstract We illustrates the ways to improve abilities.
1 PERFORMANCE DIFFERENTIATION OF NETWORK I/O in XEN by Kuriakose Mathew ( )‏ under the supervision of Prof. Purushottam Kulkarni and Prof. Varsha.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Jérémie Sublime Sonia Yassa Development of meta-heuristics for workflow scheduling based on quality of service requirements 1.
Introduction to Load Balancing:
Smita Vijayakumar Qian Zhu Gagan Agrawal
CPU SCHEDULING.
Presented By: Darlene Banta
Presentation transcript:

COST IC804 – IC805 Joint meeting, February Jorge G. Barbosa, Altino M. Sampaio, Hamid Harabnejad Universidade do Porto, Faculdade de Engenharia, LIACC Porto, Portugal, Experiments on cost/power and failure aware scheduling for clouds and grids

Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February

Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February

Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  Cloud computing paradigm COST IC804 – IC805 Joint meeting, Tenerife, February Image source:  Dynamic provisioning of computing services.  Employs Virtual Machine (VM) technologies for consolidation and environment isolation purposes.  Node failure can occur due to hardware or software problems.  Dynamic provisioning of computing services.  Employs Virtual Machine (VM) technologies for consolidation and environment isolation purposes.  Node failure can occur due to hardware or software problems. 4

Characteristics  Dependability of the infrastructure  Distributed systems continue to grow in scale and in complexity  Failures become norms, which can lead to violation of the negotiated SLAs  Mean Time Between Failures (MTBF) would be 1.25h on a petaflop system (1)  Energy consumption  The main part of energy consumption is determined by the CPU  Energy consumption dominates the operational costs COST IC804 – IC805 Joint meeting, Tenerife, February (1) S. Fu, "Failure-aware resource management for high-availability computing clusters with distributed virtual machines," Journal of Parallel and Distributed Computing, vol. 70, April 2010, pp , doi: /j.jpdc VMM VM 1 VM 4VM 2 PM 1PM 2PM 3PM m... Task 1Task 2Task n VM n Task 3 PM – Physical Machine 5

Related Work (1) Optimistic Best-Fit (OBFIT) algorithm - Selects the PM with minimum weighted available capacity and reliability. (2) Pessimistic Best-Fit (PBFIT) algorithm - Selects also unreliable PMs in order to increase the job completion rate. - Selects the unreliable PM p with capacity C p such that C avg + C p results in the minimum required capacity COST IC804 – IC805 Joint meeting, Tenerife, February C avg average capacity from reliable PMs.  Dynamic allocation of VMs, considering PMs’ reliability  Based in a failure predictor tool with 76.5% of accuracy Proposed architecture for reconfigurable distributed VM (1) 6

Approach  The goal  It is a best-effort approach, not a SLA based approach;  Virtual-to-physical resources mapping decisions must consider both the power-efficiency and reliability levels of compute nodes;  Dynamic update of virtual-to-physical configurations (CPU usage and migration). COST IC804 – IC805 Joint meeting, Tenerife, February Construct power- and failure-aware computing environments, in order to maximize the rate of completed jobs by their deadline 7

Approach  Multi-objective scheduling algorithms are addressed in three ways:  1- Finding the pareto optimal solutions, and let the user select the best solution.  2- Combination of the two functions in a single objective function.  3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February

Approach  Leverage virtualization tools  Xen credit scheduler  Dynamically update cap parameter  But enforcing work-conserving  Stop & copy migration  Faster VM migrations, preferable for proactive failure management COST IC804 – IC805 Joint meeting, Tenerife, February CPU CPU% Power consumption VM time PM3 PM2 PM1 – Failure– Stop & copy migration Increasing – Failure prediction accuracy9

System Overview  Cloud architecture  Private cloud  Homogenous PMs  Cluster coordinator manages user’ jobs  VMs are created and destroyed dynamically COST IC804 – IC805 Joint meeting, Tenerife, February  Users’ jobs  A job is a set of independent tasks  A task runs in a single VM, which CPU-intensive workload is known  Number of tasks per job and tasks deadlines are defined by user Private cloud management architecture 10

Power Model  Linear power model P = p1 + p2.CPU%  Power Efficiency of P  Completion rate of users’ jobs  Working Efficiency COST IC804 – IC805 Joint meeting, Tenerife, February Example of power efficiency curve (p1 = 175w, p2 = 75w) Measures the quantity of useful work done (i.e. completed users’ jobs) by the consumed power. 11

Proposed algorithms  Minimum Time Task Execution (MTTE) algorithm COST IC804 – IC805 Joint meeting, Tenerife, February  Selects a PM if:  It guarantees maximum processing power required by the VM (task);  It has higher reliability;  And if It increases CPU Power Efficiency.  Selects a PM if:  It guarantees maximum processing power required by the VM (task);  It has higher reliability;  And if It increases CPU Power Efficiency.  PM i capacity constraints  Slack time to accomplish task t 12

Proposed algorithms  Relaxed Time Task Execution (RTTE) algorithm COST IC804 – IC805 Joint meeting, Tenerife, February  Unlike MTTE, the RTTE algorithm always reserves to VM the minimum amount of resources necessary to accomplish the task within its deadline Host CPU 100% 0% VM Cap set in Xen credit scheduler 13

Performance Analysis  Simulation setup  50 PMs, each modeled with one CPU core with the performance equivalent to 800 MFLOPS;  VMs stop & copy migration overhead takes 12 secs;  30 synthetic jobs, each being constituted of 5 CPU-intensive workload tasks;  Failed PMs stay unavailable during 60 secs;  Predicted occurrence time of failure precedes the actual occurrence time;  Failures instants, jobs arriving time, and tasks workload sizes follow an uniform distribution; COST IC804 – IC805 Joint meeting, Tenerife, February

Performance Analysis  Implementation considerations  Stabilization to avoid multiple migrations  Concurrence among cluster coordinators  Algorithms compared to ours  Common Best-Fit (CBFIT)  Selects the PM with the maximum power-efficiency and do not consider resources reliability  Optimistic Best-Fit (OBFIT)  Pessimistic Best-Fit (PBFIT) COST IC804 – IC805 Joint meeting, Tenerife, February

Performance Analysis  Migrations occurring due to proactive failure management only:  Failure predictor tool has 76.5% of accuracy; RTTE algorithm presents the best results;  Working efficiency, as well as the jobs completion rate, decreases with failure prediction inaccuracy. COST IC804 – IC805 Joint meeting, Tenerife, February

Performance Analysis  Migrations occurring due to proactive failure management and power efficiency:  Sliding window of 36 seconds, with threshold of 65% (a migration starts if CPU usage below 65%);  RTTE returns the best results for 76.5% failure prediction accuracy;  Comparing to earlier results, the rate of completed jobs diminishes, since the number of VMs migrations increases. COST IC804 – IC805 Joint meeting, Tenerife, February

Performance Analysis  Number of migrations occurring due to failure management and power efficiency  RTTE and MTTE have stable number of migrations and respawns along failure accuracy variation  Migrations occurring due to proactive failure management only (75% accuracy)  RTTE and MTTE return the best working efficiency as the number of failures in the cloud infrastructure rises COST IC804 – IC805 Joint meeting, Tenerife, February

Conclusions (1)  Conclusion remarks:  Power- and failure-aware dynamic allocations improve the jobs completion rate;  Dynamically adjusting cap parameter of Xen credit scheduler prove to be capable of obtaining better jobs completion rate (RTTE);  Excessive number of VM migrations to optimizing power efficiency reduces job completion rate.  Future directions:  Dynamic allocation considering workload characteristics;  Data locality;  Scalability;  Compare/integrate DVFS feature;  Improve PM consolidation (why 65% threshold?);  Heterogeneous CPUs. COST IC804 – IC805 Joint meeting, Tenerife, February

Outline  Dynamic Power- and Failure-aware Cloud Resources Allocation for Sets of Independent Tasks  A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters COST IC804 – IC805 Joint meeting, Tenerife, February

A Budget Constrained Scheduling Algorithm for Workflow Applications on Heterogeneous Clusters A Job is represented by a workflow  A workflow is a Directed Acyclic Graph (DAG) a node is an individual task an edge represents the inter-job dependency CPU1 CPU2 CPU3 Workflow scheduling  Mapping Tasks to Resources  Main goal is to have a lower finish time of the exit task COST IC804 – IC805 Joint meeting, Tenerife, February

Introduction Target platform: - Utility Grids that are maintained and managed by a service provider. - Based on user requirements, the provider finds a scheduling that meets user constrains. In utility Grids, other QoS attributes than execution time, like economical cost or deadline, may be considered. It is a multi-objective problem. Multi-objective scheduling algorithms are addressed in three ways: 1- Finding the pareto optimal solutions, and let the user select the best solution; 2- Combination of the two functions in a single objective function; 3- Bicriteria scheduling which the user specifies a limitation for one criterion (power or budget constraints), and the algorithm tries to optimize the other criterion under this constraint. COST IC804 – IC805 Joint meeting, Tenerife, February

Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) HBCS has two phases:  Task Selection Phase :  We use Upward rank to assign the priority to tasks in the DAG  Processor Selection Phase :  We combine both objective functions (cost and time) in a single function; the processor that maximizes that function for the current task is selected. COST IC804 – IC805 Joint meeting, Tenerife, February

Proposed Algorithm Heterogeneous Budget Constraint Scheduling Algorithm (HBCS) 0<=k<= 1 COST IC804 – IC805 Joint meeting, Tenerife, February (Objective function)

Experimental Result 0<=k<= 1 COST IC804 – IC805 Joint meeting, Tenerife, February Workflow Structure:  Synthetic DAG generation (  Applications have between 30 and 50 tasks, generated randomly.  Total number of DAGs in our simulation is  Workflow Budget: BUDGET = C cheapest + k (C HEFT – C cheapest ) Lower budget (k=0)  Cheapest scheduling, higher makespan Highest budget (k=1)  shortest makespan (HEFT scheduling) Performance Metric:

Experimental Result Simulation Platform :  We use SIMGRID that allows a realistic description of the infrastructure parameters.  We consider a bandwidth sharing policy; only one processor can send data over one network link at a time.  We consider nodes of clusters from the GRID’5000 platform. COST IC804 – IC805 Joint meeting, Tenerife, February

Results COST IC804 – IC805 Joint meeting, Tenerife, February ShopiaRennesGrenoble HBCS Time complexity

Conclusions (2)  Conclusion remarks  We considered a realistic model of the infrastructure;  The HBCS algorithm achieves better performances, in particular for lower budget values (makespan and time complexity);  Future directions  Compare other combinations of cost and time factors in the objective function;  Data locality;  Multiple DAG scheduling. COST IC804 – IC805 Joint meeting, Tenerife, February

29