Presentation is loading. Please wait.

Presentation is loading. Please wait.

11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.

Similar presentations


Presentation on theme: "11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel."— Presentation transcript:

1 11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel and Distributed Systems Group Delft University of Technology http://guardg.st.ewi.tudelft.nl/

2 2 The Problem: Performance inconsistency in grids ~70X Inconsistent performance common in grids bursty workloads variable background loads high rate of failures highly dynamic & heterogeneous environment Bag-of-Tasks with 128 tasks submitted every 15 minutes How can we provide consistent performance in grids?

3 3 GOAL-1 Realistic performance evaluation of static and dynamic overprovisioning strategies (systems perspective) GOAL-2 Dynamically determine the overprovisioning factor (Κ) for user specified performance requirements (users perspective) Our goals

4 4 Outline Overprovisioning Strategies Experimental Setup Results Dynamically Determining Κ Conclusions

5 5 Overprovisioning (I) Increasing the system capacity to provide better, and in particular, consistent performance even under variable workloads and unexpected demands Pros simple obviates the need for complex algorithms easy to deploy & maintain Cons cost-ineffective workloads may evolve (e.g., increasing user base) lowly-utilized systems

6 6 Overprovisioning (II) Preferred way of providing performance guarantees typical data center utilization is no more than 15-50% telecommunication systems have ~30% on average L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing, IEEE Computer, December 2007. High overprovisioning factors ( Κ ) are common in modern systems Google: 450,000 (2005) Microsoft: 218,000 (mid-2008) Facebook: 10,000+ (2009)

7 7 1. Static i.Largest ii.All iii.Number Where should we deploy the resources ? Does it make any difference? 2. Dynamic Dynamic overprovisioning a.k.a. auto-scaling low/high thresholds for acquiring/releasing resources Given Κ, it is straightforward to determine the number of processors for a strategy Overprovisioning strategies Time Capacity Static Dynamic Waste Demand

8 8 Outline Overprovisioning Strategies Experimental Setup Results Dynamically Determining Κ Conclusions

9 9 System model DAS-3 multi-cluster grid Global Resource Managers (GRM) interacting with Local Resource Managers (LRM) GRM global queue LRM local queues local jobs global job LRM

10 10 Workload Realistic workloads consisting of Bag-of-Tasks (BoT) Simulations using 10 workloads with 80% load each workload has ~1650 BoTs and ~10K tasks duration of each workload is [1 day-1week] Real background load trace DAS-3 trace of June08 (http://gwa.ewi.tudelft.nl/) (Distribution parameters are determined after base-two log transformation)

11 11 Scheduling model We consider the following BoT scheduling policies 1.Static Scheduling statically partitions tasks across clusters 2.Dynamic Scheduling takes cluster load into account 1.Dynamic Per Task Scheduling 2.Dynamic Per BoT Scheduling 3.Prediction-based Scheduling average of the last two runtimes for prediction sends the task to the cluster which is predicted to lead to the earliest completion time (ECT)

12 12 Methodology Compare the overprovisioned system with the initial system (NO) For Dynamic 69/129 s and 18/23 s for min/max acquisition/release 60%/70% for low/high thresholds Κ varies over time so for a fair comparison keep it in ± 10% range

13 13 Traditional performance metrics Makespan of a BoT Difference between the earliest time of submission of any of its tasks, and the latest time of completion of any of its tasks Normalized Schedule Length (NSL) of a BoT Ratio of its makespan to the sum of the runtimes of its tasks on a reference processor (slowdown) First task submittedLast task done Makespan

14 14 Consistency metrics We define two metrics to capture the notion of consistency across two dimensions System gets more consistent as C d gets closer to 1, C s gets closer to 0 A tighter range of the NSL is a sign of better consistency

15 15 Outline Overprovisioning Strategies Experimental Setup Results Dynamically Determining Κ Conclusions

16 16 Performance of scheduling policies ECT is the worst Dynamic Per Task is the best

17 17 Performance of different strategies Different Overprovisioning Factors (Κ) Different Strategies Consistency obtained with overprovisioning is much better than the initial system (NO) Static strategies provide similar performance (only K matters) All and Largest are viable alternatives to Number as Number increases the administration, installation, and maintenance costs Dynamic strategy has better performance compared to static strategies K = 2.5 is the critical value

18 18 Cost of different strategies Use CPU-Hours time a processor is used [h] round up a partial instance-hours to one hour similar to the Amazon EC2 on-demand instances pricing model Significant reduction, as high as ~40%, in cost

19 19 Outline Overprovisioning Strategies Experimental Setup Results Dynamically Determining Κ Conclusions

20 20 Determining Κ dynamically So far systems perspective, now users perspective How can we dynamically determine Κ given the user performance requirements? We use a simple feedback-control approach to deploy additional resources dynamically to meet user performance requirements

21 21 Evaluation Simulated DAS-3 without background load ~1.5 month workload consisting of ~33K BoTs Empirically show that the controller stabilizes Average makespan for the workload in the initial system (without the controller) is ~3120 minutes Three scenarios from tight to loose performance requirements [250m-300m] [700m-750m] [1000m-1250m]

22 22 Results (I) Significant improvement, as high as ~65%, when the performance requirements are tight ~40%-50% improvement for loose performance requirements

23 23 Results (II) [250m-300m] [700m-750m] [1000m-1250m]

24 24 Conclusions Overprovisioning improves performance consistency significantly Static strategies provide similar performance (only K matters) Dynamic strategy performs better than the static strategies Need to determine the critical value to maximize the benefit of overprovisioning GOAL-2: Dynamically Determining Κ for Given User Performance Requirements Feedback-controlled system tuning K dynamically using historical performance data and specified performance requirements The number of BoTs meeting the performance requirements increases significantly, as high as 65%, compared to the initial system GOAL-1: Realistic Performance Evaluation of Different Strategies

25 25 More Information: Guard-g Project: http://guardg.st.ewi.tudelft.nl/http://guardg.st.ewi.tudelft.nl/ PDS publication database: http://www.pds.twi.tudelft.nlhttp://www.pds.twi.tudelft.nl Thank you! Questions? Comments? M.N.Yigitbasi@tudelft.nl http://www.st.ewi.tudelft.nl/~nezih/ M.N.Yigitbasi@tudelft.nl http://www.st.ewi.tudelft.nl/~nezih/


Download ppt "11-2-2014 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel."

Similar presentations


Ads by Google