Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1, Jidong Zhai 1, Xiaosong Ma 2,3 Xiongchao Tang.

Similar presentations


Presentation on theme: "Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1, Jidong Zhai 1, Xiaosong Ma 2,3 Xiongchao Tang."— Presentation transcript:

1 Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1, Jidong Zhai 1, Xiaosong Ma 2,3 Xiongchao Tang 1, Wenguang Chen 1 THU 1 & NCSU 2 & ORNL 3

2 “HPC in Cloud” Is Trend? HPC in cloud ◦ On-demand ◦ Elastic ◦ No upfront cost ◦ Saving management fee ◦…◦… More and more engineers start using HPC cloud 2

3 “On-demand Model” Is Effective? Reserved instance pricing model ◦ 6 reserved instance classes in Amazon EC2 CCI ◦ Discounted charge rate with upfront fee 3 6.8% 38.3%

4 “On-demand Model” Is Lower Utilized! Reserved instance pricing model ◦ Difficult to be utilized for individuals SDSC Data Star system trace ◦ 391 day ◦ 460 users ◦ 1 user, 1 3Y-Light 4 Instance TypeUsed 3Y-Medium0 3Y-Light0.15 % On-demand99.85 %

5 Short Jobs Hourly-charging granularity Several minutes delay when start Maybe I should pack my short jobs to lower my rental cost. 5 70%

6 Our Proposal Semi-Elastic Cluster computing model ◦ Organization-owned ◦ Cloud-based virtual cluster ◦ Dynamic capacity ◦ Sharing resources between users 6

7 7 SEC Architecture

8 SEC Model Traditional local cluster 8 Wait time: 15 min Utilization: 56.7%

9 SEC Model Traditional local cluster Pure on-demand cloud 9 Wait time: 0 min Utilization: 70.8% Wait time: 15 min Utilization: 56.7%

10 SEC Model Traditional local cluster Pure on-demand cloud Semi-elastic cluster 10 Wait time: 0 min Utilization: 70.8% Wait time: 0 min Utilization: 77.3% Wait time: 15 min Utilization: 56.7%

11 Aggregated Workloads 11 SEC trace slices with SDSC Data Star workload 3Y-Medium,73.66 % 3Y-Light,15.75 % On-Demand, 10.59 %

12 SEC Challenges Finer-tuned capacity ◦ Intelligently controlled capacity according to job queue and submission history ◦ Tradeoff between responsiveness and lower cost Aggregated workloads ◦ Predict long-term resource requirements ◦ Auto resource provisioning Evaluation without real traces 12

13 Job Scheduling & Cluster Size Scaling Problem definition ◦ Configurable wait time constraint ◦ Minimize total cost Batch scheduling ◦ Extended backfilling algorithms ◦ Dynamic resource provisioning Resource provisioning strategies ◦ Wait-time bounded instance acquisition ◦ Expanding capacity according to job queue Job placement policies 13

14 Experimental Setup Workload ◦ 391-day trace from SDSC’s Data Star system Cloud platform ◦ Amazon's EC2 Cluster Compute Instances (CCIs) ◦ Eight Extra Large Instances (cc2.8xlarge) ◦ 16 processors (2 × Intel Xeon E5-2670, eight-core) ◦ 60.5 GB memory ◦ 4 × 850 GB instance storage 14

15 SEC vs. On-demand Model ◦ Individual ◦ NoWait ◦ SEC-On-Demand ◦ SEC-Hybrid 15 Trace: SDSC DS 61.0% 13.3%

16 SEC vs. Local Cluster ◦ Traditional local cluster ◦ SEC-Hybrid 16 Trace: SDSC DS

17 Offline Reserved Instance Configuration Offline configuration problem ◦ Input  Utilization matrix U n×m (from given cluster capacity trace)  Pricing classes {C 0, C 1, C 2,…C h } ◦ Solution  Purchased instance matrix: R n×m, where R i,k ≥0 ◦ Optimization  Minimizing total rental cost A hard problem! 17

18 Choosing larger time interval, e.g. a week ◦ Reduce computation granularity Offline Forward Greedy Algorithm 18 Running: At beginning of each time interval Steps: 1) Calculate all instances' utilization level based on given future demands 2) Identify first economical class for each instance 3) Summarize provisioning plan 4) Compare provisioning plan with current inventory and decide amount of purchased 5) Adjusting active reserved instances Running: At beginning of each time interval Steps: 1) Calculate all instances' utilization level based on given future demands 2) Identify first economical class for each instance 3) Summarize provisioning plan 4) Compare provisioning plan with current inventory and decide amount of purchased 5) Adjusting active reserved instances

19 Offline Optimal-Competitive Algorithm 19 Transform the original pricing classes into new classes TotalCost (C k ) ≥ TotalCost(C k ’) = Transform the original pricing classes into new classes TotalCost (C k ) ≥ TotalCost(C k ’) =

20 Online Reserved Instance Configuration Use weekly time intervals ◦ Reduce computation complexity ◦ Reduce short-term variance ◦ Less impact on long-term reservation decisions Evolution model ◦ Assumed a quadratic polynomial model 20

21 Long-Term Demand Prediction Classical Exponential Smoothing (ES) method ◦ Relatively simple ◦ Quite robust for processing non-stationary noises ◦ Widely used Our prediction method ◦ Extended Holt's double-parameter ES method ◦ Auto adjusting smoothing factors 21

22 Verifying Workloads Validation workloads 22 Bounded by fixed machine size 6 real traces HPC cluster Semi-elastic machine size SEC Not bounded 6 SNS traces SNS

23 SNS-based Synthetic Workloads 23 Search Traffic Search Traffic Active Users Active Users SNS Active Users Active Users Resource Demand Resource Demand HPC SNS search traffic HPC trace slices Synthetic workload Synthetic Workload Generation

24 Reserved Instance Configuration Analysis HPC trace 24

25 Reserved Instance Configuration Analysis Synthetic workloads using SNS trace 25

26 Overhead Analysis with SEC Prototype Overhead for data protection with instance reuse ◦ Reformatting EC2 ephemeral 4×845GB disks ◦ 3.4 seconds Configuration overhead when requesting new instances ◦ Configuring host names, hosts file, file system, etc. ◦ About 8.0 seconds Configuration overhead when releasing instances ◦ About 5.0 seconds 26

27 Conclusion SEC: A new execution model for HPC ◦ Organization-owned dynamic cloud-based clusters ◦ Reduced costs by workload aggregations ◦ Better responsiveness through instance reuse ◦ Higher utilization level by efficient utilizing residual resources SEC can potentially become a viable alternative to organizations owning and managing physical clusters 27

28 Related Work [1] Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/, 2012. [2] SLURM: A Highly Scalable Resource Manager. https://computing.llnl.gov/linux/slurm/, 2012. [3] StarCluster. http://web.mit.edu/star/cluster/, 2012. [4] Google Trends. http://www.google.com/trends/, 2013. [5] E. S. Gardner Jr. Exponential smoothing: The state of the art. Journal of Forecasting, 1985. [6] W. Voorsluys, S. Garg, and R. Buyya. Provisioning spot market cloud resources to create cost-effective virtual clusters. Algorithms and Architectures for Parallel Processing, 2011. [7] H. Zhao, M. Pan, X. Liu, X. Li, and Y. Fang. Optimal resource rental planning for elastic applications in cloud market. In Parallel & Distributed Processing Symposium (IPDPS), IEEE, 2012. 28

29 Acknowledgments We would thanks to ◦ HPC Workloads archive ◦ Anonymous reviewers and shepherd ◦ Research grants from Chinese 863 project, NSF grants, a joint faculty appointment between ORNL and NCSU, and a senior visiting scholarship at Tsinghua University 29

30 Thanks! 30

31 Classical HPC traces SDSC’s Data Star, SDSC's Blue Horizon (SDSC Blue), SDSC's IBM SP2 (SDSC SP2), Cornell Theory Center IBM SP2 (CTC SP2), High Performance Computing Center North (HPC2N), Sandia Ross cluster(Sandia Ross). 31 Variance in node-hour per active user

32 32 Synthesis workloads SNS search trace from Google Trends

33 Cost-responsiveness analysis 33 Local cluster expense items

34 34 Impact of scheduling parameters

35 35 Impact of scheduling parameters Average wait time Expanding strategies Wait Time Threshold

36 36 Impact of scheduling parameters Average charge rate Expanding strategies Wait Time Threshold

37 Overhead Analysis with SEC Prototype Overhead for data protection with instance reuse ◦ Reformatting EC2 ephemeral 4×845GB disks ◦ 3.4 seconds Configuration overhead when requesting new instances ◦ Configuring host names, hosts file, and the file system ◦ Set up user accounts and add nodes to the SLURM partition. Configuration overhead when releasing instances 37


Download ppt "Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1, Jidong Zhai 1, Xiaosong Ma 2,3 Xiongchao Tang."

Similar presentations


Ads by Google