Presentation on theme: "1 General and Effective Monetary Optimizations for Workflows in IaaS Clouds Amelie Chi Zhou Xtra Computing Group"— Presentation transcript:
1 General and Effective Monetary Optimizations for Workflows in IaaS Clouds Amelie Chi Zhou email@example.com Xtra Computing Group http://pdcc.ntu.edu.sg/xtra Nanyang Technological University, Singapore presented by
Workflows for Scientific Applications Workflows are structured –Tasks have very different I/O and computational behavior. Real-world workflows –Montage, Ligo, Epigenomics, water-simulation Workflow ensembles [Malawski et al., SC’12] –Composition of workflows with similar structures and different parameters and priorities 2 Montage Ligo Epigenomics
Running Workflows on IaaS Clouds Define IaaS clouds –Provide fundamental computing resources for users to provision –Examples: Amazon EC2, Rackspace, OpenStack, Google Compute Engine … Example projects –Montage, Broadband, Epigenomics on Amazon EC2 [Juve et al., eScience’09] –Astronomy applications on Nimbus, Eucalyptus, and EC2 [Vöckler et al., ScienceCloud’11] –… 3
Workflows in IaaS Clouds Features of IaaS clouds –Pay as you go (e.g., hourly pricing scheme) –Rich and evolving cloud offerings Research problems –Monetary cost optimizations –Performance optimizations –Elasticity –Fault tolerance –… 4 Are the current solutions ideal/sufficient?
Monetary Cost Opportunities Instance types –Amazon EC2 provides 29 types of instances Instance reuse –Hourly charging scheme Pricing schemes –On-demand, spot and reserved pricing V.S. Tasks can have very different I/O and computational behavior. Workflows have different deadline and monetary constraints. Users may have various workflow application scenarios. 5
Current Solutions are Far From Ideal Problems of current approaches –Auto-scaling [Mao et al., SC’11] resource management More effective optimizations 29% less cost –Assume static cloud performance and pricing Cloud dynamics + spot instances 73% less cost –Heuristic-based cost and performance optimizations are specific. They are likely to be suboptimal in evolving and diversified workflow applications. 6 29% 73%
Our Research Efforts Effectiveness –Dyna: Minimize the monetary cost of workflows, addressing both the price and performance dynamics in clouds Generality –ToF: Define transformation operations to model common cost and performance optimizations –Deco: Design a declarative language called WLog to specify various workflow optimization problems 7 The focus of this presentation.
Overall Design We design general workflow optimization frameworks to fully explore the optimization opportunities that lie in workflows 8 Wlog programs Transformation- based Optimizer Problem specification layer Optimization layer Execution layer Deco ToF
9 Outline Related Work Generalized Optimization Frameworks –General transformations for cost and performance optimizations –A declarative language for workflow optimization problems Conclusions
Related Work Performance and monetary cost optimization heuristics –Auto-scaling [Mao et al., SC’11] Fixed sequence of workflow optimizations –Workflow scheduling with performance and cost constraints [Kllapi et al., SIGMOD’11] Consider only one on-demand instance type 10 The heuristics are specifically designed for specific optimization problems and the optimization opportunities are not fully explored.
Related Work (cont’d) Generalized optimization frameworks: overhead is a problem –Generalized bin-ball abstraction for resource allocation [Rai et al., SoCC’12] GPU acceleration Not always convenient to model a problem with the bin-ball model –Declarative language to model a wide range of COPs [Liu et al., VLDB’12] Distributed systems Ignorant to the special features and optimization opportunities in workflows 11 There is no general optimization framework for workflows.
12 Outline Related Work Generalized Optimization Frameworks –General transformations for cost and performance optimizations –A declarative language for workflow optimization problems Conclusions
ToF: A Transformation-based Optimization Framework Outline –Main contributions of this work –System overview –Design details –Evaluation results 13
Main Contributions This study has two major contributions –We define a series of common transformations for the performance and cost optimizations of workflows. –We design a light-weight optimizer to guide the transformation process. 14
Workflow Transformation Definitions –Instance assignment graph Each node represents instance configuration for a task. Same structure as the workflow DAG –Transformation operation Structural change in the instance assignment graph 15 0 13 Transformations 0 1,2 2 3 0 2,3 1 0 1,3 2 0 1,2,3
System Overview Design ideas –Two types of transformations Main schemes: reduce cost Auxiliary schemes: help main schemes to reduce cost –Use cost model to guide the transformation optimization –Periodical batch optimization Maximize instance sharing and reuse Reduce optimizer overhead 16 Main Schemes Auxiliary Schemes Termin ation? Output Cost model No Yes Optimization process in one plan period
Design Details Transformation operations –Main schemes: Merge, Demote –Auxiliary schemes: Move, Promote, Split, Co- scheduling –Transformations can combine with each other 17
Using Transformations Example of using Move and Merge operations 18 Only transform shape Reduces cost
Experimental Setup Workload –Montage, Ligo and Mixed –Workflow submission rate follows Poisson distribution Comparisons –ToF –Baseline: only implement the initial instance configuration –Auto-scaling [Mao et al., SC’11] –Greedy: randomly select the transformation during optimization All results are normalized to Baseline 19
Evaluation Results on Cost Optimizations 20 Optimization results under the pricing scheme of Amazon EC2. ToF obtains the lowest monetary cost on all workflows. Over Auto-scaling by 29% Over Baseline by 27% Over Greedy by 17% 29% 17% 21% 16% 28% 15%
12% Evaluation Results on Performance Optimizations 21 Performance optimization results. ToF obtains the lowest average execution time on all workflows. Over Auto-scaling by 21% Over Baseline by 21% Over Greedy by 18% 21% 18% 21% 8% 16%
22 Outline Related Work Generalized Optimization Frameworks –General transformations for cost and performance optimizations –A declarative language for workflow optimization problems Conclusions
Deco: A Declarative Optimization Framework Outline –Main contributions of this work –System overview –A declarative language for workflows –GPU-accelerated search engine –Evaluation results 23
Main Contributions This work has three main contributions –A declarative language for resource provisioning of scientific workflows in IaaS clouds –A generalized optimization framework to serve a wide range of optimization problems –Fast GPU-based implementation for low optimization overhead 24
Motivating Ideas Why declarative language? –Declarative languages like HTML, SQL, Prolog –Concise and clear –Focus on what to do rather than how to do it Why GPU acceleration? –Generic search has large runtime overhead –Monte Carlo method is used for probabilistic approximation [Raedt et al. 2007] which is suitable for GPU acceleration 25
System Overview Overview of the Deco system –WLog, a declarative language for workflows –GPU-Accelerated search engine 26
WLog – A Declarative Language for Workflows WLog is designed based on Prolog A WLog program describing a workflow scheduling problem 27 goal minimize Ct in totalcost(Ct). cons deadline(95%, 10h). var configs(Tid, Vid) forall task(Tid) and Vm(Vid). r1 import(amazonec2). r2 import(montage). r3 path(X,Y,Y,C) :- edge(X,Y), exetime(X,Vid,T), C is T. r4 path(X,Y,Z,C) :- edge(X,Z), Zn==Y, path(Z,Y,Z2,C1), exetime(X,Vid,T), C is T+C1. r5 maxtime(Path,T) :- setof([Z,C],path(root,tail,Z,C),Set), max(Set,[Path,T]). r6 cost(Tid,Vid,C) :- price(Vid,Up), exetime(Tid,Vid,T), C is ceil(T/60.0)*Up. r7 totalcost(Ct) :- findall(C,cost(Tid,Vid,C),Bag), sum(Bag,Ct). problem specific keywords: goal Optimization goal defined by the user. cons Problem constraint defined by the user. var Problem variable to be optimized. deadline(P, D) A probabilistic deadline requirement that D is at the P-th percentile of workflow execution time. import(cloud) Import the cloud-related facts from the cloud metadata. import(daxfile) Import the workflow-related facts generated from a DAX file.
GPU Accelerations Explore vs. exploit –By exploit, partial results are prioritized. –Exploration traverses the search tree level by level which offers GPU a opportunity to parallel the searching process. Memory optimizations –Minimize the usage of global memory –Reduce accesses to shared memory 28
Evaluation Settings Three use cases –Workflow scheduling problem –Workflow ensemble [Malawski et al., SC’12] Goal: execute more workflows with high priorities within given budget and deadline –Follow-the-cost: multiple workflows, multiple datacenters Comparison for workflow ensemble problem –Algorithms: Deco vs. SPSS [Malawski et al., SC’12] –Ensemble types: constant, Uniform(Un)sorted, Pareto(Un)sorted –Generate 5 budgets between [MinBudget, MaxBudget] All results are normalized to that of SPSS 29
Evaluation Results Under all ensemble types and budget constraints –Deco obtains better score metric value than SPSS 30 Obtained score results of SPSS and Deco with different ensemble types under budget 1 to 5 and fixed deadline. Workflow type is Ligo.
Evaluation Results (cont’d) Programmability of WLog in Deco (lines of codes) –Users (re-)implement the workflow application in C++. –With Deco, users implement in WLog. 31 Use CaseC++ Implementation WLog Workflow Scheduling195010 Workflow Ensemble196013 Follow-the-Cost223015 Deco allows much lower coding complexity than manual implementation.
Performance Speedup of GPUs 32 Performance speedup of GPU implementation over CPU implementation on a single core for the three applications 437x 93x 31x
33 Outline Related Work Generalized Optimization Frameworks –General transformations for cost and performance optimizations –A declarative language for workflow optimization problems Conclusions
34 Conclusions IaaS clouds have become an attractive platform for hosting workflows. Despite recent efforts in monetary cost optimizations of workflows in the cloud, there is still a large room for further improvements. Due to the complex cloud offerings and problem specifications, we develop general optimization frameworks. –ToF achieves up to 29% improvement over the state-of- the-art algorithm. –Deco achieves up to 77% improvement over the state-of- the-art algorithm.
Future Work Energy-efficient Cloud –Reduce the investment cost of cloud provider to potentially reduce instance price with energy-efficient hardware/software Optimization opportunities in Multi-Cloud –Utilize different cloud offerings, e.g., instance types, to further reduce cost 35
References Maciej Malawski, Gideon Juve, Ewa Deelman, and Jarek Nabrzyski. 2012. Cost- and deadline- constrained provisioning for scientific workflow ensembles in IaaS clouds. SC '12. 11 pages. Juve, G.; Deelman, E.; Vahi, K.; Mehta, G.; Berriman, B.; Berman, B.P.; Maechling, P., "Scientific workflow applications on Amazon EC2," E-Science Workshops, pp.59,66, 9-11 Dec. 2009. Jens-Sönke Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, and Bruce Berriman. 2011. Experiences using cloud computing for a scientific workflow application. ScienceCloud '11. P15-P24. 2011. Ming Mao, Marty Humphrey: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. SC 2011: 49. Herald Kllapi, Eva Sitaridi, Manolis M. Tsangaris, and Yannis Ioannidis. 2011. Schedule optimization for data processing flows on the cloud. SIGMOD '11. 289-300. Anshul Rai, Ranjita Bhagwan, and Saikat Guha. 2012. Generalized resource allocation for the cloud. SoCC '12. Article 15, 12 pages. Changbin Liu, Lu Ren, Boon Thau Loo, Yun Mao, and Prithwish Basu. 2012. Cologne: a declarative distributed constraint optimization platform. Proc. VLDB Endow. 5, 8 752-763. L. De Raedt, A. Kimmig, and H. Toivonen, ProbLog: A probabilistic Prolog and its application in link discovery, IJCAI 2007, pages 2462-2467, 2007. Amelie Chi Zhou, Bingsheng He, Transformation-based Monetary Cost Optimizations for Workflows in the Cloud, accepted by TCC, Dec 2013. Amelie Chi Zhou, Bingsheng He, A declarative optimization framework for workflows in IaaS clouds, submitted to SC 2014. Amelie Chi Zhou, Bingsheng He, Cheng Liu, Monetary Cost Optimizations for Hosting Workflow-as-a- Service in IaaS Clouds, submitted to ToC, 2014. 36
37 Thank you! Amelie Chi Zhou firstname.lastname@example.org Advisor: Bingsheng He email@example.com Xtra Computing Group http://pdcc.ntu.edu.sg/xtra Nanyang Technological University, Singapore