Presentation is loading. Please wait.

Presentation is loading. Please wait.

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics By Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram.

Similar presentations


Presentation on theme: "CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics By Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram."— Presentation transcript:

1 CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics
By Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang NSDI, 2017 Presented by Ari Ball-Burack Tuesday, 19 November, 2019

2 Context Big data analytics applications are run on clusters of virtual machines (VMs) Choosing a cloud configuration (#instances, CPU, memory, disk, network) is important (recurring jobs) But it can be difficult and costly Relationship between configuration, running time, and cost is complex and nonlinear Brute force is expensive — >40 AWS & Azure instance types, 18 Google types + customizable memory & #cores, ∞ cluster sizes

3 Related Work Agarwal et al. & Ferguson et al. report that up to 40% of analytics jobs are recurring Can model application performance, but hard to adapt Application structure is required for accurate modeling Static search has high overhead Coordinate descent (one dimension at a time) Suboptimal in the case of local minima Venkataraman et al.: Ernest, 2016 Trains a performance model for machine learning applications Doesn’t work well for other applications e.g. SQL queries

4 Contributions “…we just need a model that is accurate enough for us to separate the best configuration from the rest.” Uses Bayesian Optimization to minimize “cost” subject to a running time constraint Non-parametric (no pre-defined cost function format), near-optimal solution in few samples, and uncertainty tolerance

5 Bayesian Optimization Specifics
Cost is a running time & price product Acquisition function: maximize Expected Improvement Prior function: Gaussian Process Start with I (= 3) initial samples Stop when EI is less than thresh (= 10%) and at least N (= 6) configurations have been explored Built on top of Spearmint in Python

6 Bayesian Optimization Example

7 Experimental Results Evaluated using 5 types of analytics applications on 66 (AWS EC2) cloud configurations Compared to exhaustive search, coordinate descent, random search with a budget, and Ernest CherryPick finds exact optimum 45-90% of the time, within 5% of optimum at median — better with lower EI Exhaustive search requires times time, 6-9 times cost More stable (slightly better median, closer-to-optimal tail) than coordinate descent Better & more stable than random search w/ budget At 4x search cost, RSwB performs similarly Similar run cost to Ernest but with lower search cost

8 Experimental Results

9 Evaluation Strengths Clear improvement over existing systems (Ernest) and more naïve approaches Authors are thorough in descriptions of drawbacks (dependence on representative workloads, acquisition function slowdown for larger search spaces) and directions for future research (Monte Carlo a.f., non- Gaussian Process prior function)

10 Evaluation Critiques “45-90% chance” of finding optimal configuration is vague! Representative workload problem — acknowledged but still difficult (requires human intuition, sometimes purposeful inflation) “We admit that a better prior [than Gaussian Process] might be found given some domain knowledge of specific applications” Why not make this an optional parameter, as EI value is? For recurring jobs, might a 6-9 times search cost be worth a 5% reduction in running cost?

11 Questions?


Download ppt "CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics By Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram."

Similar presentations


Ads by Google