Presentation is loading. Please wait.

Presentation is loading. Please wait.

MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Similar presentations


Presentation on theme: "MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,"— Presentation transcript:

1 MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, J. Wilkes

2 Large Scale Storage Systems Very Difficult to configure and design Very Difficult to configure and design 10 – 100s of host computers 10 – 100s of host computers 10 – 100s of storage devices 10 – 100s of storage devices 10 – 1000s of Disks/Logical Volumes 10 – 1000s of Disks/Logical Volumes Terabytes of capacity Terabytes of capacity Meet throughput demands Meet throughput demands Maximize capacity utilization Maximize capacity utilization Automation would be nice… Automation would be nice…

3 MINERVA Subdivide problem into three stages Subdivide problem into three stages Choose correct device set Choose correct device set Choose correct configuration parameters Choose correct configuration parameters Map user data onto devices Map user data onto devices NP-hard NP-hard Architectural elements Architectural elements Declarative descriptions of storage workload requirements Declarative descriptions of storage workload requirements Constraint-based problem representation Constraint-based problem representation Optimization strategies and heuristics Optimization strategies and heuristics Analytic performance models Analytic performance models

4 MINERVA Inputs Workload Description Workload Description Data type descriptions and access patterns Data type descriptions and access patterns Two types Two types Stores Stores Logically contiguous data (db table or filesystem) Logically contiguous data (db table or filesystem) Streams Streams Sequences of accesses on a store (pattern and throughput) Sequences of accesses on a store (pattern and throughput) Device Descriptions Device Descriptions Disk information (number, size, and type) Disk information (number, size, and type) Array information (number of LUNs) Array information (number of LUNs)

5 MINERVA Objects

6 MINERVA Outputs Assignment Assignment Device Set taken from Device Descriptions Device Set taken from Device Descriptions Mapping of stores to devices Mapping of stores to devices 2 n n m possible configurations 2 n n m possible configurations O((2m) m ) complexity O((2m) m ) complexity Goal Goal Minimum cost that meets performance requirements Minimum cost that meets performance requirements Effector tool Effector tool Takes assignment as input Takes assignment as input Automated configuration of physical devices Automated configuration of physical devices

7 Storage System Lifecycle

8 Architecture Array Allocation Array Allocation Tagger Tagger Assigns a preferred RAID level Assigns a preferred RAID level Allocator Allocator Determines number of arrays Determines number of arrays Array Configuration Array Configuration Array Designer Array Designer Actually configures the arrays Actually configures the arrays Store Assignment Store Assignment Solver Solver Assigns stores to LUNs Assigns stores to LUNs Optimizer Optimizer Prunes unused resources and balances load Prunes unused resources and balances load Evaluator Evaluator Verifies design with analytic models Verifies design with analytic models

9 Architecture

10 MINERVA Process

11 Analytical Device Models Determines feasibility Determines feasibility Predicted throughput error rate = 20% Predicted throughput error rate = 20% Streams Streams Modeled as ON-OFF Markov-modulated Poisson process Modeled as ON-OFF Markov-modulated Poisson process Arrays Arrays Array controller, bus connection, disks Array controller, bus connection, disks Case Study Case Study HP SureStore Model 30/FC High Availability disk array HP SureStore Model 30/FC High Availability disk array

12 Tagger Choose storage class based on access pattern Choose storage class based on access pattern RAID 1/0 or RAID 5 RAID 1/0 or RAID 5 Rule Based Rule Based 1.Determines capacity bound stores 2.Estimates average number of IO ops per sec. IOPS IOPS

13 Capactiy Rules Calculated per GB of storage Calculated per GB of storage Capacity bound = RAID 5 Capacity bound = RAID 5

14 IOPS Estimation RAID level = least number of per-disk IOPS RAID level = least number of per-disk IOPS

15 Allocator reasonable set of arrays reasonable set of arrays 3 steps 3 steps Consider type and number of arrays Consider type and number of arrays Consider array configurations Consider array configurations Consider LUN divisions and RAID configurations Consider LUN divisions and RAID configurations

16 Allocator models Can only use analytic device models Can only use analytic device models Ignores stream phasing Ignores stream phasing Rillifier handles large resource demands Rillifier handles large resource demands Distribute workload among different LUNs Distribute workload among different LUNs Stores become shards Stores become shards Excessive capacity requirements Excessive capacity requirements Streams become rills Streams become rills Excessive throughput requirements Excessive throughput requirements

17 Allocator Search Uses Branch-and-Bound strategy Uses Branch-and-Bound strategy Determines number of array types Determines number of array types Chooses lowest cost that supports workload Chooses lowest cost that supports workload Searches array configurations Searches array configurations Starts with mixed arrays Starts with mixed arrays Iteratively converts arrays to dedicated types Iteratively converts arrays to dedicated types Branch and Bound-bias dedicated Branch and Bound-bias dedicated Searches in reverse order starting with dedicated types Searches in reverse order starting with dedicated types Calls array designer with configuration Calls array designer with configuration If array designer fails, search continues If array designer fails, search continues

18 Array Designer Determines LUN sizes and array parameters Determines LUN sizes and array parameters Starts with simple cases of equal size LUNs Starts with simple cases of equal size LUNs Also considers greedy configuration Also considers greedy configuration Workload description determines LUN size Workload description determines LUN size Relies on Optimizer to take care of unused capacity Relies on Optimizer to take care of unused capacity Target disk assignment done with round robin across buses Target disk assignment done with round robin across buses

19 Solver Assigns stores to LUNs Assigns stores to LUNs Multidimensional constrained bin-packing Multidimensional constrained bin-packing Uses analytic device models to evaluate objective function Uses analytic device models to evaluate objective function Constraints: Constraints: LUN capacity LUN capacity LUN phased utilization LUN phased utilization Array bus bandwidth Array bus bandwidth Array controller utilization Array controller utilization

20 Solver Heuristics Simple Random Simple Random 50 random cases using first fit 50 random cases using first fit Toyoda Toyoda Best fit using gradient function Best fit using gradient function Objective function combined with economic utilization Objective function combined with economic utilization (1/penalty – lun_cost) (1/penalty – lun_cost) Favors LUNS already in use or low cost Favors LUNS already in use or low cost LUNs filled in order of increasing cost LUNs filled in order of increasing cost Minimizes resource contention Minimizes resource contention

21 Solver Heuristics 2 ToyodaWeighted ToyodaWeighted Maps gradients against remaining available resources Maps gradients against remaining available resources Maps stores to LUNs such that utilization is balanced Maps stores to LUNs such that utilization is balanced Objective_function * cos(α) Objective_function * cos(α) Objective_function = max_lun_cost – lun_cost Objective_function = max_lun_cost – lun_cost Minimizes cost Minimizes cost

22 Toyoda and ToyodaWeighted

23 Optimizer Reruns Solver against configuration Reruns Solver against configuration Reduces required arrays Reduces required arrays Runs ToyodaWeighted with new objective function Runs ToyodaWeighted with new objective function Objective_value = 1 – lun_utilization Objective_value = 1 – lun_utilization Assigns stores to underutilized LUNs Assigns stores to underutilized LUNs Variations Variations Simple Random Simple Random Randomized first fit, chooses lowest utilization variance Randomized first fit, chooses lowest utilization variance Simple Balanced Simple Balanced Round robin first fit, based on capacity and utilization constraints Round robin first fit, based on capacity and utilization constraints

24 Clusterer Addresses performance scaling issues Addresses performance scaling issues With many stores runtime grew to days With many stores runtime grew to days Combines multiple stores into a cluster Combines multiple stores into a cluster Cluster is mapped instead of stores Cluster is mapped instead of stores Cluster rules based on observation Cluster rules based on observation 10MB/s bandwidth 10MB/s bandwidth 2GB size 2GB size Increases cost ~3% Increases cost ~3%

25 Evaluation Analytic model performance predictions Analytic model performance predictions Evaluate sensitivity to workload changes Evaluate sensitivity to workload changes Effect of design changes Effect of design changes Measure live system Measure live system

26 Model Validation Based on single FC-30 Based on single FC-30 Ran performance tests on physical system Ran performance tests on physical system Compared results to model predictions Compared results to model predictions Results showed mean error rate of +5.4% Results showed mean error rate of +5.4% Range of [-11%, +19%] Range of [-11%, +19%]

27 Safety and Sensitivity Examined scaling of workload parameters Examined scaling of workload parameters Start with baseline workload, then modify a single parameter Start with baseline workload, then modify a single parameter Wanted to have 3 effects Wanted to have 3 effects Mixing of appropriate RAID levels Mixing of appropriate RAID levels Requiring non-trivial number of arrays (2+) Requiring non-trivial number of arrays (2+) Balanced store performance requirements Balanced store performance requirements

28 Scaling Store Size and Bandwidth Store size scaling Store size scaling System becomes capacity bound System becomes capacity bound Creates RAID 5 LUNs Creates RAID 5 LUNs System size scales linearly with store size System size scales linearly with store size Bandwidth scaling Bandwidth scaling Ratio of RAID 1/0 to RAID 5 increases linearly Ratio of RAID 1/0 to RAID 5 increases linearly

29

30 Scaling Number of Stores Number of arrays scales linearly with stores Number of arrays scales linearly with stores

31 Running time Quadratic increase with number of stores Quadratic increase with number of stores

32 Workload Variability Workload attributes randomly taken from log-normal distribution Workload attributes randomly taken from log-normal distribution Baseline values = mean distribution values Baseline values = mean distribution values Capacity utilization drops with increased variability Capacity utilization drops with increased variability RAID 5 LUNs increase RAID 5 LUNs increase Segmentation increases Segmentation increases

33 Workload variance

34 Whole System Validation MINERVA vs. Human Expert MINERVA vs. Human Expert 3 aspects 3 aspects Comparison of resultant system cost Comparison of resultant system cost Comparison of application performance Comparison of application performance Low runtime and minimal human interaction Low runtime and minimal human interaction Based on TPC-D benchmark Based on TPC-D benchmark Decision Support system based on DB queries Decision Support system based on DB queries Human designers from HP system benchmarking team Human designers from HP system benchmarking team

35 Execution Times


Download ppt "MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,"

Similar presentations


Ads by Google