MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

On Sequential Experimental Design for Empirical Model-Building under Interval Error Sergei Zhilin, Altai State University, Barnaul, Russia.
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
I/O InterfaceCS510 Computer ArchitecturesLecture Lecture 17 I/O Interfaces and I/O Busses.
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Empirical Algorithmics Reading Group Oct 11, 2007 Tuning Search Algorithms for Real-World Applications: A Regression Tree Based Approach by Thomas Bartz-Beielstein.
Hash Tables.
1 CS533 Modeling and Performance Evaluation of Network and Computer Systems Capacity Planning and Benchmarking (Chapter 9)
Capacity Planning For Products and Services
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
The strength of routing Schemes. Main issues Eliminating the buzz: Are there real differences between forwarding schemes: OSPF vs. MPLS? Can we quantify.
Lindsey Bleimes Charlie Garrod Adam Meyerson
Database System Concepts and Architecture
Part IV: Memory Management
© 2004, D. J. Foreman 1 Scheduling & Dispatching.
Design of Experiments Lecture I
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Multi‑Criteria Decision Making
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
Managing storage requirements in VMware Environments October 2009.
Hippodrome: Automatic Global Storage Adaptation Eric Anderson, Mustafa Uysal, Michael Hobbs, Guillermo Alvarez, Mahesh Kallahalla, Kim Keeton, Arif Merchant,
Chapter 6 Database Design
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Chapter 9 Database Design
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Toolbox for Dimensioning Windows Storage Systems Jalil Boukhobza, Claude Timsit 12/09/2006 Versailles Saint Quentin University.
CSC271 Database Systems Lecture # 30.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Week 4 Lecture Part 3 of 3 Database Design Samuel ConnSamuel Conn, Faculty Suggestions for using the Lecture Slides.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Line Balancing Problem
Energy Aware Consolidation for Cloud Computing Srikanaiah, Kansal, Zhao Usenix HotPower 2008.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
Linear Programming Erasmus Mobility Program (24Apr2012) Pollack Mihály Engineering Faculty (PMMK) University of Pécs João Miranda
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Network design Topic 6 Testing and documentation.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
File management and Performance. File Systems Architecture device drivers physical I/O (PIOCS) logical I/O (LIOCS) access methods File organization and.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Machine Learning 5. Parametric Methods.
Tutorial I: Missing Value Analysis
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
OPERATING SYSTEMS CS 3502 Fall 2017
Data Transformation: Normalization
Resource Elasticity for Large-Scale Machine Learning
Chapter 6 Database Design
ISP and Egress Path Selection for Multihomed Networks
Predictive Performance
Physical Database Design
Basic Training for Statistical Process Control
Basic Training for Statistical Process Control
TECHNICAL SEMINAR PRESENTATION
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Statistical Thinking and Applications
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Veitch, J. Wilkes

Large Scale Storage Systems Very Difficult to configure and design Very Difficult to configure and design 10 – 100s of host computers 10 – 100s of host computers 10 – 100s of storage devices 10 – 100s of storage devices 10 – 1000s of Disks/Logical Volumes 10 – 1000s of Disks/Logical Volumes Terabytes of capacity Terabytes of capacity Meet throughput demands Meet throughput demands Maximize capacity utilization Maximize capacity utilization Automation would be nice… Automation would be nice…

MINERVA Subdivide problem into three stages Subdivide problem into three stages Choose correct device set Choose correct device set Choose correct configuration parameters Choose correct configuration parameters Map user data onto devices Map user data onto devices NP-hard NP-hard Architectural elements Architectural elements Declarative descriptions of storage workload requirements Declarative descriptions of storage workload requirements Constraint-based problem representation Constraint-based problem representation Optimization strategies and heuristics Optimization strategies and heuristics Analytic performance models Analytic performance models

MINERVA Inputs Workload Description Workload Description Data type descriptions and access patterns Data type descriptions and access patterns Two types Two types Stores Stores Logically contiguous data (db table or filesystem) Logically contiguous data (db table or filesystem) Streams Streams Sequences of accesses on a store (pattern and throughput) Sequences of accesses on a store (pattern and throughput) Device Descriptions Device Descriptions Disk information (number, size, and type) Disk information (number, size, and type) Array information (number of LUNs) Array information (number of LUNs)

MINERVA Objects

MINERVA Outputs Assignment Assignment Device Set taken from Device Descriptions Device Set taken from Device Descriptions Mapping of stores to devices Mapping of stores to devices 2 n n m possible configurations 2 n n m possible configurations O((2m) m ) complexity O((2m) m ) complexity Goal Goal Minimum cost that meets performance requirements Minimum cost that meets performance requirements Effector tool Effector tool Takes assignment as input Takes assignment as input Automated configuration of physical devices Automated configuration of physical devices

Storage System Lifecycle

Architecture Array Allocation Array Allocation Tagger Tagger Assigns a preferred RAID level Assigns a preferred RAID level Allocator Allocator Determines number of arrays Determines number of arrays Array Configuration Array Configuration Array Designer Array Designer Actually configures the arrays Actually configures the arrays Store Assignment Store Assignment Solver Solver Assigns stores to LUNs Assigns stores to LUNs Optimizer Optimizer Prunes unused resources and balances load Prunes unused resources and balances load Evaluator Evaluator Verifies design with analytic models Verifies design with analytic models

Architecture

MINERVA Process

Analytical Device Models Determines feasibility Determines feasibility Predicted throughput error rate = 20% Predicted throughput error rate = 20% Streams Streams Modeled as ON-OFF Markov-modulated Poisson process Modeled as ON-OFF Markov-modulated Poisson process Arrays Arrays Array controller, bus connection, disks Array controller, bus connection, disks Case Study Case Study HP SureStore Model 30/FC High Availability disk array HP SureStore Model 30/FC High Availability disk array

Tagger Choose storage class based on access pattern Choose storage class based on access pattern RAID 1/0 or RAID 5 RAID 1/0 or RAID 5 Rule Based Rule Based 1.Determines capacity bound stores 2.Estimates average number of IO ops per sec. IOPS IOPS

Capactiy Rules Calculated per GB of storage Calculated per GB of storage Capacity bound = RAID 5 Capacity bound = RAID 5

IOPS Estimation RAID level = least number of per-disk IOPS RAID level = least number of per-disk IOPS

Allocator reasonable set of arrays reasonable set of arrays 3 steps 3 steps Consider type and number of arrays Consider type and number of arrays Consider array configurations Consider array configurations Consider LUN divisions and RAID configurations Consider LUN divisions and RAID configurations

Allocator models Can only use analytic device models Can only use analytic device models Ignores stream phasing Ignores stream phasing Rillifier handles large resource demands Rillifier handles large resource demands Distribute workload among different LUNs Distribute workload among different LUNs Stores become shards Stores become shards Excessive capacity requirements Excessive capacity requirements Streams become rills Streams become rills Excessive throughput requirements Excessive throughput requirements

Allocator Search Uses Branch-and-Bound strategy Uses Branch-and-Bound strategy Determines number of array types Determines number of array types Chooses lowest cost that supports workload Chooses lowest cost that supports workload Searches array configurations Searches array configurations Starts with mixed arrays Starts with mixed arrays Iteratively converts arrays to dedicated types Iteratively converts arrays to dedicated types Branch and Bound-bias dedicated Branch and Bound-bias dedicated Searches in reverse order starting with dedicated types Searches in reverse order starting with dedicated types Calls array designer with configuration Calls array designer with configuration If array designer fails, search continues If array designer fails, search continues

Array Designer Determines LUN sizes and array parameters Determines LUN sizes and array parameters Starts with simple cases of equal size LUNs Starts with simple cases of equal size LUNs Also considers greedy configuration Also considers greedy configuration Workload description determines LUN size Workload description determines LUN size Relies on Optimizer to take care of unused capacity Relies on Optimizer to take care of unused capacity Target disk assignment done with round robin across buses Target disk assignment done with round robin across buses

Solver Assigns stores to LUNs Assigns stores to LUNs Multidimensional constrained bin-packing Multidimensional constrained bin-packing Uses analytic device models to evaluate objective function Uses analytic device models to evaluate objective function Constraints: Constraints: LUN capacity LUN capacity LUN phased utilization LUN phased utilization Array bus bandwidth Array bus bandwidth Array controller utilization Array controller utilization

Solver Heuristics Simple Random Simple Random 50 random cases using first fit 50 random cases using first fit Toyoda Toyoda Best fit using gradient function Best fit using gradient function Objective function combined with economic utilization Objective function combined with economic utilization (1/penalty – lun_cost) (1/penalty – lun_cost) Favors LUNS already in use or low cost Favors LUNS already in use or low cost LUNs filled in order of increasing cost LUNs filled in order of increasing cost Minimizes resource contention Minimizes resource contention

Solver Heuristics 2 ToyodaWeighted ToyodaWeighted Maps gradients against remaining available resources Maps gradients against remaining available resources Maps stores to LUNs such that utilization is balanced Maps stores to LUNs such that utilization is balanced Objective_function * cos(α) Objective_function * cos(α) Objective_function = max_lun_cost – lun_cost Objective_function = max_lun_cost – lun_cost Minimizes cost Minimizes cost

Toyoda and ToyodaWeighted

Optimizer Reruns Solver against configuration Reruns Solver against configuration Reduces required arrays Reduces required arrays Runs ToyodaWeighted with new objective function Runs ToyodaWeighted with new objective function Objective_value = 1 – lun_utilization Objective_value = 1 – lun_utilization Assigns stores to underutilized LUNs Assigns stores to underutilized LUNs Variations Variations Simple Random Simple Random Randomized first fit, chooses lowest utilization variance Randomized first fit, chooses lowest utilization variance Simple Balanced Simple Balanced Round robin first fit, based on capacity and utilization constraints Round robin first fit, based on capacity and utilization constraints

Clusterer Addresses performance scaling issues Addresses performance scaling issues With many stores runtime grew to days With many stores runtime grew to days Combines multiple stores into a cluster Combines multiple stores into a cluster Cluster is mapped instead of stores Cluster is mapped instead of stores Cluster rules based on observation Cluster rules based on observation 10MB/s bandwidth 10MB/s bandwidth 2GB size 2GB size Increases cost ~3% Increases cost ~3%

Evaluation Analytic model performance predictions Analytic model performance predictions Evaluate sensitivity to workload changes Evaluate sensitivity to workload changes Effect of design changes Effect of design changes Measure live system Measure live system

Model Validation Based on single FC-30 Based on single FC-30 Ran performance tests on physical system Ran performance tests on physical system Compared results to model predictions Compared results to model predictions Results showed mean error rate of +5.4% Results showed mean error rate of +5.4% Range of [-11%, +19%] Range of [-11%, +19%]

Safety and Sensitivity Examined scaling of workload parameters Examined scaling of workload parameters Start with baseline workload, then modify a single parameter Start with baseline workload, then modify a single parameter Wanted to have 3 effects Wanted to have 3 effects Mixing of appropriate RAID levels Mixing of appropriate RAID levels Requiring non-trivial number of arrays (2+) Requiring non-trivial number of arrays (2+) Balanced store performance requirements Balanced store performance requirements

Scaling Store Size and Bandwidth Store size scaling Store size scaling System becomes capacity bound System becomes capacity bound Creates RAID 5 LUNs Creates RAID 5 LUNs System size scales linearly with store size System size scales linearly with store size Bandwidth scaling Bandwidth scaling Ratio of RAID 1/0 to RAID 5 increases linearly Ratio of RAID 1/0 to RAID 5 increases linearly

Scaling Number of Stores Number of arrays scales linearly with stores Number of arrays scales linearly with stores

Running time Quadratic increase with number of stores Quadratic increase with number of stores

Workload Variability Workload attributes randomly taken from log-normal distribution Workload attributes randomly taken from log-normal distribution Baseline values = mean distribution values Baseline values = mean distribution values Capacity utilization drops with increased variability Capacity utilization drops with increased variability RAID 5 LUNs increase RAID 5 LUNs increase Segmentation increases Segmentation increases

Workload variance

Whole System Validation MINERVA vs. Human Expert MINERVA vs. Human Expert 3 aspects 3 aspects Comparison of resultant system cost Comparison of resultant system cost Comparison of application performance Comparison of application performance Low runtime and minimal human interaction Low runtime and minimal human interaction Based on TPC-D benchmark Based on TPC-D benchmark Decision Support system based on DB queries Decision Support system based on DB queries Human designers from HP system benchmarking team Human designers from HP system benchmarking team

Execution Times