Taha Rafiq MMath Thesis Presentation 24/04/2013

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Symantec 2010 Windows 7 Migration Global Results.
1 A B C
Variations of the Turing Machine
Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
Sequential Logic Design
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Create an Application Title 1Y - Youth Chapter 5.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
1 00/XXXX © Crown copyright Carol Roadnight, Peter Clark Met Office, JCMM Halliwell Representing convection in convective scale NWP models : An idealised.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Media-Monitoring Final Report April - May 2010 News.
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
Briana B. Morrison Adapted from William Collins
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Regression with Panel Data
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
When you see… Find the zeros You think….
Splines IV – B-spline Curves
Before Between After.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
CSE20 Lecture 15 Karnaugh Maps Professor CK Cheng CSE Dept. UC San Diego 1.
Clock will move after 1 minute
famous photographer Ara Guler famous photographer ARA GULER.
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Presentation transcript:

Taha Rafiq MMath Thesis Presentation 24/04/2013 Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems Taha Rafiq MMath Thesis Presentation 24/04/2013

Outline Introduction & Motivation VoltDB & Elastic Scale-Out Mechanism Partition Placement Problem Workload-Aware Optimizer Experiments & Results Supporting Multi-Partition Transactions Conclusion

IntroDuction & Motivation

DBMS Scalability Replication Partitioning Replication: durability, fault tolerance, avalability, faster reads Problems: consistency (writes are complex) Partitioning: Horizontal or vertical partitioning, performance improvement, more data can be stored Problems: complicates application logic, multi-partition transactions costly Partitioning

Traditional (DBMS) Scalability Higher Load Add Resources Better Performance Expensive Downtime Resources are added to a system that’s experiencing load higher than it can handle to get better performance. However, traditional DBMS do not allow addition of resources on-the-fly, resulting in expensive downtime which can be costly for a lot of use cases (Amazon) Ability of a system to be enlarged to handle growing amount of work

Elastic (DBMS) Scalability Higher Load Dynamically Add Resources Better Performance No Downtime Elastic scalability alleviates this problem by allowing resources to be added while the system is live, without affecting performance significantly Use of computer resources which vary dynamically to meet a variable workload

Elastically Scaling a Partition Based DBMS Re-Partitioning Partition 1 Node 1 Node 1 Scale Out Partition 1 Re-partitioning is difficult in an elastic setting How to partition + do it while the system is live Partition 2 Node 2 Scale In

Elastically Scaling a Partition Based DBMS Partition Migration Node 1 P1 P2 P1 Node 1 P2 P3 P4 Scale Out Another more attractive option is to use partition migration A large number of small partitions are aggregated on a few nodes, and are migrated to new nodes as they are added Node 2 P3 P4 Scale In

Partition Migration for Elastic Scalability Mechanism How to add/remove nodes and move partitions Policy/Strategy Which partitions to move when and where during scale out/scale in Mechanism: for migrating partitions Minimize effect on transaction processing Maintain consistency Policy: To decide when partitions need to be move where

Elasca Elastic Scale-Out Mechanism Partition Placement & Migration Optimizer = + Elasca consists of An elastic scale-out mechanism built into VoltDB, a commercial partition-based DBMS A workload aware partition placement and migration optimizer

VoltDB & Elastic Scale-oUT Mechanism

What is VoltDB? In memory, partition based DBMS No disk access = very fast Shared nothing architecture, serial execution No locks Stored procedures No arbitrary transactions Replication Fault tolerance & durability

VoltDB Architecture P1 P2 ES1 ES2 Initiator P3 P1 ES1 ES2 Initiator P2 Client Interface P3 P1 ES1 ES2 Initiator Client Interface P2 P3 ES1 ES2 Initiator Client Interface Threads Execution sites – cores – threads Client Client Client Client

Single-Partition Transactions ES1 ES2 Initiator Client Interface P3 P1 ES1 ES2 Initiator Client Interface P2 P3 ES1 ES2 Initiator Client Interface Client Client Client Client

Multi-Partition Transactions ES1 ES2 Initiator Client Interface P3 P1 ES1 ES2 Initiator Client Interface P2 P3 ES1 ES2 Initiator Client Interface ES1 Client Client Client Client

Elastic Scale-Out Mechanism Scale-Out Node (Failed) ES4 Initiator Client Interface ES1 P1 P2 P1 P3 P4 P4 ES1 ES2 ES3 ES4 Initially the scale-out node is not part of the cluster, and is perceived as a failed node The scale-out node ‘rejoins’ the cluster and informs all the nodes about which partitions it needs to recover The node containing the partitions stream the partition data to the scale-out node After partition migration is complete, the source execution sites and partitions are shutdown Initiator Client Interface

Overcommitting Cores VoltDB suggests: Partitions per node < Cores per node Wasted resources when load is low or data access is skewed Idea Aggregate extra partitions on each node and scale out when load increases Execution sites can run on separate cores while leaving at least one core for host-level tasks

Partition Placement Problem

Cluster and System Specifications Given… Cluster and System Specifications Number of CPU cores Max. Number of Nodes Memory

Given…

Given…

Current Partition-to-Node Assignment Given… Current Partition-to-Node Assignment Partition Node 1 Node 2 Node 3 P1 P2 P3 P4 P5 P6 P7 P8

Optimal Partition-to-Node Assignment (For Next Time Interval) Find… Optimal Partition-to-Node Assignment (For Next Time Interval) Partition Node 1 Node 2 Node 3 P1 ? P2 P3 P4 P5 P6 P7 P8

Optimization Objectives Maximize Throughput Match the performance of a static, fully provisioned system Minimize Resources Used Use the minimum number of nodes required to meet performance demands Importance of different objectives

Optimization Objectives Minimize Data Movement Data movement adversely affects system performance and incurs network costs Balance Load Effectively Minimizes the risk of overloading a node during the next time interval

Workload-Aware Optimizer

System Overview

Statistics Collected β. CPU overhead of host-level tasks α. Maximum number of transactions that can be executed on a partition per second Max capacity of Execution Sites β. CPU overhead of host-level tasks How much CPU capacity the Initiator uses

Effect of β

Estimating CPU Load CPU Load Generated by Each Partition Average CPU Load of Host-Level Tasks Per Node Average CPU Load Per Node

Optimizer Details Mathematical Optimization vs. Heuristics Mixed-Integer Linear Programming (MILP) Can be solved using any general-purpose solver (we use IBM ILOG CPLEX) Applicable for wide variety of scenarios

Objective Function Minimizes data movement as primary objective and balances load as secondary objective Two-stage objective function

Effect of ε

Minimizing Resources Used Calculate the minimum number of nodes that can handle the load of all the partitions Non-integer assignment Explicitly tell optimizer how many nodes to use If optimizer can’t find solution with minimum nodes, it tries again with N + 1 nodes

Constraints Replication: Replicas of a given partition must be assigned to different nodes CPU Capacity: Sum of the load of partitions must be less than capacity of node Memory Capacity: All the partitions assigned to a node must fit in its memory Host-Level Tasks: The overhead of host-level tasks must not exceed capacity of single core

Staggering Scale In Fluctuating workload can result in excessive data movement Staggering scale in mitigates this problem Delay scaling in by s time steps Slightly higher resources used to provide stability

Experimental Evaluation

Optimizers Evaluated ELASCA: Our workload-aware optimizer ELASCA-S: ELASCA with staggered scale in OFFLINE: Offline optimizer that minimizes resources used and data movement GREEDY: A greedy first-fit optimizer SCO: Static, fully provisioned system (no optimization)

Benchmarks Used TPC-C: Modified to make it cleanly partitioned and fit in memory (3.6 GB) TATP: Telecommunication Application Transaction Processing Benchmark (250 MB) YCSB: Yahoo! Cloud Serving Benchmark with 50/50 read/write ratio (1 GB)

Dynamic Workloads Varying the aggregate request rate Periodic waveforms Sine, Triangle, Sawtooth Skewing the data access Temporal skew Statistical distributions Uniform, Normal, Categorical, Zipfian

Temporal Skew

Experimental Setup Each experiment run for 1 hour 15 time intervals Optimizer run every four minutes Combination of simulation and actual runs Exact numbers for data movement, resources used and load balance through simulation Cluster has 4 nodes, 2 separate client machines

Data Movement (TPC-C) Triangle Wave (f = 1) ELASCA – 63% ELASCA-S – 72% Why is Elasca better? (Load balance + data movement is primary objective)

Triangle Wave (f = 1), Zipfian Skew Data Movement (TPC-C) Triangle Wave (f = 1), Zipfian Skew P=64 ELASCA – 73% ELASCA-S – 79%

Data Movement (TPC-C) Triangle Wave (f = 4) ELASCA-S ------------ GREEDY 83% ELASCA 57%

Computing Resources Saved (TPC-C) Triangle Wave (f = 1) Upto 14% difference

Load Balance (TPC-C) Triangle Wave (f = 1) 4x higher variance

Database Throughput (TPC-C) Sine Wave (f = 2) ELASCA worst case – 6% GREEDY worst case – 14%

Database Throughput (TPC-C) Sine Wave (f = 2), Normal Skew

Database Throughput (TATP) Sine Wave (f = 2) Gap is small because of initiator bottleneck and less partitions

Database Throughput (YCSB) Sine Wave (f = 2)

Database Throughput (TPC-C) Triangle Wave (f = 4) GREEDY – worst case 21% ELASCA – worst case 15% ELASCA-S – worst case 6% OFFLINE – worst case 12%

Optimizer Scalability

Supporting Multi-Partition Transactions

Factors Affecting Performance Maximum MPT Throughput (η): The maximum number of transactions an execution site can coordinate per second Probability of MPTs (pmpt): Percentage of transactions that are MPTs Partitions Involved in MPTs: The number of partitions involved in MPTs

Changes to Model CPU load generated by each partition is equal to sum of: Load due to transaction work (same as SPTs) Load due to coordinating MPTs

Maximum MPT Throughput Simulation based

Probability of MPTs

Effect on Resources Saved Made changes to optimizer and ran the optimizer again to get these values

Effect on Data Movement

Conclusion

Related Work Data replication and partitioning Database consolidation Live database migration Key-value stores Data placement

Elasca Elastic Scale-Out Mechanism Partition Placement & Migration Optimizer = +

Conclusion Elasca = Mechanism + Optimizer Workload-Aware Optimizer Meets performance demands Minimizes computing resources used Minimizes data movement Effectively balances load Scalable to large problem sizes for online setting

Future Work Migrating to VoltDB 3.0 Intelligent client routing, master/slave partitions Supporting multi-partition transactions Automated parameter tuning Transaction mixes Workload prediction

Thank You Questions?