Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical.

Similar presentations


Presentation on theme: "Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical."— Presentation transcript:

1 Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical and Computer Engineering, University of Florida 1

2 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 2

3 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 3

4 Introduction - Popularity of highly dynamic workloads Many web-based services (especially Web 2.0) often experience rapid load surges and drops. One Facebook application saw an increase from 25,000 to 250,000 users in 3 days, with up to 20,000 new users signing up per hour during peak times. Elastic services offered by cloud computing becomes one solution Grow/shrink service capacity dynamically as the load changes. 4

5 Introduction - Elasticity in cloud computing Elasticity is one of cloud computing’s greatest features – Systems acquire and release resources in response to users’ dynamic workloads; users only pay for what they need. SLAs Web Services Virtualization Picture provided by Dr. Andy Li from UF 5

6 Introduction - Topic of this paper This paper addresses the challenges associated with controlling the elastic storage in a data- intensive service, in cloud computing environment. Intuitively, it does: If performance can not meet the Service Level Objective (SLO) → grow storage capacity If performance meets SLO, and system utilization is low → shrink storage capacity 6

7 Introduction - Topic of this paper In this paper, Hadoop Distributed File System (HDFS) is employed as the storage system. When the controller increases the storage size: Create new storage instances Move storage data to the new instances (data rebalancing) When the controller reduces the storage size: Remove a certain number of storage instances Some storage data on existing nodes get replicated because the replica number is lower than the replica degree N. This is automatically done by DHFS. 7

8 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 8

9 System overview What is the big picture Controller Cloud Provider (Amazon EC2) Web Tier (Apache server) Application Tier (Facebook core) Storage Tier (Hadoop DFS) Elastic Service Clients Sensor Actuator Gather measurements Manage instances Sensors higher system load Create more storage instances, and rebalance data Suppose we are hosting the Facebook server on amazon EC2 instances, with the proposed control techniques. Sensors lower system load Remove some storage nodes 9

10 System overview Challenges in elastic storage control Controlling elastic storage involves many challenges: Data Rebalancing. The newly added storage nodes will not be effective until data rebalancing is done. Interference to Guest Service. Data rebalancing also consumes the system resources. Actuator Delay. The controller must consider the delay of the control operations, otherwise it may response too late or become unstable. 10

11 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 11

12 System architecture The controller is composed by: Horizontal Scale Controller (HSC) - responsible for growing and shrinking the number of storage nodes. Data Rebalance Controller (DRC) - controlling the data transfers to rebalance the storage tier after it grows or shrinks. State machine - coordinating the actions of the HSC and the DRC. 12

13 System architecture - Horizontal Scale Controller (HSC) Actuator: The HSC uses cloud APIs to change the number of active server instances. Sensor: The paper uses CPU utilization on the storage nodes as the sensor feedback metric It is easy to measure, and strongly correlated to overall response time of the Cloudstone benchmark when the bottleneck is on the storage tier. 13

14 Modeling methodology - System model without controller The system without a controller can be described as this graph: U(z): Input to the system, the number of storage instances. D(z): The effect of client workload variance on the value of storage instance number. V(z): The effective number of storage instances Y(z): The Output of the system, the CPU utilization on storage nodes. G(z): The transfer function of the storage system. G(z) U(z) Y(z) + + V(z) D(z) 14

15 Modeling methodology - Controller - Integral control G(z) R(z) K(z) + - E(z) U(z) Y(z) + + V(z) D(z) 15

16 Modeling methodology - Controller - discrete control functions 16

17 Modeling methodology - Proportional thresholding 17

18 System architecture - Data Rebalance Controller (DRC) The DRC rebalances the layout of data in the system after the number of storage nodes grows or shrinks. Rebalancing is a cause of actuator delay and interference. Tuning knob of HDFS rebalancer: Bandwidth b allocated to the rebalancer. Select b to control the tradeoff between lag and interference. Big b - fast rebalance, serious impacts on normal service. Small b - slow rebalance, not very disruptive to normal service. 18

19 Modeling Methodology - Modeling the impacts of b 19

20 Modeling Methodology - Balancing between lag and interference 20

21 System architecture - State machine Recall that: Horizontal Scale Controller (HSC) is used to increase/shrink the number of storage nodes Data Rebalance Controller (DRC) is used to rebalance the storage after the changes in storage node size They have mutual dependencies: After HSC adds a new storage node, the system cannot obtain full service until DRC completes rebalancing. When one component is taking actions, the noise will be introduced to the sensor measurements of the other one. To preserve stability during adjustments, a state machine is employed to coordinate HSC and DRC to manage their mutual dependencies. 21

22 System architecture - State machine The following diagram shows the internal state machine of the elasticity controller in the storage tier. Horizontal Scale State Rebalance state Init Storage tier configuration changed? No Storage tier configuration changed? Yes Rebalancing done? Yes Rebalancing done? No Elasticity Controller Storage Tier 22

23 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 23

24 Evaluation - Experimental Testbed The paper employs CloudStone to run with GlassFish as the front-end application server tier. CloudStone: a flexible Web 2.0 benchmark generator GlassFish: an open source application server project HDFS is used for the storage HDFS is modified to expose the rebalancer’s bandwidth throttle b as an actuator to the external controller. The paper implements a local ORCA cluster as the cloud infrastructure provider ORCA: A resource control framework that provides a resource leasing service; guests can lease resources from a substrate resource provider, such as a cloud provider 24

25 Evaluation - Experimental Testbed The experimental service cluster: A group of servers running on a local network. To fully explore the effects of the storage tier: Other tiers are statically over-provisioned. The storage tier nodes: Dynamically allocated virtual machine instances They all have fixed resource configurations:  30 MB disk space; 512 MB RAM; single disk arm; 2.8 GHz CPU. HDFS is preloaded with at least 36 GB data. 25

26 Evaluation - Controller Effectiveness a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Dynamic provisioning is able to adapt to the load burst. 2. Instance creation and data rebalancing has cost and delay on effect. 26

27 Evaluation - Controller Effectiveness a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Dynamic provisioning is alert enough to adapt to the small load increase. 2. The cost and delay of node creation/rebalancing are smaller than the prev. 27

28 Evaluation - Resource Efficiency a1. CPU utilization - static b1. Response time - static a2. CPU utilization - dynamic b2. Response time - dynamic Target response time: 3 seconds. Target CPU utilization: 20%. See from the figures: 1. Shrinking the storage size has much lower cost/ delay than increasing it. 2. During resizing process, There are almost no SLO violations. 28

29 Evaluation - Comparison of Rebalance Policies 29

30 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 30

31 Contribution and related work This paper is the first to address the problem of automated control for elastic storage in cloud computing. SCADS is a related work dealing with dynamically scaling a storage system. It uses machine learning to predict resource requirements. Padala et al. proposed a decoupled architecture (between guest and cloud provider) for cloud computing. They did not consider the actuator constraints. Aqueduct uses a feedback controller to throttle the rebalancing bandwidth usage to ensure the SLOs will not be violated. The rebalancing may be able to use very little bandwidth. 31

32 Outline Introduction System overview System architecture and modeling methodologies Evaluation Contribution and related work Discussions and future work 32

33 Discussions and future work H(z) W(z) G(z) R(z) K(z) + - E(z) U(z) Y(z) + + V(z) D(z) 33

34 Discussions and future work 34

35 Discussions and future work Make the resource configuration of newly created storage instances tunable. Resizing storage size by adding/removing storage instances with flexible resource configuration. Optimizing the system by exploring the capacity and efficiency of individual storage instances, rather than storage instance amount. This requires investigating the performance of storage nodes under different setups: disk size, CPU frequency, RAM size, etc. 35

36 THANK YOU! 36


Download ppt "Harold C. Lim, Shinath Baba and Jeffery S. Chase from Duke University AUTOMATED CONTROL FOR ELASTIC STORAGE Presented by: Yonggang Liu Department of Electrical."

Similar presentations


Ads by Google