Presentation is loading. Please wait.

Presentation is loading. Please wait.

Curator: Self-Managing Storage for Enterprise Clusters

Similar presentations


Presentation on theme: "Curator: Self-Managing Storage for Enterprise Clusters"— Presentation transcript:

1 Curator: Self-Managing Storage for Enterprise Clusters
I. Cano*, S. Aiyar, V. Arora, M. Bhattacharyya, A. Chaganti, C. Cheah, B. Chun, K. Gupta, V. Khot and A. Krishnamurthy* Nutanix Inc. *University of Washington NSDI ’17

2 Enterprises have been increasingly relying on cluster storage systems

3 Enterprises have been increasingly relying on cluster storage systems
Transparent access to storage Scale up storage by buying more resources

4 Cluster storage systems embody significant functionality to support the needs of enterprise clusters
Automatic replication and recovery Seamless integration of SSDs and HDDs Snapshotting and reclamation of unnecessary data Space-saving transformations

5 Much of their functionality can be performed in the background to keep the I/O path as simple as possible What is the appropriate systems support for engineering these background tasks?

6 Curator Framework and systems support for building background tasks
Extensible, flexible and scalable framework Borrow ideas from big data analytics but use them inside of a storage system Synchronization between background tasks and foreground I/O Co-design background and foreground components Heterogeneity across and within clusters over time Use Reinforcement Learning to deal with heterogeneity

7 Curator Framework and systems support for building background tasks
Extensible, flexible and scalable framework Borrow ideas from big data analytics but use them inside of a storage system Synchronization between background tasks and foreground I/O Co-design background and foreground components Heterogeneity across and within clusters over time Use Reinforcement Learning to deal with heterogeneity

8 Distributed Key-Value Store
Metadata for the entire storage system stored in k-v store Foreground I/O and background tasks coordinate using the k-v store Key-value store supports replication and consistency using Paxos KEY-VALUE STORE I/O SERVICE (foreground) STORAGE MGMT (background)

9 Data Structures and Metadata Maps
Data stored in units called extents Extents are grouped together and stored as extent groups on physical devices file 33 44 56 0-8K 8-16K 16-24K 24-32K 32-40K Extent Id Map eid egid 33 1984 44 56 egid physical location 1984 disk1,disk2 Extent Group Id Map Multiple levels of redirection simplifies data sharing across files and helps with minimizing map updates

10 MapReduce Framework Globally distributed maps processed using MapReduce System-wide knowledge of metadata used to perform various self-managing tasks KEY-VALUE STORE STORAGE MGMT uses MapReduce I/O SERVICE

11 Example: Tiering Move cold data from fast (SSD) to slow storage (HDD, Cloud) Identify cold data using a MapReduce job Modified Time (mtime): Extent Group Id map Access Time (atime): Extent Group Id Access Map

12 Example: Tiering egid 120 egid 120 == “cold” ?
Metadata Ring egid 120 mtime owned by Node A atime owned by Node D egid 120 == “cold” ? Maps globally distributed  not a local decision Use MapReduce to perform a “join” Node A Curator (A) Slave Node D Node B Curator (D) Slave Curator (B) Master Curator (C) Slave Node C

13 Example: Tiering Map phase Reduce phase Scan both metadata maps
Emit egid -> mtime or atime Partition using egid Reduce phase Reduce based on egid Generate tuples (egid,mtime,atime) Sort locally and identify the cold egroups Curator (A) 120 -> mtime Curator (D) Curator (B) 120 -> atime Curator (C) (120,mtime,atime)

14 Other tasks implemented using Curator
Disk Failure Disk Balancing Fault Tolerance Garbage Collection Data Removal Compression Erasure Coding Deduplication Snapshot Tree Reduction

15 Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time

16 Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time

17 Co-Design Background Tasks and Foreground I/O
I/O Service provides an extended low-level API Storage Mgmt. only gives hints to I/O Service to act on the data Background tasks are batched and throttled Soft updates on metadata to achieve lock-free execution KEY-VALUE STORE STORAGE MGMT I/O SERVICE Tasks

18 Evaluation Clusters in-the-wild Key results
~ 50 clusters over a period of 2.5 months Key results Recovery 95% of clusters have at most 0.1% of under replicated data Garbage 90% of clusters have at most 2% of garbage More results in paper

19 50% of clusters have 75% usage
Tiering Goal: maximize SSD effectiveness for both reads and writes Curator uses a threshold to achieve some “desired” SSD occupancy 50% of clusters have 75% usage

20 Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time

21 Machine Learning - Reinforcement Learning
Heterogeneity across cluster deployments Resources Workloads Dynamic changes within individual clusters Changing workloads Different loads over time Threshold-based heuristics sub-optimal ML to the rescue! Reinforcement Learning REINFORCEMENT LEARNING

22 Reinforcement Learning 101
AGENT ENVIRONMENT State st Reward rt st+1 rt+1 Action at

23 Agent Deployment How the agent is deployed?
No prior knowledge and needs to learn from scratch Some prior knowledge and then keeps on learning while deployed

24 Use Case: Tiering State = (cpu, memory, ssd, iops, riops, wiops)
Actions = { run, not run } Reward = - latency Leverage threshold-based heuristics to “bootstrap” our agent Build a dataset from real traces ~ 40 clusters in-the-wild ~ 32K examples State-Action-Reward tuples Pre-train two models, one for each action

25 Evaluation METRIC AVG IMPROVEMENT LATENCY 12 % SSD READS 16 %

26 Conclusions Curator: framework and systems support for building background tasks Borrowed ideas from big data analytics but used them inside of a storage system Distributed key-value store for metadata MapReduce framework to process metadata Co-designed background tasks and foreground I/O I/O Service provides an extended low-level API Storage Mgmt. only gives hints to I/O Service to act on actual data Used Reinforcement Learning to deal with heterogeneity across and within clusters over time Results on Tiering showed up to ~20% latency improvements


Download ppt "Curator: Self-Managing Storage for Enterprise Clusters"

Similar presentations


Ads by Google