Download presentation
Presentation is loading. Please wait.
Published byCornelius Tucker Modified over 6 years ago
1
Curator: Self-Managing Storage for Enterprise Clusters
I. Cano*, S. Aiyar, V. Arora, M. Bhattacharyya, A. Chaganti, C. Cheah, B. Chun, K. Gupta, V. Khot and A. Krishnamurthy* Nutanix Inc. *University of Washington NSDI ’17
2
Enterprises have been increasingly relying on cluster storage systems
3
Enterprises have been increasingly relying on cluster storage systems
Transparent access to storage Scale up storage by buying more resources
4
Cluster storage systems embody significant functionality to support the needs of enterprise clusters
Automatic replication and recovery Seamless integration of SSDs and HDDs Snapshotting and reclamation of unnecessary data Space-saving transformations …
5
Much of their functionality can be performed in the background to keep the I/O path as simple as possible What is the appropriate systems support for engineering these background tasks?
6
Curator Framework and systems support for building background tasks
Extensible, flexible and scalable framework Borrow ideas from big data analytics but use them inside of a storage system Synchronization between background tasks and foreground I/O Co-design background and foreground components Heterogeneity across and within clusters over time Use Reinforcement Learning to deal with heterogeneity
7
Curator Framework and systems support for building background tasks
Extensible, flexible and scalable framework Borrow ideas from big data analytics but use them inside of a storage system Synchronization between background tasks and foreground I/O Co-design background and foreground components Heterogeneity across and within clusters over time Use Reinforcement Learning to deal with heterogeneity
8
Distributed Key-Value Store
Metadata for the entire storage system stored in k-v store Foreground I/O and background tasks coordinate using the k-v store Key-value store supports replication and consistency using Paxos KEY-VALUE STORE I/O SERVICE (foreground) STORAGE MGMT (background)
9
Data Structures and Metadata Maps
Data stored in units called extents Extents are grouped together and stored as extent groups on physical devices file 33 44 56 0-8K 8-16K 16-24K 24-32K 32-40K Extent Id Map eid egid 33 1984 44 56 egid physical location 1984 disk1,disk2 … Extent Group Id Map Multiple levels of redirection simplifies data sharing across files and helps with minimizing map updates
10
MapReduce Framework Globally distributed maps processed using MapReduce System-wide knowledge of metadata used to perform various self-managing tasks KEY-VALUE STORE STORAGE MGMT uses MapReduce I/O SERVICE
11
Example: Tiering Move cold data from fast (SSD) to slow storage (HDD, Cloud) Identify cold data using a MapReduce job Modified Time (mtime): Extent Group Id map Access Time (atime): Extent Group Id Access Map
12
Example: Tiering egid 120 egid 120 == “cold” ?
Metadata Ring egid 120 mtime owned by Node A atime owned by Node D egid 120 == “cold” ? Maps globally distributed not a local decision Use MapReduce to perform a “join” Node A Curator (A) Slave Node D Node B Curator (D) Slave Curator (B) Master Curator (C) Slave Node C
13
Example: Tiering Map phase Reduce phase Scan both metadata maps
Emit egid -> mtime or atime Partition using egid Reduce phase Reduce based on egid Generate tuples (egid,mtime,atime) Sort locally and identify the cold egroups Curator (A) 120 -> mtime Curator (D) Curator (B) 120 -> atime Curator (C) (120,mtime,atime)
14
Other tasks implemented using Curator
Disk Failure Disk Balancing Fault Tolerance Garbage Collection Data Removal Compression Erasure Coding Deduplication Snapshot Tree Reduction
15
Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time
16
Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time
17
Co-Design Background Tasks and Foreground I/O
I/O Service provides an extended low-level API Storage Mgmt. only gives hints to I/O Service to act on the data Background tasks are batched and throttled Soft updates on metadata to achieve lock-free execution KEY-VALUE STORE STORAGE MGMT I/O SERVICE Tasks
18
Evaluation Clusters in-the-wild Key results
~ 50 clusters over a period of 2.5 months Key results Recovery 95% of clusters have at most 0.1% of under replicated data Garbage 90% of clusters have at most 2% of garbage More results in paper
19
50% of clusters have 75% usage
Tiering Goal: maximize SSD effectiveness for both reads and writes Curator uses a threshold to achieve some “desired” SSD occupancy 50% of clusters have 75% usage
20
Challenges Extensible, flexible and scalable framework
Synchronization between background tasks and foreground I/O Heterogeneity across and within clusters over time
21
Machine Learning - Reinforcement Learning
Heterogeneity across cluster deployments Resources Workloads Dynamic changes within individual clusters Changing workloads Different loads over time Threshold-based heuristics sub-optimal ML to the rescue! Reinforcement Learning REINFORCEMENT LEARNING
22
Reinforcement Learning 101
AGENT ENVIRONMENT State st Reward rt st+1 rt+1 Action at
23
Agent Deployment How the agent is deployed?
No prior knowledge and needs to learn from scratch Some prior knowledge and then keeps on learning while deployed
24
Use Case: Tiering State = (cpu, memory, ssd, iops, riops, wiops)
Actions = { run, not run } Reward = - latency Leverage threshold-based heuristics to “bootstrap” our agent Build a dataset from real traces ~ 40 clusters in-the-wild ~ 32K examples State-Action-Reward tuples Pre-train two models, one for each action
25
Evaluation METRIC AVG IMPROVEMENT LATENCY 12 % SSD READS 16 %
26
Conclusions Curator: framework and systems support for building background tasks Borrowed ideas from big data analytics but used them inside of a storage system Distributed key-value store for metadata MapReduce framework to process metadata Co-designed background tasks and foreground I/O I/O Service provides an extended low-level API Storage Mgmt. only gives hints to I/O Service to act on actual data Used Reinforcement Learning to deal with heterogeneity across and within clusters over time Results on Tiering showed up to ~20% latency improvements
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.