Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

Similar presentations


Presentation on theme: "Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British."— Presentation transcript:

1 Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, matei}@ece.ubc.ca NetSysLab University of British Columbia

2 2 Distributed File Systems Different workloads: Read/write only, high data similarity Different optimizations: Temp local storage, deduplication, replication One size fits all: each choice may be the optimal for a specific workload, not for all

3 3 Configurable Optimizations MosaStore 1 and UrsaMinor 2 propose file systems with configurable optimizations User must choose the optimizations 1 The Case for a Versatile Storage System, S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, SOSP Workshop on Hot Topics in Storage and File Systems (HotStorage09) 2 Ursa Minor: versatile cluster-based storage. M. Abd-El-Malek, W. V. Courtright et al. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - FAST05

4 4 Tuning File System Indentify the system parameters Define a target metric do Define a target value Configure the parameters Measure and analyze the performance while not satisfied

5 5 Tuning is Hard Defining metrics and target values can be complex Lack of knowledge of distributed systems, application or applications workloads Workload or infrastructure can change Tuning is time-consuming

6 6 Deduplication: Detecting Similarity Only the first block is different File A X Y Z Blocks Hashing AAAA BBBB CCCC File B W Y Z Hashing Blocks DDDD BBBB CCCC

7 7 Deduplication for Checkpointing? Checkpointing applications write multiple snapshots Successive snapshots may have high data similarity Similarity depends on number of factors, e.g.: –Process or application level –Frequency of checkpointing

8 8 How can we configure the file system parameters (optimizations) with minimal human intervention?

9 9 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Summary and Future Work

10 10 Requirements Be easy to configure Minimal human intervention Be able to choose a satisfactory performance Performance close to administrators intention Have a reasonable automated configuration cost Overhead small enough to make sense to use

11 11 Loop for Automated Configuration

12 12 Controller Utility function captures the metrics utility It is simple for one target metric, e.g. time It reduces several target metrics to just one dimension Predictor estimates how a change affects the target metrics

13 13 Controller decides the configuration by comparing the utility of current and predicted metrics

14 14 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Conclusions Future Work

15 15 Data Deduplication Can save storage space and network bandwidth Has high computational cost to hash data Mechanism to choose among two options: data deduplication on or off

16 16 Control Loop for Deduplication Metrics: time spent and storage space Keep history for writes: –total time –number of blocks received –number of blocks similar

17 17 Utility Function Administrator gives weights to capture the relative importance, e.g.: –1 x time + 0 x storage –0.5 x time + 0.5 x storage

18 18 Predictor Space Time No deduplication Deduplication number of blocks I/O operations consider similarity + time for hashing data

19 19 Evaluation Three aspects: –Effort to configure –Performance –Overhead Experimental setup 10 storage nodes, 1Gbps NICs

20 20 Evaluation: Workload Synthetic Similarity varied For each similarity level, write 100 snapshots Similar results for several snapshot sizes 32, 64, 128, 256, 512 MB Plots for 256MB

21 21 Effort to Configure Small effort Administrator specifies the weights for each metric No effort in the default case System optimizes for time

22 22 Optimizing for Time

23 23 Optimizing for Time Hashing cost paid off by savings with I/O operations

24 24 Optimizing for Time

25 25 Overhead Memory less than 1KB Computational Low similarity - within 5% in evaluated cases High similarity – negligible

26 26 Summary Initial study on automatically configuring a file system Data deduplication configured properly with low overheads

27 27 Future Work More parameters for similarity detection variable block boundary, block sizes, offload to GPU Constraints for utility functions e.g., best time for a maximum storage space More optimizations and metrics replication, buffer size, caching policies reliability, energy

28 28

29 29 Mixing time and storage space

30 30

31 31 MosaStore Architecture Metadata Manager Benefactors (Storage nodes) Client (FS interface)...

32 32 Prototype in MosaStore Deduplication can be turned on and off on the fly Write flow collects the measurements Monitor and Controller are co-located with the client

33 33 Utility Utility is a measure of the relative satisfaction: How happy the administrator is Money is a good proxy, but complicated Focus on simple cases - 100% time + 0% space - 50% time + 50% space -Constraint on space, optimize for time Function cannot use different units: normalize

34 34 MosaStore Architecture Storage space aggregated from nodes in a network Naming scheme: BYHASH, BYSEQ File creation/write Collects metric It has the option to activate similarity detection


Download ppt "Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British."

Similar presentations


Ads by Google