Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
1 ZonicBook/618EZ-Analyst Resonance Testing & Data Recording.
1 Inducements–Call Blocking. Aware of the Service?
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advanced Piloting Cruise Plot.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
HyLog: A High Performance Approach to Managing Disk Layout Wenguang Wang Yanping Zhao Rick Bunt Department of Computer Science University of Saskatchewan.
13 Copyright © 2005, Oracle. All rights reserved. Monitoring and Improving Performance.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
2 |SharePoint Saturday New York City
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
1 Using one or more of your senses to gather information.
Subtraction: Adding UP
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
A SMALL TRUTH TO MAKE LIFE 100%
PSSA Preparation.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Essential Cell Biology
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
New Opportunities for Load Balancing in Network-Wide Intrusion Detection Systems Victor Heorhiadi, Michael K. Reiter, Vyas Sekar UNC Chapel Hill UNC Chapel.
Presentation transcript:

Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British Columbia

2 Distributed File Systems Different workloads: Read/write only, high data similarity Different optimizations: Temp local storage, deduplication, replication One size fits all: each choice may be the optimal for a specific workload, not for all

3 Configurable Optimizations MosaStore 1 and UrsaMinor 2 propose file systems with configurable optimizations User must choose the optimizations 1 The Case for a Versatile Storage System, S. Al-Kiswany, A. Gharaibeh, M. Ripeanu, SOSP Workshop on Hot Topics in Storage and File Systems (HotStorage09) 2 Ursa Minor: versatile cluster-based storage. M. Abd-El-Malek, W. V. Courtright et al. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies - FAST05

4 Tuning File System Indentify the system parameters Define a target metric do Define a target value Configure the parameters Measure and analyze the performance while not satisfied

5 Tuning is Hard Defining metrics and target values can be complex Lack of knowledge of distributed systems, application or applications workloads Workload or infrastructure can change Tuning is time-consuming

6 Deduplication: Detecting Similarity Only the first block is different File A X Y Z Blocks Hashing AAAA BBBB CCCC File B W Y Z Hashing Blocks DDDD BBBB CCCC

7 Deduplication for Checkpointing? Checkpointing applications write multiple snapshots Successive snapshots may have high data similarity Similarity depends on number of factors, e.g.: –Process or application level –Frequency of checkpointing

8 How can we configure the file system parameters (optimizations) with minimal human intervention?

9 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Summary and Future Work

10 Requirements Be easy to configure Minimal human intervention Be able to choose a satisfactory performance Performance close to administrators intention Have a reasonable automated configuration cost Overhead small enough to make sense to use

11 Loop for Automated Configuration

12 Controller Utility function captures the metrics utility It is simple for one target metric, e.g. time It reduces several target metrics to just one dimension Predictor estimates how a change affects the target metrics

13 Controller decides the configuration by comparing the utility of current and predicted metrics

14 Agenda Motivation: Configurable file systems Architecture to automatically configure a FS First Case: Checkpointing applications Implementation Evaluation Conclusions Future Work

15 Data Deduplication Can save storage space and network bandwidth Has high computational cost to hash data Mechanism to choose among two options: data deduplication on or off

16 Control Loop for Deduplication Metrics: time spent and storage space Keep history for writes: –total time –number of blocks received –number of blocks similar

17 Utility Function Administrator gives weights to capture the relative importance, e.g.: –1 x time + 0 x storage –0.5 x time x storage

18 Predictor Space Time No deduplication Deduplication number of blocks I/O operations consider similarity + time for hashing data

19 Evaluation Three aspects: –Effort to configure –Performance –Overhead Experimental setup 10 storage nodes, 1Gbps NICs

20 Evaluation: Workload Synthetic Similarity varied For each similarity level, write 100 snapshots Similar results for several snapshot sizes 32, 64, 128, 256, 512 MB Plots for 256MB

21 Effort to Configure Small effort Administrator specifies the weights for each metric No effort in the default case System optimizes for time

22 Optimizing for Time

23 Optimizing for Time Hashing cost paid off by savings with I/O operations

24 Optimizing for Time

25 Overhead Memory less than 1KB Computational Low similarity - within 5% in evaluated cases High similarity – negligible

26 Summary Initial study on automatically configuring a file system Data deduplication configured properly with low overheads

27 Future Work More parameters for similarity detection variable block boundary, block sizes, offload to GPU Constraints for utility functions e.g., best time for a maximum storage space More optimizations and metrics replication, buffer size, caching policies reliability, energy

28

29 Mixing time and storage space

30

31 MosaStore Architecture Metadata Manager Benefactors (Storage nodes) Client (FS interface)...

32 Prototype in MosaStore Deduplication can be turned on and off on the fly Write flow collects the measurements Monitor and Controller are co-located with the client

33 Utility Utility is a measure of the relative satisfaction: How happy the administrator is Money is a good proxy, but complicated Focus on simple cases - 100% time + 0% space - 50% time + 50% space -Constraint on space, optimize for time Function cannot use different units: normalize

34 MosaStore Architecture Storage space aggregated from nodes in a network Naming scheme: BYHASH, BYSEQ File creation/write Collects metric It has the option to activate similarity detection