Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Pattern-Directed Replication Scheme for Heterogeneous Object-based Storage
Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen Data-Intensive Scalable Computing Lab, Texas Tech University CCGrid-2017 1/16

Outline Motivation Pattern-Directed Replication Scheme Evaluation
Architecture Overview I/O Trace Collection Data Access Pattern Analysis Pattern-Directed Replication Evaluation Conclusion & Future Work Here is the outline of the presentation. I will begin by discussing the Motivations, Pattern-directed replication scheme, Evaluation, then the conclusion and discuss future work. CCGrid-2017 2/16

Motivation Object-based storage model is an important storage architecture (e.g., Lustre, Ceph, Sheepdog) Objects are the logical entities Exploit the inherent advantages of storage devices Heterogeneous storage systems are promising Magnetic media (hard disk drives, HDDs) Non-volatile storage-class memory (SCM) High bandwidth, low latency, and mechanical-component- free Limitations of small capacity, short lifetime, and high cost The motivation of our work is to understand data access patterns for data replication in heterogeneous object-based storage systems. The object-based storage model, which abstracts files as multiple data objects stored in object-based devices, becomes increasingly important for distributed storage systems in data centers. It has been widely adopted in production systems like Lustre, Ceph, and Sheepdog. Recently, heterogeneous storage systems that take advantages of emerged new devices, such as non-volatile storage class memory (SCM) including solid state drives, phase change memory, in addition to conventional[kən'vɛnʃənl] hard disk drives (HDDs), have gained increasing attention. Compared with HDDS, SCM has the advantage of high bandwidth, low latency, and mechanical-component-free, but is limited to small capacity, short lifetime and high cost. CCGrid-2017 3/16

Motivation Replication is a common approach to enhance data availability in storage systems Automatically copies data and distributes them with multiple replicas The data can be retrieved from other replicas if one copy is not accessed or corrupted Limitation of current replication schemes for object- based storage There has been little work on data replication for heterogeneous object-based storage systems Can’t take full advantages of heterogeneous device merits Does not consider the object access patterns well Place hot, frequently accessed data on SSDs while the cold, sequentially accessed data on HDDs Do not consider access correlation of objects On the other hand, replication is a common approach to enhance data availability in storage systems. Its core concept is to automatically replicate data objects and distribute them into multiple devices. With replication, data can be retrieved from other replicas if one copy is not accessible or is corrupted. Although numerous studies have been devoted to the design and development of data replication schemes, there has been little work on data replications for heterogeneous object-based storage systems. As the performance and capacity of heterogeneous devices are different, it is desired to place replicas according to their characteristics. Existing replication strategies mainly use frequency or sequentiality to work with heterogeneous storage, in which they place replicas of hot, frequently accessed data on SSDs while the cold, sequentially accessed data on HDDs. However, this does not consider the object access patterns well. For instance, two strongly correlated objects that are normally accessed successively in term of time might be placed into HDD and SSD, respectively, which leads to poor performance when this same access pattern appears again in the future. CCGrid-2017 4/16

Pattern-Directed Replication Scheme (PRS)
Addressing two technical challenges Analyze object access pattern to guide data replication Optimize replica layout to take full advantage of heterogeneous device characteristics Approach Selectively replicate data objects Group objects based on their temporal locality or access frequency and reorganize them for replication Use a pseudo-random algorithm to optimize replica layout by considering the device bandwidth and capacity features This work introduces a novel pattern-directed replication scheme (PRS) to achieve highly efficient data replication in heterogeneous object-based storage systems. The promise of the PRS relies on addressing two technical challenges: analyzing object access pattern to guide data replication and optimizing replica layout to take full advantage of heterogeneous device characteristics. Different from traditional schemes, the PRS selectively replicates data objects and distributes replicas to different storage devices according to their features. It reorganizes objects with object distance calculation and arranges replicas according to application’s data access pattern identified. In addition, the PRS uses a pseudo-random algorithm to optimize replica layout by considering the storage device performance and capacity features. CCGrid-2017 5/16

Architecture Overview
Next, we introduce the detailed design of a pattern-directed replication scheme (PRS) for heterogeneous object-based storage systems. First, as shown in the figure, the application running on physical machines or virtual machines (VMs) send I/O requests to the backend storage system. The data objects are placed with an original data layout for its first replica, e.g., via the consistent hashing algorithm to distribute data. Second, the second and third replicas for an object will be created with PRS after pattern analysis. The Data access pattern analysis is based on the runtime object I/O requests that are traced by a trace collector in each storage node. PRS analyzes the trace to find data access patterns (can be performed periodically in real time or offline). It groups objects based on identified access patterns for making the second or more replications, and places them with a replica distribution algorithm in the background. Third, when the same application runs again, the I/O requests will be redirected to the new copy in an optimized layout for better performance if the object has been replicated with PRS. CCGrid-2017 6/16

I/O Trace Collection Collected aspects of object requests
Object ID Start offset in an object Size of access Access time Operation code An example of FIO trace on Sheepdog One node trace Time Opcode Object id offset size An I/O trace collector is responsible for collecting the runtime object information of object-based storage. We collect five aspects of object request action for further pattern analysis, as shown below. The right figure shows an example of FIO trace on Sheepdog, a distributed storage system for virtual machine storage. In a distributed storage system, each storage node generates one trace file that contains all its I/O requests in which the node receives or forwards. The trace file is used for data access pattern analysis and object reorganization for replication in the node. CCGrid-2017 7/16

Data Access Pattern Object division and placement
Tend to provide data replication for virtual machine storage The file is divided into fixed objects with 4MB size when storing The figure shows a typical data placement for virtual machine using object-based storage. In this figure, the 40GB image file is divided into fixed objects with 4MB size when storing. CCGrid-2017 8/16

Two Types of Object Correlation
Local data access pattern A temporal-based correlation Reflect the object access order in period of time Objects that are accessed together in a given interval and accessed periodically Global data access pattern Represent the objects that are most frequently accessed during the application run time Adopt an configuration-aware method to find hot objects A certain percentage of objects will be considered as hot data according to the proportion of HDD and SSD capacity To find the object access behavior, we define two types of data access patterns: a local data access pattern and a global data access pattern. The local data access pattern represents the temporal-based correlation, which reflects the object access order in a period of time. Objects that are accessed together in a given interval and accessed periodically belong to this pattern. The global data access pattern means the objects’ behavior is no longer limited to data access in a period of time. Instead, they represent the objects that are most frequently accessed during the application run time. We adopt an configuration-aware method to find the hot objects that belong to the global pattern, which means a certain percentage of objects will be considered as hot data according to the proportion of HDD and SSD space capacity. CCGrid-2017 9/16

Pattern-Directed Replication Scheme
Group objects that belong to local or global patterns With the object clustering algorithm The objects in the same group/pattern are grouped to a new object and replicated First replicate objects (hot data) with a global pattern Then replicate objects that belong to local patterns With the object clustering algorithm, the PRS groups objects based on the local pattern or global pattern. The objects in the same group are grouped to a new object and replicated. For the identified data access pattern, the PRS first replicates objects with a global pattern. That means a proportion of hot objects will first be grouped for replication. The objects that do not belong to any global pattern will be grouped for identified local patterns. CCGrid-2017 10/16

Optimized replica placement
Adopt a pseudo-random number algorithm to optimize replica placement First assign heterogeneous nodes to various segments with different ranges according to each node’s average bandwidth The data is then mapped to one node/range by pseudo-random hashing functions If the number of replicas is m, then m random numbers are generated for mapping m copies to different nodes/ranges PRS first assigns heterogeneous nodes to different segments in a number line. Each node is assigned to one or more segments considering its average bandwidth. After the segment assignment of nodes, the data is then mapped to one segment by pseudo random hash functions. We choose the pseudo random number mapping because it can distribute data proportional to the segment length. If the number of replicas is m, there are m random numbers generated for mapping m segments in the random number sequence. As a node may contain more data for higher performance, the replicas will be placed on other nodes if one or more storage nodes have no sufficient storage space for load balance. CCGrid-2017 11/16

Evaluation Evaluate PRS scheme based on Sheepdog
Use SIMD-oriented Fast Mersenne Twister (SFMT) for pseudo numbers 32-nodes cluster with Linux kernel 16 storage nodes with half HDD nodes and half SSD nodes 16 client nodes hosting VMs Benchmark Run 1 to 16 clients to launch FIO on different virtual machines Each job accessed an independent file with 100MB in an asynchronous way with the request size of 256KB we present the evaluation results of the proposed PRS based on the Sheepdog storage system. The experiments were conducted on a local 32-node cluster, including 16 storage nodes with half HDD nodes and half SSD nodes, and 16 client nodes hosting virtual machines. We use the SFMT algorithm to generate pseudo random numbers to optimize replication placement. And we use multiple clients running on different nodes to perform read operations and get the aggregate bandwidth. We run 1 to 16 clients to launch FIO on different virtual machines. Each FIO instance has 16 jobs, in which each job accessed an independent file with 100MB in an asynchronous way with the request size of 256KB . CCGrid-2017 12/16

Evaluation (Cont.) Test procedure Comparison Make 3 replicas
First run the benchmark for one time to collect I/O trace Then the objects that belong to identified access patterns are replicated asynchronously Comparison Consistent hashing algorithm CRUSH algorithm (straw buckets) Make 3 replicas In our tests, we first run the application for one time to collect I/O trace. Then the objects with identified data access patterns are replicated asynchronously, thus the application can benefit from the PRS for replica access in the future runs. We compare the PRS with the data replication scheme based on consistent hashing algorithm and straw buckets, a typical replication algorithm in CRUSH, in Sheepdog, with each one making 3 replicas. CCGrid-2017 13/16

FIO Performance As seen in the figures, compared with consistent hashing and CRUSH algorithms, the PRS scheme shows an improved bandwidth. It achieves a more rapid growth for performance with the increase of clients. As the PRS groups objects with identified access patterns, it can reduce the I/O access times and network communication cost. Simultaneously, the PRS scheme places replications in an optimized layout, which efficiently use the performance advantage of SSDs. (a) Random read performance (b) Sequential read performance Compared with consistent hashing and CRUSH algorithms, PRS shows an improved bandwidth CCGrid-2017 14/16

Summary Introduce a novel pattern-directed replication scheme (PRS) to achieve highly efficient data replication in heterogeneous object-based storage systems Build a prototype of PRS within the Sheepdog distributed storage system and carry out evaluation tests Experimental results show that PRS wells considers unique features of storage devices in a heterogeneous environment and significantly improve the performance So in summary, this work introduces a novel pattern-directed replication scheme to achieve highly efficient data replication in heterogeneous object-based storage systems. To verify the potential, we have built a prototype of the PRS within the Sheepdog distributed storage system and carried out evaluation tests. The experimental results show that the PRS is an effective replication scheme for heterogeneous storage systems. It well considers unique features of storage devices in a heterogeneous environment and significantly improves the performance. CCGrid-2017 15/16

Thank you Q&A For more information http://discl.cs.ttu.edu
Thanks! This research is supported in part by the National Science Foundation under grant CNS , CNS , IIP , and CCF CCGrid-2017 16/16

Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Similar presentations

Presentation on theme: "Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen

Similar presentations

Presentation on theme: "Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen"— Presentation transcript:

Similar presentations

About project

Feedback