Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2

Slides:

Advertisements

Similar presentations

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Advertisements

Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.

CSCE430/830 Computer Architecture

Henry C. H. Chen and Patrick P. C. Lee

1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.

Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2

1 Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters Runhui Li, Patrick P. C. Lee, Yuchong Hu The Chinese University of Hong Kong.

Availability in Globally Distributed Storage Systems

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.

Tradeoffs in Scalable Data Routing for Deduplication Clusters FAST '11 Wei Dong From Princeton University Fred Douglis, Kai Li, Hugo Patterson, Sazzala.

Impact of Data Locality on Garbage Collection in SSDs: A General Analytical Study Yongkun Li, Patrick P. C. Lee, John C. S. Lui, Yinlong Xu The Chinese.

Chapter 8 Hardware Conventional Computer Hardware Architecture.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.

Quality-Aware Segment Transmission Scheduling in Peer-to-Peer Streaming Systems Cheng-Hsin Hsu Senior Research Scientist Deutsche Telekom R&D Lab USA Los.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Presented by: Raymond Leung Wai Tak Supervisor:

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Storage Networks How to Handle Heterogeneity Bálint Miklós January 24th, 2005 ETH Zürich External Memory Algorithms and Data Structures.

Multi-level Selective Deduplication for VM Snapshots in Cloud Storage Wei Zhang*, Hong Tang †, Hao Jiang †, Tao Yang*, Xiaogang Li †, Yue Zeng † * University.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information.

Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.

Word Wide Cache Distributed Caching for the Distributed Enterprise.

1 The Google File System Reporter: You-Wei Zhang.

Redundant Array of Independent Disks

RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.

Middleware Enabled Data Sharing on Cloud Storage Services Jianzong Wang Peter Varman Changsheng Xie 1 Rice University Rice University HUST Presentation.

Distributed Load Balancing for Key-Value Storage Systems Imranul Hoque Michael Spreitzer Malgorzata Steinder.

Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.

Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.

Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.

Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.

A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.

Large Scale Parallel File System and Cluster Management ICT, CAS.

1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng, Patrick P. C. Lee The Chinese University of Hong Kong.

A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.

1 Enabling Efficient and Reliable Transitions from Replication to Erasure Coding for Clustered File Systems Runhui Li, Yuchong Hu, Patrick P. C. Lee The.

Parity Logging with Reserved Space: Towards Efficient Updates and Recovery in Erasure-coded Clustered Storage Jeremy C. W. Chan*, Qian Ding*, Patrick P.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Tackling I/O Issues 1 David Race 16 March 2010.

Load Rebalancing for Distributed File Systems in Clouds.

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.

-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.

Pouya Ostovari and Jie Wu Computer & Information Sciences

Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #

R-Storm: Resource Aware Scheduling in Storm

Rekeying for Encrypted Deduplication Storage

Data Driven Resource Allocation for Distributed Learning

A Simulation Analysis of Reliability in Primary Storage Deduplication

Repair Pipelining for Erasure-Coded Storage

Presented by Haoran Wang

Unistore: Project Updates

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Jingwei Li*, Patrick P. C. Lee #, Yanjing Ren*, and Xiaosong Zhang*

Presentation transcript:

Even Data Placement for Load Balance in Reliable Distributed Deduplication Storage Systems Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2 1The Chinese University of Hong Kong, 2University of Science and Technology of China IWQoS ’15 Hello, I will present our work, “Even …”

Storage cost ~8 trillion US dollars Data Explosion Nowadays, nearly everyone has different types of electronic devices. They generate large volume of digital data. To properly store the data, users tend to upload their data onto some storage service providers, such as Dropbox However, the global data volume has been growing explosively, and it is estimated that, by the year 2020, the global digital universe will be as large as around 40,000 exabytes To store such large volume of data, the storage devices, such as hard disk, will cost around 8 trillion US dollars. This poses huge financial burden for the storage service providers. Source: IDC's Digital Universe Study, December 2012 Storage cost ~8 trillion US dollars

Modern Storage Products Deduplication removes content redundancy to improve storage efficiency Store one copy of chunks with same content Up to 20x storage cost reduction with practical storage workloads [Andrews, ExaGrid’13] Adopt distributed architecture to extend the storage capacity Deploy erasure coding technology for high fault-tolerance Therefore, modern storage products utilizes deduplication to remove content-level redundancy to improve storage efficiency Deduplication stores one copy of chunks with the same content, and previous studies have shown that the storage cost can be reduced by 20 times for practical workloads In addition, the storage products also adopt distributed architecture for storage capacity and erasure coding for high fault-tolerance

Deduplication A A B B C C D D A A E E F F G G A A E E H H I I File 1 File 2 File 3 Node 1 Node 2 Node 3 Node 4 Now, let’s see how deduplication works with this toy example: Three files, each with 4 chunks, are uploaded to a storage cluster with 4 storage nodes For file 1, four chunks are unique, and conventional distribution algorithm distributes the 4 chunks to 4 storage nodes in round-robin manner For file 2, chunk A is existing in the storage system, and we regard it as duplicate chunk. The 3 unique chunks are evenly distributed to the storage cluster. Similar for file 3. By the way, in this example, deduplication reduces the storage cost by 25% File 2 consists of 4 chunks: one duplicate chunk, A; three unique chunks, E, F and G. Duplicate Chunk Unique Chunk

What's the problem? A B C D A E F G A E H I File 1 File 2 File 3 Node 1 Node 2 Node 3 Node 4 A A A B C D E E F G H I Now, when we want to download the files from the storage system For file 1, we only need 1 parallel read to retrieve the file For file 2, file retrieval is bottlenecked by reading two chunks from node 1 And it gets even worse for file 3. We can see that conventional distribution algorithms blindly cluster chunks of a file on a single storage node And, in a networked storage cluster with parallel data access and limited link bandwidth, reading a single file is bottlenecked by the storage node with the most chunks of the file Conventional distribution algorithms are unaware of duplicate chunks Achieve only storage balance while poor read balance can degrade parallel file read performance

Our Contributions We model the file data distribution problem in a reliable distributed deduplication storage system as integer optimization problem We propose a polynomial time algorithm, the even data placement (EDP) algorithm, to tackle the problem We implement a prototype with proposed algorithms, and conduct extensive evaluations on top of it

Problem Modeling Balance chunks of a file online We should manage merely the unique chunks Information on duplicate chunks is available at upload time Preserve the storage balance In each upload, volume of unique data, distributed to each storage node, should be comparable

Problem Modeling We balance multiple files so that both storage balance and read balance are achievable File 1 File 2 File 3 File 4 Node 1 Node 2 Node 3 Node 4 Node 5 We realize that we have to balance multiple files at one time to achieve both storage balance and read balance We accumulate the unique chunks of multiple files as a distribution batch. Within the distribution batch, we fix the number of unique chunks to each storage node And we can freely manage how the unique chunks of each file are placed within the batch The number of unique chunks to each storage node in our implementation is pre-set at configuration, and is decided by the cluster scale and coding scheme; Refer to the paper for the details; Within the distribution batch, we can freely place the unique chunks of each file to achieve the read balance

Optimization Problem The theoretical optimal distribution of a file should has 𝐸 𝑖 chunks on each node, where 𝐸 𝑖 = 𝑗=1 𝑁 𝑢 𝑖,𝑗 + 𝑑 𝑖,𝑗 𝑁 , 1≤𝑖≤𝑡 Thus, we aim to find a distribution that minimizes the gaps between bottlenecks of files and their corresponding optimal value: 𝑖=1 𝑡 (1− 𝐸 𝑖 𝑚𝑎𝑥 𝑗=1 𝑁 𝑢 𝑖,𝑗 + 𝑑 𝑖,𝑗 ) To balance the distribution of chunks of file, first we need to know what the theoretical optimal result that we can achieve We calculate the total number of chunks of a file, and divide it by the number of nodes in the storage cluster. The result is the number of

Optimization Problem 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒: 𝑖=1 𝑡 (1− 𝐸 𝑖 𝑚𝑎𝑥 𝑗=1 𝑁 𝑢 𝑖,𝑗 + 𝑑 𝑖,𝑗 ) 𝑆𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜: 𝑗=1 𝑁 𝑢 𝑖,𝑗 = 𝑈 𝑖 , 1≤𝑖≤𝑡 , (1) 𝑖=1 𝑡 𝑢 𝑖,𝑗 =𝐶, 1≤𝑗≤𝑁, (2) 𝑢 𝑖,𝑗 ∈ 0,1,…, 𝑈 𝑖 , 1≤𝑖≤𝑡 & 1≤𝑗≤𝑁. (3) Constraint 1 All unique chunks should be assigned to one of the storage nodes Constraint 2 Each storage node should be assigned C unique chunks Constraint 3 Each chunk is distributed integrally The first constraint indicates that all unique chunks of a file should be assigned to one of the storage nodes; The second constraint indicates that each storage node should be assigned C unique chunks to sustain storage balance; The last constraint implies that chunk is the unit of data distribution, which manifests the integral nature of the problem

Algorithm Design The algorithm should output fast The online processing should NEVER bottleneck the upload operation Brute-force approach is prohibitive due to the huge solution space of the problem The algorithm output should be accurate Guarantee improvement over conventional distribution approaches in terms of read performance Close to optimal result for practical workloads

Greedy Assignment Each unique chunk of a file in a batch is logically assigned to the storage node with the least number of existing chunks Existing chunks include both duplicate chunks and previously assigned unique chunks of the file A B C D A A E C C F To break tie, the first priority is the node that holds the previous contiguous chunk. The second priority is the node succeeding the last node in round-robin fashion File 1 File 2 𝑑 1,1 , 𝑑 1,2 , 𝑑 1,3 =(2,0,1) Objective Value: 1-7/(3*3) +1-6/(3*3) =5/9 Node 1 Node 2 Node 3 E A C 𝑑 2,1 , 𝑑 2,2 , 𝑑 2,3 =(1,2,1) F B D

Inter-file Swapping Greedy assignment fails for later files in a batch due to the constraints of storage load balancing Identify one previous file and swap some chunks of it with the current file to further reduce the objective value A B C D A A E C C F File 1 File 2 𝑑 1,1 , 𝑑 1,2 , 𝑑 1,3 =(2,0,1); 𝑑 2,1 , 𝑑 2,2 , 𝑑 2,3 =(1,2,1) Objective Value: 1-7/(3*3) +1-6/(3*2) =2/9 Node 1 Node 2 Node 3 D E A C F B D E

Discussions Both greedy assignment and inter-file swapping are conducted in memory without I/O operations The time complexity of EDP is polynomial, linear to the number of unique chunks in a batch and quadratic to the number of files in a batch We extend EDP to tackle the distribution problem, with heterogeneous topology or using variable-sized chunking scheme

Discussions We evaluate the effectiveness of our EDP algorithm via simulations and experiments The size of a distribution batch has intricate effects on performance of EDP If the batch is so small that all the unique chunks belong to the same file, EDP has no improvement over conventional algorithms If the batch contains unique chunks of many distinct files, considerable improvement of EDP over conventional algorithms is expected, whereas with increased processing overhead

Implementation Client: Chunking, coding and data transmission Deduplication Meta-data DB Client Coding I/O Placement MDS Storage Nodes 1. configurable fixed-size chunking or variable-size chunking; 2. OpenSSL for hash (SHA-1); 3. Jerasure [Plank, 2014] & GF-Complete [Plank, 2013] for encoding; 4. Implementation in C. Client: Chunking, coding and data transmission MDS: Deduplication and data placement control Storage Node: Read/write of chunk data

Evaluation Setup How is the performance of EDP compared to baseline round-robin distribution? Real-world datasets Setup 16 storage nodes using (14,10) erasure coding 𝐶=1050, 4KB chunk size, variable-size for FSLHOME and fixed-size for LINUX and LINUXTAR Logical Size (GB) Physical Size (GB) File Type FSLHOME 6615 348 Mixed-size LINUX 7.81 2.34 Small LINUXTAR 101 47.46 Large

Trace-driven Simulation FSLHOME Read Balance Gap: 1− 𝐸 𝑖 𝑚𝑎𝑥 𝑗=1 𝑁 𝑢 𝑖,𝑗 + 𝑑 𝑖,𝑗 ,for each file i EDP consistently outperforms baseline distribution in terms of read balance metric.

Test-bed Performance LINUX Normalized read latency is the read latency of files distributed by EDP over that of file distributed by baseline The link bandwidth between each storage node and the client is 100Mbps. Improved read load balance directly translates into reduction in parallel read latency. On average, EDP reduces the read latency of the baseline by 37.41% fixed-size chunking scheme

Heterogeneity LINUX In heterogeneous network, cost-based EDP (CEDP) outperforms both EDP and baseline Larger variation in network bandwidth leads to larger improvement LINUX The cost-based EDP balance distribution of chunks based on the download cost, i.e. latency, from each storage node, instead of the number of chunks on each node We configure the link bandwidths of 5, 6 and 5 storage nodes to be 10Mbps, 100Mbps and 1Gbps, respectively

Computational Overheads 1. Processing time of EDP increases faster than baseline due to its quadratic complexity to the number of files As the number of files increases, the processing time of EDP increases faster than that of the baseline We leave as future work to reduce gap via parallel processing or bounding the number of files in a batch

Evaluation Summary EDP outperforms baseline consistently in trace-driven simulations, and its performance is close to optimal for practical workloads EDP reduces the file read latency of baseline by 37.41% in the test-bed experiments In heterogeneous network, CEDP outperforms both EDP and baseline EDP lags as the number of files gets large, and we will improve this in the future

Conclusions We study the read load balance problem in reliable distributed deduplication storage systems We propose polynomial time algorithm, EDP, to solve the problem efficiently We also extend the algorithm to tackle practical factors, including heterogeneous topology and variable-size chunking Our prototype is available at http://ansrlab.cse.cuhk.edu.hk/software/edp