A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.

Slides:



Advertisements
Similar presentations
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
Advertisements

RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
1 On the Speedup of Single-Disk Failure Recovery in XOR-Coded Storage Systems: Theory and Practice Yunfeng Zhu 1, Patrick P. C. Lee 2, Yuchong Hu 2, Liping.
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Raid dr. Patrick De Causmaecker What is RAID Redundant Array of Independent (Inexpensive) Disks A set of disk stations treated as one.
CSCE430/830 Computer Architecture
XtremIO Data Protection (XDP) Explained
Henry C. H. Chen and Patrick P. C. Lee
Coding and Algorithms for Memories Lecture 12 1.
1 STAIR Codes: A General Family of Erasure Codes for Tolerating Device and Sector Failures in Practical Storage Systems Mingqiang Li and Patrick P. C.
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
1 Conserving Energy in RAID Systems with Conventional Disks Dong Li, Jun Wang Dept. of Computer Science & Engineering University of Nebraska-Lincoln Peter.
A Comprehensive Study on RAID6 Codes: Horizontal vs. Vertical Chao Jin*, Dan Feng*, Hong Jiang†, Lei Tian*† *Huazhong University of Science and Technology.
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
Beyond the MDS Bound in Distributed Cloud Storage
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Redundant Data Update in Server-less Video-on-Demand Systems Presented by Ho Tsz Kin.
Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
More Codes Never Enough. 2 EVENODD Code Basics of EVENODD code  each storage node as a single column # of data nodes k = p (prime) # of total nodes n.
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID:
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Background Gaussian Elimination Fault Tolerance Single or multiple core failures: Single or multiple core additions: Simultaneous core failures and additions:
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
A Cost-based Heterogeneous Recovery Scheme for Distributed Storage Systems with RAID-6 Codes Yunfeng Zhu 1, Patrick P. C. Lee 2, Liping Xiang 1, Yinlong.
Multicast and Unicast Real-Time Video Streaming Over Wireless LANS April. 27 th, 2005 Presented by, Kang Eui Lee.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Resource Allocation in Hospital Networks Based on Green Cognitive Radios 王冉茵
A Mechanism for Communication- Efficient Broadcast Encryption over Wireless Ad Hoc Networks Johns Hopkins University Department of Computer Science Reza.
U of Minnesota DIWANS'061 Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Attribute Allocation in Large Scale Sensor Networks Ratnabali Biswas, Kaushik Chowdhury, and Dharma P. Agrawal International Workshop on Data Management.
Simulation of Finite Geometry LDPC code on the Packet Erasure channel Wu Yuchun July 2007 Huawei Hisi Company Ltd.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Seminar On Rain Technology
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #
A Tale of Two Erasure Codes in HDFS
rain technology (redundant array of independent nodes)
Matrices When data from a table (or tables) needs to be manipulated, easier to deal with info in form of a matrix. Fresh Soph Jun Sen A B 7 C 1 6.
Memory Management.
Double Regenerating Codes for Hierarchical Data Centers
Fujitsu Training Documentation RAID Groups and Volumes
Vladimir Stojanovic & Nicholas Weaver
Presented by Haoran Wang
Zhirong Shen+, Patrick Lee+, Jiwu Shu$, and Wenzhong Guo*
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Edge computing (1) Content Distribution Networks
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
Erasure Correcting Codes for Highly Available Storage
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of China A joint work with Liping Xiang, John C.S. Lui and Qian Chang

I would like to thank John C.S. Lui, Raymond W. Yeung, Patrick B.C. Lee, Alfred C.L. Ho!

Outline Background A Hybrid Recovery Approach for Single Disk Failure Row-Diagonal Optimal Recovery (RDOR) for Single Disk Failure  A Recovery Scheme with Minimum Disk Reads  Balancing Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 3/60

Outline Background Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Disk Failure  A Recovery Scheme with Minimum Disk Reads  Balancing Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 4/60

Remark: 5/60 This work can be applied to two RAID-6 codes, RDP and EVENODD. This talk takes RDP as an example.

RDP Code Note: With RDP code, all information data is recoverable when any two disks fail. In a form of a (p  1)×(p+1) matrix, p is a prime number. The first p  1 columns are information columns. The last two are parity columns (row parity, diagonal parity). 6/60 Missing Diagonal d 0,4 = d 0,0  d 0,1  d 0,2  d 0,3 d 0,5 = d 0,0  d 2,3  d 3,2  d 1,4 Row parity Diagonal parity

Outline of our work Problem: The recovery of single disk failure in RDP coded systems Motivation: RDP code tolerates two disk failures, but the probability of single disk failure is much higher than double disk failures. Contributions:  Give the lower bound of disk reads  Propose a recovery scheme, s.t. Disk reads matches the lower bound, reduced by 1/4. Balancing disk reads Minimum extra memory usage: (p  1)/2 blocks XOR operations: No more than conventional scheme 7/60

A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(1) Case 1: Single information disk fails  Row parity disk and other information disks are used for the recovery. The recovery of Disk 1 d 3,1 d 2,1 d 1,1 d 0,1 Disk 6 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 8/60

A Naive Recovery Scheme for Single Disk Failure of RDP Code –– Case(2) Case 2: Single parity disk fails  The recovery is equivalent to the parity encoding The recovery of diagonal parity disk d 3,5 d 2,5 d 1,5 d 0,5 Disk 6 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,5 d 1,5 d 2,5 d 3,5 9/60

Features of the Naive Recovery Scheme Only uses single parity column for single disk failure recovery, however, there are two parity columns in the array. (p  1) 2 symbols are read from the disks for the recovery. 10/60

Questions Whether the disk reads can be reduced for the recovery of single disk failure? What if two parity disks are used for single disk failure recovery? 11/60

Some Benefits from Reducing Disk Reads Speeding up the recovery Relieving system bus load Relieving disk load Enhancing user’s service performance Saving disk energy … 12/60

Outlines Background A Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single disk Failure  A Recovery Scheme with Minimum Disk Reads  Balancing Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 13/60

Row Parity or Diagonal Parity? Either row parity or diagonal parity can be used to recover an erasure symbol Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 d 0,1 can be recovered by row parity 14/60 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 d 0,1 can also be recovered by diagonal parity

A Hybrid Recovery Approach for Single Disk Failure Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 Overlapping symbols 15/60 Using diagonal parity to recover d 0,1 ; Using diagonal parity to recover d 1,1 ; Using row parity to recover d 2,1 ; Using row parity to recover d 3,1. Notes: There are 4 overlapping symbols which need to be read twice. If the 4 overlapping symbols are per-stored in memory, the number of disk reads is reduced to 16  4=12<16.

Consideration of Hybrid Recovery Approach By using memory read instead of disk read  The recovery process will be speeded up Note: Memory read is 100 times faster than disk read  Communication load of the storage system will be reduced During the recovery, the more overlapping symbols, the fewer symbols to be read from disks. Questions  What is the lower bound of disk reads for single disk failure recovery?  How to design a recovery scheme which matches this lower bound? 16/60

Outlines Background A Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Failure  Recovery Scheme with Minimum Disk Reads  Balancing Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 17/60

Row Parity Sets R i = {d i,k |0  k  p  1}-----the i-th row parity set. Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 Because d 0,4 =d 0,0  d 0,1  d 0,2  d 0,3, so d 0,1 =d 0,0  d 0,2  d 0,3  d 0,4 18/60 Row parity Diagonal parity  Each symbol in R i can be recovered by other symbols in R i. E.g. R 0 ={d 0,0, d 0,1, d 0,2, d 0,3, d 0,4 }.

Diagonal Parity Sets Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 D j = {d i,k |(i+k) mod p = j, 0  i  p  2, 0  k  p} is the j-th diagonal parity set. 19/60 Row parity Diagonal parity d 0,1 =d 1,0  d 3,3  d 2,4  d 1,5  Each symbol in D j can be recovered by other symbols in D j. E.g. D 1 ={d 1,0, d 0,1, d 3,3, d 2,4, d 1,5 }

Overlapping Symbols There is just one common (named overlapping) symbol between each pair of R i and D j. R1R1 D3D3 e.g. R 1 ∩D 3 ={d 1,2 } 20/60

Special Cases of Parity Sets Only belong to their diagonal parity sets Only belong to their row parity sets Disk p can only be recovered by diagonal parity. This work only consider the recovery of Disk k, with k ≠ p. 21/60

Recovery Combination Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 E.g. Using recovery combination (D 1, D 2, R 2, R 3 ) to recover Disk 1. A combination of parity sets (R i, …, D j ) is corresponding to a recovery scheme. 22/60

Recovery Combination Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 E.g. Using recovery combination (D 1, D 2, R 2, R 3 ) to recover Disk 1.  Using D 1 to recover d 0,1 ; A combination of parity sets (R i, …, D j ) is corresponding to a recovery scheme. 23/60

Recovery Combination Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 E.g. Using recovery combination (D 1, D 2, R 2, R 3 ) to recover Disk 1.  Using D 1 to recover d 0,1 ;  Using D 2 to recover d 1,1 ; A combination of parity sets (R i, …, D j ) is corresponding to a recovery scheme. 24/60

Recovery Combination Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 E.g. Using recovery combination (D 1, D 2, R 2, R 3 ) to recover Disk 1.  Using D 1 to recover d 0,1 ;  Using D 2 to recover d 1,1 ;  Using R 2 to recover d 2,1 ; A combination of parity sets (R i, …, D j ) is corresponding to a recovery scheme. 25/60

Recovery Combination Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,0 d 0,1 d 0,2 d 0,3 d 0,4 d 0,5 d 1,0 d 1,1 d 1,2 d 1,3 d 1,4 d 1,5 d 2,0 d 2,1 d 2,2 d 2,3 d 2,4 d 2,5 d 3,0 d 3,1 d 3,2 d 3,3 d 3,4 d 3,5 E.g. Using recovery combination (D 1, D 2, R 2, R 3 ) to recover Disk 1.  Using D 1 to recover d 0,1 ;  Using D 2 to recover d 1,1 ;  Using R 2 to recover d 2,1 ;  Using R 3 to recover d 3,1. A combination of parity sets (R i, …, D j ) is corresponding to a recovery scheme. 26/60

Number of Overlapping Symbols Assumption  Disk k is in erasure  p  1 symbols d 0,k, d 1,k, …, d p-2,k need to be recovered Disk 5 Disk 4 Disk 3 Disk 2 Disk 1 Disk 0 Conclusion  t(p  1  t) = (t  (p  1)/2) 2 +(p  1) 2 /4 overlapping symbols  When t=(p  1)/2, the number of overlapping symbols is maximized. A Recovery Scheme  t erasure symbols from diagonal parity sets  The other p  1  t symbols from row parity sets 27/60

Lower Bound of Disk Reads for Single Failure Recovery The maximum number of overlapping symbols is (p  1) 2 /4. A maximum of (p  1) 2 /4 symbols may be reduced from disk read for recovery. Conclusion: The lower bound of disk reads for recovery is (p  1) 2  (p  1) 2 /4 =3(p  1) 2 /4. 28/60 Symbols be readOverlapping symbols Disk 5 Disk 4 Disk 3 Disk 2 Disk 1 Disk 0

Read Optimal Recovery Scheme Any recovery combination consists of (p  1)/2 row and (p  1)/2 diagonal parity sets is read optimal Named Row-Diagonal Optimal Recovery (RDOR). Conclusion: RDOR reduces approximately 25% disk reads compared with the naive scheme. 29/60

Outlines Background Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Failure  A Recovery Scheme with Minimum Disk Reads  Balancing Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 30/60

Example: Two R ead Optimal Recovery Combinations (R 0, R 1, R 2, D 3, D 4, D 5 )(D 0, D 1, R 2, D 3, R 4, R 5 ) Disk reads: Unbalanced Balanced 31/60

Problem and Questions During recovery, the disk with the most read operations may slow down the recovery. 32/60 Questions  To reduce the recovery time, what is a balanced and read-optimal recovery scheme?  It reads the same (or almost the same) number of symbols from different disks.

Average Reads from Each Disk The minimum number of disk reads for recovery is 3(p  1) 2 /4. To achieve read optimal, (p  1)/2 symbols will be read from Disk p (diagonal parity disk). Conclusion: Average number of symbols to be read from the other surviving disks (except for Disk k and Disk p) is [3(p  1) 2 /4  (p  1)/2] / (p  1)= (3p  5)/4. Note: A balanced read-optimal recovery should read (p  1)/2 symbols from Disk p and (3p  5)/4 symbols from each of other disks 33/60

Example: A Balanced Example (D 0, D 1, R 2, D 3, R 4, R 5 ) Balanced 34/60 E.g. p=7 Total: 3(p  1) 2 /4 =27 Disk 7: (p  1)/2=3 Each of other disks: (27  3)/6=4

Recovery Sequence Define a recovery sequence x 0, x 1,..., x p  2, x p  1 corresponds to a recovery combination, where  x i =0 means that d i,k is recovered from its row parity set  x i =1 means that d i,k is recovered from its diagonal parity set E.g. (D 0, D 1, R 2, D 3, R 4, R 5 ) /60 Additional symbol

Condition of Read Optimal and Balanced Recovery Sequence Recovery sequence {x i } 0≤i≤p  1 is read optimal and balanced if and only if  Read optimal x 0 +x 1 +…+x p  2 +x p  1 =(p  1)/2 (1)  Symbols in missing diagonal and added row are recovered by row parity. x p  1  k =x p  1 =0 (2)  (3p  5)/4 symbols to be read from Disk j (0≤j≤p  1, j≠k) 36/60 (3)

Read Optimal and Balanced Recovery –– An Example (D 0, D 1, R 2, D 3, R 4, R 5 ) is a read optimal and balanced recovery combination for p=7 and k=0. Corresponding recovery sequence x 0 x 1...x 5 x 6 = satisfies:  x 0 +x 1 +…+x 5 +x 6 =(p  1)/2=3 (1)  x p  1  k =x p  1 =0 (2) 37/60

Condition of Read Optimal and Balanced Recovery Sequence (Cont.) When x i =0 or x p =1, d i,j in Disk j is used for recovery. When d i,j is used for recovery, x i (1  x p )=0. The number of symbols that need to be read in Disk j is 38/60 Number of symbols not read from Disk j x 2 =0, d 2,3 is read x 0 =1, d 4,3 is used Disk 3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 39/60 d 4,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is used; 40/60 d 4,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is used; x 1 =1, d 5,3 is used; 41/60 d 5,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is used; x 1 =1, d 5,3 is used; x 2 =0, d 2,3 is used; 42/60 d 2,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is used; x 1 =1, d 5,3 is used; x 2 =0, d 2,3 is used; x 3 =1, d 0,3 is used; 43/60 d 0,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is used; x 1 =1, d 5,3 is used; x 2 =0, d 2,3 is used; x 3 =1, d 0,3 is used; x 4 =0, d 4,3 is used; 44/60 d 4,3

Read Optimal and Balanced Recovery –– An Example (Cont.) Recovery sequence x 0 x 1...x 5 x 6 = also satisfies:  4 symbols to be read from Disk j (0≤j≤6, j≠0) (3) Disk 3 E.g. x 0 =1, d 4,3 is read; x 1 =1, d 5,3 is read; x 2 =0, d 2,3 is read; x 3 =1, d 0,3 is read; x 4 =0, d 4,3 is read; x 5 =0, d 5,3 is read; 45/60 d 5,3 Not be read

Recovery Set Given a recovery sequence {x i } 0≤i≤p  1, define A={ i | x i =1, 0≤i≤p  1} as the recovery set. x 0 x 1...x 5 x 6 = A={0,1,3} 46/60

Recovery Set As if and only if i ∈ A and p ∈ A, x i x p = 1. So Balanced Recovery Set  A corresponds to a balanced sequence, if and only if For any t (1≤ t≤ p  1), t has a multiplicity of (p  3)/4 in the multi-set M A ={a 1  a 2 | a 1, a 2 ∈ A, a 1 ≠a 2 } 47/60

The Existence of Read Optimal and Balanced Recovery Set By using the concept of (partial) difference-set, we have the following conclusion.  Given a prime number p, and the nonzero squares set D={i 2 |1≤i≤(p−1)/2} in F p is a difference-set.  There is g ∈ F p, s.t. A=D+g corresponds to a read-optimal and balanced recovery sequence {x i } 0≤i≤p−1 for the recovery of Disk k (k≠p). 48/60

Reviewing on Read Balance Problem 49/60 Find out the average number of disk reads on each disk. Define recovery sequence and recovery set to describe recovery scheme. Find out the constraint conditions that a recovery set is read optimal and balanced. Using the concept of (partial) difference set to solve these constraint conditions. The read optimal and balanced recovery scheme corresponds to the solved recovery set.

Outlines Background Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Failure  Recovery Scheme with Minimum Disk Reads  Balancing of Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 50/60

Extra Memory Usage Problem The number of overlapping symbols should be stored in memory is at most (p  1) 2 /4. The more overlapping symbols, the more extra memory usage. Question  How to minimize the extra memory usage while read-optimal and balanced? 51/60

Main Idea of Optimizing Extra Memory Usage Using D 1 to recover d 0,1 ;  Pre-store d 3,3 ;  Pre-store d 2,4 ; Using D 2 to recover d 1,1 ;  Pre-store d 2,0 ;  Pre-store d 3,4 ; Using R 2 to recover d 2,1 ;  Read d 2,0, d 2,4 from memory; Using R 3 to recover d 3,1.  Read d 2,0, d 2,4 from memory; Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 52/60 E.g. Using (D 1, D 2, R 2, R 3 ) to recover Disk 1. Need four extra memory units

Main Idea of Optimizing Extra Memory Usage Main Idea  Store the XOR-sum of overlapping symbols instead of the original symbols to optimize extra memory usage. Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 d 0,1 d 1,1 d 2,1 d 3,1 53/60 Two extra memory units M[2], M[3] are reserved for the recovery of d 2,1, d 3,1.  M[2]=0, M[3]=0;  M[2]=d 2,4, M[3]= d 3,3 ;  M[2]=d 2,0  d 2,4, M[3]=d 3,3  d 3,4 ; M[2] M[3] Only need two extra memory units

Read Optimal and Balanced Recovery Scheme with Minimum Memory Usage Using the read optimal and balanced recovery combination. Recovery process is executed in a “row-parity-first” manner.  Firstly, recover all symbols that use row parity sets.  Then, using diagonal parity sets to recover the other symbols. (p  1)/2 memory units are reserved to recover (p  1)/2 symbols which use diagonal parity sets for recovery. 54/60

Outlines Background Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Failure  Recovery Scheme with Minimum Disk Reads  Balancing of Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 55/60

Methodology Experiment Settings  Off-line recovery mode  DiskSim simulation  Disk array size p+1=8, 14, and 20  Strip size from 16KB to 64KB Metrics  Total recovery time  Individual disk access time 56/60

Experimental Results –– Recovery Time The total recovery time of RDOR is less than the naive scheme as strip size varies from 16KB to 64KB. Moreover, with a strip size less than 32KB, the recovery time of RDOR is reduced by approximately 20% compared with the naive scheme. 57/60

Experimental Results –– Disk Access Time The average disk access time of RDOR is reduced 15.16% to 22.60% when p=7 and strip size varies from 16KB to 64KB. In on-line scenarios, each disk will be more available to serve user’s requests. 58/60

Outlines Background Motivation Hybrid Recovery Approach for Single Failure Row-Diagonal Optimal Recovery (RDOR) for Single Failure  Recovery Scheme with Minimum Disk Reads  Balancing of Disk Reads  Optimizing Memory Usage Performance Evaluation Summary 59/60

Summary The proposed single recovery scheme RDOR issues  Lower bound of disk reads for recovery When k≠p, the number of symbols should be read from disk is reduced by 1/4 compared with the conventional strategy.  Balancing disk reads The number of read operations from each disk are the same (or almost the same).  Minimum memory usage At any time, the maximum number of overlapping symbols or their computations stored in memory is (p  1)/2.  XOR operations No more than conventional scheme 60/60

Future works Design efficient recovery algorithms for other codes. Construct codes against multiple failures but more efficient for single failure recovery. 61/60

Thank you !