Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.

Similar presentations


Presentation on theme: "Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk."— Presentation transcript:

1 Disk Failures Xiaqing He ID: 204 Dr. Lin

2 Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk crashes: Mirroring – RAID level1 Parity Checks – RAID 4 Improvement – RAID 5 RAID 6

3 Mirroring The simplest scheme to recovery from Disk Crashes Mirror work -- making two or more copies of the data on different disks Benefit: -- save data in case one disk might fail; -- divide data on several disks and let access to several blocks at once

4 Mirroring (con’t)  When the data can be lost? -- in case there is a second (mirror/redundant) disk crash while the first (data) disk crash is being repaired.  Possibility: Suppose: One disk: mean time to failure = 10 years; One of the two disk: average of mean time to failure = 5 years; The process of replacing the failed disk= 3 hours=1/2920 year; So: the possibility of the mirror disk will fail=1/10 * 1/2,920 =1/29,200; The possibility of data loss by mirroring: 1/5 * 1/29,200 = 1/146,000

5 Parity Blocks  why changes? -- disadvantages of Mirroring: uses so many redundant disks  What’s new? -- RAID level 4: uses only one redundant disk  How this one redundant disk works? -- modulo-2 sum; -- the jth bit of the redundant disk is the modulo-2 sum of the jth bits of all the data disks.  Example

6 Parity Blocks(con’t)_Example Data disks: Disk1: 11110000 Disk2: 10101010 Disk3: 00111000 Redundant disk: Disk4: 01100010

7 RAID 4 (con’t)  Reading -- Similar to reading blocks from any disk;  Writing 1)change the data disk; 2)change the corresponding block of the redundant disk; Why? -- hold the parity checks for the corresponding blocks of all the data disks

8 RAID 4 (con’t) _ writing For a total N data disks: 1) naïve way: read N data disks and compute the modulo-2 sum of the corresponding blocks; rewrite the redundant disk according to modulo-2 sum of the data disks; 2) better way: Take modulo-2 sum of the old and new version of the data block which was rewritten; Change the position of the redundant disk which was 1’s in the modulo-2 sum;

9 RAID 4 (con’t) _ writing_Example  Data disks: Disk1: 11110000 Disk2: 10101010  01100110 Disk3: 00111000  to do: Modulo-2 sum of the old and new version of disk 2: 11001100 So, we need to change the positions 1,2,5,6 of the redundant disk. Redundant disk: Disk4: 01100010  10101110

10 RAID 4 (con’t) _failure recovery  Redundant disk crash: -- swap a new one and recomputed data from all the data disks;  One of Data disks crash: -- swap a new one; -- recomputed data from the other disks including data disks and redundant disk;  How to recomputed? (same rule, that’s why there will be some improvement) -- take modulo-2 sum of all the corresponding bits of all the other disks

11 An Improvement: RAID 5  Why need a improvement? -- Shortcoming of RAID level 4: suffers from a bottleneck defect (when updating data disk need to read and write the redundant disk);  Principle of RAID level 5 (RAID 5): -- treat each disk as the redundant disk for some of the blocks;  Why it is feasible? The rule of failure recovery for redundant disk and data disk is the same: “take modulo-2 sum of all the corresponding bits of all the other disks” So, there is no need to retreat one as redundant disk and others as data disks

12 3) RAID 5 (con’t)  How to recognize which blocks of each disk treat this disk as redundant disk? -- if there are n+1 disks which were labeled from 0 to N, then we can treat the i th cylinder of disk J as redundant if J is the remainder when I is divided by n+1;  Example;

13 3) RAID 5 (con’t)_example N=3; The first disk, labeled as 0 : 4,8,12…; The second disk, labeled as 1 : 1,5,9…; The third disk, labeled as 2 : 2,6,10…; ………. Suppose all the 4 disks are equally likely to be written, for one of the 4 disks, the possibility of being written: 1/4 + 3 /4 * 1/3 =1/2 If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m

14 4) Coping with multiple disk crashes  RAID 6 – deal with any number of disk crashes if using enough redundant disks  Example a system of seven disks ( four data disks_numer 1-4 and 3 redundant disks_ number 5-7); How to set up this 3*7 matrix ? (why is 3? – there are 3 redundant disks) 1)every column values three 1’s and 0’s except for all three 0’s; 2) column of the redundant disk has single 1’s; 3) column of the data disk has at least two 1’s;

15 4) Coping with multiple disk crashes (con’t)  Reading: read form the data disks and ignore the redundant disk  Writing: Change the data disk change the corresponding bits of all the redundant disks

16 4) Coping with multiple disk crashes (con’t)  In those system which has 4 data disks and 3 redundant disk, how they can correct up to 2 disk crashes? Suppose disk a and b failed: find some row r (in 3*7 matrix)in which the column for a and b are different (suppose a is 0’s and b is 1’s); Compute the correct b by taking modulo-2 sum of the corresponding bits from all the other disks other than b which have 1’s in row r; After getting the correct b, Compute the correct a with all other disks available;  Example

17 4) Coping with multiple disk crashes (con’t)_example 3*7 matrix data disk redundant disk disk number 1 2 3 4 5 6 7 1110100 1101010 1011001

18 4) Coping with multiple disk crashes (con’t)_example First block of all the disks disk contents 1) 11110000 2) 10101010 3) 00111000 4) 01000001 5) 01100010 6) 00011011 7) 10001001

19 4) Coping with multiple disk crashes (con’t)_example Two disks crashes; disk contents 1) 11110000 2) ????????? 3) 00111000 4) 01000001 5) ????????? 6) 00011011 7) 10001001

20 4) Coping with multiple disk crashes (con’t)_example In that 3*7 matrix, find in row 2, disk 2 and 5 have different value and disk 2’s value is 1 and 5’s value is 0. so: compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,4,6; then compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,2,3; 1) 11110000 2) ????????? => 00001111 3) 00111000 4) 01000001 5) ????????? => 01100010 6) 00011011 7) 10001001


Download ppt "Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk."

Similar presentations


Ads by Google