Presentation on theme: "DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)"— Presentation transcript:
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
Index Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling Capabilities of Stable Storage Recovery from Disk Crashes Mirroring as a Redundancy Technique Parity Blocks An Improving: RAID Coping With Multiple Disk Crashers
Intermittent Failures If we try to read the sector but the correct content of that sector is not delivered to the disk controller Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the read operation
Checksums Each sector has some additional bits, called the checksums Checksums are set depending on the values of the data bits stored in that sector Probability of reading bad sector is less if we use checksums For Odd parity: Odd number of 1s, add a parity bit 1 For Even parity: Even number of 1s, add a parity bit 0 So, number of 1s becomes always even
Intermittent Failure: Parity Check Media Decay And Write Failure: Stable Storage Disk Crash: RAID Example: 1. Sequence : > odd no of 1s parity bit: 1 -> Sequence : >even no of 1s parity bit: 0 ->
Stable Storage Correct Errors Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by subsituting spare sector of Xl and Xr until the good value is returned
Error Handling Capabilities of Stable Storage Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X
Recovery from Disk Crashes: Ways to recover the data The most serious mode of failure for disks is head crash where data permanently destroyed. So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes. Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.
Mirroring as a Redundancy Technique Mirroring Scheme is referred as RAID level 1 protection against data loss scheme. In this scheme we mirror each disk. One of the disk is called as data disk and other redundant disk. In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.
Parity Blocks RAID level 4 scheme uses only one redundant disk no matter how many data disks there are. In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks. It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1s and redundant disk bit is used to make this condition true.
Parity Block – Reading disk Reading data disk is same as reading block from any disk. We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: disk 3: disk 4: If we take the modulo-2 sum of the bits in each column, we get -disk 1:
Parity Block - Writing When we write a new block of a data disk, we need to change that block of the redundant disk as well. One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk.
Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: disk 2: ???????? disk 3: disk 4: If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is :
An Improvement: RAID 5 RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.