Disk Failures Xiaqing He ID: 204 Dr. Lin. Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
RAIDRAID Rithy Chhay Shari Holstege CMSC 691X: UNIX Systems Administration.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID:
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
RAID and mirroring Track SA-E AfNOG workshop May 15, 2009 Cairo, Egypt (Slides by Phil Regnauld)
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
CSI-09 COMMUNICATION TECHNOLOGY FAULT TOLERANCE AUTHOR: V.V. SUBRAHMANYAM.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
Copyright © Curt Hill, RAID What every server wants!
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Barcodes! Felipe Voloch These notes and the barcode program are available at /barcode.html.
- Disk failure ways and their mitigation - Priya Gangaraju(Class Id-203)
The concept of RAID in Databases By Junaid Ali Siddiqui.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Part IV I/O System Chapter 12: Mass Storage Structure.
Disk Failures Skip. Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling.
LECTURE 13 I/O. I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Lecture 11: Storage Systems Disk, RAID, Dependability Kai Bu
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
Disk Failures Xiaqing He ID: 204 Dr. Lin.
Transactions and Reliability
Multiple Platters.
What every server wants!
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Lecture 13 I/O.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Disk Failures Disk failure ways and their mitigation
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Presentation transcript:

Disk Failures Xiaqing He ID: 204 Dr. Lin

Content 1) RAID stands for: “redundancy array of independent disks” 2) Several schemes to recover from disk crashes: Mirroring – RAID level1 Parity Checks – RAID 4 Improvement – RAID 5 RAID 6

Mirroring The simplest scheme to recovery from Disk Crashes Mirror work -- making two or more copies of the data on different disks Benefit: -- save data in case one disk might fail; -- divide data on several disks and let access to several blocks at once

Mirroring (con’t)  When the data can be lost? -- in case there is a second (mirror/redundant) disk crash while the first (data) disk crash is being repaired.  Possibility: Suppose: One disk: mean time to failure = 10 years; One of the two disk: average of mean time to failure = 5 years; The process of replacing the failed disk= 3 hours=1/2920 year; So: the possibility of the mirror disk will fail=1/10 * 1/2,920 =1/29,200; The possibility of data loss by mirroring: 1/5 * 1/29,200 = 1/146,000

Parity Blocks  why changes? -- disadvantages of Mirroring: uses so many redundant disks  What’s new? -- RAID level 4: uses only one redundant disk  How this one redundant disk works? -- modulo-2 sum; -- the jth bit of the redundant disk is the modulo-2 sum of the jth bits of all the data disks.  Example

Parity Blocks(con’t)_Example Data disks: Disk1: Disk2: Disk3: Redundant disk: Disk4:

RAID 4 (con’t)  Reading -- Similar to reading blocks from any disk;  Writing 1)change the data disk; 2)change the corresponding block of the redundant disk; Why? -- hold the parity checks for the corresponding blocks of all the data disks

RAID 4 (con’t) _ writing For a total N data disks: 1) naïve way: read N data disks and compute the modulo-2 sum of the corresponding blocks; rewrite the redundant disk according to modulo-2 sum of the data disks; 2) better way: Take modulo-2 sum of the old and new version of the data block which was rewritten; Change the position of the redundant disk which was 1’s in the modulo-2 sum;

RAID 4 (con’t) _ writing_Example  Data disks: Disk1: Disk2:  Disk3:  to do: Modulo-2 sum of the old and new version of disk 2: So, we need to change the positions 1,2,5,6 of the redundant disk. Redundant disk: Disk4: 

RAID 4 (con’t) _failure recovery  Redundant disk crash: -- swap a new one and recomputed data from all the data disks;  One of Data disks crash: -- swap a new one; -- recomputed data from the other disks including data disks and redundant disk;  How to recomputed? (same rule, that’s why there will be some improvement) -- take modulo-2 sum of all the corresponding bits of all the other disks

An Improvement: RAID 5  Why need a improvement? -- Shortcoming of RAID level 4: suffers from a bottleneck defect (when updating data disk need to read and write the redundant disk);  Principle of RAID level 5 (RAID 5): -- treat each disk as the redundant disk for some of the blocks;  Why it is feasible? The rule of failure recovery for redundant disk and data disk is the same: “take modulo-2 sum of all the corresponding bits of all the other disks” So, there is no need to retreat one as redundant disk and others as data disks

3) RAID 5 (con’t)  How to recognize which blocks of each disk treat this disk as redundant disk? -- if there are n+1 disks which were labeled from 0 to N, then we can treat the i th cylinder of disk J as redundant if J is the remainder when I is divided by n+1;  Example;

3) RAID 5 (con’t)_example N=3; The first disk, labeled as 0 : 4,8,12…; The second disk, labeled as 1 : 1,5,9…; The third disk, labeled as 2 : 2,6,10…; ………. Suppose all the 4 disks are equally likely to be written, for one of the 4 disks, the possibility of being written: 1/4 + 3 /4 * 1/3 =1/2 If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m

4) Coping with multiple disk crashes  RAID 6 – deal with any number of disk crashes if using enough redundant disks  Example a system of seven disks ( four data disks_numer 1-4 and 3 redundant disks_ number 5-7); How to set up this 3*7 matrix ? (why is 3? – there are 3 redundant disks) 1)every column values three 1’s and 0’s except for all three 0’s; 2) column of the redundant disk has single 1’s; 3) column of the data disk has at least two 1’s;

4) Coping with multiple disk crashes (con’t)  Reading: read form the data disks and ignore the redundant disk  Writing: Change the data disk change the corresponding bits of all the redundant disks

4) Coping with multiple disk crashes (con’t)  In those system which has 4 data disks and 3 redundant disk, how they can correct up to 2 disk crashes? Suppose disk a and b failed: find some row r (in 3*7 matrix)in which the column for a and b are different (suppose a is 0’s and b is 1’s); Compute the correct b by taking modulo-2 sum of the corresponding bits from all the other disks other than b which have 1’s in row r; After getting the correct b, Compute the correct a with all other disks available;  Example

4) Coping with multiple disk crashes (con’t)_example 3*7 matrix data disk redundant disk disk number

4) Coping with multiple disk crashes (con’t)_example First block of all the disks disk contents 1) ) ) ) ) ) )

4) Coping with multiple disk crashes (con’t)_example Two disks crashes; disk contents 1) ) ????????? 3) ) ) ????????? 6) )

4) Coping with multiple disk crashes (con’t)_example In that 3*7 matrix, find in row 2, disk 2 and 5 have different value and disk 2’s value is 1 and 5’s value is 0. so: compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,4,6; then compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,2,3; 1) ) ????????? => ) ) ) ????????? => ) )