ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
CSCE430/830 Computer Architecture
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,
1 A Case for Redundant Arrays of Inexpensive Disks Patterson, Gibson and Katz (Seminal paper) Chen, Lee, Gibson, Katz, Patterson (Survey) Circa late 80s..
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Storage (cont’d) Disk scheduling Reducing seek time (cont’d) Reducing rotational latency RAIDs.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
RAID Systems CS Introduction to Operating Systems.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Redundant Array of Independent Disks
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
ICOM 5016– Intro. to Database Systems Lecture 11 - Buffer Management and RAID Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
The concept of RAID in Databases By Junaid Ali Siddiqui.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
ICOM 5016– Intro. to Database Systems Lecture 12 – RAID and Buffer Management Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture ?? – September October.
W4118 Operating Systems Instructor: Junfeng Yang.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CS Introduction to Operating Systems
CMPE Database Systems Workshop June 16 Class Meeting
Vladimir Stojanovic & Nicholas Weaver
ICOM 5016– Intro. to Database Systems
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez – All rights reserved

ICOM 6005Dr. Manuel Rodriguez Martinez2 Readings Read –Chapter 9: “Storing Data: Disks and Files” –Paper: “A Case for Redundant Arrays of Inexpensive Disks (RAID)” by David A. Patterson, Garth Gibson and Randy H. Katz

ICOM 6005Dr. Manuel Rodriguez Martinez3 The problem with Striping Striping has the advantage of speeding up disk access time. But the use of a disk array decreases the reliability of the storage system because more disks mean more possible points of failure. Mean-time-to-failure (MTTF) –Mean time to have the disk fail and lose its data MTTF is inversely proportional to the number of components in used by the system. –The more we have the more likely they will fall apart!

ICOM 6005Dr. Manuel Rodriguez Martinez4 MTTF in terms of number of disks Suppose we have N disks in the array. Then the MTTF of the array is given by: This assumes any disk can fail with equal probability.

ICOM 6005Dr. Manuel Rodriguez Martinez5 MTTF in a disk array Suppose we have a single disk with a MTTF of 50,000 hrs (5.7 years). Then, if we build an array with 50 disks, then we have a MTTF for the array of 50,000/50 = 1000 hrs, or 42 days!, because any disk can fail at any given time with equal probability. –Disk failures are more common when disks are new (bad disk from factory) or old (wear due to usage). Morale of the story: More does not necessarily means better!

ICOM 6005Dr. Manuel Rodriguez Martinez6 Increasing MTTF with redundancy We can increase the MTTF in a disk array by storing some redundant information in the disk array. –This information can be used to recover from a disk failure. This information should be carefully selected so it can be used to reconstruct original data after a failure. What to store as redundant information? –Full data block –Parity bit for a set of bit locations across all the disks Where to store it? –Check disks – disks in the array used only for this purpose –All disks – spread redundant information on every disk in the array.

ICOM 6005Dr. Manuel Rodriguez Martinez7 Redundancy unit: Data Blocks One approach is to have a back-up copy of each data block in the array. This is called mirroring. Back up can be in: –another disk, or disk array –Tape (very slow …) Advantage: –Easy to recover from failure, just read the block from backup. Disadvantages: –Requires up to twice the storage space –Writes are more expensive Need to write data block to two different locations each time Snapshot writes are unfeasible (failures happen at any time!)

ICOM 6005Dr. Manuel Rodriguez Martinez8 Redundancy Unit: Parity bits Consider an array of N disks. Suppose k is the number of the k-th block in each disk. Each block consists of several kilobytes, and each byte is 8-bit. We can store redundant information about the i-th bit position in each data block. –Parity bit The parity bit gives the number of bits that were set to the value 1 in the group of corresponding bit locations of the data blocks. For example, if bit 1024 has a parity 0, then an even number of bits where set 1 at bit position Otherwise its value must be 1.

ICOM 6005Dr. Manuel Rodriguez Martinez9 Parity bits Consider bytes: –b1 = , b2 = , b3 = If we take the XOR these bytes we get this byte has the parity for all bits in b1, b2, b3 Notice the following: –For bit position 0, the parity is 1,meaning an odd number of bits have value 1 for bit position 0. –For bit position 1, the parity is 0, meaning an even number of bits have value 1 for bit position 1

ICOM 6005Dr. Manuel Rodriguez Martinez10 Redundancy as blocks of parity bits For each corresponding data block compute and store a parity block that has the parity bits corresponding to the bit location in the data blocks. Disk Array Controller Bus Disk 0 Block 0 Disk 1 Block 0 Disk 2 Block 0 Disk 3 Block 0 Check Disk 0 Block 0

ICOM 6005Dr. Manuel Rodriguez Martinez11 Reliability groups in disk array Organize the disks in the array into groups called reliability groups. Each group has: –1 or more data disks, where the data blocks are stored –0 or more check disks where the blocks of parity bits are stored If a data disk fails, then the check disk(s) for its reliability group can be used to recover the data lost from that disk. There is a recovery algorithm that works for any failed disk m in the disk array. Can recover from up to 1 disk failure.

ICOM 6005Dr. Manuel Rodriguez Martinez12 Recovery algorithm Suppose we have an array of N disks, with M check disks (in this case there is one reliability group). Suppose disk p fails. We buy a replacement and then we can recover the data as follows. For each data block k on disk p: –Read data blocks k on every disk r, with r != p –Read parity block k from its check disk w –For each bit position i in block k of disk p: Count number of bits set 1 at bit i in each block coming from a disk other than p. Let this number be j If j is odd, and parity bit is 1 then bit position i is set to 0 If j is even, and parity bit is 0 then bit position i is set to 0 Else, bit position i is set to 1

ICOM 6005Dr. Manuel Rodriguez Martinez13 Recovery algorithm: Example Suppose we have an array of 5 disks, with 1 check disk (in this case there is one reliability group). Suppose disk 1 fails. We buy a replacement and then we can recover the data as follows. For each data block k on disk 1: –Read data blocks k on disks 0, 2, 3, 4, –Read parity block k on check disk 0 –For each bit position i in block k of disk 1: Count number of bits set 1 at bit I in each block k coming from disks 0, 2, 3, 4. Let this number be j If j is odd, and parity bit is 1 then bit position i is set to 0 If j is even, and parity bit is 0 then bit position i is set to 0 Else, bit position i is set to 1

ICOM 6005Dr. Manuel Rodriguez Martinez14 RAID Organization RAID: –Originally: Redundant Array of Inexpensive Disks –Now: Redundant Array of Independent Disks RAID organization combines the ideas of striping, redundancy as parity bits, and reliability groups. RAID system has one or more reliability groups –For simplicity we shall assume only one group … RAID systems can have various number of check disks for reliability groups, depending on the RAID level that is chosen for the system. Each RAID level represent a different tradeoff between storage requirements, write speed and recovery complexity.

ICOM 6005Dr. Manuel Rodriguez Martinez15 RAID Analysis Suppose we have a disk array with 4 data disks. Let’s analyze how many check disks we need to build a RAID with 1 reliability group of 4 data disks plus the check disks. Note: Effective space utilization is a measure of the amount of space in the disk array that is used to store data. It is given as a percentage by the formula:

ICOM 6005Dr. Manuel Rodriguez Martinez16 RAID Level 0 RAID Level 0: Non-redundant Uses data striping to distribute data blocks, and increase maximum disk bandwidth available. –Disk bandwidth refers to the aggregate rate of moving data from the disk array to the main memory. Ex. 200MB/sec Solution with lowest cost, but with little reliability. Write performance is the best since only 1 block is written in every write operation, and the cost is 1 I/O. Read performance is not the best, since a block can only be read from one site. Effective space utilization is 100%

ICOM 6005Dr. Manuel Rodriguez Martinez17 RAID Level 1 RAID Level 1: Mirrored Each data block is duplicated: –original copy + mirror copy No striping! Most expensive solution since it requires twice the space of the expected data set size. Every write involves two writes (original + copy) –cannot be done simultaneously to prevent double corruption –First write on data disk, then on copy at mirror disk Reads are fast since a block can be fetched from –Data disk –Mirror disk

ICOM 6005Dr. Manuel Rodriguez Martinez18 RAID Level 1 (cont…) In RAID Level 1, the data block can be fetched from the disk with least contention. Since we need to pair disks in groups of two (original + copy), the space utilization is 50%, independent on the amount of disks. RAID Level 1 is only good for small data workloads where the cost of mirroring is not an issue.

ICOM 6005Dr. Manuel Rodriguez Martinez19 RAID Level 0+1 RAID Level 0+1: Striping and Mirroring –Also called RAID Level 10 Combines mirroring and striping. Data is striped over the data disks. –Parallel I/O for high throughput (full disk array bandwidth) Each data disk is copied into a mirror disk Writes require 2 I/Os – (original disk + mirror disk) Blocks can be read from either original disk or mirror disk –Better performance since more parallelism can be achieved. –No need to wait for busy disk, just go to its mirror disk!

ICOM 6005Dr. Manuel Rodriguez Martinez20 RAID Level 1+0 (cont…) Space utilization is 50% (half data and half copies) RAID Level 1+0 is better than RAID 1 because of striping. RAID Level 1+0 is good for workloads with small data sets, where cost of mirroring is not an issue. Also good for workloads with high percentages of writes, since a write is always 2 I/Os to unloaded disks (specially the mirrors).

ICOM 6005Dr. Manuel Rodriguez Martinez21 RAID Level 2 RAID Level 2: Error-Correcting Codes Uses striping with a 1-bit striping unit. Hamming code for redundancy in C check disks. –Can indicate which disk failed –Make number of check disk grow logarithmically with respect to the number of data disks. (???) Read is expensive since to read 1 bit we need to read 1 physical data block, the one storing the bit. Therefore, to read 1 logical data block from the array we need to read multiple physical data blocks from each disk to get all the necessary bits.

ICOM 6005Dr. Manuel Rodriguez Martinez22 RAID Level 2 (Cont…) Since we are striping with 1-bit units, if we have an array with m data disks, then m reads for bits will require 1 block from each disk, for a total of m I/Os. Therefore, reading 1 logical data block from the RAID will require reading at least m blocks, and therefore the cost will be at least m I/Os. Level 2 is good for request of large contiguous data blocks since the system will fetch physical blocks that will have the require data. Level 2 is bad for request of small data since the I/Os will be wasted in fetching just a bits and throwing away the rest.

ICOM 6005Dr. Manuel Rodriguez Martinez23 RAID Level 2 (Cont…) Writes are expensive with Level 2 RAID. A write operation on N data disks involves: –Reading at least N data blocks into the memory. –Reading C check disks –Modifying the N data blocks with the new data. –Modifying C check disks to update hamming codes –Writing N + C blocks to the disk array. This is called a read-modify-write cycle. Level 2 has better space utilization than Level 1.

ICOM 6005Dr. Manuel Rodriguez Martinez24 RAID Level 3 Raid Level 3: Bit-Interleaved Parity Uses striping with a 1-bit striping unit. Does not uses Hamming codes, but simply computes bit parity. –Disk controller can tell which disk has failed. Only need 1 check disk to store parity bits of the data disks in the array. A RAID Level 3 system will have N disks, where N-1 are data disks, and one is the check disk.

ICOM 6005Dr. Manuel Rodriguez Martinez25 RAID Level 3 (Cont…) Reading or writing a logical data block in a RAID Level 3 involves reading at least N-1 data blocks from the array. Writing requires a read-modify-write cycle.

ICOM 6005Dr. Manuel Rodriguez Martinez26 RAID Level 4 RAID Level 4: Block-Interleaved Parity Uses striping with a 1-block striping unit. –Logical data block is the same as physical data block. Computes redundancy as parity bits, and has 1 check disk to store parity bits for all corresponding block in the array. Reads can be run in parallel –Works well for both large and small data requests. Writes require read-modify-write cycle but only involve: –Data disk for block being modified (target block k) –Check disk (parity block for block k)

ICOM 6005Dr. Manuel Rodriguez Martinez27 RAID Level 4 (Cont…) The parity block k is updated incrementally to avoid reading all data blocks k from all data disks. –Only need to read parity block k and block k to be modified –Parity is computed as follows: New parity block = ((Old block XOR New block) XOR Old parity block) In this way Read-modify-write cycle avoids reading the data block in each disk to compute the parity. Read-modify-write cycle only performs 4I/Os (2 reads and 2 writes of the target data block and parity block) Space utilization is the same as RAID Level 3.

ICOM 6005Dr. Manuel Rodriguez Martinez28 RAID Level 4 (Cont…) In RAID Level 3 and 4, the check disk is only used in writing operations. It does not help with the reads. Moreover, the check disk becomes a bottleneck since it must participate in every write operation.

ICOM 6005Dr. Manuel Rodriguez Martinez29 RAID Level 5 RAID Level 5: Block-Interleaved Distributed Parity Uses striping with a 1-block striping unit. Redundancy is stored as blocks of parity bits, but the parity blocks are distributed over all the disks in the array. –Every disk is both a data disk and a check disk. Best of both worlds: –Fast reads –Fast writes Reads are efficient since they can be run in parallel.

ICOM 6005Dr. Manuel Rodriguez Martinez30 RAID Level 5 (Cont…) Writes still involve a read-modify-write cycle But the cost of writing the parity block is lowered by scattering them over all the disks in the array. –Remove the contention at one check disk RAID Level 5 is a good general purpose system –Small reads –Large reads –Intensive writes Space utilization is equivalent to Level 3 and 4 since there is 1 disk worth of parity blocks in the system!

ICOM 6005Dr. Manuel Rodriguez Martinez31 RAID Level 6 RAID Level 6: P+Q Redundancy RAID Levels 2-4 only recover from 1 disk failure. In a large disk array, there is a high probability that two disk might fail simultaneously. RAID Level 6 provides recovery from 2 disk failures. Uses striping with 1-block striping unit Redundancy is stored as parity bits and Reed- Solomon codes. –Require two check disks for the data disks in the array

ICOM 6005Dr. Manuel Rodriguez Martinez32 RAID Level 6 (Cont…) Reads are like in RAID Level 5. Writes involve a read-modify-write cycle that involves 4 I/Os: –1 for data block –1 for parity block –2 for Reed-Solomon Codes