RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
CSCE430/830 Computer Architecture
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
CSE521: Introduction to Computer Architecture Mazin Yousif I/O Subsystem RAID (Redundant Array of Independent Disks)
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
RAID Systems CS Introduction to Operating Systems.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Storage Systems CSE 598d, Spring 2007 Lecture 5: Redundant Arrays of Inexpensive Disks Feb 8, 2007.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
RAPID-Cache – A Reliable and Inexpensive Write Cache for Disk I/O Systems Yiming Hu Qing Yang Tycho Nightingale.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Copyright © Curt Hill, RAID What every server wants!
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
The concept of RAID in Databases By Junaid Ali Siddiqui.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CSE 451: Operating Systems Spring 2010 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center 534.
CS Introduction to Operating Systems
CMPE Database Systems Workshop June 16 Class Meeting
HP AutoRAID (Lecture 5, cs262a)
Multiple Platters.
Vladimir Stojanovic & Nicholas Weaver
CSE 451: Operating Systems Spring 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan Allen Center.
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
UNIT IV RAID.
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID RAID Mukesh N Tekwani April 23, 2019
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Seminar on Enterprise Software
Presentation transcript:

RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson, U. C. Berkeley

The seven RAID organizations Why RAID-1, RAID-3 and RAID-5 are the most interesting The small write problem occurring with RAID-5 –Possible solutions Review of actual implementations Highlights

Original Motivation Replacing large and expensive mainframe hard drives (IBM 3310) by several cheaper Winchester disk drives Will work but introduce a data reliability problem: –Assume MTTF of a disk drive is 30,000 hours –MTTDL for a set of n drives is 30,000/ n n = 10 means MTTDL of 3,000 hours

Today’s Motivation “Cheap” SCSI hard drives are now big enough for most applications We use RAID today for –Increasing disk throughput by allowing parallel access –Eliminating the need to make disk backups Disk drives are too big to be backed up in an efficient fashion

RAID 0 Spread data over multiple disk drives Advantage –Simple to implement –Fast Disadvantage –Very unreliable RAID 0 with n disks has MMTF equal to 1/ n of MTTF of a single disk

RAID 1 Mirroring –Two copies of each disk block on two separate drives Advantages –Simple to implement and fault-tolerant Disadvantage –Requires twice the disk capacity of normal file systems

RAID 2 Instead of duplicating the data blocks we use an error correction code Very bad idea because disk drives either work correctly or do not work at all –Only possible errors are omission errors –We need an omission correction code A parity bit is enough to correct a single omission

RAID 2

RAID 3 Requires N+1 disk drives –N drives contain data 1/N of each data block on each drive Block b[k] now partitioned into N fragments b[k,1], b[k,2],... b[k,N] –Parity drive contains exclusive or of these N fragments p[k] = b[k,1]  b[k,2] ...  b[k,N]

RAID 3 A stripe consists of a single block

RAID 4 Requires N+1 disk drives –N drives contain data (individual blocks) –parity drive contains exclusive or of the N blocks in stripe p[k] = b[k]  b[k+1] ...  b[k+N-1]

RAID 4 A stripe now contains multiple blocks

RAID 5 Single parity drive of RAID-4 is involved in every write –Will limit parallelism RAID-5 distribute the parity blocks among the N+1 drives

RAID 5

The small write problem Specific to RAID 5 Happens when we want to update a single block –Block belongs to a stripe –How can we compute the new value of the parity block... b[k+1] p[k] b[k+2]b[k]

First solution Read values of N-1 other blocks in stripe Recompute p[k] = b[k]  b[k+1] ...  b[k+N-1] Solution requires –N-1 reads –2 writes (new block and parity block)

Second solution Assume we want to update block b[m] Read old values of b[m] and parity block p[k] Compute p[k] = new b[m]  old b[m]  old p[k] Solution requires –2 reads (old values of block and parity block) –2 writes (new block and parity block)

RAID 6 Each stripe has two redundant blocks: –P + Q redundancy Advantage –Much higher reliability Disadvantage: –Costlier updates

PERFORMANCE COMPARISON Focus on system throughput Measure it against system cost expressed in number of disk drives

Small read Small write Large read Large write RAID RAID 11½1½ RAID 31/G (G-1)/G RAID 51max(1/G, 1/4)1(G-1)/G RAID 61max(1/G, 1/6)1(G-2)/G Throughputs per dollar

Discussion Performance per dollar of RAID 3 is always less or equal to that of a RAID 5 system For small writes, –RAID 3, 5 and 6 are equally cost -effective at small group sizes –RAID 5 and 6 are better for large group sizes

RELIABILITY Theoretical reliability is very high –Especially for RAID 6 In practice, –System crashes can cause parity inconsistencies – Uncorrectable bit errors can happen during repair times (one in bits) – Correlated disk failures happen!

Impact of parity inconsistencies Happen when system crashes during an update –New data were written but parity block was not updated Has little impact on RAID 3 (bad block) Significant impact on RAID 5 Bigger impact on RAID 6 –Same as simultaneous failures of both P& Q blocks

Discussion System crashes and unrecoverable bit errors have biggest effect on MTTDL P + Q redundant disks protect against correlated disk failures and unrecoverable bit errors –Still vulnerable to system crashes –Should use NVRAM for write buffers

IMPLEMENTATION CONSIDERATIONS Must prevent users from reading corrupted data from a failed disk –Mark blocks located on the failed disk invalid –Mark reconstructed blocks valid To avoid regenerating all parity blocks after a crash –Must keep track of parity consistency and store it in stable storage

Discussion Maintaining consistent/inconsistent state information for all parity blocks is a problem for software RAID systems – Rarely have NVRAM If updates are local, keep track in stable storage of a small number of parity blocks that could be inconsistent Otherwise use group commits

SMALL WRITES REVISITED (I) Asynchronous writes can help if future updates overwrites previous ones Caching recently read blocks can help if old data necessary to compute new parity are in cache Caching recently written parity can also help –Parity is computer over many logically consecutive blocks

SMALL WRITES REVISITED (II) Floating Parity –Make parity update cheaper, by putting parity in a rotationally-nearby unallocated block –Requires directories for locations of nearby unallocated blocks –Should be implemented at controller level

SMALL WRITES REVISITED (III) Parity Logging : – Defers cost of parity update by logging XOR of old data and new data – Replay log file later to update parity –Reduces update cost to two blocking writes ( if we have in the old data block in RAM ) –It works because nearly all storage systems have idle times.

Declustered Parity (I) Addresses issue of high read cost when recovering from a failure a failure Looking at example: –A failure of disk 2 generates additional read requests to disks 0, 1 and 3 every time a read request is made for a block that was stored on disk 2

Declustered Parity (II)

Declustered Parity (III) With declustered parity: –Same disk belongs to different groups Looking at example: –Disk 2 is in groups (0,1, 2, 3), (4, 5, 2, 3) and so on –Additional read requests caused by a failure of disk 2 are now spread among all remaining disks

Declustered Parity (IV) Extra workload caused by the failure of a disk is now shared by all remaining disks Sole Disadvantage: –A failure of any two disks will now result in data loss –In a standard set of RAID array, the two failed disks had to be in the same array

Exploiting On-Line Spare Disks Distributed Sparing: –No dedicated spare disk –Each disk has 1/(N+1) of its capacity reserved Parity Sparing: –Also spreads the spare space but uses it to sore additional party blocks Can split groups into half groups More …

Distributed Sparing S0, S1 and S2 represent spare blocks

CASE STUDIES TicketTAIP AutoRAID –See presentation

TickerTAIP (I) Traditional RAID architectures have –A central RAID controller interfacing to the host and processing all I/O requests –Disk drives organized in strings –One disk controller per disk string (mostly SCSI)

TickerTAIP (II) Capabilities of RAID controller are crucial to the performance of RAID –Can become memory-bound –Presents a single point of failure –Can become a bottleneck Having a spare controller is an expensive proposition

TickerTAIP (III) Uses a cooperating set of array controller nodes Major benefits are: – Fault-tolerance – Scalability – Smooth incremental growth – Flexibility: can mix and match components

TickerTAIP (IV) Host interconnects Controller nodes

TickerTAIP ( V) A TickerTAIP array consists of: Worker nodes connected with one or more local disks through a bus Originator nodes interfacing with host computer clients A high-performance small area network : –Mesh based switching network ( Datamesh ) –PCI backplanes for small networks

TickerTAIP ( VI) Can combine or separate worker and originator nodes Parity calculations are done in decentralized fashion: –Bottleneck is memory bandwidth not CPU speed –Cheaper than having faster paths to a dedicated parity engine

CONCLUSION RAID original purpose was to take advantage of Winchester drives that were smaller and cheaper than conventional disk drives –Replace a single drive by an array of smaller drives Nobody does that anymore! Main purpose of RAID is to build fault-tolerant file systems that do not need backups