Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

RAID Redundant Arrays of Independent Disks Courtesy of Satya, Fall 99.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
RAID Redundant Array of Inexpensive Disks Presented by Greg Briggs.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
RAID Redundant Array of Independent Disks
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
CSE521: Introduction to Computer Architecture Mazin Yousif I/O Subsystem RAID (Redundant Array of Independent Disks)
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
RAID Systems CS Introduction to Operating Systems.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Architecture of intelligent Disk subsystem
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
RAID COP 5611 Advanced Operating Systems Adapted from Andy Wang’s slides at FSU.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Three-Dimensional Redundancy Codes for Archival Storage J.-F. Pâris, U. of Houston D. D. E. Long, U. C. Santa Cruz W. Litwin, U. Paris-Dauphine.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
Auxiliary Memory Magnetic Disk:
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
Part IV I/O System Chapter 12: Mass Storage Structure.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
CS Introduction to Operating Systems
A Case for Redundant Arrays of Inexpensive Disks (RAID) -1988
A Case for Redundant Arrays of Inexpensive Disks (RAID)
Multiple Platters.
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
Dave Eckhardt Disk Arrays Dave Eckhardt
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Disk Arrays COEN 180

Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many IO per seconds. Data spread across more drives is more accessible. JBOD: Just a Bunch Of Disks

Large Storage Systems Principal difficulty: Reliability Data needs to be stored redundantly: Mirroring, Replication Simple Expensive (double, triple, … storage costs) Good performance Erasure correcting codes Complex Save storage Moderate performance

Large Storage Systems Mirrored Disks Used by Tandem 1970 – 1997, bought by Compact Nonstop architecture Used redundancy (CPU, storage) for fail-over capacity Data is replicated on both drives Performance: Writes as fast as single disk model Reads: Slightly faster, since we can serve the read from the drive with best expected service time.

Disk Performance Modeling Basics Service Time: Time to satisfy a request if system is otherwise idle. Response Time: Time to satisfy a request at a given system load. Response time = service time + waiting time Utilization: Time system is busy

Disk Performance Modeling Basics M/M/1 queue single server Assume Poisson arrival, exponential service time Arrival rate Service time S Utilization U = S (Littles law) Response time R Determine R by: R = S + UR R= S/(1-U) = S/(1- S) S=1 hence U = R

Disk Performance Modeling Basics Need to determine service time of disk request. Service time = seek time + latency + transfer time Industrial (but wrong) determination: Seek time = time to travel one third of a disk. Why?

Disk Performance Modeling Basics Assume that head position is randomly on any track. Assume that target track is another random track. Given x [0,1], calculate D(x) = distance of random point in [0,1] from x.

Disk Performance Modeling Basics Given x [0,1], calculate D(x) = distance of random point in [0,1] from x.

Disk Performance Modeling Basics Now calculate the average distance from a random point to a random point in [0,1]

Disk Performance Modeling Basics Is Average Seek Time = Seek Time for Average Distance? NO: Seek Time is not linearly dependent on average seek time. Seek Time consists acceleration cruising (if seek distance is long braking exact positioning

Disk Performance Modeling Basics Is Average Seek Time = Seek Time for Average Distance? Practical measurements suggests Seek time depends on the seek distance roughly as a square-root of distance

Disk Performance Modeling Basics Rules of Thumb Keep utilization of disks between 50% and 80%.

Disk Arrays Dealing with reliability RAID Redundant array of inexpensive (independent) disks RAID Levels RAID Level 0: JBOD (striping) RAID Level 1: Mirroring RAID Level 2: Encodes symbols (bytes) with a Hamming code. Stores a bit per symbol on different disk. Not used in practice.

Disk Arrays Dealing with reliability RAID Levels RAID Level 3: Encodes symbols (bytes) with the simple parity code. Breaks a file up into n stripes. Calculates parity stripes. Stores all n + 1 stripes on n + 1 disks.

Disk Arrays Dealing with Reliability RAID Levels RAID Level 4 Maintains n data drives. Files are stored completely on one drive. Or perhaps in stripes if files become very large. Additional drive storing the byte-wise parity of the disk arrays. Parity Data

Disk Arrays Level 4 RAID Uneven load of parity drive and data drives

Disk Arrays Dealing with Reliability RAID Level 5 No dedicated parity disk Data in blocks Blocks in parallel positions on disks form reliability stripe. One block in each reliability stripe is the parity of the others. No performance bottleneck

Disk Arrays Dealing with Reliability RAID Level 6 Like RAID Level 5, but every stripe has two parity blocks Lower write performance 2-failure resilience RAID Level 7 Proprietary name for a RAID Level 3 with lots of caching. (Marketing bogus)

Disk Arrays Disk Array Operations Reads: Directly from data in RAID Level 3-6 Writes: Large Writes: Writes to all blocks in a single reliability stripe. Calculate parity from data and write it. Small Writes: Need to maintain parity. Option 1: Write data, then read all other blocks in the stripe and recalculate parity. Option 2: Read old data, then overwrite it. Calculate the difference (XOR) between old and new data. Then read old parity, XOR it with the result of the previous operation and overwrite with it the parity block.

Disk Arrays Disk Array Operations Reconstruction (RAID Level 4-5): Systematically: Reconstruct only lost data. Read all surviving blocks in the reliability stripe. Calculate its parity. This is the lost data block. Write data block in place of parity. Out of order reconstruction for data that is being read.

Disk Arrays Performance Analysis Assume that read and write service times are the same. seek latency (transfer) Write operation involves the read-modify operation. About twice as long as read / write service time seek latency transfer two latencies transfer

Disk Arrays Performance Analysis Level 4 RAID Offered read load r Offered write load w n disks Utilization at data disk: r S /(n – 1) + w 2S/(n – 1) Utilization at parity disk: w 2S Equal utilization only if r = 2(n – 2) w

Disk Arrays Performance Analysis Level 4 RAID Offered load. Assume only small writes. Assume read /write ratio of Utilization at data disk S/n Utilization at write disk (1- )2 S parity disk data disk Utilization Offered Load (IO/sec) Parameters: 4+1 layout 70% reads Service time 10 msec

Disk Arrays Performance Analysis RAID Level 5 Offered load Read ratio n disks Read Load S/n Write Load (1- ) 4S/n Every write leads to two read-modify-write ops.

Disk Arrays Level 4 RAID vs Level 5 RAID Without parity disk (JBOD) RAID Level 5 Parameters: 4+1 layout 70% reads Service time 10 msec parity drive data drive

Disk Arrays Performance Small writes are expensive. Parity logging (Daniel Stodolsky, Garth Gibson, Mark Holland) Write operation: Read old data, Write new data, Send XOR to a parity log file. Whenever parity log file becomes to big, process it by updating parity information.

Disk Arrays Reliability Accurately given by the probability of failure at every moment in time.

Disk Arrays Reliability Often given by Mean Time To Data Loss MTTDL Warning: MTTDL numbers can be deceiving. Red line is more reliable during Design Life, but has lower MTTDL

Disk Arrays Use Markov Model to model system in various states. States describe system. Assumes constant rates of transitions. Transitions correspond to: component failure component repair

Disk Arrays One component system Failure State (absorbing) Initial State MTTDL = MTTF = 1/

Disk Arrays Two component system without repair Failure State (absorbing) Initial State: 2 components working component working, one failed

Disk Arrays Two component system with repair Failure State (absorbing) Initial State: 2 components working component working, one failed

Disk Arrays How to calculate MTTF Start with original Markov model. Remove failure state. Replace transition(s) to failure state with failure transitions to initial state. This models a meta-system where we replace a failed system immediately with a new one. Now calculate the steady-state solution of the Markov model. It typicallyhas become ergodic. Use this to calculate the average rate of a failure transition being taken. This gives the MTTF.

Disk Arrays One component system Initial State System in initial state all the time. Failure transition taken at rate. Loss rate L =. MTTDL = 1/L = 1/

Disk Arrays Two component system without repair Initial State: 2 components working component working, one failed Steady-state solution Let x be the probability to be in state 2, y the probability to be in state 1. Then: Inflow into state 2 = Outflow from state 2: 2 x = y Total sum of probabilities is 1: x+y = 1.

Disk Arrays Two component system without repair Initial State: 2 components working component working, one failed Steady-state solution 2 x = y x+y = 1. Solution is: x = 1/3, y = 2/3. Loss rate is L = (2/3). MTTF = 1/L = 1.5 (1/ ). (1.5 times better than before).

Disk Arrays Two component system with repair Initial State: 2 components working component working, one failed

Disk Arrays RAID Level 4/5 Reliability Failure State (absorbing) Initial State: n disks n nn-1 (n-1) n – 1 disks

Disk Arrays RAID Level 6 Reliability Initial State: n disks n n n-1 (n-2) n – 1 disks Failure State (absorbing) (n-1) n-2 2 n – 2 disks

Disk Arrays Sparing Create more resilience by adding a hot spare. Failover to hot spare reconstructs and replaces contents of the lost disk on spare disk. Distributed sparing (Menon et al.): Distribute the spare space throughout the disk array.