Thomas Schwarz, S.J. Qin Xin, Ethan Miller, Darrell Long, Andy Hospodor, Spencer Ng Summarized by Leonid Kibrik.

Slides:



Advertisements
Similar presentations
By Rakshith Venkatesh Outline What is RAID? RAID configurations used. Performance of each configuration. Implementations. Way.
Advertisements

Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
CSE 451: Operating Systems Spring 2012 Module 20 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570 ©
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Magnetic Disk Magnetic disks are the foundation of external memory on virtually all computer systems. A disk is a circular platter constructed of.
RAID Redundant Array of Independent Disks
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
0 秘 Type of NAND FLASH Discuss the Differences between Flash NAND Technologies: SLC :Single Level Chip MLC: Multi Level Chip TLC: Tri Level Chip Discuss:
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Disk Scrubbing in Large Archival Storage Systems Thomas Schwarz, S.J. 1,2 Qin Xin 1,3, Ethan Miller 1, Darrell Long 1, Andy Hospodor 1,2, Spencer Ng 3.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
RAID Systems CS Introduction to Operating Systems.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
By : Nabeel Ahmed Superior University Grw Campus.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
RAID Shuli Han COSC 573 Presentation.
1 Database Systems Storage Media Asma Ahmad 21 st Apr, 11.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
- Disk failure ways and their mitigation - Priya Gangaraju(Class Id-203)
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
Auxiliary Memory Magnetic Disk:
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections )
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
PERGAMUM: REPLACING TAPE WITH ENERGY EFFICIENT, RELIABLE, DISK-BASED ARCHIVAL STORAGE M. W. Storer K. M. Greenan E. L. Miller UCSC K. Vorugant Network.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
CS Introduction to Operating Systems
External Memory.
RAID Non-Redundant (RAID Level 0) has the lowest cost of any RAID
I/O System Chapter 5 Designed by .VAS.
Introduction I/O devices can be characterized by I/O bus connections
RAID Disk Arrays Hank Levy 1.
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2010 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
Disk Failures Disk failure ways and their mitigation
CSE 451: Operating Systems Winter 2004 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Thomas Schwarz, S.J. Qin Xin, Ethan Miller, Darrell Long, Andy Hospodor, Spencer Ng Summarized by Leonid Kibrik

outline Archive storage systems properties Disk structures Disk failure Data redundancy Disk scrubbing Disk power cycling Simulation resultes

Archive storage systems Large scale storage systems based on disks are becoming increasingly attractive for archive storage systems. Very large number of disks, store petabytes of data. Most of the disk are powered off between accesses to conserve power and extend disk lifetime. Massive Array of mainly Idle Disks (MAID)

Disk structure Data in disks are addressed in Blocks Each block contains one or more sectors Sectors usually contain 512 Bytes of data Each sector has error correcting code (ECC) bits

Preamble – used to sync R/W head ECC – usually reed-solomon code

Disk failure Many block defects are the result of: Imperfection in the machining of disk substrate Non-uniformity in the magnetic coating Contaminants within the head disk assembly Disk manufacture try to detect disk manufacturing defects during a self-scan and “map-out” defects blocks. (P-List) During a lifetime of a disk, additional blocks can be map-out (G- List) Driver only detects error after reading the affected block On multiple errors in a block the ECC can: Correct the errors Flag the read as unsuccessful Mis-correct the error (extremely rare)

Disk Failure Rates Most block failures are not related Device failure rates are specified by disk drive manufactures as MTBF (Mean Time Between Failures) The actual observed values depend in practice heavily on operation condition that are frequently worse than the manufactures implicit assumption. Errors may occur even if we do not access the disk.

Data redundancy In large systems, disk failure will become frequent =>we need some kind of a redundancy in storing the data Collect data into large reliability blocks, group m of these blocks in a redundancy group to which we add k parity blocks Parity blocks are calculated with an erasure correcting code Data is recoverable if we can access m out of the n = m + k blocks making up the redundancy group If error are not detected the data is in jeopardy

Disk scrubbing Disk scrubbing – reading all the data in a certain regain called a scrubbing block. (s-block) If a certain sector suffers failure, the internal ECC on the disk sector flags the sector a unreadable, but only when a sector is read. Periodically scrub an s- block by reading it into the drive buffer

Different scrubbing strategies Random scrubbing – scrub a s-block at random times, with a fixed mean time between scrubs Deterministic scrubbing – scrub a s-block at fixed time intervals. Opportunistic scrubbing – piggy-backs as much as possible on other disk operations to avoid additional power on cycles.

Power Cycling and Reliability Turning a disk on and off has significant impact on the reliability of the disk. Especially true for commodity disks that lack techniques used by more expensive laptop disks to keep the R/W head from touching the surface during power-down Disk manufacturers are reluctant to publish actual failure rates, because they depend strongly on how disks are operated Estimated analysis of Seagate data show that power cycling a disk is equivalent to running the drive for eight hours in terms of driver reliability

Simulation results 1PB archival data store. Disks have MTBF of 10 5 hours. 10,000 disk drives 10GB reliability blocks. ~1TB/day traffic

Simulation results Two redundancy schemes: Two-way mirroring RAID 5 The mean time to scrub of a single disk is set up to 3 time per year for random and deterministic schemes For the opportunistic scheme we scrub the disk no more then 3 times per year

Two-way Mirroring

RAID 5 redundancy scheme

Result analysis When no scrubbing is done there is a great deal of data loss Random scrubbing performs the worst of the 3 methods Opportunistic schema provides high reliability when data access is relatively frequent, but number of data losses are increased when data access is not infrequent For systems where data is infrequently accessed we must power disks on periodically to scrub them in addition to doing scrubbing when the drive is accessed normally.

Conclusion When dealing with a systems that contains large number of disks, disk failure is likely to accrue For a redundancy method to be effective, error detection needs to happen as soon as possible. Disk scrubbing is an essential technique in a large storage system Opportunistic scrubbing is an attractive scheme that allows detecting of error without unnecessary power cycling disk drives.