Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.

Slides:



Advertisements
Similar presentations
Storage Management Lecture 7.
Advertisements

Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
What is RAID Redundant Array of Independent Disks.
Destage Algorithms for Disk Arrays with Nonvolatile Caches IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, JANUARY 1998 Anujan Varma, Member, IEEE, and.
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID A RRAYS Redundant Array of Inexpensive Discs.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Redundant Array of Independent Disks
CSCE430/830 Computer Architecture
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
2P13 Week 11. A+ Guide to Managing and Maintaining your PC, 6e2 RAID Controllers Redundant Array of Independent (or Inexpensive) Disks Level 0 -- Striped.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 36: Chapter 6 Today’s topic –RAID 1. RAID Redundant Array of Inexpensive (Independent) Disks –Use multiple smaller disks (c.f. one large disk)
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
RAID Technology. Use Arrays of Small Disks? 14” 10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
EECC551 - Shaaban #1 Lec # 13 Winter Magnetic Disk CharacteristicsMagnetic Disk Characteristics I/O Connection StructureI/O Connection Structure.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
RAID Systems CS Introduction to Operating Systems.
By : Nabeel Ahmed Superior University Grw Campus.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
RAID Shuli Han COSC 573 Presentation.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Redundant Array of Independent Disks
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Parity Logging O vercoming the Small Write Problem in Redundant Disk Arrays Daniel Stodolsky Garth Gibson Mark Holland.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
The concept of RAID in Databases By Junaid Ali Siddiqui.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Part IV I/O System Chapter 12: Mass Storage Structure.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
RAID.
Disks and RAID.
Vladimir Stojanovic & Nicholas Weaver
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Indexing and Hashing Basic Concepts Ordered Indices
TECHNICAL SEMINAR PRESENTATION
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
RAID RAID Mukesh N Tekwani April 23, 2019
Storage Management Lecture 7.
Seminar on Enterprise Software
Presentation transcript:

Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson

P urpose of Parity Declustering Parity Declustering is designed to balance cost against data reliability and performance during failure recovery. It improves on standard parity organizations by reducing the additional load on surviving disks during the reconstruction of a failed disk’s contents. And it yields higher user throughput during recovery, and/or shorter recovery time.

Declustered Parity Layout RAID is a special case of declustered Parity Layout, in RAID G=C

Definition of some terms Data unit is the minimum amount of contiguous user data allocated to one disk before any data is allocated to any other disk. Parity unit is a block of parity information that is the size of a data stripe unit. Parity stripe is the set of data units over which a parity unit is computed, plus the parity unit itself. e.g. An S in the layout is either a data unit or parity unit. Four S’s together is called parity stripe.

Example declustered layout Di,j represents one of the four data units in parity stripe number i,and Pi represents the parity unit for parity stripe i. Declustering ratio is defined as α=(G-1)/(C-1). It indicates the fraction of each surviving disk that must be read during the reconstruction of a failed disk. For example, D1.0, D1.1, D1.2, P1 together is a parity stripe. So G=4, C = 5, α = 75% In RAID5, α = 100%

Data layout strategy How to layout data in parity declustered disk arrays? Our goals: 1.Single failure correcting: No two stripe units in the same parity stripe may reside on the same physical disk. 2. Distributed reconstruction: When any disk fails, its user workload should be evenly distributed across all other disks in the array. 3. Distributed parity: Parity information should be evenly distributed across the array. 4.Efficient mapping: The function mapping a file system’s logical block address to physical disk addresses is efficient. 5.Large write optimization: Don’t need 4 access operation. 6.Maximal parallelism: Read of contiguous data can have max parallelism.

Layout strategy The distributed reconstrucition criterion requires that the same number of unites be read from each surviving disk during the reconstruction of a failed disk. This will be achieved if the number of times that a pair of disks contain stripe units from the same parity stripe is constant across all pairs of disks.Such layout can be implemented by balanced incomplete block design. A block design is an arrangment of v distinct objects into b tuples, each containing k elements, such that each object appears in exactly r tuples, and each pair of objects appears in exactly λ p tuples.

Complete block design It’s simpler than incomplete block design. A block design is called a complete block design which includes all combinations of exactly k distinct elements selected from the set of v objects. The number of these combinations is.

Example complete block design In this example, we arrange 5 distinct objects(numbers) into 5 tuples, such that each object appears in exactly 4 tuples, and each pair of objects appears in exactly 3 tuples. e.g. number 0 appears in 4 tuples, (0,1) appears in tuple 0, 1, 2. (1,4) appears in tuple 1,2,4. It’s complete because it includes all combinations of exactly 4 distinct elements selected from the set of 5 elements.

Layout with complete block design Tuple 0:0,1,2,3 Tuple 1:0,1,2,4 Tuple 2:0,1,3,4 Tuple 3:0,2,3,4 Tuple 4:1,2,3,4 If we associates disks with objects(numbers) and parity stripes with tuples. We get Although it’s complete, it violates the design goals 3. It doesn’t distributed parity evenly. Parity on disk 4 is the bottleneck for write operation.

We duplicate previous layout G times, assigning parity to a different element of each tuple in each duplication, then we get above full block design table.

Problem with full block design The size of the block design table may be very large. So it’s not guaranteed that the layout will have an efficient mapping. But it’s required by our fourth criterion. Our fifth and sixth criteria depend on the data mapping function used by higher levels of software. Large-write opimization is guaranteed. But parallel read cannot achieve maximal parallelism. That’s, not all sets of five adjacent data units from the mapping, D0.0, D0.1, D0.2, D1.0, D1.1, D1.2, D2.0 etc., are allocated on five different disks. Reading five adjacent data units starting at data unit 0 causes disk 0 and 1 to be used twice, and disk3 and 4 not at all.

Problem with full block design In addition, in the case the number of disks in an array( C ) is large relative to the number of stripe units in a parity stripe( G), the full block design cannot be implemented. e.g. a 41 disk array with 20% parity overhead(G =5) allocated by a complete block design will have about 3,750,000 tuples. It cannot be implemented, because even large disks rarely have more than a few million sectors.

Balanced Incomplete block design Our goal is to find a small block design on C objects with a tuple size of G. Hall presents a list containing a large number of known block designs, and states that, within the bounds of this list, a solution is given in every case where one is known to exit Sometimes a balanced incomplete block design with the required parameters may not be known, we resort to choosing the closest feasible design point; that’s the point which yield a value of α closest to what is desired.

Balanced Incomplete block design We can choose the closest feasible design point from the subset of Hall’s list of design.

Average reponse time These two figure show that, except for writes with α =0.1, fault-free performance is essentially independent of parity declustering. It may lead to slightly better average response time in the degraded rather than fault-free mode.(A user write may induces only one write access)

Reconstruction Performance Higher user performance during recovery compared to RAID 5. Simplest Reconstruction involves a single sweep through the contents of a failed disk. For each stripe unit on a replacement disk, the reconstruction process reads all other stripe units in the corresponding parity stripe and computes an exclusive-or on these units. The resulting unit is then written to the replacement disk. The time needed to entirely repair a failed disk is equal to the time needed to replace it in the array plus the time needed to reconstruct its entire contents and store them on the replacement. Continuous-operation system require data availability during reconstruction.

Four reconstruction algorithm Minial-update algorithm: No extra work is sent; whenever possible user writes are folded into the parity unit, and neither reconstruction optimization is enabled User-writes algorithm:All user writes explicitly targeted at the replacement disk are sent directly to the replacement. Redirection of reads:user accesses to data that has already been reconstructed are serviced by (redirected to ) the replacement disk, rather than invoking on-the-fly reconstruction as they would if the data were not yet available. Piggybacking of writes:User reads that cause on-the-fly reconstruction also cause the reconstructed data to be written to the replacement disk. This is targeted at speeding reconstruction. (Redirection of reads and Piggybacking of writes are proposed by Muntz and Lui.)

Comparison of four algorithm The testing result showed that Muntz and Lui’s redirection of reads and redirect+piggyback don’t consistently decrease reconstruction time relative to the simpler algorithm. The reason is that loading the replacement disk with random work penalizes the reconstruction writes to this disk more than off-loading benefits the surviving disks unless the surviving disks are highly utilized. Even a small amount of random load imposed on the replacement disk may greatly increase its average access times because reconstruction writes are sequential and don’t require long seeks.

Conclusion We demonstrated: Parity declustering, a strategy for allocating parity in a single-failure-correcting redundant disk array that trades increased parity overhead for reduced user-performance degradation during on-line failure recovery, can be effectively implemented in array-controlling software. Using block design to map parity stripes onto a disk array insures that both the parity update load and the on- line reconstruction load is balanced over all disks in the array.

Questions 1.What’s parity declustering? 2. What’s the data layout goals? 3. What’s the disadvantage of complete block design?