Presentation on theme: "A. Ian Vogelesang Tools Competency Center (TCC) Hitachi Data Systems"— Presentation transcript:
1A. Ian Vogelesang Tools Competency Center (TCC) Hitachi Data Systems Hitachi Data System’s WebTech SeriesRAID ConceptsA. Ian VogelesangTools Competency Center (TCC)Hitachi Data SystemsINSTRUCTIONS ON HOW TO COMPRESS YOUR POWERPOINTS TO THE OPTIMUM SIZE FOR SCREEN DISPLAY in Powerpoint 2003Highlight any picture in the presentation (even the title image in the title master)Right click on the picture and select Format PictureUnder the picture tab there is a button called Compress. Click on the button.In the Compress Pictures dialog box select Apply to All Pictures in Document and Change Resolution to Web/ScreenClick OK. Any possible picture optimization will be applied in a few seconds.Click OK to close the Format Picture dialog.Use Save As to save the File under the same or a different filename
2Hitachi Data Systems WebTech Educational Seminar Series RAID ConceptsWho should attend:Systems and Storage AdministratorsStorage Specialists & ConsultantsIT Team LeadSystem and Network ArchitectsIT StaffOperations and IT ManagersOthers who are looking for storage management techniques
3How RAID type impacts cost The factors we will examineDisk drive capacity vs. disk drive IOPS capabilityThe impact of RAID level on disk drive activityTopics to cover along the wayRAID concepts (RAID-1 vs. RAID-5 vs. RAID-6).The 30-second “elevator pitch” on data flow through the subsystem.Conclusion will beThat I/O access pattern very often is the determining factor, rather than storage capacity in GB.
4Growth in recording density drives $/GB Perpendicular40% /yrArealDensityProgress6Recording1E+61051E+5101st GMR Head1E+4104100% CGR31E+3101st MR Head260% CGR1E+210Areal Density Megabits/in21E+11025% CGR1E+01-11E-110-21E-210IBM RAMAC (First Hard Disk Drive)10-31E-360708090100110Production Year
5Areal density growth will continue Thermally-assisted writing2,000-15,000Areal Density (Gb/in2)1,500-4,000Bit Patterned MediaPerpendicularLongitudinal10,000 Gb/in2 = 10 Tb/in250 TB 3.5-inch drive12 TB 2.5-inch drive1 TB 1-inch drive~ 60 M fold increase50 Years> 50 Million increase in areal densityTime200620112014
6Here’s the problemDrive capacities keep doubling every 1.5 years or soIf you take the data that used to be on two disk drives and put it onto one drive that’s twice as big, you will also be combining the I/O activity that was on the original two drives onto the one double-size drive.The problem is that as drive capacity keeps increasing, the number of I/Os per second (IOPS) that a drive can handle has not been increasing.An I/O operation consists of a seek, ½ turn latency, and data transfer.Data transfer for a 4K block is now down to around 1 % of a rotation.To position the head takes over 1 rotation (seek + ½ turn latency)IOPS capability is ALL about mechanical positioning
7IOPS capability at 50% busy by drive type 4k random IOPS at 50% busy by drive type6365863859618123*10203040506070809010010K7310K14610K30015K7315K146SATA 7K400ReadWrite* Includes read verify after writeNote that IOPS capability is the same for different drive capacities with same RPMThese are green zone upper limits per drive for back-end I/O, including RAID-penalty I/Os
8Access density capability When we talked about combining the data that used to be on two drives onto one double-size drive, and how that also combines (doubles) the I/O activity to the bigger drive, this illustrates that for a given workload there is a certain amount of I/O activity per GB of data.This activity per GB is called the “access density” of the workload, and is measured in IOPS per GB.Over the last few decades, as disk drive storage capacity has become much cheaper, from a humble beginning it became economic to store graphics, then audio, and now video.The introduction of these new data types has reduced typical access densities by about a factor of 10 over the last 20 years.However, access density is going down slower than disk drive capacity is going up.Typical access densities are reported in the 0.6 to 1.0 IOPS per GB range
9Random read IOPS capability by drive type This chart shows what access density each drive type can handle if you fill it up with data. marks green zone upper limit at 50% busy.The position of the left to right shows the maximum access density that the drive can comfortably handle.7K40010K30015K30010K14615K14610K73720010K15K7315K
10RAID makes the access density problem worse The basic idea behind RAID is to make sure that you don’t lose any data when a single drive fails.So what this means is that whenever a host writes data to the subsystem, that at least two disks need to be updated.The amount of extra disk drive I/O activity needed to handle write activity is the key factor in determining the lowest cost solution as a combination of disk drive RPM, disk drive capacity, and RAID type.So that’s why we will look at how different RAID levels workIt is very rare that the access density is so low that you can completely fill up the cheapest drive.Only for things like a home PVR will a 750 GB SATA drive make the smallest dent in your wallet while getting the job done.
1130 second “elevator pitch” on subsystem data flow Random read hits are stripped off by cache and do not reach the back end.Random read misses go through cache unaltered and go straight to the appropriate back end disk drive.This is the only type of “I/O” operation where the host always “sees” the performance of the back-end disk drive.Random writesHost sees random writes complete at electronic speedHost only sees delay if too many pending writes build up.Each host random write is transformed going through cache into a multiple I/O pattern that depends on RAID typeSequential I/OHost sequential I/O is at electronic speed.Cache acts like a “holding tank”.Back end puts [removes] “back-end buckets” of data into [out of] the tank to keep the tank at an appropriate level
12What is RAID? 1993 paper by a group of researchers at UC Berkeley “Redundant Array of Inexpensive Disks”The original idea was to use cheap (i.e. PC) disk drives arranged in a RAID to give you “mainframe” reliability.Now most call it Redundant Array of Independent DisksA RAID is an arrangement of data on disk drives in such a way that if a disk drive fails, you can still get the data back somehow from the remaining disksRAID-1 is mirroring – just keep two copiesRAID-5 uses parity – recovers from single drive failuresRAID-6 uses dual parity – recovers from double drive failures
13RAID-1 random reads / writes Copy #1Copy #2XYZFor writes, a copy must be written to both disk drivesTwo parity group disk drive writes for every host writeDon’t care about what the previous data was, just over-write with new dataXYZCopy #1Copy #2orFor reads, the data can be read from either disk driveRead activity distributed over both copies reduces disk drive busy (due to reads) to ½ of what it would be to read from a single (non-RAID) disk driveABCABCCopy #1Copy #2Also called “mirroring”Two copies of the dataRequires 2x number of disk drives
14RAID-1 sequential read2 sets of parallel I/O operations, each set reading 4 data chunks (2 MB)Parity group data MB/s = 4 x drive MB/sChunk 1Chunk 2Chunk 3Chunk 4Chunk 5Chunk 6Chunk 7Chunk 8Chunk 1’Chunk 1Chunk 2Chunk 2’Chunk 3Chunk 3’Chunk 4’Chunk 42+2 shownChunk 5’Chunk 5Chunk 6Chunk 6’Chunk 7Chunk 7’Chunk 8’Chunk 8
15RAID-1 sequential write 4 sets of parallel I/O operations, each writing 2 data chunks (1MB) and 2 parity chunksParity group data MB/s = 2 x drive MB/sChunk 1Chunk 2Chunk 3Chunk 4Chunk 5Chunk 6Chunk 7Chunk 8Chunk 1’Chunk 1Chunk 2Chunk 2’Chunk 3Chunk 3’Chunk 4’Chunk 42+2 shownChunk 5’Chunk 5Chunk 6Chunk 6’Chunk 7Chunk 7’Chunk 8’Chunk 8
16RAID-1 commentsSince RAID-1 requires doubling the number of disk drives to store the data, people tend to think of RAID-1 as the most expensive type of RAID.However, due to the intensity of host access, in RAID subsystems often one cannot completely “fill up” the disk drive with data because the disk drive would become too busy.RAID-1 offers the lowest “RAID penalty” of only having two disk drive I/Os per random write, compared to four for RAID-5, and six for RAID-6.For this reason, when the workload is sufficiently active and has a lot of random writes, RAID-1 will be the cheapest RAID type because it has the least disk drive I/O operations per random write.
17RAID-1’s “RAID penalty” Penalty in spaceDouble the number of disk drives requiredPenalty in disk drive utilization (disk drive % busy)Twice the number of I/O operations required for all writesNo penalty for read operations; read operation distributed over twice the number of drives.
18RAID-5 parity concept 0 XOR 1 XOR 0 = 1 10011 11111 00000 01100 There is an odd number of 1s in this bit position, so parity bit is 110011111110000001100(odd) parityDataDataData1 XOR 1 XOR 0 = 0With an even number of 1s in this bit position, parity bit is set to 0.Each parity bit indicates whether or not there is an odd number of “1” bits in that bit position across the whole parity group (“odd parity”).If you add more data drives, you don’t add any more parity.
19RAID-5 – if drive containing parity fails 10011111110000001100DataDataDataParityYou still have the data.Better reconstruct the parity on a spare disk drive right away just in case a second drive fails
20RAID-5 – if drive containing data fails Since on the remaining data disks, there is now an even number of “1” bits, we know that the missing data bit is a “1”`10011111110000001100A “1” bit here says there originally was an odd number of “1” data bits in this position across the data drives11111DataDataDataParityIf a drive that had data on it fails, you can reconstruct the missing data.Read the corresponding “chunk” from all the remaining data drives, and see how many “1” bits there are in each position.By comparing how many “1” bits there are in each bit position out of the remaining disk drives with what the parity tells you there originally was, you can reconstruct the dataBetter reconstruct the parity on a spare disk drive right away just in case a second drive fails
21RAID-5 random read hit Read hits operate at electronic speed Read data #3Read hits operate at electronic speedJust transfer data from cacheCopy of data #300000Cache10011111110000001100Data #1Data #2Data #3Parity
22RAID-5 random read missRead data #1Read misses are the ONLY operation that “sees” the speed of the disk drive during normal (not overloaded) operationI.e. read misses are the only type of host I/O operation that does not complete at electronic speed with just an access to cacheCopy of data #110011Copy of data #300000Cache10011111110000001100Data #1Data #2Data #3Parity
23RAID-5 random write + - Read old data, read old parity 01010New data #2 from host1100101010New dataNew parity11001+New dataPartial parity corresponds to remaining part of stripe without old data-10011Partial parity0110011111.....Old dataOld parityCache10011111110000001100Data #1Data #2Data #3ParityRead old data, read old parityRemove old data from old parity giving “partial parity” (parity for the rest of the row)Add new data into partial parity to generate “new parity”Write new data and new parity to disk
24RAID-5 sequential readThe subsystem “detects” that the host is reading sequentially after a few sequential I/Os(The first few are treated as random reads.)The subsystem performs “sequential pre-fetch” to load stripes of data from the parity group into cache in advance of when the host will request the dataThe subsystem can usually easily keep up with the host as transfers from the parity group are performed in parallelCache1010100110101011100110101001101010111001
25RAID-5 sequential read example In parallel, read a chunk from each drive in the parity group.3 sets of parallel I/O operations to read 12 chunks (6 MB)Parity group MB/s = 4 x drive MB/sChunk 1Chunk 2Chunk 3Chunk 4Chunk 5Chunk 6Chunk 7Chunk 8Chunk 9Chunk 10Chunk 11Chunk 12Chunk 1Chunk 2Chunk 3Parity 1, 2, 3Chunk 5Chunk 6Parity 4, 5, 6Chunk 4Chunk 9Parity 7, 8, 9Chunk 7Chunk 8Parity 10, 11, 12Chunk 10Chunk 11Chunk 12
26RAID-5 sequential write First compute the parity chunk for a rowThen write row to disk.4 sets of parallel I/O operations to write 12 data chunks (6 MB) with 4 parity chunksParity group data MB/s = 3 x drive MB/sChunk 4Chunk 5Chunk 6Chunk 1Chunk 2Chunk 3Chunk 8Chunk 7Chunk 9Chunk 10Chunk 11Chunk 12Parity 10, 11, 12Parity 7, 8, 9Parity 4, 5, 6Parity 1, 2, 3Chunk 1Chunk 2Chunk 3Parity 1, 2, 3Parity 4, 5, 6Chunk 4Chunk 5Chunk 6Parity 7, 8, 9Chunk 7Chunk 8Chunk 9Parity 10, 11, 12Chunk 10Chunk 11Chunk 12
27RAID-5 comments For sequential reads and writes, RAID-5 is very good. It’s very space efficient (smallest space for parity), and sequential reads and writes are efficient, since they operate on whole stripes.For low access density (light activity), RAID-5 is very good.The 4x RAID-5 write penalty is (nearly) invisible to the host, because it’s non-synchronous.For workloads with higher access density and more random writes, RAID-5 can be throughput-limited due to all the extra parity group I/O operations to handle the RAID-5 “write penalty”
28RAID-5 “RAID penalty” Penalty in space For 3+1, 33% extra space for parityFor 7+1, 14% extra space for parityPenalty in disk drive utilization (disk drive % busy)Random writesFour times the number of I/O operations (300% extra I/Os)Sequential writesFor 3+1, 33% extra I/Os for sequential writesFor 7+1, 14% extra I/Os for sequential writes
29RAID-6 “6D + 2P” parity group Q“6D + 2P” parity groupRAID-6 is an extension of the RAID-5 concept which uses two separate parity-type fields usually called “P” and “Q”.The mathematics are beyond a basic course*, but RAID-6 allows data to be reconstructed from the remaining drives in a parity group when any one or two drives have failed. *The math is the same as for ECC used to correct errors in DRAM memory or on the surface of disk drives.Each RAID-6 host random write turns into 6 parity group I/O operationsRead old data, read old P, read old Q(Compute new P, Q)Write new data, write new P, write new QRAID-6 parity group sizes usually start at 6+2.This has the same space efficiency as RAID
30RAID-6 “RAID penalty” 6+2 penalty in space 33% extra space for parity6+2 penalty in disk drive utilization (disk drive % busy)Random writesSix times the number of I/O operations (500% extra I/Os)Sequential writes33% extra I/Os
31RAID-1 vs RAID-5 vs RAID-6 summary The concept of RAID with parity groups permits data to be recovered even upon a single drive failure for RAID-1 and RAID-5, or a double drive failure for RAID-6RAID-1 trades off more space utilization for lower RAID penalty for writes, and lower degradation after drive failure.RAID-1 can be cheaper (require less disk drives) than RAID-5 where there is concentrated random write activityRAID-5 achieves redundancy with less parity space overhead, but at the expense of having a higher “RAID penalty” for random writes, and having a larger performance degradation upon a drive failure
3230 second “elevator pitch” on subsystem data flow Random read hits are stripped off by cache and do not reach the back end.Random read misses go through cache unaltered and go straight to the appropriate back end disk drive.This is the only type of “I/O” operation where the host always “sees” the performance of the back-end disk drive.Random writesHost sees random writes complete at electronic speedHost only sees delay if too many pending writes build up.Each host random write is transformed going through cache into a multiple I/O pattern that depends on RAID typeSequential I/OHost sequential I/O is at electronic speed.Cache acts like a “holding tank”.Back end puts [removes] “back-end buckets” of data into [out of] the tank to keep the tank at an appropriate level
33RAID-5 can often be more expensive See how much busier the “back end” disk drives are for the RAID-5 configuration, all due to random writes (solid blue)In this case, the RAID-1 configuration was cheaper, because fewer disk drives were needed to handle the back-end I/O activity.RAID-1 drives could be completely filled, whereas the RAID-5 drives could only be filled to 55% of their capacity.
34Conclusions – factors driving lowest cost The lowest cost configuration in terms of disk drive RPM, disk drive capacity, and RAID type depends strongly on the access density and the read:write ratio.If there is even moderate access density with significant random write activity, RAID-1 will often turn out to be the lowest cost total solution, due to being able to fill up more of the drives’ capacity with data.Where access densities are higher, 15K RPM drives will often turn out to offer the lowest cost overall solution.SATA drives, due to their low IOPS capability, can only be filled if the data has very low access density, and therefore are rarely the cheapest.
35www.hds.com/webtech Upcoming WebTech Sessions: 19 September - Enterprise Data Replication Architectures that Work: Overview and Perspectives17 October – 10 Steps To Determine if SANs Are Right For You