Download presentation
Presentation is loading. Please wait.
1
What every server wants!
RAID What every server wants! Copyright © by Curt Hill
2
Copyright © 2003-2017 by Curt Hill
History I In the late part of the 1970s and there were two types of disks: Hard Floppy The hard drives were only used on mainframes or minis Quite expensive The floppys were mostly used on personal computers These were largely hobby or light business Low capacity In about 1984 hard drives started appearing on IBM personal computers Copyright © by Curt Hill
3
Copyright © 2003-2017 by Curt Hill
History II After 1984 there were two types of hard drives: Professional Large M Expensive: $50,000 and up Amateur Small: 5-50 M Inexpensive: $2500 or less Copyright © by Curt Hill
4
Copyright © 2003-2017 by Curt Hill
History III At this point the laws of economics takes charge The large disks were a small market, typically 10s of thousands The small disks were mass market, typically 10s of millions What happens to the prices? The small drives advance on the learning curve and become even less expensive Copyright © by Curt Hill
5
Copyright © 2003-2017 by Curt Hill
History IV Someone gets a bright idea: Should I buy one 500 M professional disk for $50,000 or 10 50M amateur disks at $100 each? RAID is born: Redundant Array of Inexpensive Disks Later Redundant Array of Independent Disks There might even be a performance improvement if I can access the array independently However, we need an exotic controller to treat the 5 disks as if they were one Copyright © by Curt Hill
6
Copyright © 2003-2017 by Curt Hill
Issues Redundancy If the small is less reliable or if the data is so important we need multiple copies for safety Speed Increase speed by writing part of the data to one drive and another part to another at the same time Versions There have been many, termed Levels Currently level 0 through level 6 Copyright © by Curt Hill
7
Copyright © 2003-2017 by Curt Hill
Redundancy Multiple copies allows one disk to crash without the computer losing data Two forms: Mirroring AKA Shadowing, Duplexing Used early on Error Correction Codes More popular later Copyright © by Curt Hill
8
Copyright © 2003-2017 by Curt Hill
Mirroring Write all the data twice Once to a disk and its mirror This is done simultaneously, so no extra delay A read only has to wait for the faster of the two If a disk crashes, no rebuild is needed since the other disk may just be copied High disk overhead, needs two disks to store one disk worth of data Can read different parts of the two at same time to increase speed Copyright © by Curt Hill
9
Copyright © 2003-2017 by Curt Hill
Error Correction Code We should have already seen the background on Error Correction Codes Otherwise see the ECC.ppt presentation Many of the EC codes were pioneered by Richard Hamming of Bell Labs around 1950 Thus known as Hamming codes Copyright © by Curt Hill
10
Copyright © 2003-2017 by Curt Hill
ECC Instead of mirroring which requires double disk space use an ECC Conceptually: The eight bit data placed on eight separate drives and the four bit ECC on another Any one disk that fails may be recreated from the rest Recall the larger the data the better the ratio of data to parity bits Copyright © by Curt Hill
11
Copyright © 2003-2017 by Curt Hill
Speed Speed requires parallelism Do two things at once With a mirrored disk read the front half from one and the back half from the other Transfer time cut in half This leads to a more general approach: stripes Copyright © by Curt Hill
12
Copyright © 2003-2017 by Curt Hill
Stripes Cut the data into stripes If you have N disks Partition file into N pieces Read or write N pieces at a time Best if each piece goes to a separate controller Generally controllers are much slower than memory, so multiple controllers Copyright © by Curt Hill
13
Copyright © 2003-2017 by Curt Hill
Controllers Controllers may be hardware or software or both Some RAID levels are so complicated to make a software controller a problem Thus we always want to prefer RAID implemented in hardware Copyright © by Curt Hill
14
Copyright © 2003-2017 by Curt Hill
RAID Levels Originally specified with six levels 0-5 Some of these Were commercially popular Were replaced by better techniques before adoption RAID 5 and 6 are now the dominant with 6 replacing 5 Two is the minimum number of disks but more is usually an option Copyright © by Curt Hill
15
Copyright © 2003-2017 by Curt Hill
Level 0 Striped disk array Minimum of two disks No redundancy or error checking If a drive is lost all the data is lost Only RAID that does not protect data Some argue that this is Pre-RAID Lets not argue over the obsolete Copyright © by Curt Hill
16
Copyright © 2003-2017 by Curt Hill
Level 0 - Striping A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 Each file is represented by a letter (A-C) Four stripes within file (A1-A4) Each block of data is written in four pieces No redundancy Copyright © by Curt Hill
17
Copyright © 2003-2017 by Curt Hill
Level 1 Mirroring without striping Minimum of two disks 100% redundancy A rebuilt disk is just a copy, not computed Simple controller Cannot be expanded on the fly Copyright © by Curt Hill
18
Copyright © 2003-2017 by Curt Hill
Level 1 - Mirroring A A E E B B F F C C G G Four disks, each is a mirror of another Each block of data is written twice Need twice as much space Copyright © by Curt Hill
19
Copyright © 2003-2017 by Curt Hill
Level 2 Striped disk at bit level ECC provides the redundancy Minimum of seven disks for storing 4 bit word Not commercially viable Number of parity disks is proportional to log of number of data disks Not very flexible Copyright © by Curt Hill
20
Level 2 – Striping with ECC
A1 A2 A3 A4 EAx EAy EAz B1 B2 B3 B4 EBx EBy EBz C1 C2 C3 C4 ECx ECy ECz Four stripes of data protected by three of ECC ECC is computed on the fly Copyright © by Curt Hill
21
Copyright © 2003-2017 by Curt Hill
Level 3 Striped disk array with one ECC disk Each block is striped across disks One disk may fail without diminishing throughput Minimum of three disks High data rates Controller should be in hardware, not just software Copyright © by Curt Hill
22
Level 3 – Striping with ECC
A1 A2 A3 A4 ECCA B1 B2 B3 B4 ECCB C1 C2 C3 C4 ECCC Four stripes of data protected by three of ECC ECC is computed on the fly Copyright © by Curt Hill
23
Copyright © 2003-2017 by Curt Hill
Level 4 Similar to level 3 except Blocks are not subdivided Different block are written to different disks Minimum of three disks Controller is complex Should be hardware Not easy to rebuild in case of failure Copyright © by Curt Hill
24
Copyright © 2003-2017 by Curt Hill
Level 5 Striped disk array with distributed ECC blocks Each disk stores both data and ECC An ECC block is never on same disk as the data Minimum of three disks Any single disk failure will result in no loss of data Most complex controller design Copyright © by Curt Hill
25
Level 5 – Striping with Distributed ECC
A1 B1 C1 D1 ECC1 A2 B2 C2 ECC2 E2 A3 B3 ECC3 D3 E3 A4 ECC4 C4 D4 E4 ECC5 B5 C5 D5 E5 Four stripes of data Each ECC protects other four No disks are only data or only ECC Copyright © by Curt Hill
26
Copyright © 2003-2017 by Curt Hill
Level 6 Striped disk array Similar to level 5 except two independent ECC schemes ECC blocks distributed among data disks Minimum of four disks May have multiple disk failures without loss of data Copyright © by Curt Hill
27
Level 6 – Striping with Two Distributed ECCs
A1 B1 C1 ECC1 A2 B2 ECCd ECCa A3 ECCc ECC2 D1 ECCb ECC3 C2 D2 ECC4 B3 C3 D3 Four stripes of data Two types of ECC each of which protects a different group No disks are only data or only ECC Copyright © by Curt Hill
28
Copyright © 2003-2017 by Curt Hill
Others Several combinations of these exists 0 + 1 Mirroring a striped disk 10 Striping a mirrored disk 53 Level 0 and 3 Level 0 whose stripes are level 3 arrays Copyright © by Curt Hill
29
Copyright © 2003-2017 by Curt Hill
What is coming? As disk arrays get larger the likelihood of a double failure becomes significant Although no product yet, it seems likely that a triple parity scheme is inevitable Copyright © by Curt Hill
30
Copyright © 2003-2017 by Curt Hill
Finally One thing that slows a database is disk access The parallelism of RAID is an asset in this case Other problem for a database is robustness That is minimizing down time The redundancy of RAID helps this Every database wants RAID! Copyright © by Curt Hill
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.