Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 245Notes 21 Database System Principles Notes 02: Hardware.

Similar presentations


Presentation on theme: "CS 245Notes 21 Database System Principles Notes 02: Hardware."— Presentation transcript:

1 CS 245Notes 21 Database System Principles Notes 02: Hardware

2 CS 245Notes 22 Outline Hardware: Disks Access Times Example - Megatron 747 Optimizations Other Topics: –Storage costs –Using secondary storage –Disk failures

3 CS 245Notes 23 10 nanoseconds (10 -8 seconds) 10-100 nanoseconds (10 -8 to 10 -7 seconds) 10-30 miliseconds (0.1 to 0.3 seconds) 1000 times slower than secondary memory Volatile Non-volatile The Memory Hierarchy

4 CS 245Notes 24 Storage Cost 10 -9 10 -6 10 -3 10 -0 10 3 access time (sec) 10 15 10 13 10 11 10 9 10 7 10 5 10 3 cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter

5 CS 245Notes 25 Storage Cost 10 -9 10 -6 10 -3 10 -0 10 3 access time (sec) 10 4 10 2 10 0 10 -2 10 -4 cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter

6 CS 245Notes 26 DBMS Data Storage APP1 APPn ……

7 CS 245Notes 27 P MC Typical Computer Secondary Storage...

8 CS 245Notes 28 Processor Fast, slow, reduced instruction set, with cache, pipelined… Speed: 100  500  1000 MIPS Memory Fast, slow, non-volatile, read-only,… Access time: 10 -6  10 -9 sec. 1  s  1 ns

9 CS 245Notes 29 Secondary storage Many flavors: - Disk: Floppy (hard, soft) Removable Packs Winchester Ram disks Optical, CD-ROM… Arrays - TapeReel, cartridge Robots

10 CS 245Notes 210 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap …

11 CS 245Notes 211

12 CS 245Notes 212 Top View 0 track Sector is unit of error correction/detection

13 CS 245Notes 213 Disk Controller 1. Buffer data in and out of disk 2. Schedule the disk heads 3. Manage the “bad blocks” so they are not used

14 CS 245Notes 214 “Typical” Numbers Diameter: 1 inch  15 inches Cylinders:100  2000 Surfaces:1 (CDs)  (Tracks/cyl) 2 (floppies)  30 Sector Size:512B  50K Capacity:360 KB (old floppy)  400 GB (I use)

15 CS 245Notes 215 Disk Access Time block x in memory ? I want block X

16 CS 245Notes 216 Time = Seek Time + Rotational Delay + Transfer Time + Other

17 CS 245Notes 217 Seek Time 3 or 5x x 1N Cylinders Traveled Time

18 CS 245Notes 218 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i

19 CS 245Notes 219 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “Typical” S: 10 ms  40 ms

20 CS 245Notes 220 Rotational Delay Head Here Block I Want

21 CS 245Notes 221 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM)

22 CS 245Notes 222 Transfer Rate: r “typical” r: 1  3 MB/second transfer time: block size r

23 CS 245Notes 223 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory

24 CS 245Notes 224 Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “Typical” Value: 0

25 CS 245Notes 225 So far: Random Block Access What about: Reading “Next” block?

26 CS 245Notes 226 If we do things right (e.g., Double Buffer, Stagger Blocks…) Time to get = Block Size + Negligible block r - skip gap - switch track - once in a while, next cylinder

27 CS 245Notes 227 Rule ofRandom I/O: Expensive Thumb Sequential I/O: Much less Ex:1 KB Block »Random I/O:  10 ms. »Sequential I/O:  1 ms.

28 CS 245Notes 228 Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size r

29 CS 245Notes 229 To Modify a Block?

30 CS 245Notes 230 To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

31 CS 245Notes 231 Block Address: Physical Device Cylinder # Surface # Sector

32 CS 245Notes 232 Complication: Bad Blocks Messy to handle May map via software to integer sequence 1 2.Map Actual Block Addresses. m Logical block address (LBA)

33 CS 245Notes 233 The Megatron 747 16 Surfaces, 3.5 Inch diameter – outer 1 inch used 2 16 = 65536 Tracks/surface 2 8 = 256 Sectors/track 2 12 = 4096 Bytes/sector 2 40 Bytes = 1 TB Disk An Example Example 13.1, 13.2

34 CS 245Notes 234 7200 RPM ~ 8.33ms/rotation Take 1ms for the head to start 1ms for every 4000 cylinders – the head move one track in 1.00025ms – 65536 tracks in about 17.38ms 10% overhead between blocks Megatron 747 Disk (cont.)

35 CS 245Notes 235 7200 RPM 120 revolutions / sec 1 rev. = 8.33 msec. One track:...

36 CS 245Notes 236 7200 RPM 120 revolutions / sec 1 rev. = 8.33 msec. One track:... Time over useful data:(8.33)(0.9)=7.5 ms. Time over gaps: (8.33)(0.1)= 0.83 ms. Transfer time 1 sector = 7.5/256=0.029 ms. Trans. time 1 sector+gap=8.3/256=0.032ms.

37 CS 245Notes 237 Timing for Megatron 747 Seek time: –MIN: 0 ms –MAX: 17.38 ms –AVE: 6.46 ms (why not 17.38/2=8.69 ms?)

38 CS 245Notes 238 0 32768 65536 0 32768 16384 Average travel distance as a function of initial head position

39 CS 245Notes 239 Timing for Megatron 747 Rotational delay: –MIN: 0 ms –MAX: 8.33 ms –AVE: 4.17 ms

40 CS 245Notes 240 T 1 = Time to read one random sector T 1 = seek + rotational delay + TT = 6.46 + 4.17 +.029 = 10.66 ms.

41 CS 245Notes 241 Suppose OS deals with 16 KB blocks T 4 = 6.46 + 4.17 + (.029) x 1 + (.032) X 3 = 10.76 ms [Compare to T 1 = 10.66 ms]... 1 2 3 4 1 block

42 CS 245Notes 242 T T = Time to read a full track (start at any block) T T = 6.46 + (0.032/2) + 8.33 * = 14.81 ms to get to first block * Actually, a bit less; do not have to read last gap.

43 CS 245Notes 243 Block Size Selection? Big Block  Amortize I/O Cost Big Block  Read in more useless stuff! and takes longer to read Unfortunately...

44 CS 245Notes 244 Trend As memory prices drop, blocks get bigger... Trend

45 CS 245Notes 245 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here

46 CS 245Notes 246 The I/O Model of Computation Access of a block is very expensive Random block access is the norm Reasonable to count only the disk I/O

47 CS 245Notes 247 External Sorting 10M records of 100 bytes = 1Gb file –4Kb block, each holding 40 records –Entire file takes 2 18 blocks 50Mb available main memory –Holds 12800 blocks ~ 1/20 th of file Requirement – sort by primary key field

48 CS 245Notes 248 External Sorting Main memory sorting algorithms may not be good candidates here Time complexity in this case is estimated based on disk I/O numbers, instead of CPU times

49 CS 245Notes 249 Merge Sort Take two sorted list and repeatedly choose the smaller of the “heads” of the lists Merge 1, 3, 5 with 2, 4, 6 = 1,2,3,4,5,6 A recursive algorithm: divide records into two parts; recursively mergesort the parts, and merge the resulting list Each record is R/W from disk log 2 n times

50 CS 245Notes 250 Two-Phase, Multiway Merge Sort (2PPMS) Phase 1 –Fill main memory with records –Sort with favorite main memory sorting algorithms –Write sorted list to disk –Repeat until all records have been put into one of the sorted lists

51 CS 245Notes 251 Phase 2

52 CS 245Notes 252 Two-Phase, Multiway Merge Sort (2PPMS) Phase 2 –Initially load input buffers with the first block of their respective sorted lists –Repeated run a competition among the first unchosen records of each of the buffered blocks –If an input block is exhausted, get the next block from the same file –If the output block is full, write it to disk

53 CS 245Notes 253 Analysis of Naïve Implementation 2 reads and 2 writes for each block Recall that the file is stored on 2 18 blocks Assume a random block access is 10ms Overall time for 2PPMS = 2 20 *.01s ~ 174 minutes ~ 3 hours

54 CS 245Notes 254 Optimizations (in controller or O.S.) Disk Scheduling Algorithms –e.g., elevator algorithm Cylinderization Track (or larger) Buffer Pre-fetch Arrays Mirrored Disks On Disk Cache

55 CS 245Notes 255 Cylinderization Place data by cylinder so that we can read block after block, with no seek time or/and rotational delay Assume records on 1000 consecutive cylinders = 1Mb data per cylinder = 256 blocks per cylinder Assume transfer time for one block is 0.05ms (actually 0.032 for 747)

56 CS 245Notes 256 Application to Phase 1 Load main memory from 50 consecutive cylinders (50M data=12800 blocks) –Time to transfer 12800 blocks = 0.05ms * 12800 = 0.64 sec –One random seek and 49 1-cylinder seeks cost 0.06 sec (actually 0.056 for 747) –Write sorted list onto 50 consecutive cylinders, so write time is also about 0.7 sec Overall cost = 20*2*0.7 = 28 sec

57 CS 245Notes 257 But in Phase 2… –Blocks are read from the 20 sublists in a data-dependent order (random disk I/O requests), so we cannot take much advantage of cylinderization.

58 CS 245Notes 258 Track Buffer If we have extra space for main-memory buffer, consider loading buffers in advance of need Similarly, keep output blocks buffered in main-memory until it is convenient to write them to disk

59 CS 245Notes 259 Application to Phase 2 Buffer one cylinder for each sublist and the output –Recall that one cylinder (1Mb) = 256 blocks –Time to read 1000 cylinder = 1000*(256*0.05ms+6.5ms) ~ 19.3 sec Similarly, time to write the output buffer = 19.3 sec

60 CS 245Notes 260 Double Buffering Problem: Have a File » Sequence of Blocks B1, B2 Have a Program » Process B1 » Process B2 » Process B3...

61 CS 245Notes 261 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

62 CS 245Notes 262 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

63 CS 245Notes 263 Double Buffering Memory: Disk: ABCDGEF process

64 CS 245Notes 264 Double Buffering Memory: Disk: ABCDGEF B done process A

65 CS 245Notes 265 Double Buffering Memory: Disk: ABCDGEF A C process B done

66 CS 245Notes 266 Double Buffering Memory: Disk: ABCDGEFA B done process A C B done

67 CS 245Notes 267 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks

68 CS 245Notes 268 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P)

69 CS 245Notes 269 Multiple Disks If the sublists are stored on multiple (cheap) different disks, then we can read or write multiple blocks at once Better performance at the cost of $ Example 13.5

70 CS 245Notes 270 Mirror Disks If we use n disks and mirror the original disk with n-1 copies, then we can read n blocks at once Writing requires a write on each disk, thus not speed up Obvious minus: extra disk cost

71 CS 245Notes 271 Disk Scheduling Algorithms Concurrent processes generate (random) disk I/O requests Disk controller (or OS, DMBS) decides in which order these requests are serviced Elevator algorithm –Head moves from center to rim, then back again –Idea is to honor “nearby” requests whenever possible

72 CS 245Notes 272 Example 13.6 P572

73 CS 245Notes 273 Disk Failures Partial  Total Intermittent  Permanent 1. Intermittent failure 2. Media decay 3. Disk crash

74 CS 245Notes 274 Coping with Disk Failures Detection –e.g. Checksum (example 13.7 p576) Correction  Redundancy

75 CS 245Notes 275 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk

76 CS 245Notes 276 RAID Level 1 Mirror the disk Redundancy allows data to survive a disk crash

77 CS 245Notes 277 MTTF for Mirrored disks Assume –10% failure per year; MTTF = 10 years –3 hours to replace and restore failed disk If a fail to one disk occurs, then the other better not fail in the next three hours Example 13.7, p579

78 CS 245Notes 278 MTTF for Mirrored disks (cont.) Probability of any disk having a failure in one year = 1/10 (by def.) 3 hours = 1/2920 year Probability that the mirror disk fails = (1/10)*(1/2920)

79 CS 245Notes 279 MTTF for Mirrored disks (cont.) Probability that one of the two disks fail in one year = 1/10 + 1/10 = 1/5 Probability that the mirror disk fails during the restoring of the other crashed disk= (1/5)*(1/10)*(1/2920) MTTF for mirror disks = 5*10*2920=146000 years

80 CS 245Notes 280 RAID Level 4 Use n data disks and one redundant disk Each block of the redundant disk is the parity check for the corresponding blocks of the data disks

81 CS 245Notes 281 RAID Level 4 - Example

82 CS 245Notes 282 RAID Level 4 - Recovery

83 CS 245Notes 283 RAID Level 4 - Reading

84 CS 245Notes 284 RAID Level 4 - Writing Example 13.10, p581

85 CS 245Notes 285 RAID Level 5 The redundant disk is the bottleneck for writing in RAID 4 Solution: vary the redundant disks for different blocks –Example: n disks; block j is redundant on disk i if i = remainder of j /n Example 13.12, p583

86 CS 245Notes 286 At what level do we cope? Single Disk –e.g., Error Correcting Codes Disk Array Logical Physical

87 CS 245Notes 287 Operating System e.g., Stable Storage Logical BlockCopy A Copy B

88 CS 245Notes 288 Database System e.g., Log Current DBLast week’s DB

89 CS 245Notes 289 Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here Summary

90 CS 245Notes 290 Assignments (First Edition) 1.Ex. 2.2.1, p39 2.Ex. 2.3.1, p48, Ex. 2.3.4, p49 3.Ex. 2.4.1, p61, Ex. 2.4.5, p63, refer to tips on p58 4.Ex. 2.5.2, p67 5.Ex. 2.6.1, p77, Ex. 2.6.5, p78

91 CS 245Notes 291 Assignments (Second Edition) 1.Ex. 13.2.1, 13.2.2, p567 2.Ex. 13.3.1, 13.3.4, p575 3.Ex. 13.4.2, 13.4.3, 13.4.5, 13.4.7, p589 4.Ex. 13.5.1, 13.5.2, p593 5.Ex. 13.6.5, 13.6.7, 13.6.9, p603 6.Ex. 13.7.1, 13.7.3, p611 7.Ex. 13.8.1, p615

92 CS 245Notes 292 Assignments (Second Edition in Chinese ) 1.Ex. 2.2.1, 2.2.3, p17 2.Ex. 2.3.1, 2.3.4, p21 3.Ex. 2.4.2, 2.4.3, 2.4.5, 2.4.7, p29 4.Ex. 2.5.1, 2.5.2, p31 5.Ex. 2.6.5, 2.6.7, 2.6.9, p37 6.Ex. 2.7.1, 2.7.3, p41 7.Ex. 2.8.1, p44

93 CS 245Notes 193 Reading Materials 1.D.A. Patterson, G.A. Gibson, and R.H. Katz, “A case for redundant arrays of inexpensive disks,” Proc. ACM SIGMOD, 1988 2.Bianca Schroeder Garth A. Gibson, “Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?” 5th USENIX Conference on File and Storage Technologies


Download ppt "CS 245Notes 21 Database System Principles Notes 02: Hardware."

Similar presentations


Ads by Google