Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?

Outline Memory Hierarchy Using Hard Disks Efficiently Accessing Hard Disks Quickly Keeping Hard Disks Safely Mechanics of Hard Disks

The Memory Hierarchy Tertiary Storage Main memory Cache As Virtual Memory Disk File System DBMS Programs, Main-memory DBMS’s SpeedCostCapacity Small Large Fast Slow High Low Secondary Storage

Cache Capacity Up to 1 megabyte Speed between cache and processor: 10 nanosecond Speed between cache and memory: 100 nanoseconds

Main Memory Capacity up to 10 gigabytes Random Access Access time in 10-100 nanosecond range

Virtual Memory Most machines use 32-bit address space which is up to 4 gigabytes. Main memory is usually 256 Megabytes. Virtual memory is supported by the machine hardware and the operating system through paging mechanism. Main-memory database system can be implemented by virtual memory.

Secondary Storage Significantly more capacious than main memory Significantly cheaper than main memory Significantly slower than main memory Magnetic Disks are usually used as secondary storage.

Tertiary Storage Data volumes measured in terabytes Slow and cheaper Access times varying widely Ad-hoc Tape Storage, Optical Disk Juke Boxes and Tape Silos are the common tertiary storages.

Volatile and Nonvolatile Storage Volatile device “forgets” its contents when the power goes off, such as main memory. Nonvolatile device keeps its contents intact in the presence of power failures, such as magnetic disk, tapes, flash memory.

13 12 11 10 9 8 7 6 5 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Tertiary Secondary Zip disk Floppy disk Main Memory Cache Access time versus capacity for various levels of memory hierarchy The horizontal axis measures seconds in exponents of 10. The vertical axis measures bytes in exponents of 10.

Mechanics of Disks cylinder Platter = 2 surfaces disk heads A typical disk tracks sector gap Top view of a disk surface

Disk Controller Controlling disk head to move and position the heads at a particular radius Selecting a surface, and selecting a sector from the track on that surface that is under the head Transferring data

Processor Main Memory Disk Controller Disks Bus Schematic of a simple computer system

Disk Storage Characteristics The typical measures ： ----Rotation Speed of the Disk Assembly ----Number of Platters per Unit ----Number of Tracks per Surface ----Number of Bytes per Track Example ： Megatron 747’s characteristics ： ---- 3840 RPM ---- There are four platters providing eight surfaces ---- There are 8192 tracks per surface ---- There are ( on average) 256 sectors per track, 512 bytes per sector

Capacity of Megatron 747 8 surfaces X 8192 tracks X 256 sectors X 512 Bytes 8 gigabytes

Block Address: Physical Device Cylinder # Surface # Sector

Disk Access Characteristics Head here Rotation Block we want The cause of rotational latency 1 x MAX in range 3x~20x Seek time varies with distance traveled 040968192 0 2048 4096 Average travel distance as a function of initial head position Starting track Average travel Disk Access Time = Seek Time + Rotational Delay + Transfer Time + Other Cylinders traveled

Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “ Typical ” S: 10 ms  40 ms

Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM)

Transfer Rate: t “ typical ” t: 1  3 MB/second transfer time: block size t

Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “ Typical ” Value: 0

Average time to read a 4096- byte block from Megatron 747 3840 rmp, makes one rotation in 1/64 th of a second. Take one millisecond to start and stop, plus one additional millisecond for every 500 cylinders travelled. Seek Time: 1+ 2730/500=6.5 millisecond Rotational Latency: 1/64/2*1000 =7.8 millisecond Transfer Time: 36*7/256+324*8/256=11.109 11.109/360/64 *1000 = 0.5 millisecond The average latency is 6.5 + 7.8 + 0.5 = 14.8 ms

Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

Using Hard Disk Efficiently The time of disk access is much larger than the time likely to be used manipulating that data in main memory so the number of disk accesses need be limited during designing algorithm.

The I/O Model of Computation Dominance of I/O cost When the data is so large it does not fit in main memory, reading and writing disk blocks between disk and memory often takes much longer than it does to process the data once it is in main memory. Algorithms need to change under the I/O model. The evaluation of algorithms for data in secondary storage focuses on the number of disk I/O’s required.

Sorting Data in Secondary Storage There are a number of well-known algorithms for sorting data in main memory. However, when the data is much larger than main memory. We should consider how to reduce times moving each block between main memory and secondary storage.

Merging two sorted lists to make one sorted list.

Two-Phase, Multiway Merge-Sort Phase 1: Repeat sorting main-memory- sized pieces of the data. Phase 2: Merge all the sorted sublists into a single sorted list.

Input buffers, one for each sorted list Pointers to first unchosen records Select smallest unchosen for output Output Buffer Main-memory organization for multiway merging

How large sets of record can be sorted Block size: B bytes Memory Size: M bytes Record: R bytes Total number of record that can be sorted: (M/R)((M/B)-1)

Accessing Hard Disk Quickly Organizing Data by Cylinders Using Multiple Disks Mirroring Disks Disk Scheduling and the Elevator Algorithm Prefetching and Large-Scale Buffering

Organizing Data by Cylinders Disk Access Time = Seek Time + Rotational Delay +Transfer Time 6.5 ms + 7.8 ms + 0.5ms Sorting 10,000,000 records by Two-Phase, Multiway Merge takes 250 minutes Blocks distributed randomly on disk. The organization of blocks by cylinders. One phase 2.15 minutes + Second phase 125 minutes Place blocks that are accessed together on the same cylinder so we can often avoid seek time, and possibly rotational latency.

Using Multiple Disks Megatron 747 ( four platters with eight surfaces) Megatron 737 ( one platter with two surfaces) X 4 Two-Phase, Multiway Merge-Sort 1.Phase 1: Speed-up 4 times 2.Phase 2: Speed-up 2~3 times Divide the data among several smaller disks rather than one large one. Having more head assemblies can go after blocks independently and increase the number of block accesses per unit time

Mirroring Disks Enhance reliability Speed up reading but not writing

Disk Scheduling and the Elevator Algorithm Cylinder of Request First time available 1000 0 3000 0 7000 0 2000 20 8000 30 5000 40 1000 8.3 3000 21.6 7000 38.9 8000 50.2 5000 65.5 2000 80.8 Cylinder of Request Time completed 1000 8.3 3000 21.6 7000 38.9 2000 58.2 8000 79.5 5000 94.8 Cylinder of Request Time completed Arrival times for six block- access requests Finishing times for block accesses using the elevator algorithm Finishing times for block accesses using the first- come-first-served algorithm

Prefetching and Large-Scale Buffering Input Buffer 1 Input Buffer 2 merge Disk read Prefetch blocks to main memory in anticipation of their later use. Using track-sized or cylinder-sized output buffers can eliminate seek time and rotational latency. 1.Store the sorted sublists on whole, consecutive cylinders, with the blocks on each track being consecutive blocks of the sorted sublist. 2.Read whole tracks or whole cylinders whenever we need some more records from a given list. Output Buffer 1 Output Buffer 2 merge Disk write

Keeping Hard Disk Safely Intermittent failure Media decay Write failure Disk crashes

Intermittent Failures Disk Reading (W, S) W: the data in the sector that is read S: status bit that tells whether or not the read was successful. Disk Reading S== “bad” S == “good” W We may be fooled. Disk WritingDisk ReadingStatus Checking

Checksums If there is an odd number of 1’s among a collection of bits, we say the bits have odd parity, or that their parity bit is 1. If there is an even number of 1’s among a collection of bits, we say the bits have even parity, or that their parity bit is 0. 01101000 ------- 011010001 11101110 ------- 111011100

Stable Storage X XLXL XRXR While checksums will almost certainly detect the existence of a media or a failure to read or write correctly. it does not help us correct the error. To deal with the problems, we can implement a police known as stable storage. The stable-storage writing policy: (1)Write the value of X into X L. Check that the value has status “good”. If not, repeat the write. After a set number of write attempts, fix-up X L. (2) Repeat (1) for X R. The stable-storage reading policy: (1)To obtain the value of X, read X L. If status “ bad” is returned, repeat the read a set number of times. If a value with status “ good” is eventually returned, take that value as X. (2)If we cannot read X L, repeat (1) with X R.

Error-Handling Capabilities of Stable Storage Media failure If one fails, read the other. Write failure Failure occurred during writing X L, Copy X R to X L ; Failure occurred after writing X L, copy X L to X R

Recovery from Disk Crashes RAID (Redundant Arrays of Independent Disks ) has been developed to reduce the risk of data loss by disk crashes.

RAID 1 Data DiskRedundant Disk Mirroring

RAID 4 Disk 1 ： 11110000 Disk 2 ： 10101010 Disk 3 ： 00111000 The redundant disk will have the following parity check bits ： Disk 4 ： 01100010 While mirroring disks uses as many redundant disks as there are data disks, RAID 4 uses only one redundant disk no matter how many data disks there are.

Reading Reading blocks from a data disk is no different from reading blocks from any disk. In some circumstances, we can actually get the effect of two simultaneous read from one of the data disks. Suppose Disk 1 is busy and we want to read it, while none of the other disks are busy. Disk 2 ： 10101010 Disk 3 ： 00111000 Disk 4 ： 01100010 If we take the modulo-2 sum of the bits in each column. Disk 1: 11110000

Writing Disk 1 ： 11110000 Disk 2 ： 10101010 ----- 11001100 Disk 3 ： 00111000 + 01100110Redundant 4: 01100010 00000100 +

Failure Recovery disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010 disk 2 is : 10101010

RAID 5 RAID 4 suffers from a bottleneck defect that we can see when re-examine the process of writing a new data block. RAID 5 treating each disk as the redundant disk for some of the blocks. Disk 1 Disk 2Disk 3

Coping With Multiple Disk Crashes (RAID 6) Data DiskRedundant Disk 12 3 4 5 6 7 1 1 1 0 1 0 0 11 0 1 0 1 0 1 0 1 1 0 0 1 a)Every possible column of three 0’s and 1’s, except for the all-0 column. b)The columns for the redundant disks have a single 1. c)The columns for the data disks each have at least two 1’s.

Writing DiskContent 1 ） 11110000 2 ） 10101010 3 ） 00111000 4 ） 01000001 5 ） 01100010 6 ） 00011011 7 ） 10001001 Disk Content 1 ） 11110000 2 ） 00001111 3 ） 00111000 4 ） 01000001 5 ） 11000111 6 ） 10111110 7 ） 10001001

Failure Recovery Disk Content 1 ） 11110000 2 ） ???????? 3) 00111000 4) 01000001 5) ???????? 6) 10111110 7) 10001001 Disk Content 1 ） 11110000 2 ） 00001111 3) 00111000 4) 01000001 5) ???????? 6) 10111110 7) 10001001 Disk Content 1 ） 11110000 2 ） 00001111 3) 00111000 4) 01000001 5) 11000111 6) 10111110 7) 10001001 Disk 2 and Disk 5 failure Disk 2 recovery from Disk 1, 4, 6 Disk 5 recovery from Disk 1, 2, 3

Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?

Similar presentations

Presentation on theme: "Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?

Similar presentations

Presentation on theme: "Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?"— Presentation transcript:

Similar presentations

About project

Feedback