Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Storage Memory Hierarchy Disks Source: our textbook.

Similar presentations

Presentation on theme: "1 Data Storage Memory Hierarchy Disks Source: our textbook."— Presentation transcript:

1 1 Data Storage Memory Hierarchy Disks Source: our textbook

2 2 Strawman Implementation uUse UNIX file system to store relations, e.g. wStudents(name, id, dept) in file /usr/db/Students uOne line per tuple, each component stored as character string, with # as a separator, e.g. wtuple could be: Smith#123#CS uStore schema in /usr/db/schema, e.g.: wStudents#name#STR#id#INT#dept#STR

3 3 Strawman cont'd uTo execute SELECT * FROM R WHERE : schema file for R 2.check that is valid for R 3.print out column headers file R and for each line  check the condition  print the line if true

4 4 Strawman cont'd uTo execute a query involving a join of two relations R and S: for each tuple (line) in R do for each tuple (line) in S do if the condition is satisfied then display the desired attributes

5 5 What's Wrong? uThe storage of the tuples on disk is inflexible: if a student changes major from EE to ECON, entire file must be rewritten uSearch is very expensive (read entire relation) uQuery processing is "brute force" -- there are faster ways to do joins, etc. uData is not buffered between disk and main memory uNo concurrency control uNo reliability in case of a crash

6 6 How to Fix these Problems uTake advantage of the characteristics of computer hardware with clever algorithms to do things better uWe will cover wdata storage (predominantly disks) whow to represent data elements windexes wquery optimization wfailure recovery wconcurrency control

7 7 Memory Hierarchy ucache umain memory usecondary storage (disk) utertiary storage (tapes, CD-ROM) faster, smaller, more expensive slower, larger, cheaper

8 8 Cache Memory uTransfer a few bytes at a time between cache and main memory: instruction, integer, floating point, short string uProcessor operates on instruction and data in the cache uTypical size: 1 Mbyte (2 20 bytes) uTypical speed to/from main memory: 100 nanosec (1 nanosec = 10 -9 sec)

9 9 Main Memory uTypical size: 100 Mbytes to 10 Gbytes (1 Gbyte = 2 30 bytes) uTypical access speed (to read or write): 10 to 100 nanosec uAt least 100 times larger than cache uAt least 10 times slower than cache

10 10 Secondary Storage uUsually disk uDivided logically into blocks, unit of transfer between main memory (called disk I/O) uTypical size: 100 Gbytes uTypical speed: 10 millisec (10 -3 sec) uAt least 100 times larger than main memory uMuch slower than main memory and much much slower than cache: can execute several million instructions during one disk I/O

11 11 Tertiary Storage uTape(s) uCD-ROM(s) uAt least 1000 times slower than secondary storage uAt least 1000 times larger than secondary storage

12 12 Volatile vs. Nonvolatile uStorage is volatile if the data is lost when the power is gone uUsually main memory is volatile uUsually secondary and tertiary storage is nonvolatile uThus every change made to a database in main memory must be backed up on disk before it can be permanent.

13 13 Disks spindle disk heads platters: each has two surfaces, each surface consists of tracks (concentric rings) one head per surface, very close to surface, does the reading and writing

14 14 More on Disks orange ring is a track black squares are gaps, which don't hold data part of track between two gaps is a sector one or more sectors make a block

15 15 Disk Controller ucontrols mechanical actuator that moves the heads in and out (radius, distance from spindle) wone track from each surface at the same radius forms a cylinder uselects a surface uselects a sector (senses when that sector is under the corresponding head) utransfers bits

16 16 Typical Values uRotation speed: 5400 rmp uNumber of platters: 5 uNumber of tracks/surface: 20,000 uNumber of sectors/track: 500 uNumber of bytes/sector: thousands

17 17 Disk Latency for a Read uTime between issuing command to read a block and when contents of block appear in main memory: wtime for processor and disk controller to process request, including resolving any contention (negligible) wseek time: time to move heads to correct radius (0 to 40 millisec) wrotational latency: time until first sector of block is under the head (5 millisec) wtransfer time: until all sectors of the block have passed under the head; depends on rotation speed and size of block

18 18 Disk Latency for Updates uFor a write: like reading plus verification (read back and compare) uTo modify a block: wread it into main memory wchange it in main memory wwrite it back to disk

19 19 Moral of the Story uDisks accesses are orders of magnitude slower than accesses to main memory. uThey are unavoidable in large databases. uThus do everything possible to minimize them. uCan lead to different algorithms than for main memory model.

20 20 Speeding Up Disk Accesses 1.Place blocks accessed together on same cylinder wreduces seek time and rotational latency 2.Divide data among several disks whead assemblies can move in parallel 3.Mirror a disk: make copies of it wspeeds up reads: get data from disk whose head is closest to desired block wno effect on writes: write to all copies walso helps with fault tolerance

21 21 Speeding up Disk Accesses 4.Be clever about order in which read and write requests are serviced, i.e., algorithm in OS or DBMS or disk controller wEx: elevator algorithm 5.Prefetch blocks to main memory in anticipation of future use (buffering)

22 22 Elevator Algorithm uWorks well when there are many "independent" read and write requests, i.e., don't need to be done in a particular order, that are randomly distributed over the disk. uDisk head assembly sweeps in and out repeatedly uWhen heads pass a cylinder with pending requests, they stop to do the request uWhen reaching a point with no pending requests ahead, change direction

23 23 Prefetching uSuppose you can predict order in which blocks will be requested from disk. uLoad them into main memory buffers before they are needed. uHave flexibility to schedule the reads efficiently uCan also delay writing buffered blocks if the buffers are not needed immediately

24 24 Disk Failures uIntermittent failure: attempt to read or write a sector fails but a subsequent try succeeds uImpossible to read sector uImpossible to write a sector uDisk crash: entire disk becomes unreadable

25 25 Coping with Intermittent Failures uUse redundant bits in each sector uStore checksums in the redundant bits uAfter a read, check if checksums are correct; if not then try again uAfter a write, can do a read and compare with value written, or be optimistic and just check the checksum of the read

26 26 Checksums uSuppose we use one extra bit, a parity bit. wif the number of 1's in the data bits is odd, then set the parity bit to 1, otherwise to 0 uThis is not foolproof: 101 and 110 both have even parity so checksum would be 0 for both uUse n parity bits in the checksum: wparity bit 1 stores parity of every n-th bit, starting with first bit, wparity bit 2 stores parity of every n-th bit, starting with second bit, etc. wProbability of missing an error is 1/2 n

27 27 Coping with Permanent Read/Write Errors uStable storage policy: uEach "virtual" sector X is represented by two real sectors, X L and X R. uTo write value v to X: wrepeat {write v to X L, read from X L } until read's checksum is correct or exceed max # of tries wdo the same thing with X R wif X L or X R is discovered to be bad, then must find a substitute

28 28 Handling Write Failures uSuppose write(s) to X L all fail. wThen old value is safe in X R. uSuppose write(s) to X R all fail. wThen new value is safe in X L. uAssumption is that it is highly unlikely for two sectors to fail around the same time.

29 29 More on Stable Storage uTo read from X: wrepeatedly read X L until checksum is good or exceed max # tries wif read of X L failed then repeatedly read X R until checksum is good or exceed max # tries uHandles permanent read failures, unless both X L and X R fail about the same time (unlikely)

30 30 Coping with Disk Crashes u"Mean time to failure" of a disk is length of time by which 50% of such disks will have had a head crash uGoal is to have a much longer "mean time to data loss" for your system uKey idea: use redundancy uDiscuss three such approaches next…

31 31 Mirroring (RAID Level 1) uKeep another copy of each disk: write to both, read from one. uOnly way data can be lost is if second disk crashes while first is being repaired. uIf mean time to crash of a single disk is 10 years and it takes 3 hours to repair a disk, then mean time to data loss is 146,000 years.

32 32 Parity Blocks (RAID Level 4) uDrawback of previous scheme is that you need double the number of disks. uInstead use one spare disk no matter how many data disks you have. uBlock i of the spare disk contains the parity checks for block i of all the data disks. uIf spare disk fails, get a new spare. uIf a data disk fails, recompute its data from the other data disks and the spare.

33 33 RAID Level 5 uDrawback of previous scheme is that spare disk is a bottleneck. uInstead, let each data disk also serve as the spare disk for some blocks. All these assume only one crash at a time. RAID Level 6 uses error-correcting codes to be able to handle multiple crashes.

Download ppt "1 Data Storage Memory Hierarchy Disks Source: our textbook."

Similar presentations

Ads by Google