Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?

Slides:



Advertisements
Similar presentations
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
The Memory Hierarchy fastest, perhaps 1Mb
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
Storage.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Chapter 2. Data Storage Chapter 2.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
2.1 Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Section 13.2 – Secondary storage management (Former Student’s Note)
Section 13.1 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Data Storage and Querying in Various Storage Devices.
Section 13.2 – Secondary storage management (Former Student’s Note)
Chapter 2: Computer-System Structures
Multiple Platters.
Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Oracle SQL*Loader
Lecture 11: DMBS Internals
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Section 13.1 – Secondary storage management (Former Student’s Note)
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?

Outline Memory Hierarchy Using Hard Disks Efficiently Accessing Hard Disks Quickly Keeping Hard Disks Safely Mechanics of Hard Disks

The Memory Hierarchy Tertiary Storage Main memory Cache As Virtual Memory Disk File System DBMS Programs, Main-memory DBMS’s SpeedCostCapacity Small Large Fast Slow High Low Secondary Storage

Cache Capacity Up to 1 megabyte Speed between cache and processor: 10 nanosecond Speed between cache and memory: 100 nanoseconds

Main Memory Capacity up to 10 gigabytes Random Access Access time in nanosecond range

Virtual Memory Most machines use 32-bit address space which is up to 4 gigabytes. Main memory is usually 256 Megabytes. Virtual memory is supported by the machine hardware and the operating system through paging mechanism. Main-memory database system can be implemented by virtual memory.

Secondary Storage Significantly more capacious than main memory Significantly cheaper than main memory Significantly slower than main memory Magnetic Disks are usually used as secondary storage.

Tertiary Storage Data volumes measured in terabytes Slow and cheaper Access times varying widely Ad-hoc Tape Storage, Optical Disk Juke Boxes and Tape Silos are the common tertiary storages.

Volatile and Nonvolatile Storage Volatile device “forgets” its contents when the power goes off, such as main memory. Nonvolatile device keeps its contents intact in the presence of power failures, such as magnetic disk, tapes, flash memory.

Tertiary Secondary Zip disk Floppy disk Main Memory Cache Access time versus capacity for various levels of memory hierarchy The horizontal axis measures seconds in exponents of 10. The vertical axis measures bytes in exponents of 10.

Mechanics of Disks cylinder Platter = 2 surfaces disk heads A typical disk tracks sector gap Top view of a disk surface

Disk Controller Controlling disk head to move and position the heads at a particular radius Selecting a surface, and selecting a sector from the track on that surface that is under the head Transferring data

Processor Main Memory Disk Controller Disks Bus Schematic of a simple computer system

Disk Storage Characteristics The typical measures : ----Rotation Speed of the Disk Assembly ----Number of Platters per Unit ----Number of Tracks per Surface ----Number of Bytes per Track Example : Megatron 747’s characteristics : RPM ---- There are four platters providing eight surfaces ---- There are 8192 tracks per surface ---- There are ( on average) 256 sectors per track, 512 bytes per sector

Capacity of Megatron surfaces X 8192 tracks X 256 sectors X 512 Bytes 8 gigabytes

Block Address: Physical Device Cylinder # Surface # Sector

Disk Access Characteristics Head here Rotation Block we want The cause of rotational latency 1 x MAX in range 3x~20x Seek time varies with distance traveled Average travel distance as a function of initial head position Starting track Average travel Disk Access Time = Seek Time + Rotational Delay + Transfer Time + Other Cylinders traveled

Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “ Typical ” S: 10 ms  40 ms

Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM)

Transfer Rate: t “ typical ” t: 1  3 MB/second transfer time: block size t

Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory “ Typical ” Value: 0

Average time to read a byte block from Megatron rmp, makes one rotation in 1/64 th of a second. Take one millisecond to start and stop, plus one additional millisecond for every 500 cylinders travelled. Seek Time: /500=6.5 millisecond Rotational Latency: 1/64/2*1000 =7.8 millisecond Transfer Time: 36*7/ *8/256= /360/64 *1000 = 0.5 millisecond The average latency is = 14.8 ms

Cost for Writing similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

Using Hard Disk Efficiently The time of disk access is much larger than the time likely to be used manipulating that data in main memory so the number of disk accesses need be limited during designing algorithm.

The I/O Model of Computation Dominance of I/O cost When the data is so large it does not fit in main memory, reading and writing disk blocks between disk and memory often takes much longer than it does to process the data once it is in main memory. Algorithms need to change under the I/O model. The evaluation of algorithms for data in secondary storage focuses on the number of disk I/O’s required.

Sorting Data in Secondary Storage There are a number of well-known algorithms for sorting data in main memory. However, when the data is much larger than main memory. We should consider how to reduce times moving each block between main memory and secondary storage.

Merging two sorted lists to make one sorted list.

Two-Phase, Multiway Merge-Sort Phase 1: Repeat sorting main-memory- sized pieces of the data. Phase 2: Merge all the sorted sublists into a single sorted list.

Input buffers, one for each sorted list Pointers to first unchosen records Select smallest unchosen for output Output Buffer Main-memory organization for multiway merging

How large sets of record can be sorted Block size: B bytes Memory Size: M bytes Record: R bytes Total number of record that can be sorted: (M/R)((M/B)-1)

Accessing Hard Disk Quickly Organizing Data by Cylinders Using Multiple Disks Mirroring Disks Disk Scheduling and the Elevator Algorithm Prefetching and Large-Scale Buffering

Organizing Data by Cylinders Disk Access Time = Seek Time + Rotational Delay +Transfer Time 6.5 ms ms + 0.5ms Sorting 10,000,000 records by Two-Phase, Multiway Merge takes 250 minutes Blocks distributed randomly on disk. The organization of blocks by cylinders. One phase 2.15 minutes + Second phase 125 minutes Place blocks that are accessed together on the same cylinder so we can often avoid seek time, and possibly rotational latency.

Using Multiple Disks Megatron 747 ( four platters with eight surfaces) Megatron 737 ( one platter with two surfaces) X 4 Two-Phase, Multiway Merge-Sort 1.Phase 1: Speed-up 4 times 2.Phase 2: Speed-up 2~3 times Divide the data among several smaller disks rather than one large one. Having more head assemblies can go after blocks independently and increase the number of block accesses per unit time

Mirroring Disks Enhance reliability Speed up reading but not writing

Disk Scheduling and the Elevator Algorithm Cylinder of Request First time available Cylinder of Request Time completed Cylinder of Request Time completed Arrival times for six block- access requests Finishing times for block accesses using the elevator algorithm Finishing times for block accesses using the first- come-first-served algorithm

Prefetching and Large-Scale Buffering Input Buffer 1 Input Buffer 2 merge Disk read Prefetch blocks to main memory in anticipation of their later use. Using track-sized or cylinder-sized output buffers can eliminate seek time and rotational latency. 1.Store the sorted sublists on whole, consecutive cylinders, with the blocks on each track being consecutive blocks of the sorted sublist. 2.Read whole tracks or whole cylinders whenever we need some more records from a given list. Output Buffer 1 Output Buffer 2 merge Disk write

Keeping Hard Disk Safely Intermittent failure Media decay Write failure Disk crashes

Intermittent Failures Disk Reading (W, S) W: the data in the sector that is read S: status bit that tells whether or not the read was successful. Disk Reading S== “bad” S == “good” W We may be fooled. Disk WritingDisk ReadingStatus Checking

Checksums If there is an odd number of 1’s among a collection of bits, we say the bits have odd parity, or that their parity bit is 1. If there is an even number of 1’s among a collection of bits, we say the bits have even parity, or that their parity bit is

Stable Storage X XLXL XRXR While checksums will almost certainly detect the existence of a media or a failure to read or write correctly. it does not help us correct the error. To deal with the problems, we can implement a police known as stable storage. The stable-storage writing policy: (1)Write the value of X into X L. Check that the value has status “good”. If not, repeat the write. After a set number of write attempts, fix-up X L. (2) Repeat (1) for X R. The stable-storage reading policy: (1)To obtain the value of X, read X L. If status “ bad” is returned, repeat the read a set number of times. If a value with status “ good” is eventually returned, take that value as X. (2)If we cannot read X L, repeat (1) with X R.

Error-Handling Capabilities of Stable Storage Media failure If one fails, read the other. Write failure Failure occurred during writing X L, Copy X R to X L ; Failure occurred after writing X L, copy X L to X R

Recovery from Disk Crashes RAID (Redundant Arrays of Independent Disks ) has been developed to reduce the risk of data loss by disk crashes.

RAID 1 Data DiskRedundant Disk Mirroring

RAID 4 Disk 1 : Disk 2 : Disk 3 : The redundant disk will have the following parity check bits : Disk 4 : While mirroring disks uses as many redundant disks as there are data disks, RAID 4 uses only one redundant disk no matter how many data disks there are.

Reading Reading blocks from a data disk is no different from reading blocks from any disk. In some circumstances, we can actually get the effect of two simultaneous read from one of the data disks. Suppose Disk 1 is busy and we want to read it, while none of the other disks are busy. Disk 2 : Disk 3 : Disk 4 : If we take the modulo-2 sum of the bits in each column. Disk 1:

Writing Disk 1 : Disk 2 : Disk 3 : Redundant 4:

Failure Recovery disk 1: disk 2: ???????? disk 3: disk 4: disk 2 is :

RAID 5 RAID 4 suffers from a bottleneck defect that we can see when re-examine the process of writing a new data block. RAID 5 treating each disk as the redundant disk for some of the blocks. Disk 1 Disk 2Disk 3

Coping With Multiple Disk Crashes (RAID 6) Data DiskRedundant Disk a)Every possible column of three 0’s and 1’s, except for the all-0 column. b)The columns for the redundant disks have a single 1. c)The columns for the data disks each have at least two 1’s.

Writing DiskContent 1 ) ) ) ) ) ) ) Disk Content 1 ) ) ) ) ) ) )

Failure Recovery Disk Content 1 ) ) ???????? 3) ) ) ???????? 6) ) Disk Content 1 ) ) ) ) ) ???????? 6) ) Disk Content 1 ) ) ) ) ) ) ) Disk 2 and Disk 5 failure Disk 2 recovery from Disk 1, 4, 6 Disk 5 recovery from Disk 1, 2, 3