Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.

Similar presentations


Presentation on theme: "1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure."— Presentation transcript:

1 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure

2 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure DBMS and Storage/File Structure Why do we need to know about storage/file structure  Many database technologies are developed to utilize the storage architecture/hierarchy  Data in the database needs to be organized and stored/retrieved efficiently

3 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage Hierarchy Magnetic Tape Optical Disk Magnetic Disk Flash Memory Cache unit price Memory Volatile primary storage Non-volatilespeed Secondary storage Tertiary storage

4 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Primary Storage (Volatile) Cache  Speed: 7 to 20 ns ( 1 nanosecond = 10 –9 seconds )  Capacity: A typical PC level 2 cache 64KB-2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. Main memory:  Speed: 10s to 100s of nanoseconds;  Capacity: Up to a few Gigabytes widely used currently per-byte costs have decreased roughly factor of 2 every 2 to 3 years)

5 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Secondary Storage (Non-volatile) Flash memory  Speed: read speed similar to main memory, write is much slower  Capacity: 32M to 512M currently  Forms: SmartMedia, memory stick, secure digital, BIOS  Cost: roughly same as main memory Magnetic-disk  Capacities: up to roughly 100 GB currently Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years)

6 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Tertiary Storage (Non-volatile) Optical storage  CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms  CD-RW, DVD-RW, and DVD-RAM  Reads and writes are slower than with magnetic disk  Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data

7 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Tertiary Storage (Non-volatile) Tape storage  non-volatile, used primarily for backup (to recover from disk failure), and for archival data  sequential-access – much slower than disk  very high capacity (40 to 300 GB tapes available)  Tape jukeboxes available for storing massive amounts of data hundreds of terabytes (1 terabyte = 10 9 bytes) to even a petabyte (1 petabyte = 10 12 bytes)

8 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Between Memory and Disk The permanent residency of database is mostly on disk In database, cost is usually measured by the number of disk I/O But disks are too slow and we need memory to be the buffers … but memory is volatile  this introduced a number of issues

9 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Lets talk about disk

10 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Disk Subsystem Disk interface standards families ATA (AT adaptor) range of standards SCSI (Small Computer System Interconnect) range of standards Several variants of each standard (different speeds and capabilities)

11 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Disk Speed Seek time (milliseconds) Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) Typical numbers: H 16,000 tracks per platter H sectors per track: 200 – 400 H 512 bytes per sector H 4-16KB per block H 5,400 - 15,000 r p m Access time = seek time + latency Discuss ways to improve disk reading speed

12 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure RAID RAID: Redundant Arrays of Independent Disks  high capacity and high speed by using multiple disks in parallel, and  high reliability by storing data redundantly

13 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Mean time to failure (MTTF) Average time the disk is expected to run continuously without any failure.  Typically 3 to 5 years (1 year = 8,760 hours)  MTTF = 30,000 to 1,200,000 hours for a new disk an MTTF of 1,200,000 hours for a new disk means that given 1000 relatively new disks, on an average one will fail every 1200 hours When number of disks increase, the chance of some disk failure increase proportionally

14 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Parallelism Two main goals of parallelism in a disk system: 1.Load balance multiple small accesses to increase throughput 2.Parallelize large accesses to reduce response time. Basic strategy: Stripping  Compare and contrast bit stripping and byte stripping

15 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Redundancy store extra information that can be used to rebuild information lost in a disk failure Basic strategy: mirroring, parity Mean time to data loss depends on mean time to failure, and mean time to repair  E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to data loss of 500*10 6 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure modes) 10010010 1 Data Parity

16 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure RAID levels

17 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Choice of RAID Levels Level 1 provides much better write performance than level 5  Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes  Level 1 preferred for high update environments such as log disks Level 1 had higher storage cost than level 5  disk drive capacities increasing rapidly (50%/year) whereas disk access times have decreased much less (x 3 in 10 years)  I/O requirements have increased greatly, e.g. for Web servers  When enough disks have been bought to satisfy required rate of I/O, they often have spare storage capacity so there is often no extra monetary cost for Level 1! Level 5 is preferred for applications with low update rate, and large amounts of data Level 1 is preferred for all other applications

18 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Buffer Management Database can not fit entirely in memory, needs memory as a buffer for speed reasons LRU is used in many OS  Spatial and temporal locality due to loops Database has a more predictable behavior  Example: join

19 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure DBMS Buffer Management Strategies Pinned block – memory block that is not allowed to be written back to disk. Toss-immediate strategy – frees the space occupied by a block as soon as the final tuple of that block has been processed Most recently used (MRU) strategy – system must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block. Statistical information based  E.g., the data dictionary is frequently accessed. Heuristic: keep data-dictionary blocks in main memory buffer Buffer managers also support forced output of blocks for the purpose of recovery

20 1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Exercises


Download ppt "1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure."

Similar presentations


Ads by Google