Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.

Chapter Ten

Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive and limited in capacity Volatile Secondary memory Data on SM cannot be processed by CPU directly Slow, larger capacity, less expensive Non-volatile Secondary storage is the media of database storage

Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to in- memory operations, so must be planned carefully!

Why Not Store Everything in Main Memory? Costs too much Main memory is volatile. We want data to be saved between runs. (Obviously!) Typical storage hierarchy: Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage).

Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequential. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS performance!

Components of a Disk Platters  The platters spin (say, 90rps). Spindle  The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Disk head Arm movement Arm assembly  Only one head reads/writes at any one time. Tracks Sector  Block size is a multiple of sector size (which is fixed).

Disk Storage Devices The division of a track into sectors is hard-coded on the disk surface and cannot be changed. One type of sector organization calls a portion of a track that subtends a fixed angle at the center as a sector. A track is divided into blocks. The block size B is fixed for each system. Typical block sizes range from B=512 bytes to B=4096 bytes. Whole blocks are transferred between disk and main memory for processing.

Accessing a Disk Page Time to access (read/write) a disk block: seek time ( moving arms to position disk head on track ) rotational delay ( waiting for block to rotate under head ) transfer time ( actually moving data to/from disk surface ) Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?

Buffer Management in a DBMS Data must be in RAM for DBMS to operate on it! Table of pairs is maintained. DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy

Records and Files Record consists of a collection of related data values or items (or fields, column etc) A file is a sequence of records made up of Fixed-length records Variable-length records A database is stored as a collection of files

Record Formats: Fixed Length Information about field types same for all records in a file; stored in system catalogs. Finding i’th field does not require scan of record. Base address (B) L1L2L3L4 F1F2F3F4 Address = B+L1+L2

Fixed-Length Records Simple approach: Store record i starting from byte n  (i – 1), where n is the size of each record. Record access is simple but records may cross blocks Modification: do not allow records to cross block boundaries Deletion of record i: alternatives: move records i + 1,..., n to i,..., n – 1 move record n to i do not move records, but link all free records on a free list

13 Fixed Length Records –Deletion Store the address of the first deleted record in the file header. Use this first record to store the address of the second deleted record, and so on Can think of these stored addresses as pointers since they “point” to the location of a record. More space efficient representation: reuse space for normal attributes of free records to store pointers. (No pointers stored in in-use records.)

Variable-Length Records (b & c on the diagram)

Record Organization (on Disks)

Files of Records Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. FILE : A collection of pages, each containing a collection of records. Must support: insert/delete/modify record read a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved)

File Organization & Access Method File organization refers to physical arrangement of data in a file into records and pages of the secondary storage Access method refers to the steps involved in storing and retrieving record from a file Some common file organizations and access methods are discussed now

File Organization and Access Method – Unordered File Also called a heap or a pile file. Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de- allocated. New records are inserted at the end of the file. To search for a record, a linear search through the file records is necessary. This requires reading and searching half the file blocks on the average, and is hence quite expensive. Record insertion is quite efficient. Reading the records in order of a particular field requires sorting the file records

File Organization and Access Method – Ordered File Also called a sequential file. File records are kept sorted by the values of an ordering field. Insertion is expensive: records must be inserted in the correct order. It is common to keep a separate unordered overflow (or transaction ) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file. A binary search can be used to search for a record on its ordering field value. This requires reading and searching log2 of the file blocks on the average, an improvement over linear search. Reading the records in order of the ordering field is quite efficient

Ordered File

File Organization and Access Method – Hash Files Hashing for disk files is called External Hashing The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1,..., bucket M-1. Typically, a bucket corresponds to one (or a fixed number of) disk block. One of the file fields is designated to be the hash key of the file. The record with hash key value K is stored in bucket i, where i=h(K), and h is the hashing function. Search is very efficient on the hash key. Collisions occur when a new record hashes to a bucket that is already full. An overflow file is kept for storing such records. Overflow records that hash to each bucket can be linked together.

Hash Files

23 Indexing Structures for Files Index is a data structure that allows the DBMS to locate a particular records in a file more quickly and thereby speed response to user queries An index file consists of records (called index entries) of the form Any subset of the fields of a relation can be the search key for an index on the relation. Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation). Index files are typically much smaller than the original file search-key pointer

Types of Indexes There are different types of indexes Single-level Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes

Single Level Index A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified on one field of the file (although it could be specified on several fields) One form of an index is a file of entries, which is ordered by field value

Types of Single Level Indexes Primary Index Defined on an ordered data file The data file is ordered on a key field Includes one index entry for each block in the data file; the index entry has the key field value for the first record in the block, which is called the block anchor A similar scheme can use the last record in a block. A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value.

Primary Index

An Example of Dense Index File

Types of Single Level Indexes Clustering Index Defined on an ordered data file The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record. Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value. It is another example of nondense index where Insertion and Deletion is relatively straightforward with a clustering index

Clustering Index

Clustering Index With a Separate Block Per Record Group

Types of Single Level Indexes Secondary Index A secondary index provides a secondary means of accessing a file for which some primary access already exists. The secondary index may be on a field which is a candidate key and has a unique value in every record, or a nonkey with duplicate values. The index is an ordered file with two fields. The first field is of the same data type as some nonordering field of the data file that is an indexing field. The second field is either a block pointer or a record pointer. There can be many secondary indexes (and hence, indexing fields) for the same file. Includes one entry for each record in the data file; hence, it is a dense index

Secondary Index

Multi-Level Indexes Because a single-level index is an ordered file, we can create a primary index to the index itself ; in this case, the original index file is called the first- level index and the index to the index is called the second-level index. We can repeat the process, creating a third, fourth,..., top level until all entries of the top level fit in one disk block A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block

Multi-Level Index

Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.

Similar presentations

Presentation on theme: "Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.

Similar presentations

Presentation on theme: "Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive."— Presentation transcript:

Similar presentations

About project

Feedback