Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Published byModified over 6 years ago
Presentation on theme: "Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers."— Presentation transcript:
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers –Clustering –Log disks –Fragmentation Buffer Management –Overview of Buffers –Buffer replacement strategies –LRU, MRU, toss-immediate, pinning –Other buffer details (buffer interaction with existing OS, variations on buffers, etc.)
File Organization The database is stored logically as a collection of files. Each file is a sequence of records A record is a sequence of fields Easy so far, but as we just finished discussing, anything stored on disk is stored in blocks, which are a physical constraint unrelated to the storage system used for files. So how do we organize a file into blocks and records? –formatting fields within a record –formatting records within a block –assigning records to blocks.
Fixed-Length Records Simplest approach. We know the length of each record, and they are all the same –store record i starting from byte n * (i - 1), where n is record length –record access is simple
Fixed-Length Records Problems: –records may cross blocks normal modification: don’t permit records to cross block boundaries –deletion of record i leaves a gap, which requires some way of dealing with the empty space. –E.G., record 2 (A-215) is deleted from the example block on the right
Fixed-Length Records: Deletion One simple fix is to shift all the records down to fill the gap, as shown on the right. This involves a lot of work, so it might be slow
Fixed-Length Records: more Deletion Another fix would be to shift the last record (record n) to the deleted position I Much less work Not useful if the records are stored in order (sorted)
Fixed-Length Records: still more Deletion Another possibility is to not move records at all Maintain a header at the beginning of the file Store a link to the list of addresses of deleted records Use each deleted record to store the link to the next deleted record Essentially a linked list, often called a free list
Variable Length Records Can occur in a database system in several ways –storage of multiple record types in a file –record types that allow variable lengths for one or more fields –record types that allow repeating fields Several ways to store variable length records –attach an “end of record” symbol (delimiter) to mark the end of each record –store the length of the record at the beginning of each record –store header information at the beginning of the file with the location and length of each record –these techniques can be applied at the file, block, or record level
Variable Length Records Files: –delimit each record within the file –store a length field at the beginning of each record –store header information at the beginning of the file with the location and length of each record within the file Blocks: –delimit each record within the block –store a length field at the beginning of each record –store header information at the beginning of the block with the location and length of each record inside the block Records: –delimit each field within the record –store a length field at the beginning of each field –store header information at the beginning of the record with the location and length of each field
Variable Length Records Two more techniques for storing variable-length records –use fixed-length fields reserve space -- if there is a maximum space, just reserve that, and mark unused space with a special null (end-of-record) symbol wasteful if the maximum record length is much larger than the average record length –list representation represent variable-length records by lists of fixed-length records, chained together by pointers useful for variable-length records caused by repeating same-length fields we don’t want a single field of the variable-length record to cross the boundary of two fixed-length records in its representation, so this can also be wasteful of space
Organizing Records in a Block Two major ways we can organize records within a block (disk page) –fixed-packed (contiguous storage) –slotted page structure (indexed heap) 1) fixed-packed -- records are stored contiguously –highly inflexible –records may span over block boundary –fragmentation with deletions and insertions –external pointers may prevent internal block reorganization -- records are pinned to their address in the block
Organizing Records in a Block 2) slotted page structure –an initial block header storing block address and offset is used to reference the record –records are indexed within a block –insertions and deletions are easy (there are no assumptions about contiguity of records and record-address startpoints to deal with) –records may be rearranged within the block without concerns about external pointers –records are not pinned within the block
Organizing Records in a Files Given a set of records, how do we organize them in a file? Three possible methods are: –1. Heap -- no order at all. A record can be placed anywhere in the file where there is space –2. Sequential -- records are stored in a sorted order according to the value of a search key –3. Hashing -- a hash function computed on some attribute of each record is used to specify in which block of the file the record should be placed –Records of each relation are often stored in separate files. Sometimes it is useful to use a clustering file organization, where records of several relations might be stored in a single file.
Heap File Organization Heap -- no order at all. A record can be placed anywhere in the file where there is space –easy insert, easy delete –lack of any structure makes queries (including finding a particular record) very difficult –not usually useful for anything except very small relations
Sequential File Organization Sequential -- records are stored in a sorted order according to the value of a search key –designed for efficient queries in sorted order –very suitable for applications that require sequential processing of the entire file –difficult to maintain sorted order with insert/delete –deletions can use a free list (pointer chain) to mark empty space as previously described
Sequential File Organization Insertions use the following method: –locate the location to be inserted –if there is space there, insert with no more work –otherwise insert the record in an overflow block –in either case the pointer chain must be updated Every so often we need to reorganize the whole file to restore sequential order
Clustering File Organization Simple file structure stores each relation in a separate file –tuples can be represented as fixed-length records –easy to implement –well-suited to small databases Large databases often attempt to store many relations in one file using a clustering file organization E.G., relations customer and depositor shown to the right:
Clustering File Organization depositor relation stores the different accounts that a particular customer has customer relation stores the address information of a given customer Both relations use customer- name as a key Some common queries on the two relations join them based on the customer-name attribute
Clustering File Organization Storing the two relations together, sorted on customer- name, allows the join to be computed much more quickly There is a price to pay -- some operations are now more expensive (slower) for example, consider select* fromcustomer sequential pass through the customer relation is now hard
Clustering File Organization To allow sequential access through all tuples of the customer relation, we chain together all the tuples of that relation using pointers clustering results in variable- size records Careful use of clustering can produce significant performance gains