Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin.

Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin

What is an Index? Blocks Holding Records Index Matching Records Value An Index is any Data Structure that takes as input a property of Records - typically the value of one or more fields - and finds the records with that property “quickly ”.

Indexes on Sequential Files 10 40 20 30 50 60 70 80 This is a Sequential File in which tuples are sorted by their Primary Key.

Definitions to Study Index Structures Data File : A Sorted Sequential file Index File : Consisting of Key-Pointer pairs Search Key : A Search Key K in the Index file is associated with a pointer to a Data file record that has Search Key K.

Dense Indexes 40 30 50 60 70 80 20 10 40 20 30 50 60 70 80 There is an entry in the Index File for record in the Data File.

Sparse Indexes 70 50 90 110 130 150 30 10 40 20 30 50 60 70 80 Holds only one Key-Pointer per Data Block. The Key is for the first Record on the Data Block.

Multiple Levels of Indexes 250 170 330 410 490 570 90 10 40 20 30 50 60 70 80 70 50 90 110 130 150 30 10 Adding second level of Sparse Index

Managing Indexes during Data Modification Create / Delete Overflow blocks Insert new blocks in the Sequential order Slide tuples to adjacent blocks.

ActionDense Index Sparse Index Create empty overflow blockNone Delete empty overflow blockNone Create empty Sequential blockNoneInsert Delete empty Sequential blockNoneDelete Insert RecordInsertUpdate (?) Delete RecordDeleteUpdate (?) Slide RecordUpdateUpdate (?) How actions on the Sequential File affect the Index File

Secondary Indexes Why do we need Secondary Indexes? SELECT name, address FROM moviestar WHERE birthday = DATE ’01/09/2008’; Need Secondary Indexes on birthday to help with such queries

Due advent in WWW keeping documents online and document retrieval become one of the largest database problem The most easy approach for document retrieval is to create separate index for each word (Problem: wastage of storage space) The other approach is to use Inverted Index Document Retrieval and Inverted Index

Records is a collection of documents The inverted index itself consist of set of word-pointer pairs. The inverted index pointers refer to position in the bucket file. Pointers in the bucket file can be: Pointers to document Pointers to occurrence of word (may be pair of first block of document and an integer indicating number of word) When points to word we can include some info in bucket array EX. For document using HTML and XML we can also include marking associated with words so we can distinguish between titles headers etc. Inverted Index

 Stemming: Remove suffixes to find stem of each word ( Ex. Plurals can be treated as there singular version.  Stop Words: words such as “the” or “and” are excluded from inverted index.  Ex. With ref. to next fig. if we want to find the document about the dogs that compare them with cats. Difficult to solve with out understanding of text However we could get good hint if we search document that 1.Mention dogs in the title, and 2.Mention cats in an anchor More Information Retrieval Techniques to improve effectiveness

The most commonly used index structure in the commercial systems. Advantages B-trees automatically maintains the levels of index according to file size B-trees mange the space on the blocks so no overflow blocks are needed The structure of B-trees The tree is balanced ( All the paths from root to leaf have the same length Typically three layers: the root, an intermediate layer, and leaves. B-Trees

Keys are distributed among the leaves in sorted order, from left to right At root there are at least two used pointer. (exception of tree with single record) At leaf last pointer points to the next leaf block to the right. At interior node all n+1 pointer can be used (at least n+1/2 are actually used) Important rules about what can appear in the blocks of a B-tree

B-tree allow lookup, insertion, deletion of record using few disk I/O’s 1.If the number of keys per block is reasonably large then rarely we need to split or merge the blocks. And even if we need this operation this are limited to the leaves and there parents. 2.The number of disk I/O to read the record are normally the levels of B-tree plus the one (for lookup) and two (for insert or delete). Ex. Suppose 340 key pointer pairs could fit in one block, suppose avg. block has occupied between min. and max. i.e. the typical block has 255 pointers. With root 255 children and 255^2 = 65025 leaves Suppose among this leaves we have 255^3 or about 16.6 million records That is, files with up to 16.6 million records can be accommodated by 3 levels of B-tree Number of disk I/O can reduced further by keeping B-tree in main memory. EfficiencyofB-tree Efficiency of B-tree

Index Index Disk Failures  Intermittent Failures  Checksums  Stable Storage  Error- Handling Capabilities of Stable Storage

Types of Errors Types of Errors  Intermittent Error: Read or write is unsuccessful.  Media Decay: Bit or bits becomes permanently corrupted.  Write Failure: Neither write or retrieve the data.  Disk Crash: Entire disk becomes unreadable.

Intermittent Failures  If we try to read the sector but the correct content of that sector is not delivered to the disk controller  Check for the good or bad sector  To check write is correct: Read is performed  Good sector and bad sector is known by the read operation

Checksums  Each sector has some additional bits, called the checksums  Checksums are set on the depending on the values of the data bits stored in that sector  Probability of reading bad sector is less if we use checksums  For Odd parity: Odd number of 1’s, add a parity bit 1  For Even parity: Even number of 1’s, add a parity bit 0  So, number of 1’s becomes always even

Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100

By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected Error detecting can be improved by keeping one bit for each byte Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256 Same way if n independent bits are used then the probability is only 1/(2^n) of missing error

Stable Storage To recover the disk failure known as Media Decay, in which if we overwrite a file, the new data is not read correctly Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Stable Storage Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Recovery from Disk Crashes: Ways to recover data  The most serious mode of failure for disks is “head crash” where data permanently destroyed.  So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes.

Recovery from Disk Crashes Ways to recover data Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.

Mirroring as a Redundancy Technique  Mirroring Scheme is referred as RAID level 1 protection against data loss scheme.  In this scheme we mirror each disk.  One of the disk is called as data disk and other redundant disk.  In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Parity Blocks  RAID level 4 scheme uses only one redundant disk no matter how many data disks there are.  In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks.  It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

Parity Blocks – Reading disk Reading data disk is same as reading block from any disk. We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: 10101010 disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of the bits in each column, we get disk 1: 11110000

Parity Block - Writing  When we write a new block of a data disk, we need to change that block of the redundant disk as well.  One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk.  But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os

Parity Block - Writing Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk.

Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010

An Improvement: RAID 5  RAID 4 is effective in preserving data unless there are two simultaneous disk crashes.  Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk.  However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

Continued…  For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1.  For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4,8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4.  The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,..., and disk 3 is redundant for 3, 7, 11,....

Coping With Multiple Disk Crashes Error-correcting codes theory known as Hamming code leads to the RAID level 6. By this strategy the two simultaneous crashes are correctable.  The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3.  The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4.  The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4

Coping With Multiple Disk Crashes – Reading/Writing  We may read data from any data disk normally.  To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.

 Records consist of fields.  Each record must have a schema which is stored by database system.  The schema includes the name and data types of the fields and their offsets within the record. RECORDS

 Example: CREATE TABLE Moviestar (name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1), birthdate DATE ); Fixed Length Records

Name Address Gender Birth date 30 286 287 297 Fixed Length Records

 Each record start at a byte within its block that is a multiple of 4.  All fields within the record start at a byte that is offset from the beginning of the record by a multiple of 4. Fixed Length Records

So the record should look like this. Name Address Gender Birth date 32 288 292 304 Fixed Length Records

 Following information should be there in the record. 1. The record schema 2. The length of the record 3. Timestamps Many record layouts include a header of some small number of bytes to provide this additional information. Record Headers

To schema Timestamp Name Address gender birthdate 0 12 44 300 304 316 length Record Headers

Records representing tuples of a relation are stored in blocks of the disk and moved into main memory when we need to access or update them. Header record1 record2 … record n Records into Blocks

Header contains following information. Links to one or more other blocks that are part of a network of blocks for creating indexes to the tuples of a relation. Information about the role played by this block in such a network. Information about which relation the tuples of this block belong to. Timestamps indicating the time of the block's last modification or access. Records into Blocks

 Database consists of a server process that provides data from secondary storage to one or more client processes that are applications using the data.  The server and client processes may be on one machine, or the server and the various clients can be distributed over many machines. Client - Server Systems

 The client application uses a "virtual" address space.  The operating system or DBMS decides which parts of the address space are currently located in main memory, and hardware maps the virtual address space to physical locations in main memory. Client - Server Systems

 The server's data lives in a database address space.  The addresses of this space refer to blocks, and possibly to offsets within the block. Client - Server Systems

 These are byte strings that let us determine the place within the secondary storage system where the block or record can be found.  Bytes of physical address used to indicate following information: The host to which the storage is attached. An identifier for the disk or other device on which the block is located. Client - Server Systems – Physical Address

 The number of the cylinder of the disk.  The number of the track within the cylinder.  The number of the block within the track.  The offset of the beginning of the record within the block. Client - Server Systems – Physical Address

 Each block or record has a "logical address," which is an arbitrary string of bytes of some fixed length.  A map table, stored on disk in a known location, relates logical to physical addresses. Client - Server Systems Logical Address

Map table : logical physical Logical address Physical address Client - Server Systems Logical Address

 All the information needed for a physical address is found in the map table.  Many combinations of logical and physical addresses yield structured address schemes.  A very useful, combination of physical and logical addresses is to keep in each block an offset table that holds the offsets of the records within the block, as suggested in Fig. Logical and Structured Addresses

Record4 Record3 Record2 Record1 Header Unused Offset value A block with a table of offsets telling us the position of each record within the block Record Headers

 The address of a record is now the physical address of its block plus the offset of the entry in the block's offset table for that record ADVANTAGES  Move the record around within the block  We can even allow the record to move to another block  Finally, we have an option, should the record be deleted, of leaving in its offset-table entry a tombstone, a special value that indicates the record has been deleted.

 relational systems need the ability to represent pointers in tuples  index structures are composed of blocks that usually have pointers within them  Thus, we need to study the management of pointers as blocks are moved between main and secondary memory. Pointer Swizzling

 every block, record, object, or other referenceable data item has two forms of address: database address the memory address of the item.  in secondary storage, we must use the database address of the item.  However, when the item is in the main memory, we can refer to the item by either its database address or its memory address.

 We need a table that translates from all those database addresses that are currently in virtual memory to their current memory address. Such a translation table is suggested in Fig.

DB-addr mem-addr Database address memory address The translation table turns database addresses into their equivalents in memory

 To avoid the cost of translating repeatedly from database addresses to memory addresses, several techniques have been developed that are collectively known as pointer swizzling.  when we move a block from secondary to main memory, pointers within the block may be “swizzled,"that is, translated from the database address space to the virtual address space.

A pointer actually consists of: 1. A bit indicating whether the pointer is currently a database address or a (swizzled) memory address. 2.The database or memory pointer, as appropriate.

 As soon as a block is brought into memory, we locate dl its pointers and addresses and enter them into the translation table if they are not already there.  However we need some mechanism to locate the pointers. Automatic Swizzling

For example: 1. If the block holds records with a known schema, the schema will tell us where in the records the pointers are found. 2. If the block is used for one of the index structures then the block will hold pointers at known locations. 3. We may keep within the block header a list of where the pointers are.

Disk Memory Block1 Block2 Swizzled Unswizzled Read into Memory Structure of a pointer when swizzling is used

 leave all pointers unswizzled when the block is first brought into memory.  We enter its address, and the addresses of its pointers, into the translation table, along with their memory equivalents.  If and when we follow a pointer P that is inside some block of memory, we swizzle it.  difference between on-demand and automatic swizzling is that the latter tries to get all the pointers swizzled quickly and efficiently when the block is loaded into memory. Swizzling on Demand

 The possible time saved by swizzling all of a block‘s pointers at one time must be weighed against the possibility that some swizzled pointers will never be followed.  In that case, any time spent swizzling and unswizzling the pointer will be wasted. Drawback

 arrange that database pointers look like invalid memory addresses. If so, then we can allow the computer to follow any pointer as if it were in its memory form.  If the pointer happens to be unswizzled, then the memory reference will cause a hardware trap.  If the DBMS provides a function that is invoked by the trap, and this function "swizzles" the pointer and then we can follow swizzled pointers in single instructions, and only need to do something more time consuming when the pointer is unswizzled. Option

 it is possible never to swizzle pointers.  We still need the translation table, so the pointers may be followed in their unswizzled form. No Swizzling

 it may be known by the application programmer whether the pointers in a block are likely to be followed.  This programmer may be able to specify explicitly that a block loaded into memory is to have its pointers swizzled, or the programmer may call for the pointers to be swizzled only as needed. Programmer Control of Swizzling

 When a block is moved from memory back to disk, any pointers within that block must be "unswizzled“.  The translation table can be used to associate addresses of the two types in either direction  However, we do not want each unswizzling operation to require a search of the entire translation table. Returning Blocks to Disk

 If we think of the translation table as a relation, then the problem of findingm the memory address associated with a database address x can be expressed as the query: SELECT memAddr FROM TranslationTable WHERE dbAddr = x;

 If we want to support the reverse query, then we need to have an index on attribute memAddr as well. SELECT dbAddr FROM TranslationTable WHERE memAddr = y;

 A block in memory is said to be pinned if it cannot at the moment be written back to disk safely.  A bit telling whether or not a block is pinned can be located in the header of the block. Pinned Records and Blocks

If a block B1 has within it a swizzled pointer to some data item in block B2. we follow the pointer in B1,it will lead us to the buffer, which no longer holds B2; in effect, the pointer has become dangling. A block, like B2, that is referred to by a swizzled pointer from somewhere else is therefore pinned Reason for block to be pinned

If it is pinned, we must either unpin it, or let the block remain in memory, occupying space that could otherwise be used for some other block. To unpin a block that is pinned because of swizzled pointers from outside, we must "unswizzle” any pointers to it. Consequently, the translation table must record, for each database address whose data item is in memory, the places in memory where swizzled pointers to that item exist.

 Two possible approaches are: 1.Keep the list of references to a memory address as a linked list attached to the entry for that address in the translation table. 2.If memory addresses are significantly shorter than database addresses, we can create the linked list in the space used for the pointers themselves.  That is, each space used for a database pointer is replaced by (a) The swizzled pointer, and (b) Another pointer that forms part of a linked list of all occurrences of this pointer.

x y y y A linked list of occurrences of a swizzled pointer Swizzled pointer

Records With Variable-Length Fields A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.

Records With Repeating Fields  A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first.  We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on.  Eventually, we reach the offset of the field following F. Where upon we stop.

An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep: 1. Pointers to the place where each repeating field begins, and 2. Either how many repetitions there are, or where the repetitions end.

Storing variable-length fields separately from the record

Variable-Format Records  The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field.

There are at least two reasons why tagged fields would make sense. 1.Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.

A record with tagged fields

Records That Do Not Fit in a Block  These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks.  Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment.

If records can be spanned, then every record and record fragment requires some extra header information: 1. Each record or fragment header must contain a bit telling whether or not it is a fragment. 2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record. 3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments.

Storing spanned records across blocks

BLOBS Binary, Large OBjectS = BLOBS  BLOBS can be images, movies, audio files and other very large values that can be stored in files.  Storing BLOBS Stored in several blocks. – Preferable to store them consecutively on a cylinder or multiple disks for efficient retrieval.  Retrieving BLOBS – A client retrieving a 2 hour movie may not want it all at the same time. – Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.)

Column Stores An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records

Consider this relation

INTRODUCTION What is Record ? Record is a single, implicitly structured data item in the database table. Record is also called as Tuple. What is definition of Record Modification ? We say Records Modified when a data manipulation operation is performed.

STRUCTURE OF A RECORD RECORD STRUCTURE FOR A PERSON TABLE CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256), GENDER CHAR(1), BIRTHDATE CHAR(10));

TYPES OF RECORDS FIXED LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL, PHONE_NO INT(10) NOT NULL); VARIABLE LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL, NAME CHAR(100),ADDRESS CHAR(100),PHONE_NO INT(10) NOT NULL);

RECORD MODIFICATION  Modification of Record Insert Update Delete  Issues even with Fixed Length Records  More Issues with Variable Length Records

STRUCTURE OF A BLOCK & RECORDS Various Records are clubbed together and stored together in memory in blocks STRUCTURE OF BLOCK

BLOCKS & RECORDS  If records need not be any particular order, then just find a block with enough empty space.  We keep track of all records/tuples in a relation/tables using Index structures, File organization concepts

Inserting New Records  If Records are not required to be a particular order, just find an empty block and place the record in the block. eg: Heap Files  What if the Records are to be Kept in a particular Order(eg: sorted by primary key) ?  Locate appropriate block,check if space is available in the block if yes place the record in the block.

INSERTING NEW RECORDS  We may have to slide the Records in the Block to place the Record at an appropriate place in the Block and suitably edit the block header.

What If The Block Is Full ?  We need to Keep the record in a particular block but the block is full. How do we deal with it ?  We find room outside the Block  There are 2 approaches to finding the room for the record. 1. Find Space on Nearby Block 2. Create an Overflow Block

Approaches to finding room for record  Find space on nearby block  Block b1 has no space  If space available on block b2 move records of b1 to b2.  If there are external pointers to records of b1 moved to b2 leave forwarding address in offset table of b1

Approaches to finding room for record  Create overflow block Each block b has in its header pointer to an overflow block where additional blocks of b can be placed.

Deletion  Try to reclaim the space available on a record after deletion of a particular record  If an offset table is used for storing information about records for the block then rearrange/slide the remaining records.  If Sliding of records is not possible then maintain a SPACE-AVAILABLE LIST to keep track of space available on the Record.

Tombstone  What about pointer to deleted records ?  A tombstone is placed in place of each deleted record  A tombstone is a bit placed at first byte of deleted record to indicate the record was deleted ( 0 – Not Deleted 1 – Deleted)  A tombstone is permanent

Updating Records  For Fixed-Length Records, there is no effect on the storage system  For variable length records : If length increases, like insertion “slide the records” If length decreases, like deletion we update the space-available list, recover the space/eliminate the overflow blocks.

Secondary Storage Management  Database systems always involve secondary storage like the disks and other devices that store large amount of data that persists over time.

The Memory Hierarchy  A typical computer system has several different components in which data may be stored.  These components have data capacities ranging over at least seven orders of magnitude and also have access speeds ranging over seven or more orders of magnitude.

The Memory hierarchy from the text book as follows:

Cache  It is the lowest level of the hierarchy is a cache. Cache is found on the same chip as the microprocessor itself, and additional level-2 cache is found on another chip.  Data and instructions are moved to cache from main memory when they are needed by the processor.  Cache data can be accessed by the processor in a few nanoseconds.

Main Memory  In the center of the action is the computer's main memory. We may think of everything that happens in the computer - instruction executions and data manipulations - as working on information that is resident in main memory  Typical times to access data from main memory to the processor or cache are in the 10-100 nanosecond range

Secondary Storage  Essentially every computer has some sort of secondary storage, which is a form of storage that is both significantly slower and significantly more capacious than main memory.  The time to transfer a single byte between disk and main memory is around 10 milliseconds.

Tertiary Storage  As capacious as a collection of disk units can be, there are databases much larger than what can be stored on the disk(s) of a single machine, or even of a substantial collection of machines.  Tertiary storage devices have been developed to hold data volumes measured in terabytes.  Tertiary storage is characterized by significantly higher read/write times than secondary storage, but also by much larger capacities and smaller cost per byte than is available from magnetic disks.

Transfer of Data Between Levels  Normally, data moves between adjacent levels of the hierarchy.  At the secondary and tertiary levels, accessing the desired data or finding the desired place to store data takes a great deal of time, so each level is organized to transfer large amount of data or from the level below, whenever any data at all is needed.  The disk is organized into disk blocks and the entire blocks are moved to or from a continuous section of main memory called a buffer.

Volatile and Nonvolatile Storage  A volatile device "forgets" what is stored in it when the power goes off.  A nonvolatile device, on the other hand, is expected to keep its contents intact even for long periods when the device is turned off or there is a power failure.  Magnetic and optical materials hold their data in the absence of power.  Thus, essentially all secondary and tertiary storage devices are nonvolatile. On the other hand main memory is generally volatile.

Virtual Memory  When we write programs the data we use, variables of the program, files read and so on occupies a virtual memory address space.  Many machines use a 32-bit address space; that is, there are 2(pow)32 bytes or 4 gigabytes.  The Operating System manages virtual memory, keeping some of it in main memory and the rest on disk. Transfer between memory and disk is in units of disk blocks.

Disks  The use of secondary storage is one of the important characteristics of a DBMS, and secondary storage is almost exclusively based on magnetic disks

Mechanics of Disks  The two principal moving pieces of a disk drive are a disk assembly and a head assembly.  The disk assembly consists of one or more circular platters that rotate around a central spindle  The upper and lower surfaces of the platters are covered with a thin layer of magnetic material, on which bits are stored.

A typical disk format from the text book is shown as below:

 0’s and 1’s are represented by different patterns in the magnetic material. A common diameter for the disk platters is 3.5 inches. The disk is organized into tracks, which are concentric circles on a single platter. The tracks that are at a fixed radius from a center, among all the surfaces form one cylinder.

Top View of a disk surface from the text is as shown below

 Tracks are organized into sectors, which are segments of the circle separated by gaps that are magnetized to represent either 0’s or 1’s. The second movable piece the head assembly, holds the disk heads.

The Disk Controller  One or more disk drives are controlled by a disk controller, which is a small processor capable of:  Controlling the mechanical actuator that moves the head assembly to position the heads at a particular radius.  Transferring bits between the desired sector and the main memory.  Selecting a surface from which to read or write, and selecting a sector from the track on that surface that is under the head. An example of single processor is shown in next slide.

Simple computer system from the text is shown below

Disk Access Characteristics:  Seek Time: The disk controller positions the head assembly at the cylinder containing the track on which the block is located. The time to do so is the seek time.  Rotational Latency: The disk controller waits while the first sector of the block moves under the head. This time is called the rotational latency.  Transfer Time: All the sectors and the gaps between them pass under the head, while the disk controller reads or writes data in these sectors. This delay is called the transfer time. The sum of the seek time, rotational latency, transfer time is the latency of the time.

B-Trees B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.

Functionalities of B- Tree B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed. B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.

Structure of B-Trees  There are three layers in binary trees- the root, an intermediate layer and leaves  In a B-Tree each block have space for n search-key values and n+1 pointers [next slide explains the structure of a B- Tree]

Root B-Tree Example n=3 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

Sample non-leaf to keysto keys to keys to keys to keys < 5757  k<8181  k<95  95 57 81 95

From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85 Sample leaf node

In textbook’s notation n=3 Leaf: Non-leaf: 30 35 30 35 30 Size of nodes:n+1 pointers n keys (fixed)

Don’t want nodes to be too empty  Use at least Non-leaf:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

Full nodemin. node Non-leaf Leaf n=3 120 150 180 30 3 5 11 30 35 counts even if null

B-tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” Non-leaf (non-root) n+1n  (n+1)/ 2   (n+1)/ 2  - 1 Leaf (non-root) n+1n Rootn+1n11 Max Max Min Min ptrs keys ptrs  data keys  (n+ 1) / 2 

Applications of B-trees  The search key of the B-tree is the primary key for the data file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key.  2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file.  3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.

Lookup in B-Trees  Suppose we want to find a record with search key 40.  We will start at the root, the root is 13, so the record will go the right of the tree.  Then keep searching with the same concept.

Looking for block “40” 13 317 292323 191713117532 43 41374743 23

Range Queries  B-trees are used for queries in which a range of values are asked for. Like, SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;

Insert into B-tree (a) simple case – space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

(a) Insert key = 32 n=3 3 5 11 30 31 30 100 32

(a) Insert key = 7 n=3 3 5 11 30 31 30 100 3535 7 7

(c) Insert key = 160 n=3 100 120 150 180 150 156 179 180 200 160 180 160 179

(d) New root, insert 45 n=3 10 20 30 123123 10 12 20 25 30 32 40 45 4030 new root

CS 245Notes 4148 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B-tree

(b) Coalesce with sibling Delete 50 10 40 100 10 20 30 40 50 n=4 40

(c) Redistribute keys Delete 50 10 40 100 10 20 30 35 40 50 n=4 35

40 45 30 37 25 26 20 22 10 14 1313 10 2030 40 (d) Non-leaf coalese –Delete 37 n=4 40 30 25 new root

B-tree deletions in practice –Often, coalescing is not implemented – Too hard and not worth it!

Why we take 3 as the number of levels of a B-tree? Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that - 411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 children and 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B- tree.

Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin.

Similar presentations

Presentation on theme: "Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin.

Similar presentations

Presentation on theme: "Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin Mathur Submitted to: Dr.T.Y.Lin."— Presentation transcript:

Similar presentations

About project

Feedback