Download presentation
Presentation is loading. Please wait.
1
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion –built so each node fits on a single disk page Examined mechanics of B+-tree Insertion and Deletion –looked at several examples We’ll finish up B+-trees with two more concepts: –B+-tree File Organization –B-tree index files
2
B+-tree File Organization B+-Tree Indices solve the problem of index file degradation. The original data file will still degrade upon a stream of insert/delete operations. Solve data-file degradation by using a B+-tree file organization Leaf nodes in a B+-tree file organization store records, not pointers into a separate original datafile –since records are larger than pointers, the maximum number of recrods that can be stored in a leaf node is less than the number of pointers in a non-leaf node –leaf nodes must still be maintained at least half full –insert and delete are handled in the same was as insert and delete for entries in a B+-tree index
3
B+-tree File Organization Example Records are much bigger than pointers, so good space usage is important To improve space usage, involve more sibling nodes in redistribution during splits and merges (to avoid split/merge when possible) –involving one sibling guarantees 50% space use –involving two guarantees at least 2/3 space use, etc.
4
B-tree Index Files B-trees are similar to B+-trees, but search-key values appear only once in the index (eliminates redundant storage of key values) –search keys in non-leaf nodes don’t appear in the leaf nodes, so an additional pointer field for each search key in a non-leaf node must be stored to point to the bucket or record for that key value –leaf nodes look like B+-tree leaf nodes: (P 1, K 1, P 2, K 2, …, P n ) –non-leaf nodes look like so: (P 1, B 1, K 1, P 2, B 2, K 2, …, P n ) where the B i are pointers to buckets or file records.
5
B-tree Index File Example B-tree and B+-tree
6
B-tree Index Files (cont.) Advantages of B-tree Indices (vs. B+-trees) –May use less tree nodes than a B+-tree on the same data –Sometimes possible to find a specific key value before reaching a leaf node Disadvantages of B-tree Indices –Only a small fraction of key values are found early –Non-leaf nodes are larger, so fanout is reduced, and B-trees may be slightly taller than B+-trees on the same data –Insertion and deletion are more complicated than on B+-trees –Implementation is more difficult than B+-trees In general, advantages don’t outweigh disadvantages
7
Hashing We’ve examined Ordered Indices (design based upon sorting or ordering search key values); the other type of major indexing technique is Hashing Underlying concept is very simple: –observation: small files don’t require indices or complicated search methods –use some clever method, based upon the search key, to split a large file into a lot of little buckets –each bucket is sufficiently small –use the same method to find the bucket for a given search key
8
Hashing Basics –A bucket is a unit of storage containing one or more records (typically a bucket is one disk block in size) –In a hash file organization we find the bucket for a record directly from its search-key value using a hash function –A hash function is a function that maps from the set of all search- key values K to the set of all bucket addresses B –The hash function is used to locate records for access, insertion, and deletion –Records with different search-key values may be mapped to the same bucket the entire bucket must be searched to find a record buckets are designed to be small, so this task is usually not onerous
9
Hashed File Example –So we: divide the set of disk blocks that make up the file into buckets devise a hash function that maps each key value into a bucket V: set of key values B: number of buckets H: hashing functionH: V--> (0, 1, 2, 3, …, B-1) Example: V= 9 digit SS#; B=1000; H= key modulo 1000
10
Hash Functions To search/insert/delete/modify a key do: –compute H(k) to get the bucket number –search sequentially in the bucket (heap organization within each bucket) Choosing H: almost any function that generates “random” numbers in the range [0, B-1] –try to distribute the keys evenly into the B buckets –one rule of thumb when using MOD -- use a prime number
11
Hash Functions (2) Collision is when two or more key values go to the same bucket –too many collisions increases search time and degrades performance –no or few collisions means that each bucket has only one (or very few) key(s) Worst-case hash functions map all search keys to the same bucket
12
Hash Functions (3) Ideal hash functions are uniform –each bucket is assigned the same number of search-key values from the set of all possible values Ideal hash functions are random –each bucket has approximately the same number of records assigned to it irrespective of the actual distribution of search-key values in the file Finding a good hash function is not always easy
13
Examples of Hash Functions Given 26 buckets and a string-valued search key, consider the following possible hash functions: –Hash based upon the first letter of the string –Hash based upon the last letter of the string –Hash based upon the middle letter of the string –Hash based upon the most common letter in the string –Hash based upon the “average” letter in the string: the sum of the letters (using A=0, B=1, etc) divided by the number of letters –Hash based upon the length of the string (modulo 26) Typical hash functions perform computation on the internal binary representation of the search key –example: searching on a string value, hash based upon the binary sum of the characters in the string, modulo the number of buckets
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.