Presentation is loading. Please wait.

Presentation is loading. Please wait.

Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, 11.10 –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.

Similar presentations


Presentation on theme: "Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, 11.10 –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7."— Presentation transcript:

1 Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, 11.10 –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7

2 Quick Review of material covered Apr 3 Indexing methods are used to speed up access to desired data Definitions: Search key, ordered indices, hash indices: Ordered Indices –An ordered index stores the values of the search keys in sorted order –primary index: search key also determines the sort order of the original file. Also called clustering indices –secondary indices: search key specifies an order different from the sequential order of the file. –an index-sequential file is an ordered sequential file with a primary index. –dense and sparse Indices –Multi-level Index –Issues connected with index update operations

3 B+- Tree Index Files Main disadvantage of ISAM files is that performance degrades as the file grows, creating many overflow blocks and the need for periodic reorganization of the entire file B+- trees are an alternative to indexed-sequential files –used for both primary and secondary indexing –B+- trees are a multi-level index B+- tree index files automatically reorganize themselves with small local changes on insertion and deletion. –No reorg of entire file is required to maintain performance –disadvantages: extra insertion, deletion, and space overhead –advantages outweigh disadvantages. B+-trees are used extensively

4 B+- Tree Index Files (2) Definition: A B+-tree of order n has: All leaves at the same level balanced tree (“B” in the name stands for “balanced”) logarithmic performance root has between 1 and n-1 keys all other nodes have between n/2 and n-1 keys (>= 50% space utilization) we construct the tree with order n such that one node corresponds to one disk block I/O (in other words, each disk page read brings up one full tree node).

5 B+- Tree Index Files (3) A B+-tree is a rooted tree satisfying the following properties: All paths from root to tree are the same length Search for an index value takes time according to the height of the tree (whether successful or unsuccessful)

6 B+- Tree Node Structure The B+-tree is constructed so that each node (when full) fits on a single disk page –parameters:B: size of a block in bytes (e.g., 4096) K: size of the key in bytes (e.g., 8) P: size of a pointer in bytes (e.g., 4) –internal node must have n such that: (n-1)*K + n*P <= B n<= (B+K)/(K+P) –with the example values above, this becomes n<=(4096+8)/(8+4)=4114/12 n<=342.83

7 B+- Tree Node Structure (2) Typical B+-tree Node K i are the search-key values P i are the pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes) the search keys in a node are ordered: K 1 <K 2 <K 3 …<K n-1

8 Non-Leaf Nodes in B+-Trees Non-leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with n pointers: –all the search keys in the subtree to which P 1 points are less than K 1 – For 2<= i <= n-1, all the search keys in the subtree to which P i points have values greater than or equal to K i-1 and less than K n-1

9 Leaf Nodes in B+-Trees As mentioned last class, primary indices may be sparse indices. So B+-trees constructed on a primary key (that is, where the search key order corresponds to the sort order of the original file) can have the pointers of their leaf nodes point to an appropriate position in the original file that represents the first occurrence of that key value. Secondary indices must be dense indices. B+-trees constructed as a secondary index must have the pointers of their leaf nodes point to a bucket storing all locations where a given search key value occur; this set of buckets is often called an occurrence file

10 Example of a B+-tree B+-tree for the account file (n=3)

11 Another Example of a B+-tree B+-tree for the account file (n=5) Leaf nodes must have between 2 and 4 values (  (n-1)/2  and (n-1), with n=5) Non-leaf nodes other than the root must have between 3 and 5 children (  n/2  and n, with n=5) Root must have at least 2 children

12 Observations about B+-trees Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close The non-leaf levels of the B+-tree form a hierarchy of sparse indices The B+-tree contains a relatively small number of levels (logarithmic in the size of the main file), thus searches can be conducted efficiently Insertions and deletions to the main file can be handled efficiently, as the index can be restructured in logarithmic time (as we shall examine later in class)

13 Queries on B+-trees Find all records with a search-key value of k –start with the root node (assume it has m pointers) examine the node for the smallest search-key value > k if we find such a value, say at K j, follow the pointer P j to its child node if no such k value exists, then k >= K m-1, so follow P m –if the node reached is not a leaf node, repeat the procedure above and follow the corresponding pointer –eventually we reach a leaf node. If we find a matching key value (our search value k = K i for some i) then we follow P i to the desired record or bucket. If we find no matching value, the search is unsuccessful and we are done.

14 Queries on B+-trees (2) Processing a query traces a path from the root node to a leaf node If there are K search-key values in the file, the path is no longer than  log  n/2  (K)  A node is generally the same size as a disk block, typically 4 kilobytes, and n is typically around 100 (40 bytes per index entry) With 1 million search key values and n=100, at most log 50 (1,000,000) = 4 nodes are accessed in a lookup In a balanced binary tree with 1 million search key values, around 20 nodes are accessed in a lookup –the difference is significant since every node access might need a disk I/O, costing around 20 milliseconds

15 Insertion on B+-trees Find the leaf node in which the search-key value would appear If the search key value is already present, add the record to the main file and (if necessary) add a pointer to the record to the appropriate occurrence file bucket If the search-key value is not there, add the record to the main file as above (including creating a new occurrence file bucket if necessary). Then: –if there is room in the leaf node, insert (key-value, pointer) in the leaf node –otherwise, overflow. Split the leaf node (along with the new entry)

16 Insertion on B+-trees (2) Splitting a node: –take the n (search-key-value, pointer) pairs, including the one being inserted, in sorted order. Place half in the original node, and the other half in a new node. –Let the new node be p, and let k be the least key value in p. Insert (k, p) in the parent of the node being split. –If the parent becomes full by this new insertion, split it as described above, and propogate the split as far up as necessary The splitting of nodes proceeds upwards til a node that is not full is found. In the worst case the root node may be split, increasing the height of the tree by 1.

17 Insertion on B+-trees Example

18 Deletion on B+-trees Find the record to be deleted, and remove it from the main file and the bucket (if necessary) If there is no occurrence-file bucket, or if the deletion caused the bucket to become empty, then delete (key-value, pointer) from the B+- tree leaf-node If the leaf-node now has too few entries, underflow has occurred. If the active leaf-node has a sibling with few enough entries that the combined entries can fit in a single node, then –combine all the entries of both nodes in a single one –delete the (K,P) pair pointing to the deleted node from the parent. Follow this procedure recursively if the parent node underflows.

19 Deletion on B+-trees (2) Otherwise, if no sibling node is small enough to combine with the active node without causing overflow, then: –Redistribute the pointers between the active node and the sibling so that both of them have sufficient pointers to avoid underflow –Update the corresponding search key value in the parent node –No deletion occurs in the parent node, so no further recursion is necessary in this case. Deletions may cascade upwards until a node with  n/2  or more pointers is found. If the root node has only one pointer after deletion, it is removed and the sole child becomes the root (reducing the height of the tree by 1)

20 Deletion on B+-trees Example 1

21 Deletion on B+-trees Example 2

22 Deletion on B+-trees Example 3


Download ppt "Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, 11.10 –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7."

Similar presentations


Ads by Google