Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min.

Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min

Storage We have been studying database languages and queries and have yet to study how data is stored. We will look at how data is accessed and the different Index methods used. Different types index methods have good and bad qualities which I will try to address.

Indices Indices is used to look up, input and delete data in a ordered manner. Speed and efficiency is a main goal in the different types of Indexing. Speed and efficiency includes access type, access time, insertion time, deletion time, and space overhead.

Definitions Pointer- identifies the disk block and offset in the disk block. Index record- holds search key value and pointers to the records with the value. Clustering index- an index whose search key also defines the sequential order of the file.

Index-Sequential File Organization Index-Sequential files are files (which holds information for data) ordered sequentially on a search key. Main disadvantage is that performance degrades as file size grows for lookups and sequential scans. Degradation can be fixed with reorganization of the file. Reorganization require lot of overhead space so frequent reorganization is undesirable.

B+ Tree Index Files Index-Sequential file organization is used but not as much as B+ index structure. B+ Tree indexing maintain efficiency despite insertion and deletion of data. They use idea of a balance tree in which every path from the root of the tree to a leaf is of the same length.

B+ Tree Index Files(cont’d) In a B+ Tree data structures, each node corresponds to a disk block. Each node is kept between half-full and completely full. A node in a B+- tree has n-1 search key values K1, K2….Kn-1 and n pointers P1, P2…Pn.

B+ Tree Index Files(cont’d) Search-key values are kept in sorted order. Leaf nodes store the records instead of pointers to records. Search-key values are kept in sorted order

B+ Tree Index Files(cont’d) Pointer P points to a file record with a search key value of a K. Each leaf holds up to n-1 values. A non-leaf node can hold up to n pointers and must hold n/2 ceiling pointers.

B+ Tree Index Files(cont’d) Number of pointers in a node is called fan out of a node. The root must hold at least 2 pointers but can have less than n/2.

Example of B + -Tree of Order 4 (with L = 5) 4155 75 223035 12 15 20 22 24 27 28 29 30 32 34 35 37 38 39 4549 60678791100 41 43 44 45 46 48 49 51 53 54 55 57 58 59 60 62 64 65 66 67 72 74 75 77 80 83 85 87 88 90 91 94 97 99 100 110 112 114 120 4155 75 223035 12 15 20 22 24 27 28 29 30 32 34 35 37 38 39 4549 60678791100 41 43 44 45 46 48 49 51 53 54 55 57 58 59 60 62 64 65 66 67 72 74 75 77 80 83 85 87 88 90 91 94 97 99 100 110 112 114 120

Search in a B+ Tree Search: Start at root; use key comparisons to go to leaf.

Inserting a Data Entry into a B+ Tree 1) Find correct leaf node 2) Add index entry to the node 3) If enough space, done! 4) Else, split the node Redistribute entries evenly between the current node and the new node 5) Insert to the parent 6) Go to Step 3

Deleting a Data Entry from a B+ Tree 1) Find correct leaf node 2) Remove the entry from the node 3) If the node is at least half full, done! 4) Else, possibly borrow some entries from a sibling 5) If not possible, merge the node with the sibling 6) Delete the separator between the node and the sibling from the parent node 7) Go to Step 3

Binary Trees VS. BTrees Binary tree only have 2 children max. For large files binary tree will be too high because of the limit of children and not enough keys per records. Btrees disk size can have many children depending on the disk block. Btrees are more realistic for indexing files because they easily maintain balance and can store many keys in only a few records.

B+ VS. B- Trees B+ trees store redundant search key values because index is smaller. In a B+ tree, all pointers to data records exists at the leaf-level nodes. B-tree eliminates redundancy but require additional pointers to do so. In a B-tree, pointers to data records exist at all levels of the tree.

References http://www.cs.sjsu.edu/~lee/cs157b/cs157 b.htmlhttp://www.cs.sjsu.edu/~lee/cs157b/cs157 b.html http://www.cs.ualberta.ca/~drafiei/291/note s/6-tree-indexes.pdfhttp://www.cs.ualberta.ca/~drafiei/291/note s/6-tree-indexes.pdf A. Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed., McGraw-Hill, 2006

Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min.

Similar presentations

Presentation on theme: "Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min.

Similar presentations

Presentation on theme: "Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b 0900-1015 TR Lee, Sin-Min."— Presentation transcript:

Similar presentations

About project

Feedback