Presentation is loading. Please wait.

Presentation is loading. Please wait.

B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.

Similar presentations


Presentation on theme: "B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard."— Presentation transcript:

1

2 B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard drive (disk access)  Ugh!  A B+ tree is a tree whose nodes are pages on disk.  Offers fast search  Fast tree traversal  B+ Tree: most widely used index for database management systems.

3 B+ Tree:  M-ary tree: general tree, with M-way branching  We decide how many keys are in each node (that’s the M)  Tree is balanced – all paths from the root to the leaf are equal in depth 10*15*20*27*33*37* 40* 46* 51* 55* 63* 97* 2033 5163 40 Root

4 B+ Tree: Leaf vs Interior:  In B+ trees, we must distinguish between Leaf Nodes and Interior Nodes:  Leaf Nodes:  Leaf Nodes are where the data is  Leaf nodes are fixed in size  Leaf nodes are on the disk  Leaf nodes are sorted linked list of nodes  Interior Nodes:  Only used to navigate to the correct leaf node  For each interior node, the number of pointers is the number of keys + 1  Interior nodes are also sorted linked list of nodes

5 Example of interior node: 57 81 95 to keys to keysto keys to keys < 5757  k<8181  k<95 95 

6 B+ Trees Fill factor  Nodes (both interior and leaf) may be partially filled.  There’s a fill factor,  a percentage that controls the minimal number of keys in all non-root nodes.  Usually 50%  Every node must be sufficiently filled  So if the capacity of each node is 4 keys, and the fill factor is 50%, no node other than the root can have under 2 keys  If a node is too empty, it’s underfilled. Only the root can be underfiled

7 2*3* Root (underfilled) 17 24 33 14*16* 17*20*22*24*27* 29*33*34* 38* 39* 145 7*5*8*

8 Searching B+ Tree:  A lot like Binary Search Tree  Only all searches must end in leaf nodes  Leaf node should contain key we’re looking for, or will definitively LeafNode search(Node p, Key key) { if p is leafnode, return p; Otherwise if key < p.keys[0] return search(p.beforeptr, key); otherwise go through keys in a node until key >= p.keys[i] and key < p.keys[i+1] return (search(p.keys[i].afterptr, key)) } Searching takes at most log d (n) = d is fill in each node (at least half of max fill of each node), and n is the number of entries in the tree The larger d is, the shorter the height of the tree

9 2*3* Root 19 24 33 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 145 7*5*8* Find: 6 24 39

10 Inserting:  Trickier  We have to worry about overfilling a node  Case 1:  Leaf node isn’t filled  Insert new key in order (remember it’s a linked list)  Virtual memory – seems like disk space is memory – we can pretend that the stuff stored on a disk is stored in memory – it just takes a bit longer to load  The node(s) above don’t change

11 Insert 23* Root 17 24 33 2* 3*5* 7*13*16* 17*20* 22*24*27* 29*33*34* 38* 39* 13 Root 17 24 33 2* 3*5* 7*13*16* 17*20* 22*24*27* 29*33*34* 38* 39* 13 23* No splitting required.

12 Leaf Nodes class LeafNode { public: int keys[4]; // This could be a linked list Data *data[4]; // this is the data associated with each key int curr_fullness; LeafNode *nextleaf; }; // to insert into a non-full leaf node: keys[curr_fullness] = newkey; curr_fullness ++;

13 B+ Tree insertion  Case 2:  Leaf node is full, but parent node has space: 1. Create a new sibling leaf node after target leaf node (new_target) 2. Split the (sorted) data in the full leaf node 1. half is in the old leaf node, and half in the new target leaf node. 3. Adjust the fullness size of each of the leaf nodes  Now the parent must point to the new target node  Use the first value in the new target node new_target.keys[0] and insert this key value into the parent.

14 Root 17 24 2* 3*5* 7*13*16* 17*20* 22*24*27* 29* 13 23* Insert 21 17 24 2* 3*5* 7*14*16* 17*20* 24*27* 29* 13 21 22* 23* Insert 21 with pointer into parent node 17 21 2* 3*5* 7*13*16* 17*20* 24*27* 29* 13 21 22* 23* 24

15 Case 2 insertion pseudocode:  If the leaf node’s keys are full:  Make a new node (the new node goes after the full node)  newLeaf = new leafNode();  Split sorted keys between the old node and the new node  OldLeaf.keys[0 to fullness/2-1] stay the same  oldLeaf.curr_fullness = fullness/2;  newLeaf.keys[0 to fullness/2] become oldkeys[fullness/2 to end], including newly inserted key  newLeaf.curr_fullness = fullness/2 + 1;  Link the old node to the new node  tmp = oldLeaf.nextleaf;  oldLeaf.nextleaf = newLeaf;  newLeaf.nextleaf = tmp;  Now insert the first key in the new node into the parent node  Parent.keys[x] = newLeaf.keys[0];  Parent.leafptrs[x] = newLeaf; We’re done because we specified that the parent was not full.

16 Case 3: both target and parent are full:  Create new leaf node, divide keys in half and place in each node. Use the first key in the new node as the key for the parent (like case 2)  Interior (parent) node is full  create a new interior node  Divide the sorted keys (including the new node’s key) between the old interior node and the new node.  Insert a new pointer to the new interior node from its parent node, with the key being the first key in the new interior node.  (now we no longer need this key in the interior node)  Recursively insert the new key/pointer into the parent node, until the parent node is no longer full.  If you split the root, make a new root with the before pointer pointing to the old root and key and pointer to the new split-off node.

17 Insert 8* 2*3* New Root 17 24 30 13*16* 17*20*22*24*27* 29*30*34* 38* 39* 135 7*5*8* Root 17 24 30 2* 3*5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 Root 17 24 30 2* 3*5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 8* Insert 5 into parent 17 24 30 2* 3* 5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 8* 5 Bring 17 up to the parent (or, in this case, make a new root)

18 Deleting from a B+ Tree:  Worry about Underflow  Start at root, find leaf with key  Remove the key.  If the leaf is still at least half-full, done!  If leaf is less than half full,  Try redistribution:  Borrow from sibling (adjacent node with same parent as leaf)  Change key in parent  If redistribution fails, merge:  Merge with sibling  Delete key from parent of leaf  May need to propagate merging up tree  If parent ends up with underflow,  Adopt from neighbor, update parent  If necessary, merge and delete from parent  If root ends up with only one child (not key, child!) make the child be the new root.

19 Case 1a: Delete 5: 2*3* Root 17 24 30 14*16* 17*20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root 17 24 30 14*16* 17*20*24*27* 29*30*34* 38* 39* 147 8*7* leaf is more than half full, Note: we must modify the parent’s key value. 22*

20 Case 1b: Delete 17: 2*3* Root 17 24 30 14*16* 17*20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root 20 24 30 14*16* 20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* leaf is more than half full, Note: the first value of a node pointed to by the BEFORE pointer is removed. We must modify the parent of the parent’s key value here.

21 Case 2: Delete 20: 2*3* Root 20 24 30 14*16* 20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root 22 30 14*16* 30*34* 38* 39* 145 7*5*8* 22*24* 27 27*29* leaf is less than half full, redistribution: Borrow from sibling (adjacent node with same parent as leaf) Change key in parent If removed first value from before pointer, change key in parent’s parent

22 Case 3: Delete 24: 2*3* Root 22 30 14*16* 30*34* 38* 39* 145 7*5*8* 22*24* 27 27*29* 2*3* Root 22 30 14*16* 30*34* 38* 39* 145 7*5*8* 22*27*29* 2*3* Root 22 30 14*16* 30*34* 38* 39* 145 7*5*8* 22*27*29*  merge:  Merge with sibling  Delete key from parent of leaf  propagate merging up tree  If parent ends up with underflow,  Adopt from neighbor, update parent  If necessary, merge and delete from parent  If root ends up with only one child (not key, child!)  Insert the root key into the child  Make the child be the new root.

23 B+ Tree:  Both insertion and deletion work in log d (n)


Download ppt "B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard."

Similar presentations


Ads by Google