B+-Trees.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 2) Lecture 21 COMP171 Fall 2006.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B + -Trees (Part 2) COMP171. Slide 2 Review: B+ Tree of order M and of leaf size L n The root is either a leaf or 2 to M children n Each (internal) node.
AVL Trees / Slide 1 Deletion  To delete a key target, we find it at a leaf x, and remove it. * Two situations to worry about: (1) target is a key in some.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
TCSS 342, Winter 2006 Lecture Notes
B-Trees B-Trees.
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
Btrees Deletion.
B-Trees B-Trees.
CSE 332 Data Abstractions B-Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Trees 4 The B-Tree Section 4.7
(edited by Nadia Al-Ghreimil)
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Lecture 26 Multiway Search Trees Chapter 11 of textbook
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Trees.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
B+-Trees (Part 1).
Other time considerations
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
COMP171 B+-Trees (Part 2).
(edited by Nadia Al-Ghreimil)
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
COMP171 B+-Trees (Part 2).
B-Trees.
B-Trees.
CSE 326: Data Structures Lecture #10 B-Trees
B+-trees In practice, B-trees are not used much as defined earlier.
Presentation transcript:

B+-Trees

Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that insert/delete/find operations finish within O(log N) time However, trees require that all data fit into the main memory When the size of the tree is too large to fit in main memory and has to reside on disk, accessing each node is slow.

From Binary to K-ary Idea: allow a node in a tree to have many children Less disk access as fewer nodes are retrieved = smaller tree height = more branching As branching increases, the depth decreases An K-ary tree allows K-way branching Each internal node has at most K children Let’s use A complete K-ary tree has height that is roughly logK N instead of log2 N If K = 20, then log20 220 < 5 Thus, we can speedup access time as the number of nodes accessed decreases significantly The idea behind a B tree is that we want all leaves to be at same level. We can do that by varying the branching factor.

K-ary Search Tree Thus, we require that each node is at least ½ full! A binary search tree has one key to decide which of the two branches to take An K-ary search tree needs K–1 keys to decide which branch to take - “One more kid than key” An K-ary search tree has restrictions We don’t want an K-ary search tree to degenerate to a linked list, or even a binary search tree Thus, we require that each node is at least ½ full!

B+ Tree A B+-tree of order K (K>3) is an K-ary tree with the following properties: The data items are stored in leaves The root is either a leaf or has between K/2 and K children The non-leaf nodes store up to K-1 keys to guide the searching; key i represents the smallest key in subtree i+1 All non-leaf nodes (except the root) have between K/2 and K children All leaves are at the same depth and have between L/2 and L data items, for some L (usually L << K, but we will assume K=L in our examples) Note, the text calls these trees B-trees, but B+ is a more generally used term

Keys in Internal Nodes Which keys are stored at the internal nodes? There are several ways to do it. Different books adopt different conventions We will adopt the following convention: key i in an internal node is the smallest key in its i+1 subtree (i.e., right subtree of key i) I would even be less strict. Since internal nodes are “roadsigns”, I would just not bother to update the internal values. Even following this convention, there is no unique B+-tree for the same set of records

B+ Tree Example 1 (Order 5, K=L=5) Records are stored at the leaves (we only show the keys here) Since L=5, each leaf has between 3 and 5 data items (root can be exception) Since K=5, each nonleaf node has between 3 to 5 children (root can be exception) Requiring nodes to be half full guarantees that the B+ tree does not degenerate into a linked list or a simple binary tree

B+ Tree Example 2 (Order K=L=4) We can talk about the left subtree and right subtree of a key in internal nodes

B+ Tree in Practical Usage Each internal node/leaf is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion B+-tree is a popular structure used in commercial databases. To further speed up the search, the first one or two levels of the B+- tree are usually kept in main memory wasted space: The disadvantage of B+-tree is that most nodes will have less than K-1 keys most of the time. The textbook calls the tree B-tree instead of B+-tree. In some other textbooks, B-tree refers to the variant where the actual records are kept at internal nodes as well as the leaves. Such a scheme is not practical. Keeping actual records at the internal nodes will limit the number of keys stored there, and thus increasing the number of tree levels

Searching Example Suppose that we want to search for the key K. The path traversed is shown in red

Insertion find the leaf location Insert K into node loc Splitting (instead of rotations in AVL trees) of nodes is used to maintain properties of B+-trees If leaf loc contains < L keys, then insert K into loc (at the correct position If x is already full (i.e. containing L keys). Split loc Cut loc off from its parent Split loc into two pieces. Insert K into the correct piece Identify key to be the parent of xL and xR, and insert the copy together with its child pointers into the old parent of x.

Inserting “O” into a Non-full Leaf (K=3 L=3). Try inserting T.

Splitting a Leaf: Inserting T (K=4,L=3) Unhappy node. Break apart and propagate the smallest key of the rightmost node up to the next higher level

Splitting Example 2 Insert M (L=3, K=4) level

Splitting Example 2 Insert M (L=3, K=4) Unhappy node. Break apart and propagate the smallest key of the rightmost node up to the next higher level This node is NOW unhappy so we will do the same thing again – Break apart and propagate up.

Splitting an Internal Node To insert a key K into a full internal node x: Cut x off from its parent Insert K and its left and right child pointers into x, pretending there is space. Now x has K keys. Split x into 2 new internal nodes xL and xR, with xL containing the ( K/2 - 1 ) smallest keys, and xR containing the K/2 largest keys. Note that the (K/2)th key J is not placed in xL or xR Make J the parent of xL and xR, and insert J together with its child pointers into the old parent of x.

Notice the multiple splits

Termination Splitting will continue as long as we encounter full internal nodes If the split internal node x does not have a parent (i.e. x is a root), then create a new root containing the key J and its two children

Deletion Find and delete in leaf May have too few nodes. Do reverse of add (pull down and slap together) BUT, it could be that when you combine neighbor nodes, you get a node that is too large. Then, you would have to split it apart. Better to shift some of the records from a neighbor into the leaf that is too small.

Removal of a Key target can appear in at most one ancestor y of x as a key (why?) Node y is seen when we searched down the tree After deleting from node x, we can access y directly and replace target by the new smallest key in x

Deletion Example (K=5, L=4)– deletion causes no issues Want to delete 15

Again, no problems Want to delete 9

When a node becomes too small, you combine adjacent nodes together. You pull down the key from the parent and slap the two nodes together. Deletion of 10 leaves node too small Pull down the key between the nodes and slap together

Now this node is unhappy. Pull down 7 and slap together.

Could combining ever be a problem? K=5,L=4 In this case, the circled node is unhappy as there must be between 3 to 5 kids (except for the root), but if the unhappy node tries to combine with the left neighbor, there will be six kids (and keys 3,5,10,24,35). It will be unhappy again, and have to split. The same thing is true if it tries to combine with its right neighbor.

The solution is to slide a child from aunt.

Deleting a Key in an Internal Node Suppose we remove a key from an internal node u, and u has less than K/2 -1 keys after that Case 1: u is a root If u is empty, then remove u and make its child the new root

Deleting a key in an internal node Case 2: the right sibling v of u has K/2 keys or more Move the separating key between u and v in the parent of u and v down to u Make the leftmost child of v the rightmost child of u Move the leftmost key in v to become the separating key between u and v in the parent of u and v. Case 2: the left sibling v of u has K/2 keys or more Move the separating key between u and v in the parent of u and v down to u. Make the rightmost child of v the leftmost child of u Move the rightmost key in v to become the separating key between u and v in the parent of u and v.

Deleting a key in an internal node Case 3: all sibling v of u contains exactly K/2 - 1 keys Move the separating key between u and v in the parent of u and v down to u Move the keys and child pointers in u to v Remove the pointer to u at parent.