Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP261 Lecture 23 B Trees.

Similar presentations


Presentation on theme: "COMP261 Lecture 23 B Trees."— Presentation transcript:

1 COMP261 Lecture 23 B Trees

2 Storing large amounts of data
File systems: Lots of files, each file stored as lots of blocks. Databases: large tables of data each indexed by a key (or perhaps multiple alternative keys) How do we access the data efficiently: individual items (given key) sequence of all items (perhaps in order) assume data is stored in files on hard drives (slow access time) Use some kind of index structure Usually also stored in a file B Trees, B+ Trees: data structures and algorithms for large data stored on disk (ie, slow access)

3 COMP103 approaches: Efficient Set and Map implementations:
Hash Tables – fast, but hard to list in order Binary Search Trees Intended for in-core data structures Binary search tree Add M H T S R Q B A F D Z Search: Log(n) What if we add in alphabetical order?

4 Problems with Binary Search Trees
Trees can become unbalanced ⇒ lookup times can become linear How can we keep the tree balanced? AVL trees: self-balanced by tree rotations Red-Black trees: node with a colour bit Splay trees: move recently accessed node to the root B/B+ trees: always add levels at the top! B C F H K P N M A G

5 Problems with Binary Search Trees
Lots of pointer following ⇒ slow if each node is stored in a file! In this case, loading a node to read its value can take far longer than the comparisons we usually count! How can we reduce pointer following? More data in each node More children of each node ⇒ "bushier" trees, fewer steps to leaves B F C A C F H G K P N M

6 B Trees Like binary search trees, but:
Designed for external data structures, stored in files Non-leaf nodes have up to m children, and m-1 data values (for B tree of order m, or m-1:m tree) Non-leaf, non-root nodes always have at least ⌈m/2⌉ children and ⌈m/2⌉-1 data values (ie, at least half full) Leaf nodes have up to m-1 data values and no children Adding is done "at the top" rather than "at the bottom”, so leaves are all at same depth

7 B Trees Like binary search trees, but
Non-leaf nodes have up to m children, and m-1 data values Non-leaf, non-root nodes always have at least ⌈m/2⌉ children and ⌈m/2⌉-1 data values (ie, at least half full) Leaf nodes contain up to m-1 data values and no children Adding is done "at the top" rather than "at the bottom” – leaves are all at the same depth B tree example: m=3, ("2-3 tree") 3 children, 2 values 2 children, 1 value N V E J A C G K M R X P Q S T W Y Z

8 2-3 trees Every internal node is a 3-node or a 2-node
3-node: 3 children, 2 values 2-node: 2 children, 1 value All leaves have 1 or 2 values Data is kept in sorted order

9 B Trees: Search Data values might be single items (set of values)
key:value pairs (map) Search(key, node): Just like binary search, but more comparisons at each node: if node is null return fail for i = 0 to k-1 (k is number of keys in node) if key < keys[i] return search(key, children[i]) if key == keys[i] return value[i] return search(key, children[k]) E J A C G K M

10 B Trees: Insert To insert item (eg key:value pair):
Search for leaf where the item should be. If leaf is not full, add item to the leaf. If leaf is full: Identify the middle item (existing item or the new item) Create a new leaf node: retain items before middle key in original node put items after middle key in new leaf node, push item up to parent, along with pointer to new node To add new item to parent: if parent is not full: add new item to parent, and add new child pointer just right of new item else: split parent node into two nodes (like leaf) push middle item up to grandparent. add pointer to new child just right of pushed up item

11 2-3 B Tree: Inserting values
Add M H T S R Q B A F D Z

12 2-3 B Tree: Inserting values
Add 8, 5, 1, 7, 3,12, 9, 6

13 2-3 BTree: Deletion Opposite of inserting: if a node becomes empty,
if possible, rotate a value from a sibling through the parent to ensure minimum number of values per node. if two siblings, require >= 5 keys in parent and siblings if one sibling, require >=3 keys in parent and siblings else merge nodes: if two siblings, merge Easier at a leaf. Harder if at an internal node.

14 B Tree: Deletion To delete a value, x:
Search for the node, say n, containing x. If n is a leaf, just delete x from n. If n is an internal node, replace x by the next smallest (largest in right subtree) or next largest (smallest in left subtree). E.g. if children are leaves: […|x|…| ] […|w|…| ]  […|w| ] […| ] or: […|x|…| ] […|y|…| ]  [y|…| ] […| ] Added later and shown in lecture 25.

15 B Tree: Deletion After deleting x, one leaf node has one fewer values.
If this node now has fewer than the required minimum, we must rebalance the tree. Either: Rotate: Move a value from a sibling node to the parent node, in place of the separator, and move the separator to the under-full leaf; Or: Merge: Combine the under-full node with a sibling and the separator, and delete the empty node and the separator from the parent. Added later and shown in lecture 25. Does this cover all cases?

16 B Tree: Deletion - Rotate
If a sibling node has more than the minimum number of values, move its smallest/largest value to the parent node, in place of the separator, and move the separator to the under-full leaf. If left sibling has a “spare” value: […|x|…| ] […|w|…| ]  […|w| ] […| ] […| ] [x|…| ] Exercise! Added later and shown in lecture 25.

17 B Tree: Deletion - Merge
If there is not a sibling with a “spare” value: merge the under-full leaf with a sibling (which has minimum number of values) Move the separator and all of the values from the sibling into the under-full node. Delete the sibling and delete the separator from the parent. Merge […|s|…| ] […|…| ]  […| ] […| ] […|s|…] If left sibling has a “spare” value: Exercise! Added later and shown in lecture 25.

18 Deleting from leaves E J A C G K M E J A C G M C J A E M C A J J A C M
Only reduce levels at the root A C

19 More deletions: When internal node becomes too empty
propagate the deletion up the tree N V E J R X A C G K P S W Y Z

20 More deletions: Deletion from internal node ⇒
rotate key and child from sibling J V E N X A C G K R S W Y Z

21 Analysis B trees are balanced:
A new level is introduced only at the top A level is removed only from the top Therefore: all leaves are at the same level. Cost of search/add/delete: O(log⌈m/2⌉(n)) (at worst) = depth of tree with all half full nodes O(logm(n)) (at best) = depth of tree with full nodes if 100 million items in a B tree with m = 20, log10(100,000,000) = ? 8 log20(100,000,000) = ? 6.14 if billion items in a B tree with m = 100, log50(1,000,000,000) = ? 5.3 log100(1,000,000,000) = ? 4.5


Download ppt "COMP261 Lecture 23 B Trees."

Similar presentations


Ads by Google