Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
AVL Trees CSE 373 Data Structures Lecture 8. 12/26/03AVL Trees - Lecture 82 Readings Reading ›Section 4.4,
Advertisements

COL 106 Shweta Agrawal, Amit Kumar
Chapter 4: Trees Part II - AVL Tree
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Trees Types and Operations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Trees, Binary Trees, and Binary Search Trees COMP171.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CSE 326: Data Structures AVL Trees
Lec 15 April 9 Topics: l binary Trees l expression trees Binary Search Trees (Chapter 5 of text)
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
AVL Trees ITCS6114 Algorithms and Data Structures.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Data Structures CSCI 2720 Spring 2007 Balanced Trees.
Data Structures Balanced Trees 1CSCI Outline  Balanced Search Trees 2-3 Trees Trees Red-Black Trees 2CSCI 3110.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Trees, Binary Trees, and Binary Search Trees COMP171.
Starting at Binary Trees
Binary Search Tree vs. Balanced Search Tree. Why care about advanced implementations? Same entries, different insertion sequence: 10,20,30,40,50,60,70,
Trees  Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search,
Binary trees -2 Chapter Threaded trees (depth first) Binary trees have a lot of wasted space: the leaf nodes each have 2 null pointers We can.
File Organization and Processing Week Tree Tree.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables I.
CMSC 202, Version 5/02 1 Trees. CMSC 202, Version 5/02 2 Tree Basics 1.A tree is a set of nodes. 2.A tree may be empty (i.e., contain no nodes). 3.If.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Trees CSIT 402 Data Structures II 1. 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical.
Data Structures Balanced Trees CSCI 2720 Spring 2007.
BCA-II Data Structure Using C
B-Trees B-Trees.
B-Trees B-Trees.
B-Trees B-Trees.
UNIT III TREES.
CSIT 402 Data Structures II
Trees (Chapter 4) Binary Search Trees - Review Definition
AVL Trees "The voyage of discovery is not in seeking new landscapes but in having new eyes. " - Marcel Proust.
Binary Trees, Binary Search Trees
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Data Structures Balanced Trees CSCI
B-Trees.
Other time considerations
AVL Trees CSE 373 Data Structures.
CSIT 402 Data Structures II With thanks to TK Prasad
Binary Trees, Binary Search Trees
AVL Trees (a few more slides)
ITCS6114 Algorithms and Data Structures
B-Trees.
CSE 373 Data Structures Lecture 8
B-Trees.
Heaps & Multi-way Search Trees
Binary Trees, Binary Search Trees
Presentation transcript:

Data Structures and Algorithms Course’s slides: Hierarchical data structures www.mif.vu.lt/~algis

Trees Linear access time of linked lists is prohibitive Does there exist any simple data structure for which the running time of most operations (search, insert, delete) is O(log N)?

Trees A tree is a collection of nodes The collection can be empty (recursive definition) If not empty, a tree consists of a distinguished node r (the root), and zero or more nonempty subtrees T1, T2, ...., Tk, each of whose roots are connected by a directed edge from r

Some Terminologies Child and parent Leaves Sibling Every node except the root has one parent  A node can have an arbitrary number of children Leaves Nodes with no children Sibling nodes with same parent

Some Terminologies Path Length Depth of a node Height of a node number of edges on the path Depth of a node length of the unique path from the root to that node The depth of a tree is equal to the depth of the deepest leaf Height of a node length of the longest path from that node to a leaf all leaves are at height 0 The height of a tree is equal to the height of the root Ancestor and descendant Proper ancestor and proper descendant

Example: UNIX Directory

Binary Trees A tree in which no node can have more than two children The depth of an “average” binary tree is considerably smaller than N, eventhough in the worst case, the depth can be as large as N – 1.

Example: Expression Trees Leaves are operands (constants or variables) The other nodes (internal nodes) contain operators Will not be a binary tree if some operators are not binary

Tree traversal Used to print out the data in a tree in a certain order Pre-order traversal Print the data at the root Recursively print out all data in the left subtree Recursively print out all data in the right subtree

Preorder, Postorder and Inorder Preorder traversal node, left, right prefix expression ++a*bc*+*defg

Preorder, Postorder and Inorder Postorder traversal left, right, node postfix expression abc*+de*f+g*+ Inorder traversal left, node, right. infix expression a+b*c+d*e+f*g

Preorder, Postorder and Inorder

Binary Trees Possible operations on the Binary Tree ADT Implementation parent left_child, right_child sibling root, etc Implementation Because a binary tree has at most two children, we can keep direct pointers to them

Compare: Implementation of a general tree

Binary Search Trees Stores keys in the nodes in a way so that searching, insertion and deletion can be done efficiently. Binary search tree property For every node X, all the keys in its left subtree are smaller than the key value in X, and all the keys in its right subtree are larger than the key value in X

Binary Search Trees A binary search tree Not a binary search tree

Binary search trees Two binary search trees representing the same set: Average depth of a node is O(log N); maximum depth of a node is O(N)

Searching BST If we are searching for 15, then we are done. If we are searching for a key < 15, then we should search in the left subtree. If we are searching for a key > 15, then we should search in the right subtree.

Inorder traversal of BST Print out all the keys in sorted order Inorder: 2, 3, 4, 6, 7, 9, 13, 15, 17, 18, 20

findMin/findMax Return the node containing the smallest element in the tree Start at the root and go left as long as there is a left child. The stopping point is the smallest element Similarly for findMax Time complexity = O(height of the tree)

Insert Proceed down the tree as you would with a find If X is found, do nothing (or update something) Otherwise, insert X at the last spot on the path traversed Time complexity = O(height of the tree)

Delete When we delete a node, we need to consider how we take care of the children of the deleted node. This has to be done such that the property of the search tree is maintained.

Delete Three cases: (1) the node is a leaf Delete it immediately (2) the node has one child Adjust a pointer from the parent to bypass that node

Delete (3) the node has 2 children replace the key of that node with the minimum element at the right subtree delete the minimum element Has either no child or only right child because if it has a left child, that left child would be smaller and would have been chosen. So invoke case 1 or 2 Time complexity = O(height of the tree)

Binary search tree – best time All BST operations are O(d), where d is tree depth minimum d is for a binary tree with N nodes What is the best case tree? What is the worst case tree? So, best case running time of BST operations is O(log N) AVL Trees - Lecture 8 12/26/03

Binary Search Tree - Worst Time Worst case running time is O(N) What happens when you insert elements in ascending order? Insert: 2, 4, 6, 8, 10, 12 into an empty BST Problem: Lack of “balance”: compare depths of left and right subtree Unbalanced degenerate tree AVL Trees - Lecture 8 12/26/03

Balanced and unbalanced BST 1 4 2 2 5 3 1 3 4 4 Is this “balanced”? 5 2 6 6 1 3 5 7 7 AVL Trees - Lecture 8 12/26/03

Approaches to balancing trees Don't balance May end up with some nodes very deep Strict balance The tree must always be balanced perfectly Pretty good balance Only allow a little out of balance Adjust on access Self-adjusting AVL Trees - Lecture 8 12/26/03

Balancing binary search trees Many algorithms exist for keeping binary search trees balanced Adelson-Velskii and Landis (AVL) trees (height- balanced trees) Splay trees and other self-adjusting trees B-trees and other multiway search trees AVL Trees - Lecture 8 12/26/03

Perfect balance Want a complete tree after every operation tree is full except possibly in the lower right This is expensive For example, insert 2 in the tree on the left and then rebuild as a complete tree 6 5 Insert 2 & complete tree 4 9 2 8 1 5 8 1 4 6 9 AVL Trees - Lecture 8 12/26/03

AVL - good but not perfect balance AVL trees are height-balanced binary search trees Balance factor of a node height(left subtree) - height(right subtree) An AVL tree has balance factor calculated at every node For every node, heights of left and right subtree can differ by no more than 1 Store current heights in each node AVL Trees - Lecture 8 12/26/03

Height of an AVL tree N(h) = minimum number of nodes in an AVL tree of height h. Basis N(0) = 1, N(1) = 2 Induction N(h) = N(h-1) + N(h-2) + 1 Solution (recall Fibonacci analysis) N(h) > h (  1.62) h h-2 h-1 AVL Trees - Lecture 8 12/26/03

Height of an AVL Tree N(h) > h (  1.62) Suppose we have n nodes in an AVL tree of height h. n > N(h) (because N(h) was the minimum) n > h hence log n > h (relatively well balanced tree!!) h < 1.44 log2n (i.e., Find takes O(logn)) AVL Trees - Lecture 8 12/26/03

Node Heights 6 6 4 9 4 9 1 5 1 5 8 height of node = h Tree A (AVL) Tree B (AVL) height=2 BF=1-0=1 2 6 6 1 1 1 4 9 4 9 1 5 1 5 8 height of node = h balance factor = hleft-hright empty height = -1 AVL Trees - Lecture 8 12/26/03

Node heights after insert 7 Tree A (AVL) Tree B (not AVL) balance factor 1-(-1) = 2 2 3 6 6 1 1 1 2 4 9 4 9 -1 1 1 5 7 1 5 8 7 height of node = h balance factor = hleft-hright empty height = -1 AVL Trees - Lecture 8 12/26/03

Insert and rotation in AVL trees Insert operation may cause balance factor to become 2 or – 2 for some node only nodes on the path from insertion point to root node have possibly changed in height So after the Insert, go back up to the root node by node, updating heights If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node AVL Trees - Lecture 8 12/26/03

Single Rotation in an AVL Tree 2 2 6 6 1 2 1 1 4 9 4 8 1 1 5 8 1 5 7 9 7 AVL Trees - Lecture 8 12/26/03

Insertions in AVL trees Let the node that needs rebalancing be . There are 4 cases: Outside Cases (require single rotation) : 1. Insertion into left subtree of left child of . 2. Insertion into right subtree of right child of . Inside Cases (require double rotation) : 3. Insertion into right subtree of left child of . 4. Insertion into left subtree of right child of . The rebalancing is performed through four separate rotation algorithms. AVL Trees - Lecture 8 12/26/03

AVL insertion: outside case j Consider a valid AVL subtree k h Z h h X Y AVL Trees - Lecture 8 12/26/03

AVL Insertion: Outside Case j Inserting into X destroys the AVL property at node j k h Z h+1 h Y X AVL Trees - Lecture 8 12/26/03

AVL Insertion: Outside Case j Do a “right rotation” k h Z h+1 h Y X AVL Trees - Lecture 8 12/26/03

j k Z Y X Single right rotation Do a “right rotation” h h+1 h AVL Trees - Lecture 8 12/26/03

Outside Case Completed k “Right rotation” done! (“Left rotation” is mirror symmetric) j h+1 h h X Z Y AVL property has been restored! AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case j Consider a valid AVL subtree k h Z h h X Y AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case j Inserting into Y destroys the AVL property at node j Does “right rotation” restore balance? k h Z h h+1 X Y AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case k “Right rotation” does not restore balance… now k is out of balance j h X h h+1 Z Y AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case j Consider the structure of subtree Y… k h Z h h+1 X Y AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case j Y = node i and subtrees V and W k h Z i h h+1 X h or h-1 V W AVL Trees - Lecture 8 12/26/03

AVL Insertion: Inside Case j We will do a left-right “double rotation” . . . k Z i X V W AVL Trees - Lecture 8 12/26/03

Double rotation : first rotation j left rotation complete i Z k W V X AVL Trees - Lecture 8 12/26/03

Double rotation : second rotation j Now do a right rotation i Z k W V X AVL Trees - Lecture 8 12/26/03

Double rotation : second rotation right rotation complete Balance has been restored i k j h h h or h-1 V X W Z AVL Trees - Lecture 8 12/26/03

Implementation balance (1,0,-1) key left right No need to keep the height; just the difference in height, i.e. the balance factor; this has to be modified on the path of insertion even if you don’t perform rotations Once you have performed a rotation (single or double) you won’t need to go back up the tree AVL Trees - Lecture 8 12/26/03

Single Rotation RotateFromRight(n : reference node pointer) { p : node pointer; p := n.right; n.right := p.left; p.left := n; n := p } n X You also need to modify the heights or balance factors of n and p Y Z Insert AVL Trees - Lecture 8 12/26/03

Double Rotation Implement Double Rotation in two lines. DoubleRotateFromRight(n : reference node pointer) { ???? } n X Z V W AVL Trees - Lecture 8 12/26/03

Insertion in AVL Trees Insert at the leaf (as for all BST) only nodes on the path from insertion point to root node have possibly changed in height So after the Insert, go back up to the root node by node, updating heights If a new balance factor (the difference hleft-hright) is 2 or –2, adjust tree by rotation around the node AVL Trees - Lecture 8 12/26/03

Insert in BST Insert(T : reference tree pointer, x : element) : integer { if T = null then T := new tree; T.data := x; return 1;//the links to //children are null case T.data = x : return 0; //Duplicate do nothing T.data > x : return Insert(T.left, x); T.data < x : return Insert(T.right, x); endcase } AVL Trees - Lecture 8 12/26/03

Insert in AVL trees Insert(T : reference tree pointer, x : element) : { if T = null then {T := new tree; T.data := x; height := 0; return;} case T.data = x : return ; //Duplicate do nothing T.data > x : Insert(T.left, x); if ((height(T.left)- height(T.right)) = 2){ if (T.left.data > x ) then //outside case T = RotatefromLeft (T); else //inside case T = DoubleRotatefromLeft (T);} T.data < x : Insert(T.right, x); code similar to the left case Endcase T.height := max(height(T.left),height(T.right)) +1; return; } AVL Trees - Lecture 8 12/26/03

Example of Insertions in an AVL Tree 2 20 Insert 5, 40 1 10 30 25 35 AVL Trees - Lecture 8 12/26/03

Example of Insertions in an AVL Tree 2 3 20 20 1 1 1 2 10 30 10 30 1 5 25 35 5 25 35 40 Now Insert 45 AVL Trees - Lecture 8 12/26/03

Single rotation (outside case) 3 3 20 20 1 2 1 2 10 30 10 30 2 1 5 25 35 5 25 40 35 45 40 1 Imbalance 45 Now Insert 34 AVL Trees - Lecture 8 12/26/03

Double rotation (inside case) 3 3 20 20 1 3 1 2 10 30 10 35 2 1 1 5 Imbalance 25 40 5 30 40 25 34 1 45 35 45 Insertion of 34 34 AVL Trees - Lecture 8 12/26/03

AVL Tree Deletion Similar but more complex than insertion Rotations and double rotations needed to rebalance Imbalance may propagate upward so that many rotations may be needed. AVL Trees - Lecture 8 12/26/03

Pros and Cons of AVL Trees Arguments for AVL trees: Search is O(log N) since AVL trees are always balanced. Insertion and deletions are also O(logn) The height balancing adds no more than a constant factor to the speed of insertion. Arguments against using AVL trees: Difficult to program & debug; more space for balance factor. Asymptotically faster but rebalancing costs time. Most large searches are done in database systems on disk and use other structures (e.g. B-trees). May be OK to have O(N) for a single operation if total run time for many consecutive operations is fast (e.g. Splay trees). AVL Trees - Lecture 8 12/26/03

Double Rotation Solution DoubleRotateFromRight(n : reference node pointer) { RotateFromLeft(n.right); RotateFromRight(n); } n X Z V W AVL Trees - Lecture 8 12/26/03

Outline Balanced Search Trees 2-3 Trees 2-3-4 Trees Red-Black Trees

Why care about advanced implementations? Same entries, different insertion sequence:  Not good! Would like to keep tree balanced.

2-3 Trees each internal node has either 2 or 3 children Features each internal node has either 2 or 3 children all leaves are at the same level

2-3 Trees with Ordered Nodes leaf node can be either a 2-node or a 3-node

Example of 2-3 Tree

Traversing a 2-3 Tree inorder(in ttTree: TwoThreeTree) if(ttTree’s root node r is a leaf) visit the data item(s) else if(r has two data items) { inorder(left subtree of ttTree’s root) visit the first data item inorder(middle subtree of ttTree’s root) visit the second data item inorder(right subtree of ttTree’s root) } else visit the data item

Searching a 2-3 tree retrieveItem(in ttTree: TwoThreeTree, in searchKey:KeyType, out treeItem:TreeItemType):boolean if(searchKey is in ttTree’s root node r) { treeItem = the data portion of r return true } else if(r is a leaf) return false else return retrieveItem( appropriate subtree, searchKey, treeItem)

What did we gain? What is the time efficiency of searching for an item?

Gain: Ease of Keeping the Tree Balanced Binary Search Tree both trees after inserting items 39, 38, ... 32 2-3 Tree

Inserting Items Insert 39

Inserting Items Insert 38 divide leaf and move middle result value up to parent insert in leaf result

Inserting Items Insert 37

Inserting Items Insert 36 divide leaf and move middle value up to parent insert in leaf overcrowded node

Inserting Items ... still inserting 36 divide overcrowded node, move middle value up to parent, attach children to smallest and largest result

Inserting Items After Insertion of 35, 34, 33

Inserting so far

Inserting so far

Inserting Items How do we insert 32?

Inserting Items creating a new root if necessary tree grows at the root

Inserting Items Final Result

Deleting Items Delete 70 70 80

Deleting Items Deleting 70: swap 70 with inorder successor (80)

Deleting Items Deleting 70: ... get rid of 70

Deleting Items Result

Deleting Items Delete 100

Deleting Items Deleting 100

Deleting Items Result

Deleting Items Delete 80

Deleting Items Deleting 80 ...

Deleting Items Deleting 80 ...

Deleting Items Deleting 80 ...

Deleting Items Final Result comparison with binary search tree

Deletion Algorithm I Locate node n, which contains item I Deleting item I: Locate node n, which contains item I If node n is not a leaf  swap I with inorder successor deletion always begins at a leaf If leaf node n contains another item, just delete item I else try to redistribute nodes from siblings (see next slide) if not possible, merge node (see next slide)

Deletion Algorithm II Redistribution A sibling has 2 items: redistribute item between siblings and parent Merging No sibling has 2 items: merge node move item from parent to sibling

Deletion Algorithm III Redistribution Internal node n has no item left redistribute Merging Redistribution not possible: merge node move item from parent to sibling adopt child of n If n's parent ends up without item, apply process recursively

Deletion Algorithm IV If merging process reaches the root and root is without item  delete root

Operations of 2-3 Trees all operations have time complexity of log n

2-3-4 Trees similar to 2-3 trees 4-nodes can have 3 items and 4 children 4-node

2-3-4 Tree example

2-3-4 Tree: Insertion Insertion procedure: similar to insertion in 2-3 trees items are inserted at the leafs since a 4-node cannot take another item, 4-nodes are split up during insertion process Strategy on the way from the root down to the leaf: split up all 4-nodes "on the way"  insertion can be done in one pass (remember: in 2-3 trees, a reverse pass might be necessary)

2-3-4 Tree: Insertion Inserting 60, 30, 10, 20, 50, 40, 70, 80, 15, 90, 100

2-3-4 Tree: Insertion Inserting 60, 30, 10, 20 ... ... 50, 40 ...

2-3-4 Tree: Insertion Inserting 50, 40 ... ... 70, ...

2-3-4 Tree: Insertion Inserting 70 ... ... 80, 15 ...

2-3-4 Tree: Insertion Inserting 80, 15 ... ... 90 ...

2-3-4 Tree: Insertion Inserting 90 ... ... 100 ...

2-3-4 Tree: Insertion Inserting 100 ...

2-3-4 Tree: Insertion Procedure Splitting 4-nodes during Insertion

2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 2-node during insertion

2-3-4 Tree: Insertion Procedure Splitting a 4-node whose parent is a 3-node during insertion

2-3-4 Tree: Deletion Deletion procedure: similar to deletion in 2-3 trees items are deleted at the leafs  swap item of internal node with inorder successor note: a 2-node leaf creates a problem Strategy (different strategies possible) on the way from the root down to the leaf: turn 2-nodes (except root) into 3-nodes  deletion can be done in one pass (remember: in 2-3 trees, a reverse pass might be necessary)

2-3-4 Tree: Deletion Case 1: an adjacent sibling has 2 or 3 items Turning a 2-node into a 3-node ... Case 1: an adjacent sibling has 2 or 3 items "steal" item from sibling by rotating items and moving subtree 30 50 10 20 40 25 20 50 10 30 40 25 "rotation"

2-3-4 Tree: Deletion Turning a 2-node into a 3-node ... Case 2: each adjacent sibling has only one item  "steal" item from parent and merge node with sibling (note: parent has at least two items, unless it is the root) 30 50 50 10 40 10 30 40 merging 25 35 25 35

2-3-4 Tree: Deletion Practice Delete 32, 35, 40, 38, 39, 37, 60

Red-Black Tree binary-search-tree representation of 2-3-4 tree 3- and 4-nodes are represented by equivalent binary trees red and black child pointers are used to distinguish between original 2-nodes and 2-nodes that represent 3- and 4-nodes

Red-Black Representation of 4-node

Red-Black Representation of 3-node

Red-Black Tree Example

Red-Black Tree Example

Red-Black Tree Operations Traversals same as in binary search trees Insertion and Deletion analog to 2-3-4 tree need to split 4-nodes need to merge 2-nodes

Splitting a 4-node that is a root

Splitting a 4-node whose parent is a 2-node

Splitting a 4-node whose parent is a 3-node

Splitting a 4-node whose parent is a 3-node

Splitting a 4-node whose parent is a 3-node

Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t fit? We will have to use disk storage but when this happens our time complexity fails The problem is that Big-Oh analysis assumes that all operations take roughly equal time This is not the case when disk access is involved

Motivation (cont.) Assume that a disk spins at 3600 RPM In 1 minute it makes 3600 revolutions, hence one revolution occurs in 1/60 of a second, or 16.7ms On average what we want is half way round this disk – it will take 8ms This sounds good until you realize that we get 120 disk accesses a second – the same time as 25 million instructions In other words, one disk access takes about the same time as 200,000 instructions It is worth executing lots of instructions to avoid a disk access

Motivation (cont.) Assume that we use an Binary tree to store all the details of people in Canada (about 32 million records) We still end up with a very deep tree with lots of different disk accesses; log2 20,000,000 is about 25, so this takes about 0.21 seconds (if there is only one user of the program) We know we can’t improve on the log n for a binary tree But, the solution is to use more branches and thus less height! As branching increases, depth decreases

Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: 1. the number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree 2. all leaves are on the same level 3. all non-leaf nodes except the root have at least m / 2 children 4. the root is either a leaf node, or it has from two to m children 5. a leaf node contains no more than m – 1 keys The number m should always be odd

An example B-Tree A B-tree of order 5 containing 26 items 26 6 12 42 51 62 1 2 4 7 8 13 15 18 25 27 29 45 46 48 53 55 60 64 70 90 Note that all the leaves are at the same level

Constructing a B-tree Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 We want to construct a B-tree of order 5 The first four items go into the root: To put the fifth item in the root would violate condition 5 Therefore, when 25 arrives, pick the middle key to make a new root 1 2 8 12

Constructing a B-tree 1 2 8 12 25 Add 25 to the tree 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 Exceeds Order. Promote middle and split. 1 2 8 12 25

Constructing a B-tree (contd.) 8 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 2 12 25 6, 14, 28 get added to the leaf nodes: 8 1 2 1 6 2 12 14 25 28

Constructing a B-tree (contd.) Adding 17 to the right leaf node would over-fill it, so we take the middle key, promote it (to the root) and split the leaf 8 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 2 6 2 12 14 17 25 28 28

Constructing a B-tree (contd.) 7, 52, 16, 48 get added to the leaf nodes 8 17 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 2 7 6 12 14 16 25 28 48 52

Constructing a B-tree (contd.) Adding 68 causes us to split the right most leaf, promoting 48 to the root 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 8 17 1 2 6 7 12 14 16 25 28 48 52 68

Constructing a B-tree (contd.) Adding 3 causes us to split the left most leaf 8 17 48 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 3 2 6 7 7 12 14 16 25 28 52 68

Constructing a B-tree (contd.) Add 26, 29, 53, 55 then go into the leaves 3 8 17 48 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 1 2 6 7 12 14 16 25 26 28 29 52 53 68 55

Constructing a B-tree (contd.) Add 45 increases the trees level Exceeds Order. Promote middle and split. 1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45 Exceeds Order. Promote middle and split. 3 8 17 48 1 2 6 7 12 14 16 25 26 28 29 45 52 53 55 68

Inserting into a B-Tree Attempt to insert the new key into a leaf If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent If this would result in the parent becoming too big, split the parent into two, promoting the middle key This strategy might have to be repeated all the way to the top If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

Exercise in Inserting a B-Tree Insert the following keys to a 5-way B-tree: 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

Answer to Exercise Java Applet Source

Removal from a B-tree During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this: 1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted. 2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case can we delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position.

Removal from a B-tree (2) If (1) or (2) lead to a leaf node containing less than the minimum number of keys then we have to look at the siblings immediately adjacent to the leaf in question: 3: if one of them has more than the min’ number of keys then we can promote one of its keys to the parent and take the parent key into our lacking leaf 4: if neither of them has more than the min’ number of keys then the lacking leaf and one of its neighbours can be combined with their shared parent (the opposite of promoting a key) and the new leaf will have the correct number of keys; if this step leave the parent with too few keys then we repeat the process up to the root itself, if required

Type #1: Simple leaf deletion Assuming a 5-way B-Tree, as before... 12 29 52 2 7 9 15 22 56 69 72 31 43 Delete 2: Since there are enough keys in the node, just delete it Note when printed: this slide is animated

Type #2: Simple non-leaf deletion 12 29 52 7 9 15 22 56 69 72 31 43 56 Delete 52 Borrow the predecessor or (in this case) successor Note when printed: this slide is animated

Type #4: Too few keys in node and its siblings Join back together 12 29 56 7 9 15 22 69 72 31 43 Too few keys! Delete 72 Note when printed: this slide is animated

Type #4: Too few keys in node and its siblings 12 29 7 9 15 22 69 56 31 43 Note when printed: this slide is animated

Type #3: Enough siblings 12 29 Demote root key and promote leaf key 7 9 15 22 69 56 31 43 Delete 22 Note when printed: this slide is animated

Type #3: Enough siblings 12 31 69 56 43 7 9 15 29 Note when printed: this slide is animated

Exercise in Removal from a B-Tree Given 5-way B-tree created by these data (last exercise): 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56 Add these further keys: 2, 6,12 Delete these keys: 4, 5, 7, 3, 14

Answer to Exercise Java Applet Source

Analysis of B-Trees The maximum number of items in a B-tree of order m and height h: root m – 1 level 1 m(m – 1) level 2 m2(m – 1) . . . level h mh(m – 1) So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1 When m = 5 and h = 2 this gives 53 – 1 = 124

Reasons for using B-Trees When searching tables held on disc, the cost of each disc transfer is high but doesn't depend much on the amount of data transferred, especially if consecutive items are transferred If we use a B-tree of order 101, say, we can transfer each node in one disc read operation A B-tree of order 101 and height 3 can hold 1014 – 1 items (approximately 100 million) and any item can be accessed with 3 disc reads (assuming we hold the root in memory) If we take m = 3, we get a 2-3 tree, in which non-leaf nodes have two or three children (i.e., one or two keys) B-Trees are always balanced (since the leaves are all at the same level), so 2-3 trees make a good type of balanced tree