Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015.

Similar presentations


Presentation on theme: "Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015."— Presentation transcript:

1 Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015

2 2-3-4 Trees Binary trees revisited Binary trees combine the best of both worlds of dynamic memory usage and performing binary search like you could with a sorted array The search algorithm with a binary tree will only achieve O(log n) as long as the tree is balanced The balance of a tree is dependent on the inserting and deleting of nodes which can lead to imbalance Imbalance leads to O(n) search performance which is basically a linked list

3 2-3-4 Trees Advanced tree ideas As with other data structures, we try to address the cons For trees, we want to efficiently maintain balance as inserts and deletes are performed There are tree algorithms that already look at ways to do this: –AVL trees –Red-black trees These trees keep the basic structure of a node As you would guess, the function algorithms are more complex than the standard tree

4 2-3-4 Trees Multiway tree What if we modified the tree node instead? Notice each node here contains multiple data elements and multiple child links The modified structure is interesting, but needs to work within a set of rules to guarantee balance 40 root 6020 1030508070

5 2-3-4 Trees Multiway tree A non-leaf node with 1 data item always has 2 children root 20 1030

6 2-3-4 Trees Multiway tree A non-leaf node with 2 data item always has 3 children 40 root 20 10306050

7 2-3-4 Trees Multiway tree A non-leaf node with 3 data item always has 4 children 40 root 6020 1030508070

8 2-3-4 Trees Multiway tree Leaf nodes can have any number of data items 40 root 6020 10313230508070

9 2-3-4 Trees Multiway tree As before, child nodes to the left and right of a data item are less and greater to maintain order 40 root 6020 10313230508070

10 2-3-4 Trees Similarities to Binary trees While the number of items and node children have increased, the basic order is the same This promotes a search and insert performance similar to binary trees at O(log n) Search starts at root examining data items against the search value and traverses down nodes appropriately Insert adds new data items at the appropriate leaf level The algorithms will show that balance will always be achieved. This makes search and insert perform at O(log n)

11 2-3-4 Trees Insert New data items will be inserted at the leaf level In order to maintain balance, as we perform the normal search for the appropriate leaf to insert the new data element, we add a rule to the algorithm: –When visiting any node, if it is full, “split” the node –Whether or not a split has occurred, continue down the path using the standard search until a leaf node is reached –Once a leaf is reached, add the new data element to it (if it is full, perform another “split”)

12 2-3-4 Trees Split The splitting of a node requires creating a new or modifying an existing parent node as well as creating a new sibling node Data elements are moved and child pointers are readjusted as follows: –A new node is created as a sibling to the full node –The 3 rd data item of the full node is moved to the sibling node as its 1 st data item –The 2 nd data item of the full node is added to the parent node –The 1 st data item of the full node remains where it is –The 3 rd and 4 th child pointers of the full node move to the sibling node as its 1 st and 2 nd child pointers

13 2-3-4 Trees Split example 1 We want to add 5 to the tree below. We start at root, 1 st data item is 14 so we go down the 1 st child pointer. We see it’s full so we must split it root 14 610317 2141620181287

14 2-3-4 Trees root (parent) 14 610317 2141620181287 Step 1: Create new sibling node Notice parent node in this case is root and the sibling is not yet attached to the parent (the 2 nd child pointer of root is still connected as such) (current) (sibling)

15 2-3-4 Trees root (parent) 14 6317 2141620181287 (current) 10 (sibling) Step 2: Move 3 rd item to as 1 st item of new node 10 of current moves to new sibling node

16 2-3-4 Trees 14 root (parent) 6 317 2141620181287 (current) 10 (sibling) Step 3: Move 2 nd item to parent Notice 6 is inserted into the data item list of parent. This shifts 14 as well as its 2 child pointers

17 2-3-4 Trees 14 root (parent) 6 317 2141620181287 (current) 10 (sibling) Step 5: Move 3 rd and 4 th child pointers as 1 st and 2 nd child pointers of sibling This keeps the parent-child relationships and orders intact and balanced

18 2-3-4 Trees Split Analysis The split keeps the non-leaf and leaf rules intact Guarantees non-leaf nodes with 1, 2 or 3 data items have 2, 3 or 4 child nodes The split is performed as full nodes are encountered on the way down In the previous example, the insert of 5 still has not been performed The insert process resumes at the parent. Note that if the parent is full as a result of the split, a split at that node is not performed

19 2-3-4 Trees 146 317 21541620181287 10 Resume insert at parent 5 is less than 6 so we go down child pointer 1. 5 is greater than 3 and there is only 1 data item, so we go down 2 nd child pointer. Node with data item 4 is a leaf and is not full so we add 5 there.

20 2-3-4 Trees Insert Analysis The algorithm keeps the tree balanced New nodes are created as needed by adding siblings before adding levels Levels are increased when the root node is the one that requires splitting When splitting the root, the same split algorithm applies, but instead of adding the 2 nd data item to the parent node, a new parent node is created (as the new root)

21 2-3-4 Trees Splitting the root Here, we will insert 15. Before we even go down a child node, we must split the root because it is full 40 root 6020 10313230508070

22 2-3-4 Trees Step 1: Create the sibling node The algorithm works the same as before, except there is no “parent” node (yet) 40 root 6020 10313230508070 (current) (sibling)

23 2-3-4 Trees Step 2: Create new root as parent Since the current node is root, we create another new node to be the parent (and new root) 40 root 6020 10313230508070 (current) (sibling) (parent)

24 2-3-4 Trees Step 3: Move data items The normal split occurs. 3 rd item of current moves to 1 st of sibling and 2 nd item of current moves to 1 st of parent root 20 10313230508070 60 (current) (sibling) 40 (parent)

25 2-3-4 Trees Step 4: Update pointers 3 rd and 4 th child pointers of current become 1 st and 2 nd of sibling. 1 st and 2 nd of new parent get current and sibling nodes respectively root 20 10313230508070 60 (current) (sibling) 40 (parent)

26 2-3-4 Trees Step 5: New root and continue Make the parent the new root of the tree. Resume the insert from the root (15 will end up going down and added to leaf node with 10) Notice the full leaf node 30, 31, 32 is not split. This is because it is never visited 20 1510313230508070 60 40 (root)

27 2-3-4 Trees Insert Analysis Splitting will only occur when a visited node is full, keeping the 2- 3-4 tree rules intact Levels of the tree increase “upward” when the root node is full (because the new parent is created at that moment and becomes the new root) Splitting a leaf node will never result in more than 4 children for a parent node (if the parent node had 4 children, it would be full and split before reaching any of the child leaf nodes) Balance is maintained because even if one side gets “heavy” with data items, the number of nodes will remain balanced because of the splitting algorithm Best practice at understanding the algorithm is to insert a series of numbers and draw the resulting tree

28 2-3-4 Trees public class Node234 { private int numItems; private Node234 parent; private Node234[] children; private int[] dataItems;

29 2-3-4 Trees public Node234() { numItems = 0; parent = null; children = new Node234[4]; dataItems = new int[3]; for (int n = 0; n < 4; n++) children[n] = null; for (int n = 0; n < 3; n++) dataItems[n] = -1; }

30 2-3-4 Trees public class Tree234 { private Node234 root; public Tree234() { root = new Node234(); }

31 2-3-4 Trees Node234 and Tree234 Code More properties needed here for the node –numItems to keep track of how many data items are in the node –Reference to parent node (useful for handling splits) –Array of child pointers –Array of data items The array sizes are defined in the constructor and initialized to null (for children) and -1 (for data items) We could also use a Linked List for the child and data arrays, but they are so small, we don’t necessarily need to (and simplifying the code to start) The Tree is just the root node. Note that it is not initialized to null, but to a new Node234 object with no data items

32 2-3-4 Trees public void insert(int value) { Node234 current = root; while(true) { if(current.isFull()) { split(current); current = current.getParent(); current = getNextChild(current, value); }

33 2-3-4 Trees Tree234 Insert Code We start with a current node at root The loop plans to go down child nodes of the tree until we reach a leaf Along the way, if the node.isFull method returns true, we have to split it After the split, we set current to its parent followed by finding the appropriate child to go to based on the value to be inserted Many methods being used here: isFull, split, getParent and getnextChild

34 2-3-4 Trees public boolean isFull() { return (numItems == 3); } public Node234 getParent() { return parent; } // Note: these methods appear in the Node234 class (split and getNextChild are in Tree234)

35 2-3-4 Trees private void split(Node234 n) { int thirdItem = n.removeItem(); int secondItem = n.removeItem(); Node234 fourthChild = n.removeChild(3); Node234 thirdChild = n.removeChild(2); Node234 sibling = new Node234(); Node234 parent;

36 2-3-4 Trees Tree234 Split Code It is important now if you haven’t been drawing pictures to go through code that you do so now… Split begins with removing the 2 nd and 3 rd data items from the full node and storing their values – these will be transferred to the parent and sibling nodes respectively We do the same with disconnecting the 3 rd and 4 th child pointers of the node (so we can transfer them to the sibling) We then create a new sibling node and a parent pointer (parent is not a new node yet as we haven’t determined if the full node is root at this point) The setup is complete, but there are 2 new methods in Node234 to review: removeItem and removeChild

37 2-3-4 Trees public int removeItem() { int lastItem = dataItems[numItems - 1]; dataItems[--numItems] = -1; return lastItem; } // This removes the last data item in the data array (setting it to -1), decrements numItems and returns the value that was removed

38 2-3-4 Trees public Node234 removeChild(int n) { Node234 child = children[n]; children[n] = null; return child; } // This sets the given child of the node to null while returning a reference to that child // Now we can look at the next part of the split function…

39 2-3-4 Trees if (n == root) { parent = new Node234(); root = parent; root.setChild(0, n); } else parent = n.getParent(); // If the node being split is root, now create a new node as parent and root and set its first child to the current node // Otherwise, a parent exists and we just get it

40 2-3-4 Trees int itemLocation = parent.insertItem(secondItem); int parentItems = parent.getNumItems(); int c = parentItems - 1; while (c > itemLocation) { Node234 temp = parent.removeChild(c); parent.setChild(c + 1, temp); c--; } parent.setChild(itemLocation + 1, sibling);

41 2-3-4 Trees Tree234 Split Code – adjusting the parent The second item from the full node being split is inserted into the parent node using the Node’s insertItem function The location of that insert can vary, so it is returned here to determine how to adjust the child pointers of the parent This is done by getting the number of items in the parent and using a loop down to the location of the new item that was inserted –At each iteration, we remove the child pointer on its right and set it equal to the pointer on its left – this shifts the child pointers to the right that are after the inserted item Once that shift is complete, there will be a “hole” to the right of where the item inserted into the parent took place This hole is filled by connecting it to the new sibling node just created! Notice we have more Node234 functions: insertItem and getNumItems…

42 2-3-4 Trees public int getNumItems() { return numItems; } // This method is a standard get function of a class, returning the numItems property // insertItem is not as simple…

43 2-3-4 Trees public int insertItem(int data) { numItems++; int c = 0; for (int n = 2; n >= 0; n--) { if (dataItems[n] == -1) continue; // From right to left of the data items array, we check for non-empty data items (denoted as not equal to -1), if a spot is empty, ignore it

44 2-3-4 Trees else { int d = dataItems[n]; if (data < d) dataItems[n + 1] = dataItems[n]; else { dataItems[n + 1] = data; return n + 1; } dataItems[0] = data; return 0; }

45 2-3-4 Trees Node234 Code – inserting a data item The “else” branch here deals with encountering a data item as we go right to left in the data array looking for the correct place to insert the new data item When a data item is found, compare it to the new item –If the new item is less than it, the new item belongs to the left so we shift the data item in the array to the right by 1 –Otherwise, the new data item belongs to the right of this item in the array so we set it there and return that index If we reach the end of the loop, that means all data items in the array shifted to the right and the new item belongs in the first spot (index 0). We insert it there and return that index A lot of bouncing back and forth between Node234 and Tree234! We’re almost done though. At this point, the we’ve created the sibling node, and inserted the 2 nd data item of the full node into the parent (created or existing) All that is left in the split function is to set the sibling to the new data and child pointers

46 2-3-4 Trees sibling.insertItem(thirdItem); sibling.setChild(0, thirdChild); sibling.setChild(1, fourthChild); } // Using the Node functions previously discussed, we insert the 3 rd data item from the full node into the sibling and set its 1 st and 2 nd child pointers to what was once the full node’s 3 rd and 4 th children

47 2-3-4 Trees Efficiency The insert algorithm and the splits with the 2-3-4 tree guarantee balance The balance leads to an O(log n) category performance Each node contains 3 data items which imply extra data usage and impact to performance Question: is the impact on performance on with traversing each node’s data array significant? Question 2: is the array allocation of 3 elements per node a significant amount of data storage?

48 2-3-4 Trees Performance Worst case searches mean for each node visited at each level, the entire data array is traversed before finding the element or determining the next level to descend (this is also the tree’s maximum value) Because of the way the insert and split algorithms work, it is rare to see full nodes that haven’t been split on each level Also, even if each node on each level was full when visited, the number of data item searches will still be O(log n) proportional to the total number of data elements This makes the search performance ultimately comparable to a balanced binary search tree

49 2-3-4 Trees Data Storage With most nodes in the tree not usually full, that implies an amount of unused data space The math works out to about 2/7 of unused space based on the number of elements inserted into the tree Compared to self-balancing trees like red black trees and AVL trees, the amount of overhead to balance the tree is comparable to the amount of unused space (you get a little better performance with 2-3-4 than the balancing trees with a relative price in data storage) Why not use a linked list instead of an array? There is an increased amount of overhead with doing that as well, but if that is necessary to relieve the unused space, it can be used

50 2-3-4 Trees Tree Traversal Displaying data in order with a binary search tree involved using simple recursion of displaying the subtree on the left, printing the current element and then displaying the subtree on the right The same concept can apply with a 2-3-4 tree except you must now account for the multiple data items and child pointers: –If the current node is not null, print the child[0] subtree, print data item[0], print child[1] subtree –If the current node has 2 data items, also print data item[1] and then print the child[2] subtree –If the current node is full, also print data item[2] and then print the child[3] subtree

51 2-3-4 Trees private void displayInOrder(Node234 current) { if (current != null) { displayInOrder(current.getChild(0)); int n = current.getNumItems(); for (int c = 0; c < n; c++) { System.out.println(current.getItem(c)); displayInOrder(current.getChild(c+1)); }

52 2-3-4 Trees Delete As you can imagine, the delete function appears quite challenging: –Removing an item at the leaf level is not hard –Removing an item at a non-leaf level requires rearranging nodes and child pointers The “cop out” discussed with Binary Trees is even more necessary here –Make each data item a class with an additional “isDeleted” property –Mark data items as true for isDeleted when removed –Rebuild the 2-3-4 tree as needed walking through the tree and inserting elements into a new tree that are not flagged for deletion –The new 2-3-4 tree will still be a balanced tree

53 2-3-4 Trees Applications Guaranteed balance is a big advantage that you get with a 2-3-4 over a binary search tree Minimized node count and balance also reduces the amount of node visits Reduced node visits can be useful in applications where nodes representing a significant data element is captured –Disk blocks as nodes mean less time to find a block of data on a track that takes time to find –Disk storage is a popular use of this data structure

54 2-3-4 Trees Other Multiway Trees 2-3 trees are similar to 2-3-4 trees: –2 data items and 3 child pointers –Same non-leaf node rules apply Larger sized data item trees follow the same rules for number of data items and children (links = data items + 1) – this makes the insert and split algorithm the same 2-3 trees split only when the leaf is full and recursively split full parents up the tree (this keeps the number of splits necessary per insert to a minimum)

55 2-3-4 Trees Summary Whether self-balancing binary search tree or 2-3- 4 type tree, balance is the theme to keep performance at O(log n) Self balancing trees reduce the memory usage and makes that more dynamic while the algorithms for 2-3-4 trees are not as complex The search is optimized by way of storing the data in some determined order Search can reach O(1) performance if that order was not as significant and the data elements could be mapped in a different way…


Download ppt "Data Structures 2-3-4 Trees Phil Tayco Slide version 1.0 Apr. 23, 2015."

Similar presentations


Ads by Google