Presentation on theme: "Chapter 4: Trees Part II - AVL Tree"— Presentation transcript:
1 Chapter 4: Trees Part II - AVL Tree Mark Allen Weiss: Data Structures and Algorithm Analysis in JavaChapter 4: TreesPart II - AVL Tree
2 Better Binary Search Trees Prevent the degeneration of the BST :A BST can be set up to maintain balance during updating operations (insertions and removals)Types of BST which maintain the optimal performance:splay treesAVL treesRed-Black treesB-trees
3 AVL Trees Balanced binary search trees An AVL Tree is a binary search tree such that for every internal node v of T, the heights of the children of v can differ by at most 1.
4 Height of an AVL TreeProposition: The height of an AVL tree T storing n keys is O(log n).Justification: The easiest way to approach this problem is to find n(h): the minimum number of internal nodes of an AVL tree of height h.n(1) = 1 and n(2) = 2for n ≥ 3, an AVL tree of height h contains the root node, one AVL subtree of height n-1 and the other AVL subtree of height n-2. n(h) = 1 + n(h-1) + n(h-2)given n(h-1) > n(h-2) n(h) > 2n(h-2)n(h) > 2n(h-2)n(h) > 4n(h-4)…n(h) > 2in(h-2i)pick i = h/2 – 1 n(h) ≥ 2 h/2-1follow h < 2log n(h) +2 height of an AVL tree is O(log n)
5 InsertionA binary search tree T is called balanced if for every node v, the height of v’s children differ by at most one.Inserting a node into an AVL tree involves performing an expandExternal(w) on T, which changes the heights of some of the nodes in T.If an insertion causes T to become unbalanced, we travel up the tree from the newly created node until we find the first node x such that its grandparent z is unbalanced node.Since z became unbalanced by an insertion in the subtree rooted at its child y, height(y) = height(sibling(y)) + 2Need to rebalance...
6 Insertion: Rebalancing To rebalance the subtree rooted at z, we must perform a restructuringwe rename x, y, and z to a, b, and c based on the order of the nodes in an in-order traversal.z is replaced by b, whose children are now a and c whose children, in turn, consist of the four other subtrees formerly children of x, y, and z.
10 Restructure Algorithm Algorithm restructure(x):Input: A node x of a binary search tree T that has both a parent y and a grandparent zOutput: Tree T restructured by a rotation (eithersingle or double) involving nodes x, y, and z.1: Let (a, b, c) be an inorder listing of the nodes x, y, and z, and let (T0, T1, T2, T3) be an inorder listing of the the four subtrees of x, y, and z, not rooted at x, y, or z.Replace the subtree rooted at z with a new subtree rooted at bLet a be the left child of b and let T0, T1 be the left and right subtrees of a, respectively.Let c be the right child of b and let T2, T3 be the left and right subtrees of c, respectively.
11 Cut/Link Restructure Algorithm Let’s go into a little more detail on this algorithm...Any tree that needs to be balanced can be grouped into 7 parts: x, y, z, and the 4 trees anchored at the children of those nodes (T0-3)
12 Cut/Link Restructure Algorithm Make a new tree which is balanced and put the 7 parts from the old tree into the new tree so that the numbering is still correct when we do an in-order-traversal of the new tree.This works regardless of how the tree is originally unbalanced.Let’s see how it works!
13 Cut/Link Restructure Algorithm Number the 7 parts by doing an in-order-traversal. (note that x,y, and z are now renamed based upon their order within the traversal)
14 Cut/Link Restructure Algorithm Now create an Array, numbered 1 to 7 (the 0th element can be ignored with minimal waste of space)Cut() the 4 T trees and place them in their inorder rank in the array
15 Cut/Link Restructure Algorithm Now cut x,y, and z in that order (child,parent,grandparent) and place them in their inorder rank in the array.Now we can re-link these subtrees to the main tree.Link in rank 4 (b) where the subtree’s root formerly
16 Cut/Link Restructure Algorithm Link in ranks 2 (a) and 6 (c) as 4’s children.
17 Cut/Link Restructure Algorithm Finally, link in ranks 1,3,5, and 7 as the children of 2 and 6.Now you have a balanced tree!
18 Cut/Link Restructure Algorithm This algorithm for restructuring has the exact same effect as using the four rotation cases discussed earlier.Advantages: no case analysis, more elegantDisadvantage: can be more code to writeSame time complexity
19 RemovalWe can easily see that performing a removeAboveExternal(w) can cause T to become unbalanced.Let z be the first unbalanced node encountered while traveling up the tree from w. Also, let y be the child of z with the larger height, and let x be the child of y with the larger height.We can perform operation restructure(x) to restore balance at the subtree rooted at z.As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached
20 Removal (cont’d) example of deletion from an AVL tree: y x T T y z x T 8844177832504862142354TyxT218817785048622354Tyx444z
21 Removal (cont’d) example of deletion from an AVL tree: x y z T 88 17 78504862142354Tyxz244
22 B-TreesUp to now, all data that has been stored in the tree has been in memory.If data gets too big for main memory, what do we do?If we keep a pointer to the tree in main memory, we could bring in just the nodes that we need.For instance, to do an insert with a BST, if we need the left child, we do a disk access and retrieve the left child.If the left child is NIL, then we can do the insert, and store the child node on the disk.Not too good for a BST
23 B-TreesThe problem with BST: storing the data requires disk accesses, which is expensive, compared to execution of machine instructions.If we can reduce the number of disk accesses, then the procedures run faster.The only way to reduce the number of disk accesses is to increase the number of keys in a node.The BST allows only one key per leaf.Very good and often used for Search Engines!(when collection size gets very big the index does not fit in memory)
24 B-TreesIf we increase the number of keys in the nodes, how will we do any tree operations effectively?Above is a node with 7 keys. How do we add children?
25 B-Trees Clearly, the tree below is useless. 10 20 30 40 50 60 70 How many pointers do we need?Using the idea of BST, we need to be able to put nodes into the tree that have smaller, same and larger values than the node we are currently examining.
26 B-Trees: A General Case of Multi-Way Search Trees 706050403020108796432111227733445566We can easily find any value.We need to create operations, which require rules on what makes a tree a B-Tree.Clearly, having one key per node would be very bad.We need a mechanism to increase the height of the tree (since the number of keys in any node can get very high) so we can shift keys out of a node, making the nodes smaller.
27 B-Trees: Fields in a Node A B-Tree is a rooted tree (whose root is root[T])) having the following properties:1. Every internal node x has the following fields:leaf[x]key8[x]key7[x]key6[x]key5[x]key4[x]key3[x]key2[x]key1[x]n[x]c9[x]c8[x]c7[x]c6[x]c5[x]c4[x]c3[x]c2[x]c1[x]n[x] is the number of keys in the node. n[x] = 8 above.leaf[x] = false for internal nodes, since x is not a leaf.The keyi[x] are the values of the keys, where keyi[x] keyi+1[x].ci[x] are pointers to child nodes. All the keys in ci[x] have values that are between keyi-1[x] and keyi[x].
28 B-Trees Leaf nodes have no child pointers leaf[x] = true for leaf nodes.All leaf nodes are at the same levelleaf[x]key8[x]key7[x]key6[x]key5[x]key4[x]key3[x]key2[x]key1[x]n[x]
29 B-TreesThere are lower and upper bounds on the number of keys a node can contain. This depends on the “minimum degree” t 2, which we must specify for any given B-Tree.a. Every node other than the root must have at least t-1 keys. Every internal node other than the root thus has t children. If the tree is nonempty, the root must have at least one key.b. Every node can contain at most 2t-1 keys. Therefore, an internal node can have at most 2t children. A node is full if it contains exactly 2t-1 keys.
30 Height of B-TreeIf n 1, then for any n-key B-tree of height h and mimimum degree t 2,height = h logt[(n+1)/2]The important thing to notice is that the height of the tree is log base t. So, as t increases, the height, for any number of nodes n, will decrease.Using the formula logax = (logbx)/(logba), we can see thatlog2106 = (log10106)/(log102) 6/ 19log10106 = 6So, 13 less disk accesses to get to the leafs!
31 Basic OperationsThe root of the B-tree is always in main memory, so that a Disk-Read on the root is never required; a Disk-Write of the root is required, however, whenever the root node is changed.Any nodes that are passed as parameters must already have had a Disk-Read operation performed on them.
32 Searching a B-Tree Start at the leftmost key in the node, and go to theright until you go too far.If it is a leaf node, then youare done, as there is no leafto inspectOtherwise, retrieve the child nodefrom the disk, and put it intomemory
33 Inserting into B-trees Really very easy. Very similar with (2,4) trees.Just keep in mind that you are starting at the root, and then finding the subtree where the key should be inserted, and following the pointer.A deletion may eventually occur, and sometimes deletions force keys into their parents. So, if we encounter a full node on our way to the node where the insertion will take place, we must split that node into two.
34 Otherwise, call Nonfull() Inserting into B-trees (cont’d)B-Tree-InsertIf the node has2t-1 keys, it can’taccept any morekeys, so you needto split it into2 nodes beforedoing the insert.Otherwise, call Nonfull()