© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Binary.

Presentation on theme: "© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Binary."— Presentation transcript:

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Binary Search Trees (BST) “searching in two directions…”

ICS-II – 2008Lecture 13: Binary Search Trees2 Binary Search Tree 30 1035 120 28 21 Root

ICS-II – 2008Lecture 13: Binary Search Trees3 Binary search trees  A binary search tree is a tree where every node has at most two children  Each node stores a key and some value  The value can also be a more complex structure or pointer  Key values are respectively unique and are elements of a totally ordered set  The order is typically numerical or lexicographical  For each node N and its left and right children L and R:  K L < K N < K R (all keys are distinct)  Beware: definitions differ!: K L < K N ≤ K R (Wikipedia); K L ≤ K N ≤ K R (Cormen)  Condition on key values permits  efficient searching  sequential and ordered processing of the data (traversal in in-order)

ICS-II – 2008Lecture 13: Binary Search Trees4 Definition  Already noted: Binary trees have good access costs while searching  But: While constructing binary trees, they can degenerate to a linear list (is true for binary search trees too)  The possible degeneration is the cost for having simple construction operations (no costs for rearrangement)  A native binary search tree has no rearrangement operations  Definition: A native binary search tree T is a binary tree; it is either empty or each node in T contains a key, so that:  all keys in the left subtree of T are less than the key of the root of T  all keys in the right subtree of T are greater than the key of the root of T  the left and right subtrees of T are native binary search trees too

ICS-II – 2008Lecture 13: Binary Search Trees5 Basic operations  Basic operations on a binary search tree:  Insert  Delete  Search for a key K  Sequential processing of all keys  Find min/max element, find successor/predecessor of an element  Example: Insert  Binary search trees are constructed by repeatedly inserting keys  New keys are always attached to the leaves  Different sequences of insertions result in different tree structures  Procedure: first key will be the root all following keys are inserted recursively either in the left or in the right subtree (depending on the key values)

ICS-II – 2008Lecture 13: Binary Search Trees6 Java class class BinarySearchTree { int K;/* Key */ Info info; /* stored record */ BinarySearchTree L, R; /* Constructor */ public BinarySearchTree(int key, Info i) {... } /* insert record i with key x to the tree */ public BinarySearchTree insert(int key, Info i) {... } /* delete record with key x from the tree */ public void delete(int key) {... } /* return node with key x if it exists, NULL otherwise */ public BinarySearchTree find(int key) {... } /* sequential processing of all nodes in in-order */ public void inOrder( ) {... } /* other methods... */ }

ICS-II – 2008Lecture 13: Binary Search Trees7 Insert operation /* return reference to the new node, which is inserted */ public BinarySearchTree insert(int key, Info i) { if ( key < this.K ) { /* insert in the left subtree */ if ( this.L == null ) { this.L = new BinarySearchTree( key, i ); return this.L ; } else return ( this.L.insert( key, i ) );/* Recursion */ } else { /* this.K >= key, insert in the right subtree */ if ( this.R == null ) { this.R = new BinarySearchTree( key, i ); return this.R ; } else return ( this.R.insert( key, i ) );/* Recursion */ }

ICS-II – 2008Lecture 13: Binary Search Trees8 Example  Sequence of inserts: ORY, JFK, BRU, DUS, ZRH, MEX, ORD, NRT, ARN, GLA, GCM Min/Max? Predecessor(GLA)? Successor(GLA)? ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK

ICS-II – 2008Lecture 13: Binary Search Trees9 Example (2)  Sequence of inserts: GLA, ARN, ORY, BRU, DUS, ZRH, MEX, ORD, NRT, JFK, GCM ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK

ICS-II – 2008Lecture 13: Binary Search Trees10 Example (3)  Sequence of inserts: ARN, BRU, DUS, GCM, GLA, JFK, MEX, NRT, ORD, ORY, ZRH  Sorted sequence results in a degenerated tree ARN BRU ZRH DUS JFK MEX NRT ORY GCM GLA ORD

ICS-II – 2008Lecture 13: Binary Search Trees11 Analysis  Within n keys there are n! permutations, so n! different sequences of inserts.  Not all of them result in different trees.  Example: BRU, ARN, DUS and BRU, DUS, ARN  The number of the different native binary search trees for n keys is 1 n + 1 2n n ()

ICS-II – 2008Lecture 13: Binary Search Trees12 Search (recursive)  Searching for a key is similar to inserting one  Unsuccessful search can be considered as "finding the insert position" /* return reference to the node we are searching for or NULL */ BinarySearchTree find ( int key ) { if ( this.K == key ) return this; if ( key < this.K ) {/* search in the left subtree */ if ( this.L == null ) return null; else return this.L.find( key ); } else { /* this.K > key, search in the right subtree */ if ( this.R == null ) return null; else return this.R.find( key ); } Problems with this implementation? ;)

ICS-II – 2008Lecture 13: Binary Search Trees13 Search (iterative)  Searching corresponds to walking along a specific path in the tree (in the worst case starting from root to a leaf), so it doesn’t need any stack and can be implemented iteratively and efficiently. BinarySearchTree find ( int key ) { BinarySearchTree root = this; while ( root != NULL && root.K != key ) { if ( key < root.K ) root = root.L; else root = root.R; } /* now we have either root == NULL or root.K == key */ return root; }

ICS-II – 2008Lecture 13: Binary Search Trees14 Sequential Processing  Processing of all keys in sorted order can be achieved by an in-order traversal of the tree  Ascending key values by LWR tree walk  Descending key values by RWL tree walk  Threads can in this case obviously enhance the efficiency of the operation

ICS-II – 2008Lecture 13: Binary Search Trees15 Delete  Delete of a node with key x is the most complicated operation.  We differentiate between three cases: Case 1: Node x is a leaf: The leaf can be deleted. There is no need for additional operations. Case 2: Node x has an empty right/left subtree: delete node x, set the reference to the unique subtree of x. y xz y z x z TlTl TrTr z TlTl TrTr

ICS-II – 2008Lecture 13: Binary Search Trees16 Delete  Case 3: Node x has two non empty subtrees: Search either for the smallest right (s r ) descendent or for the greatest left (g l ) descendent. Replace x with s r or g l and delete s r respectively g l from its original position.  This can be seen as switching place of x and s r (or g l ) and doing delete for leaves

ICS-II – 2008Lecture 13: Binary Search Trees17 Delete  Delete can be performed immediately (eager strategy) or delayed (lazy)  With lazy, deleted nodes are only marked as deleted and removed later (garbage collection).  Nodes, which are marked as deleted can, if needed, be reused (if the same key is reinserted)  Deleting with an eager strategy is more complex than within a lazy  Lazy search is more complex than eager (nodes, which are marked as deleted, have also to be treated)

ICS-II – 2008Lecture 13: Binary Search Trees18 Example: Case 1 Delete GCM ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK ORY ZRH MEX ORD NRT DUS BRU ARN JFK GLA

ICS-II – 2008Lecture 13: Binary Search Trees19 Example: Case 2 Delete BRU: ORY ZRH MEX ORDDUS BRU ARN JFK ORY ZRH MEX ORD DUS ARN JFK GLA

ICS-II – 2008Lecture 13: Binary Search Trees20 Example: Case 3 Two possibilities within deleting MEX result in: ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK ORY ZRH JFK ORD NRT GLA GCM DUS BRU ARN ORY ZRH NRT ORD GLA GCM DUS BRU ARN JFK

ICS-II – 2008Lecture 13: Binary Search Trees21 Things can be more complicated! Delete ORY: ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK ORD ZRH NRT GLA GCM DUS BRU ARN JFK MEX

ICS-II – 2008Lecture 13: Binary Search Trees22 Costs of the Basic Operations  Which costs do the operations in a tree with n nodes have?  Sequential processing is already identified as O(n) (with different constant factors)  Costs of delete of a node x:  If x is a leaf or has an empty subtree, the costs are bounded by the depth of x  If not, the node, which will replace x, have to be found. The costs of this operation are bounded by the height of the tree  Direct search is the most important operation, since it is the basis for inserting and deletion  Search costs are in the worst case the costs for traversing the tree from the root to a leaf  Costs are bounded by the height of the tree  Search will be further investigated because of its importance

ICS-II – 2008Lecture 13: Binary Search Trees23 Average Access Costs  Possible measures (consider first successful search):  Number of accesses to the nodes (Z)  Number of key comparisons (C)  Average number of accesses can be determined over the internal path length PL(K) of the tree:  Assumption: Uniformly distributed access probability  PL(T) =  i = 1... n depth(K i )  Average path length L = PL(T)/n  Within each path, the root is taken into account, thus: Z avg = L + 1

ICS-II – 2008Lecture 13: Binary Search Trees24 Example  Z avg = PL(T)/n + 1 = 3.54 accesses  Since per access two comparisons are needed (by the last/successful one only one),  C avg = 2Z avg - 1 = 6.08 comparisons Internal path length PL(T) = 0 + 1 + 1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 5 = 28 n = 11 ORY ZRH MEX ORD NRT GLA GCM DUS BRU ARN JFK

ICS-II – 2008Lecture 13: Binary Search Trees25 Average Cost for Unsuccessful Search  For unsuccessful search the sum of the path lengths to “NULL” pointers is the decisive factor  Determine first the extended binary tree T’ to the tree T and then the external path length Ext of T’  For the example: Ext = PL(T) + 2n = 50  Assumption: Accesses to “NULL” pointers are uniformly distributed  Average number of comparisons of the unsuccessful search: C avg (n) = 2 Ext / (n+1)  In the example: C avg = 250 / 12 = 8.33 comparisons.

ICS-II – 2008Lecture 13: Binary Search Trees26 Maximum Average of Access Costs  The longest paths (and consequently the maximum costs) result in the case of binary search trees degenerated to lists.  Height h = L max  At each level there is only one node, i.e., n i = 1 for all i  Z avg,max = (1/n)  i = 0... n-1 ( i + 1 )1 = ½ (n + 1)  O(n)  For degenerated trees the search costs are linear to the number of nodes

ICS-II – 2008Lecture 13: Binary Search Trees27 Minimum Average Access Costs  Minimum access costs can be expected in a balanced tree structure  Optimal: complete tree, h=log 2 (n+1)  Z avg,min  O(log 2 n)  (Nearly) balanced tree: h=log 2 n+1  Z avg,min  O(log 2 n)  Using the formula for average path length (and some maths):  Z avg,min = log 2 n - 1

ICS-II – 2008Lecture 13: Binary Search Trees28 Average Access Costs  First observation: avoid degenerated trees!  Significant measure: (general) average access costs  If the average access costs are close to the minimum average of access costs, the tree structure is OK  Otherwise, the tree should be rearranged  More precisely the problem is: Determining the average access costs Z avg,n as average value over all n keys and all n! search trees  Assumption: uniformly distributed access probability  Next lecture: Balanced BSTs  AVL Trees!