Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced.

Similar presentations


Presentation on theme: "© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced."— Presentation transcript:

1 © Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees Prof. Neeraj Suri Brahim Ayari

2 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees2 Height of AVL Trees  AVL trees are defined by the height difference of subtrees  Original goal: the tree should be as “balanced” as possible  How balanced is an AVL tree?  The answer is given by the theorem of height of an AVL tree: Theorem: For the height h(T) of an AVL tree with n nodes holds:  log 2 n + 1  h(T)  1.44 log 2 ( n+1 )

3 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees3 Fibonacci Trees  The lower bound  log 2 n  + 1  h(T) comes from the minimal height of a balanced binary tree (already shown)  For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees  Fibonacci numbers: F 0 = 0, F 1 = 1, F n = F n-1 + F n-2 Definition: Fibonacci Trees are constructed as follows:  The empty tree T 0 is a Fibonacci tree (height 0)  The tree T 1, that contains only one node is a Fibonacci tree of height 1  If T h-1 and T h-2 are Fibonacci trees of heights h-1 and h-2, and x a node, then T h = (T h-1, x, T h-2 ) is a Fibonacci tree of height h  No other trees are Fibonacci trees -> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !

4 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees4 Number of nodes n 0 = 0, F 0 = 0 n 1 = 1, F 1 = 1 n 2 = 2, F 2 = 1 n 3 = 4, F 3 = 2 Fibonacci Trees T 0 : empty tree T 1 : one node T 2 : (T 1, x, T 0 ) x T 3 : (T 2, x, T 1 ) x

5 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees5 Number of nodes n 4 = 7, F 4 = 3 n 5 = 12, F 5 =5 Fibonacci Trees T 4 : (T 3, x, T 2 ) T 5 : (T 4, x, T 3 ) x T3T3 T2T2 T4T4 T3T3 T 6, T 7, etc. analogue

6 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees6 Fibonacci and AVL Trees To prove: Every Fibonacci tree is an AVL tree Proof (by induction over h):  Note: T h is always a tree of height h  T 0 and T 1 are AVL trees  If T h-1 and T h-2 are AVL trees, build according to the rules T h = (T h-1, x, T h-2 ).  As T h-1 and T h-2 are AVL trees, we must now only check the balancing factor of the root  BF(T h ) = | h(T h-1 ) - h(T h-2 ) | = | (h - 1) - (h - 2) | = 1

7 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees7 Fibonacci and AVL Trees  Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes The construction gives AVL trees with maximal height  One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged) Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes  Note: the number of nodes n h in T h is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e., n h = F h+2 - 1 (for n  0)

8 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees8 Fibonacci and AVL Trees  The following inequality holds for Fibonacci numbers: F h   h-2 for h  2 and  = ½ ( 1 + 5 )  n is the number of nodes in an AVL tree of height h. As T h contains a minimal number of nodes: n  n h  Insert n h = F h+2 - 1: n  n h = F h+2 - 1   h - 1 thus n + 1   h  Number of nodes grows exponentially with the height  Reversely: h  log  (n + 1) = (1 / log 2 ) log 2 (n+1) = 1.44... log 2 (n+1)  Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree

9 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees9 Cost Analysis of AVL Trees  h  clog 2 (n+1) means: the height of an AVL tree is limited by O(log 2 n)  Cost for insertion is in O( log 2 n )  One should only consider the path from the root to the insertion point  Rotations have constant costs  Cost for deletion is in O( log 2 n )  For every node on the path from the root to the deleted node results in maximally one rotation  AVL trees are worst case efficient implementations of binary search trees  Natural trees need (n) steps in worst case  Calculating the average height is still an open problem  Empirical results give h = c + log 2 n for c  0,2

10 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees10 Weight Balanced Binary Search Trees  Treat the “weight difference” of two subtrees as a measure of balancing  Weight = number of nodes in subtree  The properties are very similar to height balanced binary trees  Let T be a binary search tree, T L the left subtree and n(X) the number of nodes in a tree X Definition: the value (T) = (n(T L ) + 1) / (n(T) + 1) is the root balance of T Definition: a tree T is -balanced, if for every subtree T’ holds that:   (T’)  1 - 

11 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees11 Condition   (T’)  1 -   The set of all -balanced binary trees are called BB() („bounded balance“).  The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree   1 - ’(T’)  1 -  where ’ analogue to  is defined on the right subtree  Parameter  defines the “distance” from a complete tree:   = ½only complete trees allowed   < ½relaxed condition   = 0no structural conditions   > ½ makes no sense to consider

12 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees12 Example  (T) = (n(T L ) + 1) / (n(T) + 1)  Choose  = 0.3, then holds for every subtree  = 0.3    1 -  = 0.7  Tree is in BB() for  = 0.3 Subtree with root  Mars3/10 = 0.3 Jupiter2/3 = 0.67 Pluto3/7 = 0.43 Mercury1/3 = 0.33 Uranus2/4 = 0.5 Pluto Mars Jupiter EarthMercuryUranus VenusSaturnNeptune

13 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees13 Notes  Already noted:  = ½ holds for complete trees  Root balance < ½ means: there are fewer nodes in the left subtree   limits the root balance symmetrically from both sides  Left tree is complete: root balance goes towards 1 with increasing number of nodes  Only  = 0 allows all “degenerations”  Not every tree (with n nodes) can be transformed into a BB() tree for any   There is at least one tree in BB() when 0,25    1 - ½ 2  0,292

14 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees14 Height of Weight Balanced Trees  Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step  Consider the path p = v 1, v 2,..., v h  For the right and left subtree T L and T R of a tree T holds (due to the BB() condition) n(T L ) + 1  ( 1 -  ) (n(T) + 1) n(T R ) + 1  ( 1 -  ) (n(T) + 1)  Traversal of path p: n(v 2 ) + 1  ( 1 -  ) (n(v 1 ) + 1) n(v 3 ) + 1  ( 1 -  ) (n(v 2 ) + 1)  n(v h ) + 1  ( 1 -  ) (n(v h-1 ) + 1)

15 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees15 Height of Weight Balanced Trees  As v 1 is the root and v h a leaf, holds: n(T) + 1 = n(v 1 ) + 1 and n(v h ) + 1 = 2  Insertion in the total inequality : 2 = n(v h ) + 1  (1 - ) h-1 (n(v 1 ) + 1) = (1 - ) h-1 (n(T) + 1)  Apply logarithms on both sides: 1  (h - 1)log 2 (1 - ) + log 2 (n(T) + 1)  Thus (note: log 2 (1 - ) 0): h - 1  log 2 (n(T) + 1) / c  O(log 2 n)  Height of the tree is logarithmic in the number of nodes

16 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees16 Operations on Weight Balanced Binary Trees  Search is the same as for AVL trees  Cost is logarithmic  For insertion/deletion the root balance must be updated along the path from the root to the corresponding position  By violation of the criterion: rotations as for AVL trees  Open issues:  Are rotations appropriate measures for restructuring BB() trees?  How does one effectively calculate the root balance?  The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log 2 n)

17 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees17 Position Search in Balanced Binary Search Tree  Comparison: Tree implementations vs. linked lists  Balanced trees allows (almost) all operations in O(log 2 n)  Linked lists need for search/insertion/deletion in O(n)!  For sequential traversal both perform in O(n)  Should sorted data always be stored in trees?!  One should not underestimate the implementation costs  “Last” operation where lists “win” is for positional search (the p th element)  Positional search: Find the k th element in a list  For trees the “list” is an inorder traversal

18 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees18 The Problem  For lists:  Travers k elements in O(k)  For trees:  One does not “know” whether to go left or right, and one does not know anything about the number of nodes in the subtrees  Worst case all nodes must be visited: O(n)!  That can be improved! ?...

19 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees19 Rank of a Node Definition: The rank of a node is the number of nodes in the left subtree plus 1 Rank = position of node x in the tree where x is root class BinarySearchTree { int K;/* Key */ Info info; /* info */ int balance;/* BF, for AVL trees: -1, 0, +1 */ int rank; BinarySearchTree L, R; /* constructor und methods... */ public BinarySearchTree posFind(int pos) {... } }

20 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees20 Algorithm  Pseudo code:  Start in the root  If pos < rank: search in the left subtree  If pos > rank: subtract the rank from the position and search in the right subtree  Search stops when pos = rank  Correctness:  The rank of a node is always its position in the subtree where it is the root  Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks

21 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees21 Example 3 Prague Bonn Bern 2 Lima 5 Sofia 3 2 Paris 2 Cairo 1 Athens 1 Oslo 1 Rome 1 Tokyo 1 pos = 4 -> Cairo pos = 9 -> Rome pos=1 pos=2 pos=3 pos=4 pos=5 pos=6 pos=7 pos=8 pos=9 pos=10 pos=11

22 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees22 Java Method public BinarySearchTree findPos( int pos ) { BinarySearchTree root = this; while ( ( root  null ) && ( pos  root.rank )) { if ( pos < root.rank ) { root = root.L; } else { pos = pos - root.rank; root = root.R; } return root; } Complexity in balanced tree O(log 2 n)

23 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees23 Summary: Balanced Search Trees OperationSequential listLinked listBal. tree with degree SearchO(log 2 n) (binary search) O(n)O(log 2 n) Positional search (k th element) O(1)O(k)O(log 2 n) InsertionO(log 2 n) + O(n)O(n) O(1) known pos. O(log 2 n) DeletionO(log 2 n) + O(n)O(n) O(1) known pos. doubly linked O(log 2 n) Deletion k th element O(n-k)O(k)O(log 2 n) Sequential traversal O(n)

24 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees24 Extended Binary Trees

25 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees25 Extended binary trees  Replace NULL-pointers with special (external) nodes.  A binary tree, to which external nodes are added, is called extended binary tree.  The data can be stored either in the internal or the external nodes.  The length of the path to the node illustrates the cost of the search.

26 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees26 External and internal path length  The cost of the search in extended binary trees depend on the following parameters:  External path length = The sum over all path lengths from the root to the external nodes S i (1  i  n+1): Ext n =  i = 1... n+1 depth( S i )  Internal path length = The sum over all path lengths to the internal nodes K i ( 1  i  n ): Int n =  i = 1... n depth( K i )  Ext n = Int n + 2n(Proof by induction)  Extended binary trees with a minimal external path length have a minimal internal path length too.

27 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees27 Example  External path length Ext n = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25  Internal path length Int n = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11  25 = Ext n = Int n + 2n = 11 + 14 = 25 n = 7 0 4 11 222 3 4 3 3 3 3 3 2

28 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees28 Minimal and maximal length  For a given n, a balanced tree has the minimal internal path length.  Example: Within a complete tree with height h, the internal path length is (for n = 2 h -1): Int n =  i = 1... h i 2i  Internal path length becomes maximum if the tree degenerates to a linear list: Int n =  i = 1... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 164 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135

29 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees29 Weighted binary trees  Often weights q i are assigned to the external nodes ( 1  i  n+1 ).  The weighted external path length is defined as Ext w =  i = 1... n+1 depth( S i )  q i  Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more.  The determination of the minimal external path length is an important practical problem... Ext w = 102 Ext w = 88 (less than 102 although linear list) 3 8 15 25 8 3 15 25

30 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees30 Application example: optimal codes  To convert a text file efficiently to bit strings, there are two alternatives:  Fixed length coding: each character has the same number of bits (e.g., ASCII)  Variable length coding: some characters are represented using less bits than the others  Example for coding with fixed length: 3-bit code for alphabet A, B, C, D:  A = 001, B = 010, C = 011, D = 100  Message: ABBAABCDADA is converted to  001010010001001010011100001100001 (length 33 bits)  Using a 2-bit code the same message can be coded only with 22 bits.  For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.

31 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees31 Application example: optimal codes (2)  Idea: More frequently used characters are coded using less bits.  Message: ABBAABCDADA  Coding: 01010001011111001100  Length: 20 Bit!  Variable length coding can reduce the memory space needed for storing the file.  How can this special coding be found and why is the decoding unique? CharacterABCD Frequency5312 Coding010111110

32 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees32 Application example: optimal codes (3)  Representation of the frequencies and coding as a weighted binary tree.  First of all decoding: Given a bit string:  Use the successive bits, in order to traverse the tree starting from the root.  If you arrive to an external node, use the character stored there. Example: 010100010111... 1. Bit = 0: external node, A 2. Bit = 1, from the root to the right 3. Bit 0, links, external node, B 4. Bit = 1, from the root to the right 5. Bit 1, right... 3 5 2 1 0 0 0 1 1 1 A B DC

33 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees33 Correctness condition  Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character.  If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node).  If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code.  How is a tree with minimal external path length generated?

34 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees34 Huffman Code  Idea: Characters are weighted and sorted according to the frequency  This works as well independently from the text, e.g., in English (characters with relative weights):  A binary tree with minimal external path length is constructed as follows:  Each character is represented with an appropriate tree with its corresponding weight (only one external node).  The two trees having respectively the smallest weight are merged to a new tree.  The root of the new tree is marked with the sum of the weights of the original roots.  Continue until only one tree remains. E1231T959A805O794 N719I718S659R603 H514L403D365C320 U310P229F228M225 W203Y188B162G161 V93K52Q20X J10Z9

35 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees35 Example 1: Huffman  Alphabet and frequency: ETNIS 2910954 Step 1: (4, 5, 9, 10, 29) new weight: 9 Step 2: (9, 9, 10, 29) new weight: 18 4+ 5 4 5 01 9 4 5 01 9+ 9 0 9 1

36 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees36 Example 1: Huffman (2)  Step 3: (18, 10, 29)  (10, 18, 29)  new weight: 28 Step 4: (28, 29) finished! 9 4 5 01 1818 0 9 1 10+1 8 10 0 1 9 4 5 01 1818 0 9 1 2828 0 1 29 5757 01

37 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees37 Resulting tree  Coding:  Ext w = 112  Using this coding, the code e.g., for:  TENNIS = 00101101101010100  SET = 0100100  NET = 011100  Decoding as described before. 9 S I 01 1818 0 N 1 2828 T 0 1 E 5757 01 CharacterCodeWeight E129 T0010 N0119 I01015 S01004

38 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees38 Some remarks  The resulting tree is not regular.  Regular trees are not always optimal.  Example: the best nearly complete tree has Ext w = 123  For the message ABBAABCDADA 20 bits is optimal (see previous slides) 4 5 10 29 9

39 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees39 Example 2: Huffman  Average number of bits without Huffman: 3 (because 2 3 = 8)  Average number of bits using Huffman code:  There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman) Zp (%)Code A2500 B41110 C13100 D7110 E3501 F11101 G211110 H311111

40 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees40 Analysis /* Algorithm Huffmann */ for (int i = 1; i  n-1; i++) { p 1 = smallest element in list L remove p1 from L p 2 = smallest element in L remove p 2 from L create node p add p 1 und p 2 as left and right subtrees to p weight p = weight p 1 + weight p 2 insert p into L }  Run time behavior depends in particular on the implementation of the list  Time required to find the node with the smallest weight  Time required to insert a new node  “Naive” implementations give O(n 2 ), “smarter” result in O(n log 2 n)

41 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees41 Optimality  Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root.  Theorem: A Huffman tree is an extended binary tree with minimal external path length Ext w.  Proof outline (per induction over n, the number of the characters in the alphabet):  The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Ext w ”.  Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.

42 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees42 Optimality (2)  Proof:  n = 2: Only two characters with weights q1 and q2 result in a tree with Ext w = q1 + q2. This is minimal, because there are no other trees.  Induction hypothesis: For all i  n, A(i) is true.  To prove: A(n+1) is true. V T1T1 T2T2

43 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees43 Optimality (3)  Proof:  Consider a Huffman tree T with n+1 nodes. This tree has a root V and two subtrees T 1 und T 2, which have respectively the weights q 1 and q 2.  Considering the construction method we can deduce, that For the weights q i of all internal nodes n i of T 1 and T 2 : q i  min(q 1, q 2 ).  That’s why: for these weights q i : q 1 + q 2 > q i. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight.  Replacing nodes within T 1 and T 2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them).  So T is an optimal tree with n+1 nodes. V T1T1 T2T2 q1q1 q2q2 q 1 + q2

44 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees44 Huffman Code: Applications  Fax machine

45 ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees45 Huffman: Other applications  ZIP-Coding (at least similar technique)  In principle: most of coding techniques with data reduction (lossless compression)  NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …


Download ppt "© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced."

Similar presentations


Ads by Google