© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Introduction to Computer Science 2 Balanced.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
1 AVL-Trees (Adelson-Velskii & Landis, 1962) In normal search trees, the complexity of find, insert and delete operations in search trees is in the worst.
Chapter 4: Trees Part II - AVL Tree
AVL Trees COL 106 Amit Kumar Shweta Agrawal Slide Courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
1 Theory I Algorithm Design and Analysis (2 - Trees: traversal and analysis of standard search trees) Prof. Th. Ottmann.
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Binary.
CS Data Structures Chapter 10 Search Structures (Selected Topics)
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Tirgul 5 AVL trees.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
1 Theory I Algorithm Design and Analysis (3 - Balanced trees, AVL trees) Prof. Th. Ottmann.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
BST Data Structure A BST node contains: A BST contains
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Search.
Binary Search Trees1 ADT for Map: Map stores elements (entries) so that they can be located quickly using keys. Each element (entry) is a key-value pair.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Tirgul 5 Comparators AVL trees. Comparators You already know interface Comparable which is used to compare objects. By implementing the interface, one.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Tirgul 5 This tirgul is about AVL trees. You will implement this in prog-ex2, so pay attention... BTW - prog-ex2 is on the web. Start working on it!
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
1 AVL-Trees: Motivation Recall our discussion on BSTs –The height of a BST depends on the order of insertion E.g., Insert keys 1, 2, 3, 4, 5, 6, 7 into.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
AVL Trees Amanuel Lemma CS252 Algoithms Dec. 14, 2000.
CS Data Structures Chapter 10 Search Structures.
COMP20010: Algorithms and Imperative Programming Lecture 4 Ordered Dictionaries and Binary Search Trees AVL Trees.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Search Trees. Binary Search Tree (§10.1) A binary search tree is a binary tree storing keys (or key-element pairs) at its internal nodes and satisfying.
1 Trees 4: AVL Trees Section 4.4. Motivation When building a binary search tree, what type of trees would we like? Example: 3, 5, 8, 20, 18, 13, 22 2.
Outline Binary Trees Binary Search Tree Treaps. Binary Trees The empty set (null) is a binary tree A single node is a binary tree A node has a left child.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Chapter 2: Basic Data Structures. Spring 2003CS 3152 Basic Data Structures Stacks Queues Vectors, Linked Lists Trees (Including Balanced Trees) Priority.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
AVL Trees 1. 2 Outline Background Define balance Maintaining balance within a tree –AVL trees –Difference of heights –Rotations to maintain balance.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
Binary Search Trees1 Chapter 3, Sections 1 and 2: Binary Search Trees AVL Trees   
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
MA/CSSE 473 Day 30 Optimal BSTs. MA/CSSE 473 Day 30 Student Questions Optimal Linked Lists Expected Lookup time in a Binary Tree Optimal Binary Tree (intro)
MA/CSSE 473 Days Optimal linked lists Optimal BSTs.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
Binary Tree and General Tree
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Multi-Way Search Trees
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Binary Search Trees.
CSC 143 Binary Search Trees.
Presentation transcript:

© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees Prof. Neeraj Suri Brahim Ayari

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees2 Height of AVL Trees  AVL trees are defined by the height difference of subtrees  Original goal: the tree should be as “balanced” as possible  How balanced is an AVL tree?  The answer is given by the theorem of height of an AVL tree: Theorem: For the height h(T) of an AVL tree with n nodes holds:  log 2 n + 1  h(T)  1.44 log 2 ( n+1 )

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees3 Fibonacci Trees  The lower bound  log 2 n  + 1  h(T) comes from the minimal height of a balanced binary tree (already shown)  For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees  Fibonacci numbers: F 0 = 0, F 1 = 1, F n = F n-1 + F n-2 Definition: Fibonacci Trees are constructed as follows:  The empty tree T 0 is a Fibonacci tree (height 0)  The tree T 1, that contains only one node is a Fibonacci tree of height 1  If T h-1 and T h-2 are Fibonacci trees of heights h-1 and h-2, and x a node, then T h = (T h-1, x, T h-2 ) is a Fibonacci tree of height h  No other trees are Fibonacci trees -> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees4 Number of nodes n 0 = 0, F 0 = 0 n 1 = 1, F 1 = 1 n 2 = 2, F 2 = 1 n 3 = 4, F 3 = 2 Fibonacci Trees T 0 : empty tree T 1 : one node T 2 : (T 1, x, T 0 ) x T 3 : (T 2, x, T 1 ) x

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees5 Number of nodes n 4 = 7, F 4 = 3 n 5 = 12, F 5 =5 Fibonacci Trees T 4 : (T 3, x, T 2 ) T 5 : (T 4, x, T 3 ) x T3T3 T2T2 T4T4 T3T3 T 6, T 7, etc. analogue

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees6 Fibonacci and AVL Trees To prove: Every Fibonacci tree is an AVL tree Proof (by induction over h):  Note: T h is always a tree of height h  T 0 and T 1 are AVL trees  If T h-1 and T h-2 are AVL trees, build according to the rules T h = (T h-1, x, T h-2 ).  As T h-1 and T h-2 are AVL trees, we must now only check the balancing factor of the root  BF(T h ) = | h(T h-1 ) - h(T h-2 ) | = | (h - 1) - (h - 2) | = 1

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees7 Fibonacci and AVL Trees  Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes The construction gives AVL trees with maximal height  One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged) Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes  Note: the number of nodes n h in T h is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e., n h = F h (for n  0)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees8 Fibonacci and AVL Trees  The following inequality holds for Fibonacci numbers: F h   h-2 for h  2 and  = ½ ( 1 + 5 )  n is the number of nodes in an AVL tree of height h. As T h contains a minimal number of nodes: n  n h  Insert n h = F h+2 - 1: n  n h = F h   h - 1 thus n + 1   h  Number of nodes grows exponentially with the height  Reversely: h  log  (n + 1) = (1 / log 2 ) log 2 (n+1) = log 2 (n+1)  Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees9 Cost Analysis of AVL Trees  h  clog 2 (n+1) means: the height of an AVL tree is limited by O(log 2 n)  Cost for insertion is in O( log 2 n )  One should only consider the path from the root to the insertion point  Rotations have constant costs  Cost for deletion is in O( log 2 n )  For every node on the path from the root to the deleted node results in maximally one rotation  AVL trees are worst case efficient implementations of binary search trees  Natural trees need (n) steps in worst case  Calculating the average height is still an open problem  Empirical results give h = c + log 2 n for c  0,2

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees10 Weight Balanced Binary Search Trees  Treat the “weight difference” of two subtrees as a measure of balancing  Weight = number of nodes in subtree  The properties are very similar to height balanced binary trees  Let T be a binary search tree, T L the left subtree and n(X) the number of nodes in a tree X Definition: the value (T) = (n(T L ) + 1) / (n(T) + 1) is the root balance of T Definition: a tree T is -balanced, if for every subtree T’ holds that:   (T’)  1 - 

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees11 Condition   (T’)  1 -   The set of all -balanced binary trees are called BB() („bounded balance“).  The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree   1 - ’(T’)  1 -  where ’ analogue to  is defined on the right subtree  Parameter  defines the “distance” from a complete tree:   = ½only complete trees allowed   < ½relaxed condition   = 0no structural conditions   > ½ makes no sense to consider

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees12 Example  (T) = (n(T L ) + 1) / (n(T) + 1)  Choose  = 0.3, then holds for every subtree  = 0.3    1 -  = 0.7  Tree is in BB() for  = 0.3 Subtree with root  Mars3/10 = 0.3 Jupiter2/3 = 0.67 Pluto3/7 = 0.43 Mercury1/3 = 0.33 Uranus2/4 = 0.5 Pluto Mars Jupiter EarthMercuryUranus VenusSaturnNeptune

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees13 Notes  Already noted:  = ½ holds for complete trees  Root balance < ½ means: there are fewer nodes in the left subtree   limits the root balance symmetrically from both sides  Left tree is complete: root balance goes towards 1 with increasing number of nodes  Only  = 0 allows all “degenerations”  Not every tree (with n nodes) can be transformed into a BB() tree for any   There is at least one tree in BB() when 0,25    1 - ½ 2  0,292

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees14 Height of Weight Balanced Trees  Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step  Consider the path p = v 1, v 2,..., v h  For the right and left subtree T L and T R of a tree T holds (due to the BB() condition) n(T L ) + 1  ( 1 -  ) (n(T) + 1) n(T R ) + 1  ( 1 -  ) (n(T) + 1)  Traversal of path p: n(v 2 ) + 1  ( 1 -  ) (n(v 1 ) + 1) n(v 3 ) + 1  ( 1 -  ) (n(v 2 ) + 1)  n(v h ) + 1  ( 1 -  ) (n(v h-1 ) + 1)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees15 Height of Weight Balanced Trees  As v 1 is the root and v h a leaf, holds: n(T) + 1 = n(v 1 ) + 1 and n(v h ) + 1 = 2  Insertion in the total inequality : 2 = n(v h ) + 1  (1 - ) h-1 (n(v 1 ) + 1) = (1 - ) h-1 (n(T) + 1)  Apply logarithms on both sides: 1  (h - 1)log 2 (1 - ) + log 2 (n(T) + 1)  Thus (note: log 2 (1 - ) 0): h - 1  log 2 (n(T) + 1) / c  O(log 2 n)  Height of the tree is logarithmic in the number of nodes

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees16 Operations on Weight Balanced Binary Trees  Search is the same as for AVL trees  Cost is logarithmic  For insertion/deletion the root balance must be updated along the path from the root to the corresponding position  By violation of the criterion: rotations as for AVL trees  Open issues:  Are rotations appropriate measures for restructuring BB() trees?  How does one effectively calculate the root balance?  The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log 2 n)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees17 Position Search in Balanced Binary Search Tree  Comparison: Tree implementations vs. linked lists  Balanced trees allows (almost) all operations in O(log 2 n)  Linked lists need for search/insertion/deletion in O(n)!  For sequential traversal both perform in O(n)  Should sorted data always be stored in trees?!  One should not underestimate the implementation costs  “Last” operation where lists “win” is for positional search (the p th element)  Positional search: Find the k th element in a list  For trees the “list” is an inorder traversal

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees18 The Problem  For lists:  Travers k elements in O(k)  For trees:  One does not “know” whether to go left or right, and one does not know anything about the number of nodes in the subtrees  Worst case all nodes must be visited: O(n)!  That can be improved! ?...

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees19 Rank of a Node Definition: The rank of a node is the number of nodes in the left subtree plus 1 Rank = position of node x in the tree where x is root class BinarySearchTree { int K;/* Key */ Info info; /* info */ int balance;/* BF, for AVL trees: -1, 0, +1 */ int rank; BinarySearchTree L, R; /* constructor und methods... */ public BinarySearchTree posFind(int pos) {... } }

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees20 Algorithm  Pseudo code:  Start in the root  If pos < rank: search in the left subtree  If pos > rank: subtract the rank from the position and search in the right subtree  Search stops when pos = rank  Correctness:  The rank of a node is always its position in the subtree where it is the root  Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees21 Example 3 Prague Bonn Bern 2 Lima 5 Sofia 3 2 Paris 2 Cairo 1 Athens 1 Oslo 1 Rome 1 Tokyo 1 pos = 4 -> Cairo pos = 9 -> Rome pos=1 pos=2 pos=3 pos=4 pos=5 pos=6 pos=7 pos=8 pos=9 pos=10 pos=11

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees22 Java Method public BinarySearchTree findPos( int pos ) { BinarySearchTree root = this; while ( ( root  null ) && ( pos  root.rank )) { if ( pos < root.rank ) { root = root.L; } else { pos = pos - root.rank; root = root.R; } return root; } Complexity in balanced tree O(log 2 n)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees23 Summary: Balanced Search Trees OperationSequential listLinked listBal. tree with degree SearchO(log 2 n) (binary search) O(n)O(log 2 n) Positional search (k th element) O(1)O(k)O(log 2 n) InsertionO(log 2 n) + O(n)O(n) O(1) known pos. O(log 2 n) DeletionO(log 2 n) + O(n)O(n) O(1) known pos. doubly linked O(log 2 n) Deletion k th element O(n-k)O(k)O(log 2 n) Sequential traversal O(n)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees24 Extended Binary Trees

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees25 Extended binary trees  Replace NULL-pointers with special (external) nodes.  A binary tree, to which external nodes are added, is called extended binary tree.  The data can be stored either in the internal or the external nodes.  The length of the path to the node illustrates the cost of the search.

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees26 External and internal path length  The cost of the search in extended binary trees depend on the following parameters:  External path length = The sum over all path lengths from the root to the external nodes S i (1  i  n+1): Ext n =  i = 1... n+1 depth( S i )  Internal path length = The sum over all path lengths to the internal nodes K i ( 1  i  n ): Int n =  i = 1... n depth( K i )  Ext n = Int n + 2n(Proof by induction)  Extended binary trees with a minimal external path length have a minimal internal path length too.

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees27 Example  External path length Ext n = = 25  Internal path length Int n = = 11  25 = Ext n = Int n + 2n = = 25 n =

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees28 Minimal and maximal length  For a given n, a balanced tree has the minimal internal path length.  Example: Within a complete tree with height h, the internal path length is (for n = 2 h -1): Int n =  i = 1... h i 2i  Internal path length becomes maximum if the tree degenerates to a linear list: Int n =  i = 1... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 164 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = = 135

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees29 Weighted binary trees  Often weights q i are assigned to the external nodes ( 1  i  n+1 ).  The weighted external path length is defined as Ext w =  i = 1... n+1 depth( S i )  q i  Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more.  The determination of the minimal external path length is an important practical problem... Ext w = 102 Ext w = 88 (less than 102 although linear list)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees30 Application example: optimal codes  To convert a text file efficiently to bit strings, there are two alternatives:  Fixed length coding: each character has the same number of bits (e.g., ASCII)  Variable length coding: some characters are represented using less bits than the others  Example for coding with fixed length: 3-bit code for alphabet A, B, C, D:  A = 001, B = 010, C = 011, D = 100  Message: ABBAABCDADA is converted to  (length 33 bits)  Using a 2-bit code the same message can be coded only with 22 bits.  For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees31 Application example: optimal codes (2)  Idea: More frequently used characters are coded using less bits.  Message: ABBAABCDADA  Coding:  Length: 20 Bit!  Variable length coding can reduce the memory space needed for storing the file.  How can this special coding be found and why is the decoding unique? CharacterABCD Frequency5312 Coding

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees32 Application example: optimal codes (3)  Representation of the frequencies and coding as a weighted binary tree.  First of all decoding: Given a bit string:  Use the successive bits, in order to traverse the tree starting from the root.  If you arrive to an external node, use the character stored there. Example: Bit = 0: external node, A 2. Bit = 1, from the root to the right 3. Bit 0, links, external node, B 4. Bit = 1, from the root to the right 5. Bit 1, right A B DC

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees33 Correctness condition  Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character.  If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node).  If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code.  How is a tree with minimal external path length generated?

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees34 Huffman Code  Idea: Characters are weighted and sorted according to the frequency  This works as well independently from the text, e.g., in English (characters with relative weights):  A binary tree with minimal external path length is constructed as follows:  Each character is represented with an appropriate tree with its corresponding weight (only one external node).  The two trees having respectively the smallest weight are merged to a new tree.  The root of the new tree is marked with the sum of the weights of the original roots.  Continue until only one tree remains. E1231T959A805O794 N719I718S659R603 H514L403D365C320 U310P229F228M225 W203Y188B162G161 V93K52Q20X J10Z9

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees35 Example 1: Huffman  Alphabet and frequency: ETNIS Step 1: (4, 5, 9, 10, 29) new weight: 9 Step 2: (9, 9, 10, 29) new weight:

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees36 Example 1: Huffman (2)  Step 3: (18, 10, 29)  (10, 18, 29)  new weight: 28 Step 4: (28, 29) finished!

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees37 Resulting tree  Coding:  Ext w = 112  Using this coding, the code e.g., for:  TENNIS =  SET =  NET =  Decoding as described before. 9 S I N T 0 1 E CharacterCodeWeight E129 T0010 N0119 I01015 S01004

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees38 Some remarks  The resulting tree is not regular.  Regular trees are not always optimal.  Example: the best nearly complete tree has Ext w = 123  For the message ABBAABCDADA 20 bits is optimal (see previous slides)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees39 Example 2: Huffman  Average number of bits without Huffman: 3 (because 2 3 = 8)  Average number of bits using Huffman code:  There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman) Zp (%)Code A2500 B41110 C13100 D7110 E3501 F11101 G H311111

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees40 Analysis /* Algorithm Huffmann */ for (int i = 1; i  n-1; i++) { p 1 = smallest element in list L remove p1 from L p 2 = smallest element in L remove p 2 from L create node p add p 1 und p 2 as left and right subtrees to p weight p = weight p 1 + weight p 2 insert p into L }  Run time behavior depends in particular on the implementation of the list  Time required to find the node with the smallest weight  Time required to insert a new node  “Naive” implementations give O(n 2 ), “smarter” result in O(n log 2 n)

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees41 Optimality  Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root.  Theorem: A Huffman tree is an extended binary tree with minimal external path length Ext w.  Proof outline (per induction over n, the number of the characters in the alphabet):  The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Ext w ”.  Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees42 Optimality (2)  Proof:  n = 2: Only two characters with weights q1 and q2 result in a tree with Ext w = q1 + q2. This is minimal, because there are no other trees.  Induction hypothesis: For all i  n, A(i) is true.  To prove: A(n+1) is true. V T1T1 T2T2

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees43 Optimality (3)  Proof:  Consider a Huffman tree T with n+1 nodes. This tree has a root V and two subtrees T 1 und T 2, which have respectively the weights q 1 and q 2.  Considering the construction method we can deduce, that For the weights q i of all internal nodes n i of T 1 and T 2 : q i  min(q 1, q 2 ).  That’s why: for these weights q i : q 1 + q 2 > q i. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight.  Replacing nodes within T 1 and T 2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them).  So T is an optimal tree with n+1 nodes. V T1T1 T2T2 q1q1 q2q2 q 1 + q2

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees44 Huffman Code: Applications  Fax machine

ICS-II Balanced Binary Search Trees (2) & Extended Binary Trees45 Huffman: Other applications  ZIP-Coding (at least similar technique)  In principle: most of coding techniques with data reduction (lossless compression)  NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …