Introduction to Computer Science 2 Lecture 7: Extended binary trees

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Binary.
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Optimal Merging Of Runs
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Greedy Algorithms Huffman Coding
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Balanced.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
MA/CSSE 473 Day 30 Optimal BSTs. MA/CSSE 473 Day 30 Student Questions Optimal Linked Lists Expected Lookup time in a Binary Tree Optimal Binary Tree (intro)
MA/CSSE 473 Days Optimal linked lists Optimal BSTs.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Chapter 5 : Trees.
The Greedy Method and Text Compression
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Introduction to Computer Science 2 Lecture 7: Extended binary trees Prof. Neeraj Suri Brahim Ayari

In advance: Search in binary trees Binary trees can be considered as decision trees. Each node represent a decision, the edges the different possibilities. In such a tree search means to go from the root to a leaf. A < 2 TRUE FALSE B < 5 C > 7 FALSE TRUE TRUE FALSE X X2 X3 3X

Extended binary trees Replace NULL-pointers with special (external) nodes. A binary tree, to which external nodes are added, is called extended binary tree. The data can be stored either in the internal or the external nodes. The length of the path to the node illustrates the cost of the search.

External and internal path length The cost of the search in extended binary trees depend on the following parameters: External path length = The sum over all path lengths from the root to the external nodes Si (1  i  n+1): Extn = i = 1 ... n+1 depth( Si ) Internal path length = The sum over all path lengths to the internal nodes Ki ( 1  i  n ): Intn = i = 1 ... n depth( Ki ) Extn = Intn + 2n (Proof by induction) Extended binary trees with a minimal external path length have a minimal internal path length too.

Example n = 7 External path length Internal path length Extn = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25 Internal path length Intn = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11 25 = Extn = Intn + 2n = 11 + 14 = 25 n = 7 1 1 2 2 2 2 3 3 3 3 3 3 4 4

Minimal and maximal length For a given n, a balanced tree has the minimal internal path length. Example: Within a complete tree with height h, the internal path length is (for n = 2h -1): Intn = i = 1 ... h i • 2i Internal path length becomes maximum if the tree degenerates to a linear list: Intn = i = 1 ... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 16•4 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135

Weighted binary trees Often weights qi are assigned to the external nodes ( 1  i  n+1 ). The weighted external path length is defined as Extw = i = 1 ... n+1 depth( Si )  qi Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more. The determination of the minimal external path length is an important practical problem... 8 3 15 25 3 8 15 25 Extw = 88 (less than 102 although linear list) Extw = 102

Application example: optimal codes To convert a text file efficiently to bit strings, there are two alternatives: Fixed length coding: each character has the same number of bits (e.g., ASCII) Variable length coding: some characters are represented using less bits than the others Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: A = 001, B = 010, C = 011, D = 100 Message: ABBAABCDADA is converted to 001010010001001010011100001100001 (length 33 bits) Using a 2-bit code the same message can be coded only with 22 bits. For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.

Application example: optimal codes (2) Idea: More frequently used characters are coded using less bits. Message: ABBAABCDADA Coding: 01010001011111001100 Length: 20 Bit! Variable length coding can reduce the memory space needed for storing the file. How can this special coding be found and why is the decoding unique? Character A B C D Frequency 5 3 1 2 Coding 10 111 110

Application example: optimal codes (3) Representation of the frequencies and coding as a weighted binary tree. First of all decoding: Given a bit string: Use the successive bits, in order to traverse the tree starting from the root. If you arrive to an external node, use the character stored there. 1 Example: 010100010111... A 5 1. Bit = 0: external node, A 2. Bit = 1, from the root to the right 3. Bit 0, links, external node, B 4. Bit = 1, from the root to the right 5. Bit 1, right ... 1 B 3 1 D 2 1 C

Correctness condition Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character. If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node). If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code. How is a tree with minimal external path length generated?

Huffman Code Idea: Characters are weighted and sorted according to the frequency This works as well independently from the text, e.g., in English (characters with relative weights): A binary tree with minimal external path length is constructed as follows: Each character is represented with an appropriate tree with its corresponding weight (only one external node). The two trees having respectively the smallest weight are merged to a new tree. The root of the new tree is marked with the sum of the weights of the original roots. Continue until only one tree remains. E 1231 T 959 A 805 O 794 N 719 I 718 S 659 R 603 H 514 L 403 D 365 C 320 U 310 P 229 F 228 M 225 W 203 Y 188 B 162 G 161 V 93 K 52 Q 20 X J 10 Z 9

Example 1: Huffman Step 1: (4, 5, 9, 10, 29) Step 2: (9, 9, 10, 29) Alphabet and frequency: E T N I S 29 10 9 5 4 Step 1: (4, 5, 9, 10, 29) new weight: 9 4+5 1 4 5 9+9 1 Step 2: (9, 9, 10, 29) new weight: 18 9 9 1 4 5

Example 1: Huffman (2) Step 4: (28, 29) finished! new weight: 28 10+18 1 10 18 57 1 1 9 9 28 29 1 1 4 5 10 18 1 9 9 Step 4: (28, 29) finished! 1 4 5

Resulting tree Coding: Extw = 112 Using this coding, the code e.g., for: TENNIS = 00101101101010100 SET = 0100100 NET = 011100 Decoding as described before. 57 1 Character Code Weight E 1 29 T 00 10 N 011 9 I 0101 5 S 0100 4 28 E 1 T 18 1 9 N 1 S I

Some remarks The resulting tree is not regular. Regular trees are not always optimal. Example: the best nearly complete tree has Extw = 123 For the message ABBAABCDADA 20 bits is optimal (see previous slides) 9 10 29 4 5

Example 2: Huffman Average number of bits without Huffman: 3 (because 23 = 8) Average number of bits using Huffman code: There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman) Z p (%) Code A 25 00 B 4 1110 C 13 100 D 7 110 E 35 01 F 11 101 G 2 11110 H 3 11111

Analysis /* Algorithm Huffmann */ for (int i = 1; i  n-1; i++) { p1 = smallest element in list L remove p1 from L p2 = smallest element in L remove p2 from L create node p add p1 und p2 as left and right subtrees to p weight p = weight p1 + weight p2 insert p into L } Run time behavior depends in particular on the implementation of the list Time required to find the node with the smallest weight Time required to insert a new node “Naive” implementations give O(n2), “smarter” result in O(n log2n)

Optimality Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root. Theorem: A Huffman tree is an extended binary tree with minimal external path length Extw. Proof outline (per induction over n, the number of the characters in the alphabet): The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Extw”. Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.

Optimality (2) V T1 T2 Proof: n = 2: Only two characters with weights q1 and q2 result in a tree with Extw = q1 + q2. This is minimal, because there are no other trees. Induction hypothesis: For all i  k, A(i) is true. To prove: A(k+1) is true. V T1 T2

Optimality (3) Proof: Consider a Huffman tree T with k+1 nodes. This tree has a root V and two subtrees T1 und T2, which have respectively the weights q1 and q2. Considering the construction method we can deduce, that For the weights qi of all internal nodes ni of T1 and T2: qi  min(q1, q2). That’s why: for these weights qi: q1 + q2 > qi. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight. Replacing nodes within T1 and T2 will not make sense, because T1 and T2 are already optimal (both are trees with k nodes or less and the induction hypothesis hold for them). So T is an optimal tree with k+1 nodes. q1 + q2 V T1 q1 T2 q2

Huffman Code: Applications Fax machine

Huffman: Other applications ZIP-Coding (at least similar technique) In principle: most of coding techniques with data reduction (lossless compression) NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …