Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Similar presentations


Presentation on theme: "CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada."— Presentation transcript:

1 CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada

2 Lecture Overview Huffman Coding Huffman Coding Non-determinism of the algorithm Non-determinism of the algorithm Implementations: Implementations: Singly-linked List Singly-linked List Doubly-linked list Doubly-linked list Recursive top-down Recursive top-down Using heap Using heap Adaptive Huffman coding Adaptive Huffman coding

3 Huffman Coding Algorithm is used to assign a codework to each character in the text according to their frequencies. The codework is usually represented as a bitstring. Algorithm is used to assign a codework to each character in the text according to their frequencies. The codework is usually represented as a bitstring. Algorithm starts with the set of individual trees, consisting of a single node, sorted in the order of increasing character probabilities. Algorithm starts with the set of individual trees, consisting of a single node, sorted in the order of increasing character probabilities. Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of the parent node, combining their probabilities. Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of the parent node, combining their probabilities. In the end, 0 are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves (characters) of the tree is generated. In the end, 0 are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves (characters) of the tree is generated.

4 Non-determinism of the Huffman Coding

5

6 Huffman Algorithm Implementation – Linked List Implementation depends on the ways to represent the priority queue, which requires removing two smallest probabilities and inserting the new probability in the proper positions. Implementation depends on the ways to represent the priority queue, which requires removing two smallest probabilities and inserting the new probability in the proper positions. The first way to implement the priority queue is the singly linked list of references to trees, which resembles the algorithm presented in the previous slides. The first way to implement the priority queue is the singly linked list of references to trees, which resembles the algorithm presented in the previous slides. The tree with the smallest probability is replaced by the newly created tree. The tree with the smallest probability is replaced by the newly created tree. From the trees with the same probability, the first trees encountered are chosen. From the trees with the same probability, the first trees encountered are chosen.

7 Doubly Linked List All probability nodes are first ordered, the first two trees are always removed. All probability nodes are first ordered, the first two trees are always removed. The new tree is inserted at the end of the list in the sorted order. The new tree is inserted at the end of the list in the sorted order. A doubly-linked list of references to trees with immediate access to the beginning and to the end of this list is used. A doubly-linked list of references to trees with immediate access to the beginning and to the end of this list is used.

8 Doubly Linked-List implementation

9 Recursive Implementation Top-down approach for building a tree starting from the highest probability. The root probability is known if lower probabilities, in the root’s children, have been determined, the latter are known if the lower probabilities have been computed etc. Top-down approach for building a tree starting from the highest probability. The root probability is known if lower probabilities, in the root’s children, have been determined, the latter are known if the lower probabilities have been computed etc. Thus, the recursive algorithm can be used. Thus, the recursive algorithm can be used.

10 Implementation using Heap The min-heap of probabilities is built. The min-heap of probabilities is built. The highest probability is put in the root. The highest probability is put in the root. Next, the heap property is restored Next, the heap property is restored The smallest probability is removed and the root probability is set to the sum of two smallest probabilities. The smallest probability is removed and the root probability is set to the sum of two smallest probabilities. The processing is complete when there is only one node in the heap left. The processing is complete when there is only one node in the heap left.

11 Huffman implementation with a heap

12 Huffman Coding for pairs of characters

13 Devised by Robert Gallager and improved by Donald Knuth. Devised by Robert Gallager and improved by Donald Knuth. Algorithm is based on the sibling property: if each node has a sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non- increasing frequency counters, it is a Huffman tree. Algorithm is based on the sibling property: if each node has a sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non- increasing frequency counters, it is a Huffman tree. In adaptive Huffman coding, the tree includes a counter for each symbol updated every time corresponding symbol is being coded. In adaptive Huffman coding, the tree includes a counter for each symbol updated every time corresponding symbol is being coded. Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored. Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored. Adaptive Huffman Coding

14

15

16 Sources Web links: Web links: MP3 Converter: MP3 Converter: http://www.mp3-onverter.com/mp3codec/huffman_coding.htm http://www.mp3-onverter.com/mp3codec/huffman_coding.htmhttp://www.mp3-onverter.com/mp3codec/huffman_coding.htm Practical Huffman Coding: http://www.compressconsult.com/huffman/ Practical Huffman Coding: http://www.compressconsult.com/huffman/ Drozdek Textbook - Chapter 11 Drozdek Textbook - Chapter 11

17 Shannon-Fano In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured).data compressionClaude ShannonRobert Fanoprefix codedata compressionClaude ShannonRobert Fanoprefix code It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal – entropy. It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal – entropy.suboptimalHuffman codingsuboptimalHuffman coding

18 Shannon-Fano Coding For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.probabilities Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

19 Shannon-Fano example

20 Shannon-Fano References References Shannon, C.E. (July 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27: 379–423. http://cm.bell- labs.com/cm/ms/what/shannonday/shannon1948.pdf. Shannon, C.E. (July 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27: 379–423. http://cm.bell- labs.com/cm/ms/what/shannonday/shannon1948.pdf."A Mathematical Theory of Communication"Bell System Technical Journal http://cm.bell- labs.com/cm/ms/what/shannonday/shannon1948.pdf"A Mathematical Theory of Communication"Bell System Technical Journal http://cm.bell- labs.com/cm/ms/what/shannonday/shannon1948.pdf Fano, R.M. (1949). "The transmission of information". Technical Report No. 65 (Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT). Fano, R.M. (1949). "The transmission of information". Technical Report No. 65 (Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT).Research Laboratory of Electronics at MITResearch Laboratory of Electronics at MIT


Download ppt "CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada."

Similar presentations


Ads by Google