CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Trees Chapter 8.
Lecture04 Data Compression.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy Algorithms Huffman Coding
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Design & Analysis of Algorithm Huffman Coding
Expression Tree The inner nodes contain operators while leaf nodes contain operands. a c + b g * d e f Start of lecture 25.
HUFFMAN CODES.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Assignment 6: Huffman Code Generation
ISNE101 – Introduction to Information Systems and Network Engineering
United International University
Heap Chapter 9 Objectives Upon completion you will be able to:
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy: Huffman Codes Yin Tat Lee
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

CPSC 335 Compression and Huffman Coding Dr. Marina Gavrilova Computer Science University of Calgary Canada

Lecture Overview Huffman Coding Huffman Coding Non-determinism of the algorithm Non-determinism of the algorithm Implementations: Implementations: Singly-linked List Singly-linked List Doubly-linked list Doubly-linked list Recursive top-down Recursive top-down Using heap Using heap Adaptive Huffman coding Adaptive Huffman coding

Huffman Coding Algorithm is used to assign a codework to each character in the text according to their frequencies. The codework is usually represented as a bitstring. Algorithm is used to assign a codework to each character in the text according to their frequencies. The codework is usually represented as a bitstring. Algorithm starts with the set of individual trees, consisting of a single node, sorted in the order of increasing character probabilities. Algorithm starts with the set of individual trees, consisting of a single node, sorted in the order of increasing character probabilities. Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of the parent node, combining their probabilities. Then two trees with the smallest probabilities are selected and processed so that they become the left and the right sub-tree of the parent node, combining their probabilities. In the end, 0 are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves (characters) of the tree is generated. In the end, 0 are assigned to all left branches of the tree, 1 to all right branches, and the codework for all leaves (characters) of the tree is generated.

Non-determinism of the Huffman Coding

Huffman Algorithm Implementation – Linked List Implementation depends on the ways to represent the priority queue, which requires removing two smallest probabilities and inserting the new probability in the proper positions. Implementation depends on the ways to represent the priority queue, which requires removing two smallest probabilities and inserting the new probability in the proper positions. The first way to implement the priority queue is the singly linked list of references to trees, which resembles the algorithm presented in the previous slides. The first way to implement the priority queue is the singly linked list of references to trees, which resembles the algorithm presented in the previous slides. The tree with the smallest probability is replaced by the newly created tree. The tree with the smallest probability is replaced by the newly created tree. From the trees with the same probability, the first trees encountered are chosen. From the trees with the same probability, the first trees encountered are chosen.

Doubly Linked List All probability nodes are first ordered, the first two trees are always removed. All probability nodes are first ordered, the first two trees are always removed. The new tree is inserted at the end of the list in the sorted order. The new tree is inserted at the end of the list in the sorted order. A doubly-linked list of references to trees with immediate access to the beginning and to the end of this list is used. A doubly-linked list of references to trees with immediate access to the beginning and to the end of this list is used.

Doubly Linked-List implementation

Recursive Implementation Top-down approach for building a tree starting from the highest probability. The root probability is known if lower probabilities, in the root’s children, have been determined, the latter are known if the lower probabilities have been computed etc. Top-down approach for building a tree starting from the highest probability. The root probability is known if lower probabilities, in the root’s children, have been determined, the latter are known if the lower probabilities have been computed etc. Thus, the recursive algorithm can be used. Thus, the recursive algorithm can be used.

Implementation using Heap The min-heap of probabilities is built. The min-heap of probabilities is built. The highest probability is put in the root. The highest probability is put in the root. Next, the heap property is restored Next, the heap property is restored The smallest probability is removed and the root probability is set to the sum of two smallest probabilities. The smallest probability is removed and the root probability is set to the sum of two smallest probabilities. The processing is complete when there is only one node in the heap left. The processing is complete when there is only one node in the heap left.

Huffman implementation with a heap

Huffman Coding for pairs of characters

Devised by Robert Gallager and improved by Donald Knuth. Devised by Robert Gallager and improved by Donald Knuth. Algorithm is based on the sibling property: if each node has a sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non- increasing frequency counters, it is a Huffman tree. Algorithm is based on the sibling property: if each node has a sibling, and the breadth-first right-to-left tree traversal generates a list of nodes with non- increasing frequency counters, it is a Huffman tree. In adaptive Huffman coding, the tree includes a counter for each symbol updated every time corresponding symbol is being coded. In adaptive Huffman coding, the tree includes a counter for each symbol updated every time corresponding symbol is being coded. Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored. Checking whether the sibling property holds ensures that the tree under construction is a Huffman tree. If the sibling property is violated, the tree is restored. Adaptive Huffman Coding

Sources Web links: Web links: MP3 Converter: MP3 Converter: Practical Huffman Coding: Practical Huffman Coding: Drozdek Textbook - Chapter 11 Drozdek Textbook - Chapter 11

Shannon-Fano In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured).data compressionClaude ShannonRobert Fanoprefix codedata compressionClaude ShannonRobert Fanoprefix code It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal – entropy. It is suboptimal in the sense that it does not achieve the lowest possible expected code word length like Huffman coding; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal – entropy.suboptimalHuffman codingsuboptimalHuffman coding

Shannon-Fano Coding For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.probabilities Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

Shannon-Fano example

Shannon-Fano References References Shannon, C.E. (July 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27: 379– labs.com/cm/ms/what/shannonday/shannon1948.pdf. Shannon, C.E. (July 1948). "A Mathematical Theory of Communication". Bell System Technical Journal 27: 379– labs.com/cm/ms/what/shannonday/shannon1948.pdf."A Mathematical Theory of Communication"Bell System Technical Journal labs.com/cm/ms/what/shannonday/shannon1948.pdf"A Mathematical Theory of Communication"Bell System Technical Journal labs.com/cm/ms/what/shannonday/shannon1948.pdf Fano, R.M. (1949). "The transmission of information". Technical Report No. 65 (Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT). Fano, R.M. (1949). "The transmission of information". Technical Report No. 65 (Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT).Research Laboratory of Electronics at MITResearch Laboratory of Electronics at MIT