1 Huffman Codes Drozdek Chapter 11. 2 Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Binary Search Trees Chapter 6.
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees.
Binary Search Trees II Morse Code.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Trees (Revisited) CHAPTER 15 6/30/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights.
11 Introduction to Object Oriented Programming (Continued) Cats.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Huffman Codes Drozdek Chapter Encoding Next we will add the capability to encode a message entered as normal text. The Huffman Tree that we use.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
11 Introduction to Object Oriented Programming (Continued) Cats.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Huffman encoding.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
1 Huffman Codes Using Binary Files. 2 Getting Started Last class we extended a program to create a Huffman code and permit the user to encode and decode.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Greedy Technique.
Chapter 5. Greedy Algorithms
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

1 Huffman Codes Drozdek Chapter 11

2 Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability for each letter occuring in a message. Huffman Code Construct a tree for decoding messages encoded in a Huffman code. Construct a tree for encoding messages encoded in a Huffman code.

3 Huffman Codes Common character codes such as ASCII and EBCDIC use same size data structure for all characters. Eight bits per character. Contrast Morse code Uses variable-length sequences. Variable length codes can produce shorter messages than fixed length codes on average when applied to many messages with given character probabilities.

4 Variable-Length Codes Each character in such a code has a weight (probability) and a length The expected message length per character is the sum of the products of the code lengths and the probabilties for all the characters (0.2*2) + (0.1*4) + (0.1*4) + (0.15*3) + (0.45*1) = 2.1

5 Immediate Decodability When no sequence of bits that represents a character is a prefix of a longer sequence for another character Can be decoded without waiting for remaining bits. Note how previous scheme is not immediately decodable. And this one is

6 Immediate Decodability Codes that are immediatly decodable are called prefix codes. No valid code symbol is a prefix of another valid code symbol. Perhaps better called prefix free codes.

7 Optimal Codes We seek codes that are Immediately decodable. Average message length for a large number of messages is minimal. For a set of n characters { C 1.. C n } with weights { w 1.. w n } We need an algorithm which generates variable length bit strings representing the characters.

8 Huffman Codes An optimal code scheme developed by David A. Huffman while a PhD student at MIT. “A Method for the Construction of Minimum- Redundancy Codes” Proceedings of the I.R.E., Sept

9 Huffman's Algorithm How to determine an optimal code for a set of N characters given their relative frequencies (or weights).

10 Huffman's Algorithm Initialize a list of one-node binary trees One node for each character containing the character and its weight. While there is more than one tree in the list: Find two trees in the list having minimal weights. Remove those trees from the list and make them the left and right subtrees of a new node having the sum of their weights as its weight. Label the arc to the left subtree with 0. Label the arc to the right subtree with 1. Add the new tree to the list.

11 Huffman's Algorithm The code for character C i is the bit string along the path from the root to C i in the final binary tree.

12 Example Given characters and probabilities: The end result is CharacterHuffman Code A 011 B 000 C 001 D 010 E 1 Note arbitrary choice for sibling of D.

13 Alternate Result Average message length is the same.

14 Huffman Decoding Algorithm Given a message as a string of 0's and 1's: Initialize pointer p to the root of Huffman tree. While end of message string not reached: Let x be the next bit of the message string. If x is 0 move p to the left child else move p to the right child If p points to a leaf Display the character at that leaf. Reset p to the root of the Huffman tree.

15 Huffman Decoding Algorithm For message string Using Huffman Tree and decoding algorithm Click for answer B E A D

16 Implementing a Huffman Code Program Let’s implement a program to build a Huffman code tree. Encode and decode text messages using the resulting Huffman code. Limit input to letters and spaces. Convert to letters to lower case.

17 Implementing a Huffman Code Program In order to create a Huffman code for English text, we need weighting factors for the letters. Frequency tables are readily available. To simplify testing and debugging, start with the a small example: Just the letters A, B, C, D, and E

18 Getting Started Create a new empty C++ project in Visual Studio, Huffman_Code or a directory in Unix. Add a C++ code file main.cpp

19 main.cpp #include using namespace std; int main(void) { cout << "This is the Huffman Code program" << endl; cin.get(); return 0; } Build and test

20 Program Running

21 Class char_freq We need a class to hold the elements of a Huffman tree. Data Character Frequency (Probability of occurance) Pointers Left child Right child Add class Char_Freq

22 Char_Freq.h #pragma once #include using std::ostream; class Char_Freq { private: char ch; double freq; Char_Freq* left; Char_Freq* right; public: Char_Freq(void); Char_Freq(char c, double f); Char_Freq(char c, double f, Char_Freq* Left, Char_Freq* Right); char Ch() const { return ch;}; double Freq() const { return freq;}; bool operator<(const Char_Freq& rhs) const; friend ostream& operator<< (ostream& os, const Char_Freq& cf); };

23 Char_Freq.cpp #include "Char_Freq.h" Char_Freq::Char_Freq(void) {} Char_Freq::Char_Freq(char c, double f) : ch(c), freq(f), left(0), right(0) {} Char_Freq::Char_Freq(char c, double f, Char_Freq* Left, Char_Freq* Right) : ch(c), freq(f), left(Left), right(Right) {} bool Char_Freq::operator<(const Char_Freq& rhs) const { return this->freq < rhs.freq; } ostream& operator<< (ostream& os, const Char_Freq& cf) { os << cf.ch << " " << cf.freq; return os; }

24 The Huffman Tree Add class Huffman_Tree Will hold code to build and access the Huffman code for a specific set of characters and frequencies.

25 Starting the Huffman Tree We will build multiple trees of Char_Freq elements. Keep the roots in a list. Use Standard Template Library list class. Initially one tree per character to be coded. Each tree consists of root only. Method Add() will be used to add char-freq pairs to the list

26 Huffman_Tree.h #pragma once #include #include "Char_Freq.h" class Huffman_Tree { public: Huffman_Tree(void); ~Huffman_Tree(void) {}; // Add a single node tree to the list. void Add(char c, double frequency); void Display_List(void); private: std::list node_list; };

27 Huffman_Tree.cpp #include #include "Huffman_Tree.h" using namespace std; Huffman_Tree::Huffman_Tree(void) {} void Huffman_Tree::Add(char c, double frequency) { Char_Freq cf(c, frequency); node_list.push_back(cf); }

28 Huffman_Tree.cpp void Huffman_Tree::Display_List(void) { cout << "Character frequency list:" << endl; list ::iterator itr; for (itr=node_list.begin(); itr!=node_list.end(); ++itr) { cout << *itr << endl; }

29 main.cpp #include #include "Huffman_Tree.h" using namespace std; Huffman_Tree huffman_tree; int main(void) { cout << "This is the Huffman code program.\n\n"; huffman_tree.Add('a',0.2); huffman_tree.Add('b',0.1); huffman_tree.Add('c',0.1); huffman_tree.Add('d',0.15); huffman_tree.Add('e',0.45); huffman_tree.Display_List(); cin.get(); return 0; }

30 Program in Action

31 Implementing Huffman’s Algorithm Huffman’s algorithm requires us to identify two trees with minimal total frequency. To do this we can sort the list. The < operator for the char_freq class compares the frequency values. So the sort method of the list template class will sort the trees into increasing order by frequency.

32 Implementing Huffman’s Algorithm Add function Make_Decode_Tree to class Huffman_Tree. Repeatedly Sort the list of trees by frequency Remove the first two trees Create a new node with these trees as subtrees. Frequency is sum of their frequencies Add the new node to the list. Continue until there is only one node on the list.

33 Huffman_Tree.h Add new public method: void Make_Decode_Tree (void);

34 Huffman_Tree.cpp Start by sorting the list. Display the sorted list. void Huffman_Tree::Make_Decode_Tree(void) { node_list.sort(); cout << "\nSorted list:\n"; Display_List(); }

35 main.cpp Add call to make_decode_tree. int main(void) { cout << "This is the Huffman code program.\n"; huffman_tree.Add('a',0.2); huffman_tree.Add('b',0.1); huffman_tree.Add('c',0.1); huffman_tree.Add('d',0.15); huffman_tree.Add('e',0.45); huffman_tree.Display_List(); huffman_tree.Make_Decode_Tree(); cin.get(); return 0; }

36 Program in Action

37 Huffman_Tree.cpp Add to function Make_Decode_Tree() while (node_list.size() > 1) { Char_Freq* cf1 = new Char_Freq(node_list.front()); node_list.pop_front(); Char_Freq* cf2 = new Char_Freq(node_list.front()); node_list.pop_front(); Char_Freq cf3(0, cf1->Freq()+cf2->Freq(), cf1, cf2); node_list.push_back(cf3); node_list.sort(); } This is the essence of Huffman’s algorithm!

38 Huffman_Tree.h Add a new private member variable to class Huffman_Tree to hold the root of the tree. private: std::list node_list; Char_Freq decode_tree_root; };

39 Huffman_Tree.cpp In order to check our results we need to be able to display the tree. Also show the code as a list. Add public functions to Huffman_Tree.h: void Display_Decode_Tree(Char_Freq* cf, int indent) const; void Display_Code(Char_Freq* cf, std::string prefix) const; Add at top of Huffman_Tree.cpp: #include

40 Display_Decode_Tree() void Huffman_Tree::Display_Decode_Tree(Char_Freq* cf, int indent) const { if (cf->left != 0) { Display_Decode_Tree(cf->left, indent + 8); } cout << setw(indent) << " " << *cf << endl; if (cf->right != 0) { Display_Decode_Tree(cf->right, indent + 8); } Note access of private members of cf. Make class Huffman_Tree a friend of class Char_Freq.

41 Char_Freq.h Add at the end of Char_Freq.h: bool operator<(const Char_Freq& rhs) const; friend ostream& operator<< (ostream& os, const Char_Freq& cf); friend class Huffman_Tree; };

42 char_freq.cpp Update << to handle merged nodes ch will be 0 ostream& operator<< (ostream& os, const Char_Freq& cf) { if (cf.ch > 0) { os << cf.ch << " " << cf.freq; } else { os << '*' << " " << cf.freq; } return os; }

43 Huffman_Tree.cpp Add at the end of function Make_Decode_Tree() decode_tree_root = node_list.front(); cout << endl << "The Huffman Tree" << endl; Display_Decode_Tree(&decode_tree_root, 0);

44 Program in Action