Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B
Lossless Data Compression o Any compression algorithm can be viewed as a function that maps sequences of units into other sequences of units. o The original data to be reconstructed from the compressed data. - Lossless o Lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed in exchange for better compression rates.
David A. Huffman o BS Electrical Engineering at Ohio State University o Worked as a radar maintenance officer for the US Navy o PhD student, Electrical Engineering at MIT 1952 o Was given the choice of writing a term paper or to take a final exam o Paper topic: most efficient method for representing numbers, letters or other symbols as binary code
Huffman Coding o Uses the minimum number of bits o Variable length coding – good for data transfer o Different symbols have different lengths o Symbols with the most frequency will result in shorter codewords o Symbols with lower frequency will have longer codewords o “Z” will have a longer code representation then “E” if looking at the frequency of character occurrences in an alphabet o No codeword is a prefix for another codeword!
Decoding SymbolCode E0 T11 N100 I1010 S1011 To determine the original message, read the string of bits from left to right and use the table to determine the individual symbols Decode the following:
Decoding 11 SymbolCode E0 T11 N100 I1010 S T ENNIS Original String:
Representing a Huffman Table as a Binary Tree o Codewords are presented by a binary tree o Each leaf stores a character o Each node has two children o Left = 0 o Right = 1 o The codeword is the path from the root to the leaf storing a given character o The code is represented by the leads of the tree is the prefix code
Constructing Huffman Codes o Goal: construct a prefix code for Σ: associate each letter i with a codeword w i to minimize the average codeword length:
Example Letterpipi wiwi A B C0.201 D0.310 E0.311 Where p i = probability of w i
Algorithm o Make a leaf node for node symbol o Add the generation probability for each symbol to the leaf node o Take the two leaf nodes with the smallest probability (p i ) and connect them into a new node (which becomes the parent of those nodes) o Add 1 for the right edge o Add 0 for the left edge o The probability of the new node is the sum of the probabilities of the two connecting nodes o If there is only one node left, the code construction is completed. If not, to back to (2)
Example SymbolProbability A0.387 B0.194 C0.161 D0.129 E
Example – Creating the tree D C A B E SymbolProbability A0.387 B0.194 C0.161 D0.129 E
Example – Iterate Step 2 Take the two leaf nodes with the smallest probability (p i ) and connect them into a new node (which becomes the parent of those nodes) o Green nodes – nodes to be evaluated o White nodes – nodes which have already been evaluated o Blue nodes – nodes which are added in this iteration D C A B E
Example – Iterate Step 2 D C A B E Note: when two nodes are connected by a parent, the parent should be evaluated in the next iteration
D B A C E Example – Iterate Step 2
Example: Completed Tree D C A B E
Example: Table for Huffman Code SymbolProbability A0 B111 C110 D100 E101 Generate the table by reading from the root node to the leaves for each symbol
Practice SymbolOccurrencesHuffman Code A0.45? B0.13? C0.12? D0.16? E0.09? F0.05?
Practice Solution C A 0.45 D 0.16 B F 0.05 E
Questions?
References o ithms/huffman.php ithms/huffman.php o o html html o o htm htm