Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Data Compression Algorithm: Huffman Compression

Similar presentations


Presentation on theme: "A Data Compression Algorithm: Huffman Compression"— Presentation transcript:

1 A Data Compression Algorithm: Huffman Compression
Gordon College

2 Compression Definition: process of encoding which uses fewer bits
Reason: to save valuable resources such as communication bandwidth or hard disk space Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa Aaa aa aa aaa aa a a a aa aa aaaa Compress Aaa aa aa aaa aa a a a aa aa aaaa Uncompress

3 Compression Types Lossy
Loses some information during compression which means the exact original can not be recovered (jpeg) Normally provides better compression Used when loss is acceptable - image, sound, and video files

4 Compression Types Lossless exact original can be recovered
usually exploit statistical redundancy Used when loss is not acceptable - data Basic Term: Compression Ratio - ratio of the number of bits in original data to the number of bits in compressed data For example: 3:1 is when the original file was 3000 bytes and the compression file is now only 1000 bytes.

5 Variable-Length Codes
Recall that ASCII, EBCDIC, and Unicode use same size data structure for all characters Contrast Morse code Uses variable-length sequences The Huffman Compression is a variable-length encoding scheme

6 Variable-Length Codes
Each character in such a code Has a weight (probability) and a length The expected length is the sum of the products of the weights and lengths for all the characters 0.2 x x x x x 1 = 2.1 Goal minimize the expected length Goal is to minimize the “expected length”

7 Huffman Compression Uses prefix codes (sequence of optimal binary codes) Uses a greedy algorithm - looks at the data at hand and makes a decision based on the data at hand. Popular and effective choice for data compression

8 Huffman Compression Basic algorithm
Generates a table that contains the frequency of each character in a text. Using the frequency table - assign each character a “bit code” (a sequence of bits to represent the character) Write the bit code to the file instead of the character. Popular and effective choice for data compression

9 Immediate Decodability
Definition: When no sequence of bits that represents a character is a prefix of a longer sequence for another character Purpose: Can be decoded without waiting for remaining bits Coding scheme to the right is not immediately decodable However this one is

10 Huffman Compression Huffman (1951)
Uses frequencies of symbols in a string to build a variable rate prefix code. Each symbol is mapped to a binary string. More frequent symbols have shorter codes. No code is a prefix of another. Popular and effective choice for data compression Not Huffman Codes

11 Huffman Codes We seek codes that are
Immediately decodable Each character has minimal expected code length For a set of n characters { C1 .. Cn } with weights { w1 .. wn } We need an algorithm which generates n bit strings representing the codes

12 Cost of a Huffman Tree Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am, respectively. Define the cost of the Huffman tree T to be where ri is the length of the path from the root to ai. HC(T) is the expected length of the code of a symbol coded by the tree T. HC(T) is the bit rate of the code.

13 Example of Cost Example: a 1/2, b 1/8, c 1/8, d 1/4
HC(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75 a b c d

14 Huffman Tree Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes where ri is the length of the path from the root to ai. This is a Huffman tree or Huffman code.

15 Recursive Algorithm - Huffman Codes
Initialize list of n one-node binary trees containing a weight for each character Repeat the following n – 1 times: a. Find two trees T' and T" in list with minimal weights w' and w" b. Replace these two trees with a binary tree whose root is w' + w" and whose subtrees are T' and T" and label points to these subtrees 0 and 1 minimal weights means they are not common 0 / \ 1 / \ (.1) B C (.1)

16 Huffman's Algorithm The code for character Ci is the bit string labeling a path in the final binary tree from the root to Ci Given characters The end with codes result is

17 Huffman Decoding Algorithm
Initialize pointer p to root of Huffman tree While end of message string not reached repeat the following: a. Let x be next bit in string b. if x = 0 set p equal to left child pointer else set p to right child pointer c. If p points to leaf i. Display character with that leaf ii. Reset p to root of Huffman tree

18 Huffman Decoding Algorithm
For message string Using Hoffman Tree and decoding algorithm Click for answer

19 Iterative Huffman Tree Algorithm
Form a node for each symbol ai with weight pi; Insert the nodes in a min priority queue ordered by probability; While the priority queue has more than one element do min1 := delete-min; min2 := delete-min; create a new node n; n.weight := min1.weight + min2.weight; n.left := min1; also associate this link with bit 0 n.right := min2; also associate this link with bit 1 insert(n) Return the last node in the priority queue.

20 Example of Huffman Tree Algorithm (1)
P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

21 Example of Huffman Tree Algorithm (2)

22 Example of Huffman Tree Algorithm (3)

23 Example of Huffman Tree Algorithm (4)

24 Huffman Code

25 In class example I will praise you and I will love you Lord
Index Sym Freq 0 space 1 I 2 L 3 a 4 d 5 e 6 i 7 l 8 n 9 o p r s u v w y

26 In class example I will praise you and I will love you Lord
Index Sym Freq Parent Left Right Nbits Bits 0 space 1 I 2 L 3 a 4 d 5 e 6 i 7 l 8 n 9 o p r s u v w y


Download ppt "A Data Compression Algorithm: Huffman Compression"

Similar presentations


Ads by Google