Presentation is loading. Please wait.

Presentation is loading. Please wait.

HUFFMAN CODES.

Similar presentations


Presentation on theme: "HUFFMAN CODES."— Presentation transcript:

1 HUFFMAN CODES

2 Greedy Algorithm Design
Steps of Greedy Algorithm Design: Formulate the optimization problem. Show that the greedy choice can lead to an optimal solution, so that the greedy choice is always safe. Demonstrate that an optimal solution to original problem = greedy choice + an optimal solution to the subproblem Optimal Substructure Property Greedy-Choice Property A good clue that that a greedy strategy will solve the problem.

3 Encoding and Compression of Data
Definition Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth Applicable to many forms of data transmission Our example: text files

4 Huffman Codes Proposed by Dr. David A. Huffman in 1952
For compressing data (sequence of characters) Widely used Very efficient (saving 20-90%) Use a table to keep frequencies of occurrence of characters. Output binary string. Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes”

5 Huffman Coding Huffman codes can be used to compress information
Like WinZip – although WinZip doesn’t use the Huffman algorithm JPEGs do use Huffman as part of their compression process Not all characters occur with the same frequency! The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we will instead store the more frequently occurring characters using fewer bits and less frequently occurring characters using more bits On average this should decrease the file size (usually ½)

6 Total space usage in bits:
Computer Data Encoding: How do we represent data in binary? Historical Solution: Fixed length codes. Encode every symbol by a unique binary string of a fixed length. Examples: ASCII (8 bit code),

7 Total space usage in bits-Fixed length codes.
Assume an ℓ bit fixed length code. For a file of n characters Need nℓ bits. Example: suppose alphabet of 3 symbols: { A, B, C }. suppose in file: 1,000,000 characters. Need 2 bits for a fixed length code for a total of 2,000,000 bits.

8 Variable Length codes Idea: In order to save space, use less bits for frequent characters and more bits for rare characters.

9 3 bits to represent 6 characters Saving approximately 25%
Huffman Codes 3 bits to represent 6 characters Frequent characters short codewords and infrequent characters long codewords eg. “abc” = “ ” eg. “abc” = “ ” Example: Frequency Fixed-length Variable-length codeword codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ A file of 100,000 characters. Containing only ‘a’ to ‘e’ Encode file in 1* * * * * *5000 = 224,000 bits Encode file in 100,000 *3 = 300,000 bits Saving approximately 25%

10 Prefix Codes In variable length code we ca use an idea called prefix code, in which no codeword is also a prefix of some other codeword. Prefix code can always achieve the optimal data compression among any character code. Prefix codes are desirable because they simplify decoding.

11 How do we decode? In the fixed length, we know where every character starts, since they all have the same number of bits. Example: A = 000 B = 101 C = 100 A A A B B C

12 How do we decode? In the variable length code, we use an idea called Prefix code, where no code is a prefix of another. Example: A = 0 B = 10 C = 11 None of the above codes is a prefix of another.

13 How do we decode? Example: A = 0 B = 10 C = 11 So, for the string:
A A A B B C the encoding:

14 Prefix Code Example: A = 0 B = 10 C = 11 Decode the string 0 0 0101011
A A A B B C

15 Idea Consider a binary tree, with: 0 meaning a left turn
1 meaning a right turn. Example :Consider the paths from the root to each of the leaves A, B, C, D: A : 0 B : 10 C : 110 D : 111 1 A 1 B 1 C D

16 Observe: This is a prefix code, since each of the leaves has a path ending in it, without continuation. If the tree is full then we are not “wasting” bits. If we make sure that the more frequent symbols are closer to the root then they will have a smaller code. 1 A 1 B 1 C D

17 A full binary tree every nonleaf node has 2 children
Huffman Codes A file of 100,000 characters. The coding schemes can be represented by trees: Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the frequencies of the leaves in its subtree. Frequency Fixed-length (in thousands) codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ 9 100 ‘f’ 5 101 Frequency Variable-length (in thousands) codeword ‘a’ 45 0 ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 100 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 Not a full binary tree 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 A full binary tree every nonleaf node has 2 children

18 Huffman Codes - Greedy Algorithm
To find an optimal code for a file: 1. The coding must be unambiguous. Consider codes in which no codeword is also a prefix of other codeword. => Prefix Codes Prefix Codes are unambiguous. Once the codewords are decided, it is easy to compress (encode) and decompress (decode). 2. File size must be smallest. => Can be represented by a full binary tree. => Usually less frequent characters are at bottom Let C be the alphabet (eg. C={‘a’,’b’,’c’,’d’,’e’,’f’}) For each character c, no. of bits to encode all c’s occurrences = freqc*depthc File size B(T) = cCfreqc*depthc (no of bits to encode a file) Frequency Codeword ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 Cost of Tree Eg. “abc” is coded as” “ ”

19 Huffman Codes - Greedy Algorithm
1. Consider all pairs: <frequency, symbol>. 2. Choose the two lowest frequencies, and make them brothers, with the root having the combined frequency. 3. Iterate.

20 Huffman Codes How do we find the optimal prefix code?
Huffman code (1952) was invented to solve it. A Greedy Approach. Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45 c:12 b:13 d:16 a:45 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 d:16 a:45 14 25 c:12 b:13 30 f:5 e:9 a:45 d:16 14 25 c:12 b:13 30 55 f:5 e:9 100 55 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13

21 Huffman Codes Q: A min-priority queue …. HUFFMAN(C) 1 Build Q from C
14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 …. HUFFMAN(C) 1 Build Q from C 2 For i = 1 to |C|-1 3 Allocate a new node z 4 z.left = x = EXTRACT_MIN(Q) 5 z.right = y = EXTRACT_MIN(Q) 6 z.freq = x.freq + y.freq 7 Insert z into Q in correct position. 8 Return EXTRACT_MIN(Q) If Q is implemented as a binary min-heap, “Build Q from C” is O(n) “for loop is O(n) “EXTRACT_MIN(Q)” is O(lg n) “Insert z into Q” is O(lg n) Huffman(C) is O(n lg n)

22 Example

23 Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text Mike Scott

24 Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a r l k . Mike Scott

25 Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E y k 1 e s r n 2 i a 2 space 4 l 1 Mike Scott

26 Building a Tree Prioritize characters
An Introduction to Huffman Coding March 21, 2000 Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue Mike Scott

27 An Introduction to Huffman Coding
March 21, 2000 Building a Tree The queue after inserting all nodes E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

28 An Introduction to Huffman Coding
March 21, 2000 Building a Tree E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

29 An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 Mike Scott

30 An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 Mike Scott

31 An Introduction to Huffman Coding
March 21, 2000 Building a Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 2 y 1 l 1 Mike Scott

32 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 Mike Scott

33 An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 Mike Scott

34 An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

35 An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 Mike Scott

36 An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 E 1 i 1 r 2 s 2 y 1 l 1 Mike Scott

37 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 Mike Scott

38 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 Mike Scott

39 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 Mike Scott

40 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

41 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 Mike Scott

42 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? Mike Scott

43 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 Mike Scott

44 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 Mike Scott

45 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

46 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

47 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 Mike Scott

48 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

49 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

50 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

51 An Introduction to Huffman Coding
March 21, 2000 Building a Tree This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake.  26 characters Mike Scott

52 Encoding the File Traverse Tree for Codes
An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Perform a traversal of the tree to obtain new code words. Going left is a 0 going right is a 1 Code word is only completed when a leaf node is reached. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott

53 Encoding the File Traverse Tree for Codes
An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Char Code E i y l k space 011 e 10 r s n a E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott

54 An Introduction to Huffman Coding
March 21, 2000 Encoding the File Char Code E i y l k space 011 e 10 r s n a Rescan text and encode file using new code words Eerie eyes seen near lake. Why is there no need for a separator character? Mike Scott

55 Encoding the File Results
An Introduction to Huffman Coding March 21, 2000 Encoding the File Results Have we made things any better? 73 bits to encode the text ASCII would take 8 * 26 = 208 bits Mike Scott

56 An Introduction to Huffman Coding
March 21, 2000 Decoding the File How does receiver know what the codes are? Tree constructed for each text file. Considers frequency for each file Data transmission is bit based versus byte based. Mike Scott

57 Decoding the File Once receiver has tree it scans incoming bit stream 0  go left 1  go right E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 A. elk nay sir B. eel snake C. eel kin sly D. eek snarl nil E. eel snarl. e e l space s n a r l

58 Summary Huffman coding is a technique used to compress files for transmission Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions An application that uses several data structures


Download ppt "HUFFMAN CODES."

Similar presentations


Ads by Google