Presentation is loading. Please wait.

Presentation is loading. Please wait.

Huffman Coding.

Similar presentations


Presentation on theme: "Huffman Coding."— Presentation transcript:

1 Huffman Coding

2 A simple example Suppose we have a message consisting of 5 symbols, e.g. [►♣♣♠☻►♣☼►☻] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols  at least 3 bits For a simple encoding, length of code is 10*3=30 bits

3 A simple example – cont. Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code For Huffman code, length of encoded message will be ►♣♣♠☻►♣☼►☻ =3*2 +3*2+2*2+3+3=24bits

4 Another Example This is eleven letters in 23 bits
A = 0 B = 100 C = 1010 D = 1011 R = 11 ABRACADABRA = This is eleven letters in 23 bits A fixed-width encoding would require 3 bits for five different letters, or 33 bits for 11 letters Notice that the encoded bit string can be decoded!

5 The first way needs 1003=300 bits. The second way needs
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length code 111 1101 1100 The first way needs 1003=300 bits. The second way needs 45 1+13 3+12 3+16 3+9 4+5 4=232 bits. 2018/11/22

6 Variable-length code Need some carefulness to read the code.
(codeword: a=0, b=00, c=01, d=11.) Where to cut? 00 can be explained as either aa or b. Prefix of 0011: 0, 00, 001, and 0011. Prefix codes: no codeword is a prefix of some other codeword. (prefix free) Prefix codes are simple to encode and decode. 2018/11/22

7 Using codeword in Table to encode and decode
Encode: abc = = (just concatenate the codewords.) Decode: = = aabe a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length code 111 1101 1100 2018/11/22

8 Encode: abc = = (just concatenate the codewords.) Decode: = = aabe (use the (right)binary tree below:) a:45 b:13 c:12 d:16 e:9 f:5 1 100 14 86 28 58 a:45 b:13 c:12 d:16 e:9 f:5 55 25 30 14 100 1 Tree for the fixed length codeword Tree for variable-length codeword 2018/11/22

9 Binary tree Every nonleaf node has two children.
Why? The fixed-length code in our example is not optimal. The total number of bits required to encode a file is f ( c ) : the frequency (number of occurrences) of c in the file dT(c): denote the depth of c’s leaf in the tree 2018/11/22

10 Constructing an optimal coding scheme
Formal definition of the problem: Input: a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. Huffman proposed a greedy algorithm to solve the problem. 2018/11/22

11 (a) f:5 e:9 c:12 b:13 d:16 a:45 (b) a:45 d:16 e:9 f:5 14 1 b:13 c:12
1 b:13 c:12 2018/11/22

12 a:45 d:16 e:9 f:5 14 1 b:13 c:12 25 (c) a:45 b:13 c:12 d:16 e:9 f:5 25
1 b:13 c:12 25 (c) a:45 b:13 c:12 d:16 e:9 f:5 25 30 14 1 (d) 2018/11/22

13 a:45 b:13 c:12 d:16 e:9 f:5 55 25 30 14 100 1 a:45 b:13 c:12 d:16 e:9
1 a:45 b:13 c:12 d:16 e:9 f:5 55 25 30 14 1 (f) (e) 2018/11/22

14 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q)
HUFFMAN(C) 1 n:=|C| 2 Q:=C 3 for i:=1 to n-1 do 4 z:=ALLOCATE_NODE() 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q) 7 f[z]:=f[x]+f[y] 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q) 2018/11/22

15 The Huffman Algorithm This algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. C is a set of n characters, and each character c in C is a character with a defined frequency f[c]. Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together. The result of the merger is a new object (internal node) whose frequency is the sum of the two objects. 2018/11/22

16 Time complexity Lines 4-8 are executed n-1 times.
Each heap operation in Lines 4-8 takes O(lg n) time. Total time required is O(n lg n). Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered. 2018/11/22

17 An Complete Example Scan the original text
An Introduction to Huffman Coding March 21, 2000 An Complete Example Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a l k . Mike Scott

18 Building a Tree Scan the original text
An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E y k 1 e s r n 2 i a 2 space 4 l 1 Mike Scott

19 An Introduction to Huffman Coding
March 21, 2000 Building a Tree The array after inserting all nodes E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

20 An Introduction to Huffman Coding
March 21, 2000 Building a Tree E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

21 An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 Mike Scott

22 An Introduction to Huffman Coding
March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 Mike Scott

23 An Introduction to Huffman Coding
March 21, 2000 Building a Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 2 y 1 l 1 Mike Scott

24 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 Mike Scott

25 An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 Mike Scott

26 An Introduction to Huffman Coding
March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

27 An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 Mike Scott

28 An Introduction to Huffman Coding
March 21, 2000 Building a Tree n 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 E 1 i 1 r 2 s 2 y 1 l 1 Mike Scott

29 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 Mike Scott

30 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 Mike Scott

31 An Introduction to Huffman Coding
March 21, 2000 Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 Mike Scott

32 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

33 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 Mike Scott

34 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? Mike Scott

35 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 Mike Scott

36 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 Mike Scott

37 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

38 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

39 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 Mike Scott

40 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

41 An Introduction to Huffman Coding
March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

42 An Introduction to Huffman Coding
March 21, 2000 Building a Tree After enqueueing this node there is only one node left in priority queue. 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

43 An Introduction to Huffman Coding
March 21, 2000 Using heap: P L R f 5 P L R e 9 P L R c 12 P L R b 13 P L R d 16 P L R a 45 Mike Scott

44 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R e 9 P L R c 12 P L R b 13 P L R d 16 P L R a 45 P L R f 5 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

45 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R a 45 P L R e 9 P L R c 12 P L R b 13 P L R d 16 P L R f 5 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

46 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R e 9 P L R a 45 P L R c 12 P L R b 13 P L R d 16 P L R f 5 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

47 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R e 9 P L R b 13 P L R c 12 P L R a 45 P L R d 16 P L R f 5 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

48 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R b 13 P L R c 12 P L R a 45 P L R d 16 P L R e 9 P L R f 5 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

49 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P L R d 16 P L R b 13 P L R c 12 P L R a 45 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

50 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R c 12 P L R b 13 P L R d 16 P L R a 45 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

51 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R c 12 P L R b 13 P L R d 16 P L R a 45 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

52 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R b 13 P L R d 16 P L R a 45 P f e g 14 P L R c 12 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

53 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P f e g 14 P L R b 13 P L R d 16 P L R a 45 g L R f 5 g L R e 9 P L R c 12 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

54 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R b 13 P f e g 14 P L R d 16 P L R a 45 g L R f 5 g L R e 9 P L R c 12 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

55 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P f e g 14 P L R d 16 P L R a 45 P L R b 13 g L R f 5 g L R e 9 P L R c 12 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

56 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R a 45 P f e g 14 P L R d 16 g L R f 5 g L R e 9 P L R c 12 P L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

57 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P f e g 14 P L R a 45 P L R d 16 g L R f 5 g L R e 9 P c b h 25 h L R c 12 h L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

58 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P f e g 14 P L R a 45 P L R d 16 P c b h 25 g L R f 5 g L R e 9 h L R c 12 h L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

59 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P f e g 14 P c b h 25 P L R d 16 P L R a 45 g L R f 5 g L R e 9 h L R c 12 h L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

60 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P c b h 25 P L R d 16 P L R a 45 h L R c 12 h L R b 13 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

61 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R a 45 P c b h 25 P L R d 16 h L R c 12 h L R b 13 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

62 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R d 16 P c b h 25 P L R a 45 h L R c 12 h L R b 13 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

63 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P c b h 25 P L R a 45 h L R c 12 h L R b 13 P L R d 16 P f e g 14 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

64 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R a 45 P c b h 25 h L R c 12 h L R b 13 P f e g 14 P L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

65 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P c b h 25 P L R a 45 P g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

66 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P c b h 25 P g d i 30 P L R a 45 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

67 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P g d i 30 P L R a 45 i f e g 14 i L R d 16 P c b h 25 g L R f 5 g L R e 9 h L R c 12 h L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

68 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P g d i 30 P L R a 45 i f e g 14 i L R d 16 P c b h 25 g L R f 5 g L R e 9 h L R c 12 h L R b 13 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

69 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P L R a 45 P g d i 30 P c b h 25 i f e g 14 i L R d 16 h L R c 12 h L R b 13 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

70 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P h i j 55 P L R a 45 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

71 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P L R a 45 P h i j 55 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

72 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P h i j 55 P L R a 45 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

73 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P h i j 55 j c b h 25 j g d i 30 P L R a 45 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

74 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P h i j 55 P L R a 45 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

75 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P a j k 100 k L R a 45 k h i j 55 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

76 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P a j k 100 k L R a 45 k h i j 55 j c b h 25 j g d i 30 h L R c 12 h L R b 13 i f e g 14 i L R d 16 g L R f 5 g L R e 9 2018/11/22 CS3335 Design and Analysis of Algorithms/WANG Lusheng

77 Exercise Modify MyHeap.java in Tutorial 6’s folder so that the class ArrayNode has five data fields: int key; char letter; ArrayNode parent; ArrayNode left; ArrayNode right; and use the modified MyHeap to construct Huffman code tree. The program can read n pairs (ai, bi) from the keyboard , where ai is the number of times that character/letter bi appears and construct the Huffman code tree for the n pairs. 2018/11/22


Download ppt "Huffman Coding."

Similar presentations


Ads by Google