Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 4 (week 2)

Similar presentations


Presentation on theme: "1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 4 (week 2)"— Presentation transcript:

1 1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 4 (week 2)

2 2 ● Proposed by Dr. David A. Huffman in 1952 ➢ “A Method for the Construction of Minimum Redundancy Codes” ● Applicable to many forms of data transmission ➢ Mostly used example: text files Huffman Coding

3 3 Huffman Coding Algorithm: – The procedure to build Huffman codes – Extended Huffman Codes (new) Adaptive Huffman Coding (new) – Update Procedure – Decoding Procedure Huffman Coding: What We Will Discuss

4 4 Shannon-Fano Coding ● The second code based on Shannon’s theory ✗ It is a suboptimal code (it took a graduate student (Huffman) to fix it!) ● Algorithm: ➢ Start with empty codes ➢ Compute frequency statistics for all symbols ➢ Order the symbols in the set by frequency ➢ Split the set (almost half-half!) to minimize*difference ➢ Add ‘0’ to the codes in the first set and ‘1’ to the rest ➢ Recursively assign the rest of the code bits for the two subsets, until sets cannot be split.

5 5 Shannon-Fano Coding ● Example: ● Assume a sequence: A={a,b,c,d,e,f} with the following occurrence weights, {9, 8, 6, 5, 4, 2}, respectively ● Apply Shannon-Fano Coding and discuss the sub- optimality

6 6 Shannon-Fano Coding

7 7

8 8

9 9

10 10 Shannon-Fano Coding

11 11 Shannon-Fano Coding ef

12 12 Shannon-Fano Coding ● Shannon-Fano does not always produce optimal prefix codes; the ordering is performed only once at the beginning!! ● Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits ● Sometimes prefix-free codes are called, Huffman codes ● Symbol-by-symbol Huffman coding (the easiest one) is only optimal if the probabilities of these symbols are independent and are some power of a half, i.e. (½) n

13 13 ● Proposed by Dr. David A. Huffman in 1952 ➢ “A Method for the Construction of Minimum Redundancy Codes” ● Applicable to many forms of data transmission ➢ Our example: text files ● In general, Huffman coding is a form of statistical coding as not all characters occur with the same frequency! Huffman Coding

14 14 ● Why Huffman coding (likewise all entropy coding): – Code word lengths are no longer fixed like ASCII. – Code word lengths vary and will be shorter for the more frequently used characters, i.e., overall shorter average code length! Huffman Coding: The Same Idea

15 15 1. Scan the text to be compressed and compute the occurrence of all characters. 2. Sort or prioritize characters based on number of occurrences in text (from low-to-high). 3. Build Huffman code tree based on prioritized list. 4. Perform a traversal of tree to determine all code words. 5. Scan text again and create (encode) the characters in a new coded file using the Huffman codes. Huffman Coding: The Algorithm

16 16 ● Consider the following short text: – Eerie eyes seen near lake. ● Count up the occurrences of all characters in the text – E e r i e e y e s s e e n n e a r l a k e. Huffman Coding: Building The Tree E e r i space y s n a r l k.Example

17 17 Eerie eyes seen near lake. What is the frequency of each character in the text? Huffman Coding: Building The Tree Example Char Freq. Char Freq. Char Freq. E 1 y1 k1 e 8 s 2.1 r 2 n 2 i 1 a2 space 4 l1

18 18 Huffman Coding: Building The Tree 1. Create binary tree nodes with character and frequency of each character 2. Place nodes in a priority queue “??” The lower the occurrence, the higher the priority in the queue

19 19 Huffman Coding: Building The Tree second bonus assignment: Construct the Huffman tree as follows!! Uses binary tree nodes (OOP-like View; second bonus assignment: Construct the Huffman tree as follows!!) public class HuffNode { public char myChar; public int myFrequency; public HuffNode, myLeft, myRight; } priorityQueue myQueue;

20 20 Huffman Coding: Building The Tree  The queue after inserting all nodes  Null Pointers are not shown E1E1 i1i1 y1y1 l1l1 k1k1.1.1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8

21 21 Huffman Coding: Building The Tree  While priority queue contains two or more nodes – Create new node – Dequeue node and make it left sub-tree – Dequeue next node and make it right sub-tree – Frequency of new node equals sum of frequency of left and right children – Enqueue new node back into queue in the right order!!

22 22 Huffman Coding: Building The Tree E1E1 i1i1 y1y1 l1l1 k1k1.1.1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8

23 23 Huffman Coding: Building The Tree y1y1 l1l1 k1k1.1.1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8 2 E1E1 i1i1

24 24 Huffman Coding: Building The Tree k1k1.1.1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8 E1E1 i1i1 2 y1y1 l1l

25 CS 307 E1E1 i1i1 k1k1.1.1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8 2 y1y1 l1l1 2 Huffman Coding: Building The Tree

26 CS 307 E1E1 i1i1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8 2 y1y1 l1l1 2 k1k Huffman Coding: Building The Tree

27 CS 307 E1E1 i1i1 r2r2 s2s2 n2n2 a2a2 sp 4 e8e8 2 y1y1 l1l1 2 k1k Huffman Coding: Building The Tree

28 CS 307 E1E1 i1i1 n2n2 a2a2 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 Huffman Coding: Building The Tree

29 CS 307 E1E1 i1i1 n2n2 a2a2 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 Huffman Coding: Building The Tree

30 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a2 4 Huffman Coding: Building The Tree

31 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a2 4 Huffman Coding: Building The Tree

32 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a2 4 4 Huffman Coding: Building The Tree

33 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a2 4 4 Huffman Coding: Building The Tree

34 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

35 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a What is happening to the characters with a low number of occurrences? Huffman Coding: Building The Tree

36 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

37 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

38 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

39 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

40 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

41 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

42 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree

43 CS 307 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree After enqueueing this node there is only one node left in priority queue.

44 CS 307 Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Huffman Coding: Building The Tree Eerie eyes seen near lake. ----> 26 characters

45 CS 307  Perform a traversal of the tree to obtain new code words  Going left is a 0 going right is a 1  code word is only completed when a leaf node is reached E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a Encoding the File Traverse Tree

46 Huffman Coding (contd.) CharCode E0000 i0001 y0010 l0011 k space 011 e10 r1100 s1101 n1110 a1111 E1E1 i1i1 sp 4 e8e8 2 y1y1 l1l1 2 k1k r2r2 s2s2 4 n2n2 a2a

47 Huffman Coding: Example (2) Init: Create a priority set out of each letter 47

48 Huffman Coding: Example (2) 1. Sort the sets according to probability (lowest first) 48

49 Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 49

50 Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 50

51 Huffman Coding: Example (2) 4. Merge the top two sets 51

52 Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 52

53 Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 53

54 Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 54

55 Huffman Coding: Example (2) 4. Merge the top two sets 55

56 Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 56

57 Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 57

58 Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 58

59 Huffman Coding: Example (2) 4. Merge the top two sets 59

60 Huffman Coding: Example (2) 1. Sort sets according to probability (lowest first) 60

61 Huffman Coding: Example (2) 2. Insert prefix ‘1’ into the codes of top set (lowest priority) letters 61

62 Huffman Coding: Example (2) 3. Insert prefix ‘0’ into the codes of the second set letters 62

63 Huffman Coding: Example (2) The END 4. Merge the top two sets 63

64 Example Summary Average code length  l = 0.4x x x x x4 = 2.2 bits/symbol Entropy  H = Σ s=a..e P(s) log 2 P(s) = bits/symbol Redundancy  l – H = bits/symbol 64

65 Example: Tree 1 Average code length l = 0.4x x x x x4 = 2.2 bits/symbol 65

66 Huffman Coding: Example (3) Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Maximum Variance Tree! b, 0.4 c, 0.2 a, 0.2 de, b, 0.4 c, dea, 0.4 b, deac, 0.6 deacb, 1.0 b, 0.4 c, 0.2 a, 0.2 e, 0.1 d,

67 Example: Tree 1 Average code length l = 0.4x x x x x4 = 2.2 bits/symbol 67

68 Huffman Coding: Example (3) Symbols: {a,b,c,d,e,} Weights: {0.2, 0.4, 0.2,0.1, 0.1} Required: Minimum Variance Tree! b, 0.4 c, 0.2 a, 0.2 de, b, 0.4 de, ac, deb, 0.6 deacb, 1.0 b, 0.4 c, 0.2 a, 0.2 e, 0.1 d,

69 Example: Minimum Variance Tree Average code length l = 0.4x2 + ( )x3+ ( )*2 = 2.2 bits/symbol 69 b ed c a

70 Example: Yet Another Tree Average code length l = 0.4x1 + ( )x3 = 2.2 bits/symbol 70

71 Min Variance Huffman Trees Huffman codes are not unique All versions yield the same average length Which one should we choose? The one with the minimum variance in codeword lengths i.e., with the minimum height tree Why? It will ensure the least amount of variability in the encoded stream 71

72 Another Example! Consider the source: A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18 H = bits/symbol Huffman code: a → 0 b → 11 c → 10 l = 1.2 bits/symbol Redundancy = bits/symbol (on average)(47%!) Q: Could we do better? 72

73 Extended Huffman Codes Example 1: Consider encoding sequences of two letters Redundancy = bits/symbol l = ( 0.64x x x x x x x x x8)/2 /2 = /2 bits/symbol 73

74 Extended Huffman Codes (Remarks) The idea can be extended further Consider all possible n m sequences (we did 3 2 ) In theory, by considering more sequences we can improve the coding !! (is it applicable ? ) In reality, the exponential growth of the alphabet makes this impractical E.g., for length 3 ASCII seq.: = 2 24 = 16M Practical consideration: most sequences would have zero frequency → Other methods are needed (Adaptive Huffman Coding) 74


Download ppt "1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 4 (week 2)"

Similar presentations


Ads by Google