Download presentation

Presentation is loading. Please wait.

Published byRaul Sullivan Modified over 4 years ago

1
Chapter 4 Variable–Length and Huffman Codes

2
Unique Decodability We must always be able to determine where one code word ends and the next one begins. Counterexample: Suppose: s 1 = 0; s 2 = 1; s 3 = 11; s 4 = 00 0011 = s 4 s 3 or s 1 s 1 s 3 Unique decodability means that any two distinct sequences of symbols (of possibly differing lengths) result in distinct code words. 4.1, 2

3
Instantaneous Codes No code word is the prefix of another. By reading a continuous sequence of code words, one can instantaneously determine the end of each code word. Consider the reverse: s 1 = 0; s 2 = 01; s 3 = 011; s 4 = 111 0111……111 is uniquely decodable, but the first symbol cannot be decoded without reading all the way to the end. s1s1 s2s2 s3s3 111 000 decoding tree s4s4 4.3 s 1 = 0 s 2 = 10 s 3 = 110 s 4 = 111

4
Constructing Instantaneous Codes comma code:s 1 = 0 s 2 = 10 s 3 = 110 s 4 = 1110 s 5 = 1111 modification:s 1 = 00 s 2 = 01 s 3 =10 s 4 = 110 s 5 = 111 s 1 = 00s 2 = 01s 3 = 10 s 4 = 110s 5 = 111 0 00 0 1 11 1 Decoding tree Notice that every code word is located on the leaves 4.4

5
Kraft Inequality Basis: n = 1 s1s1 0,1 s1s1 0 s2s2 1 or Induction: n > 1 01 <n T0T0 T1T1 Prefixing one symbol at top of tree increases all the lengths by one, so Theorem:There exists an instantaneous code for S where each symbol s S is encoded in radix r with length |s| if and only if Proof: ( ) By induction on the height (maximal length path) of the decoding tree, max{|s|: s S}. For simplicity, pick r = 2 (the binary case). By IH, the leaves of T 0, T 1 satisfy the Kraft inequality. 4.5 Could use n = 0 here!

6
Same argument for radix r: Basis: n = 1 s1s1 0 s≤rs≤r ≤ r 1 …… ………… at most r Induction: n > 1 0 ≤ r 1 T0T0 T ≤r-1 at most r subtrees IH so adding at most r of these together gives ≤ 1 Inequality in the binary case implies that not all internal nodes have degree 2, but if a node has degree 1, then clearly that edge can be removed by contraction. 4.5 ……

7
Kraft Inequality ( ) Construct a code via decoding trees. Number the symbols s 1, …, s q so that l 1 ≤ … ≤ l q and assume K ≤ 1. Greedy method: proceed left-to-right, systematically assigning leaves to code words, so that you never pass through or land on a previous one. The only way this method could fail is if it runs out of nodes (tree is over-full), but that would mean K > 1. Exs:r = 21, 3, 3, 3r = 21, 2, 3, 3r = 21, 2, 2, 3 not used 01 1 1 0 00 01 1 1 0 0 01 1 0 ½ + ⅛ + ⅛ + ⅛ < 1½ + ¼ + ⅛ + ⅛ = 1 ½ + ¼ + ¼ + ⅛ > 1 4.5

8
Shortened Block Codes With exactly 2 m symbols, we can form a set of code words each of length m : b 1 …… b m b i {0,1}. This is a complete binary decoding tree of depth m. With < 2 m symbols, we can chop off branches to get modified (shortened) block codes. 4.6 0 0 0 0 1 1 1 1 s1s1 s2s2 s3s3 s4s4 s5s5 0 0 0 1 1 1 s1s1 s2s2 s3s3 s4s4 s5s5 0 1 Ex 1 Ex 2

9
McMillan Inequality Idea: Uniquely decodable codes satisfy the same bounds as instantaneous codes. Theorem: Suppose we have a uniquely decodable code in radix r of lengths of l 1 ≤ … ≤ l q. Then their Kraft sum is ≤ 1. Use a multinomial expansion to see that N k = the number of ways n l‘s can add up to k, which is the same as the number of different ways n symbols can form a coded message of length k. Because of uniqueness, this must be ≤ r k, the number of codewords. Conclusion: WLOG we can use only instantaneous codes. 4.7

10
Average code length Our goal is to minimize the average coded length. If p n > p m then l n ≤ l m. For if p m < p n with l m < l n, then interchanging the encodings for s m and s n we get So we can assume that if p 1 ≥ … ≥ p q then l 1 ≤ … ≤ l q, because if p i = p i+1 with l i > l i+1, we can just switch s i and s i+1. 4.8 old new >

11
Start with S = {s 1, …, s q } the source alphabet. And consider B = {0, 1} as our code alphabet (binary). First, observe that l q 1 = l q, since the code is instantaneous, s l q 1 ) won’t hurt. Huffman algorithm: So, we can combine s q 1 and s q into a “combo-symbol” (s q 1 +s q ) with probability (p q 1 +p q ) and get a code for the reduced alphabet. For q = 1, assign s 1 = ε. For q > 1, let s q-1 = (s q-1 +s q ) 0 and s q = (s q-1 +s q ) 1 0.40.2 0.1 0.40.2 0.4 0.2 0.60.4 1.0 Example: 10100000100011 101000001 10001 01 ε N. B. the case for q = 1 does not produce a valid code. 4.8

12
Huffman is always of shortest average length Assume p 1 ≥ … ≥ p q Huffman Alternative L avg L L ≥ trying to show We know l 1 ≤ … ≤ l q Example: p 1 = 0.7; p 2 = p 3 = p 4 = 0.1 Compare L avg = 1.5 to log 2 q = 2. Base Case: For q = 2, no shorter code exists. Induction Step: For q > 2 take any instantaneous code for s 1, …, s q with minimal average length. s1s1 10 s2s2 4.8

13
Claim that l q 1 = l q = l q 1, q + 1 because So its reduced code will always satisfy: By IH, L′ avg ≤ L′. But more importantly the reduced Huffman code shares the same properties so it also satisfies the same equation L′ avg + (p q 1 + p q ) = L avg, hence L avg ≤ L. 4.8 reduced code 01 sq1 + sqsq1 + sq combined symbol total height = l q s 1, ………sq1sq1 sqsq

14
Code Extensions Takep 1 = ⅔ and p 2 = ⅓Huffman code gives s 1 = 0 s 2 = 1 L avg = 1 Square the symbol alphabet to get: S 2 : s 1,1 = s 1 s 1 ;s 1,2 = s 1 s 2 ; s 2,1 = s 2 s 1 ; s 2,2 = s 2 s 2 ; p 1,1 = 4 ⁄ 9 p 1,2 = 2 ⁄ 9 p 2,1 = 2 ⁄ 9 p 2,2 = 1 ⁄ 9 Apply Huffman to S 2 : s 1,1 = 1; s 1,2 = 01; s 2,1 = 000; s 2,2 = 001 But we are sending two symbols at a time! 4.10

15
Huffman Codes in radix r At each stage down, we merge the last (least probable) r states into 1, reducing the # of states by r 1. Since we end with one state, we must begin with no. of states 1 mod (r 1). We pad out states with probability 0 to get this. Example: r = 4; k = 3 0.220.20.180.150.10.080.050.020.0 0.220.20.180.150.10.080.07 0.40.220.20.18 1.0 123000102030031 12300010203 0123 4.11 pads

Similar presentations

OK

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.

Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google