Trees Addenda
Huffman Codes ASCII, EBCDIC (IBM Mainframes) & Unicode use 8 bits for all characters Morse code, others variable-length sequences
Variable-Length Codes Each character has: Has a weight (a probability of ocurrence) A length Expected length of a string: sum of the products of the weights and lengths of all characters in string Char Code Length A 01 2 B 1000 4 C 1010 D 100 3 E 1 Character A B C D E Weight .2 .1 .15 .45 ABCDE = 0.2 x 2 + 0.1 x 4 + 0.1 x 4 + 0.15 x 3 + 0.45 x 1 = 2.1
Decoding Examine code string When complete sequence found Announce recognition of the character Start decoding next character
Immediate Decodability No code sequence is a prefix of another code (i.e.; every code has a unique start) Can be decoded without waiting for remaining bits Must decode whole string D is a prefix of B NO YES Char Code Length A 01 2 B 1000 4 C 1010 D 100 3 E 1 Char Code Length A 01 2 B 1000 4 C 0001 D 001 3 E 1
Huffman Codes Immediately decodable Minimal code length Need an algorithm Builds n-bit codes
Huffman Encoding Initialize list of n one-node binary trees T with a weight for each character Do the following n – 1 times Find two trees T' and T" in list with minimal weights w' and w" Replace these two with 1 binary tree whose root is w'+ w" and whose subtrees are T' and T" label the subtree edges: 0 and 1 the code for character Ci is the bit string of labels from the root to Ci
Huffman Encoding (Example) Value of each parent=sum of children 1 1 .55 1 .2 .35 1 1 .1 .1 .15 .2 .45 B C D A E
Huffman Decoding Initialize pointer p to root of Huffman tree While not end of message string: a. Let x be next bit in string b. if x = 0 set p = left child pointer else set p = right child pointer c. If p points to leaf Display character with that leaf Reset p to root of Huffman tree e.g.; code string: 0001011010 B E A D