Synchronization of Huffman codes Marek Biskup Warsaw University Phd-Open,
Marek Biskup - Synchronization of Huffman Codes2 Huffman Codes Each letter has a corresponding binary string (its code) The codes form a complete binary tree The depth of a letter depends its probability in the source The code is decodable a bc de h=3 N=5
Marek Biskup - Synchronization of Huffman Codes3 Coding and decoding Source sequence bbeabce Encoded text b b e a a c b c e c Encoding: For each input letter print out its code Decoding Use the Huffman tree as a finite automaton Start in the root; when you reach a leaf, print out its letter and start again a bc de
Marek Biskup - Synchronization of Huffman Codes4 Parallel decoding Use two processors to decode a string CPU1 starts from the beginning CPU2 starts in the middle Where is the middle? (bbeaacbcec) ? CPU2: d d e c Wrong! CPU1 CPU2 a bc de
Marek Biskup - Synchronization of Huffman Codes5 Parallel decoding Correct: b b e a a c b c e c Incorrect: b b e a a d d e c Synchronization! a bc de
Marek Biskup - Synchronization of Huffman Codes6 Bit corruption Correct: b b e a a c b c e c Bit error: d e c a b d d e c a bc de Synchronization!
Marek Biskup - Synchronization of Huffman Codes7 Huffman code automaton Huffman Tree = finite automaton -transitions from leaves to the root Synchronization: the automaton is in the root when on a codeword boundary $ 0 1 $ 1 0 $ $ 1 0 c b c e c Lack of synchronization: the automaton is in the root when inside a codeword The automaton is in an inner node when on a codeword boundary $ 0 1 $ 1 0 $ $ 1 0 d d e c a bc de
Marek Biskup - Synchronization of Huffman Codes8 Synchronization A Huffman Code is self-synchronizing if for any inner node there is a sequence of codewords such that the automaton reaches the root Every self-synchronizing Huffman code will eventually resynchronize (for an -guaranteed source) Almost all Huffman Codes are self-synchronizing Definition: A synchronizing string is a sequence of bits that moves any node to the root. Theorem: A Huffman code is self-synchronizing iff it has a synchronizing string a bc de Synchronizing string: 0110
Marek Biskup - Synchronization of Huffman Codes9 Synchronizing codewords Can a synchronizing string be a codeword? Yes! a bc d e
Marek Biskup - Synchronization of Huffman Codes10 Optimal codes Minumum redundancy codes are not unique: a bc d e a bc de No synchronizing codeword 2 synchronizing codewords
Marek Biskup - Synchronization of Huffman Codes11 Code characteristics Open problems: chose the best Huffman code with respect to: Average number of bits to synchronization The length of the synchronizing string Existence and length of synchronizing codewords Open problem? The limit on the number of bits in a synchronizing string O(N 3 ) – known result for all automata O(h N logN) – my result for Huffman automata O(N 2 ) – Cerny conjecture for all automata
Marek Biskup - Synchronization of Huffman Codes12 Detecting synchronization Can a decoder find out that it has synchronized? Yes! For example if it receives a synchronizing string A more general algorithm: Try to start decoding the text from h consecutive positions (h „decoders”) Synchronization takes place if all decoders reach the same word boundary This can be done without increasing the complexity of decoding (no h dependence)
Marek Biskup - Synchronization of Huffman Codes13 Guaranteed synchronization Self-synchronizing Huffman Codes: no upper bound on the number of bits before synchronization My work (together with prof. Wojciech Plandowski): Extension to the Huffman coding No redundancy if the code would synchronize Small redundancy if it wouldn’t: O(1/N) per bit N – number of bits before guaranteed synchronization Linear time in the number of coded bits Coder: Analyze each possible starting position of a decoder Add a synchronization string whenever there is a decoder with the number of lost bits above the threshold Decoder: Just decode Skip synchronization strings inserted by the coder
Marek Biskup - Synchronization of Huffman Codes14 Summary Huffman codes can be decompressed in parallel After some bits (on average) a decoder which starts in the middle will synchronize No upper bound on the number of incorrectly decoded symbols With a small additional redundancy one may impose such a bound