Synchronization of Huffman codes Marek Biskup Warsaw University Phd-Open, 2007-05-26.

Slides:



Advertisements
Similar presentations
Convolutional Codes Mohammad Hanaysheh Mahdi Barhoush.
Advertisements

Functional Programming Lecture 15 - Case Study: Huffman Codes.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......
DCSP-8: Minimal length coding II, Hamming distance, Encryption Jianfeng Feng
Recursion and Induction
Greedy Technique Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: feasible, i.e. satisfying the.
Chapter 4 Variable–Length and Huffman Codes. Unique Decodability We must always be able to determine where one code word ends and the next one begins.
Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.
Indexing DNA Sequences Using q-Grams
Synthesis For Finite State Machines. FSM (Finite State Machine) Optimization State tables State minimization State assignment Combinational logic optimization.
EE 4780 Huffman Coding Example. Bahadir K. Gunturk2 Huffman Coding Example Suppose X is a source producing symbols; the symbols comes from the alphabet.
15-583:Algorithms in the Real World
Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Optimal Partitions of Strings: A new class of Burrows-Wheeler Compression Algorithms Raffaele Giancarlo Marinella Sciortino
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
SWE 423: Multimedia Systems
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
A Data Compression Algorithm: Huffman Compression
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Data Structures – LECTURE 10 Huffman coding
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Variable-Length Codes: Huffman Codes
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
The Lower Bounds of Problems
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Huffman encoding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Information theory Data compression perspective Pasi Fränti
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
EE465: Introduction to Digital Image Processing
ISNE101 – Introduction to Information Systems and Network Engineering
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Distributed Compression For Binary Symetric Channels
Trees Addenda.
Data Structure and Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

Synchronization of Huffman codes Marek Biskup Warsaw University Phd-Open,

Marek Biskup - Synchronization of Huffman Codes2 Huffman Codes Each letter has a corresponding binary string (its code) The codes form a complete binary tree The depth of a letter depends its probability in the source The code is decodable a bc de h=3 N=5

Marek Biskup - Synchronization of Huffman Codes3 Coding and decoding Source sequence bbeabce Encoded text b b e a a c b c e c Encoding: For each input letter print out its code Decoding Use the Huffman tree as a finite automaton Start in the root; when you reach a leaf, print out its letter and start again a bc de

Marek Biskup - Synchronization of Huffman Codes4 Parallel decoding Use two processors to decode a string CPU1 starts from the beginning CPU2 starts in the middle Where is the middle? (bbeaacbcec) ? CPU2: d d e c Wrong! CPU1 CPU2 a bc de

Marek Biskup - Synchronization of Huffman Codes5 Parallel decoding Correct: b b e a a c b c e c Incorrect: b b e a a d d e c Synchronization! a bc de

Marek Biskup - Synchronization of Huffman Codes6 Bit corruption Correct: b b e a a c b c e c Bit error: d e c a b d d e c a bc de Synchronization!

Marek Biskup - Synchronization of Huffman Codes7 Huffman code automaton Huffman Tree = finite automaton  -transitions from leaves to the root Synchronization: the automaton is in the root when on a codeword boundary $ 0 1 $ 1 0 $ $ 1 0 c b c e c Lack of synchronization: the automaton is in the root when inside a codeword The automaton is in an inner node when on a codeword boundary $ 0 1 $ 1 0 $ $ 1 0 d d e c a bc de

Marek Biskup - Synchronization of Huffman Codes8 Synchronization A Huffman Code is self-synchronizing if for any inner node there is a sequence of codewords such that the automaton reaches the root Every self-synchronizing Huffman code will eventually resynchronize (for an  -guaranteed source) Almost all Huffman Codes are self-synchronizing Definition: A synchronizing string is a sequence of bits that moves any node to the root. Theorem: A Huffman code is self-synchronizing iff it has a synchronizing string a bc de Synchronizing string: 0110

Marek Biskup - Synchronization of Huffman Codes9 Synchronizing codewords Can a synchronizing string be a codeword? Yes! a bc d e

Marek Biskup - Synchronization of Huffman Codes10 Optimal codes Minumum redundancy codes are not unique: a bc d e a bc de No synchronizing codeword 2 synchronizing codewords

Marek Biskup - Synchronization of Huffman Codes11 Code characteristics Open problems: chose the best Huffman code with respect to: Average number of bits to synchronization The length of the synchronizing string Existence and length of synchronizing codewords Open problem? The limit on the number of bits in a synchronizing string O(N 3 ) – known result for all automata O(h N logN) – my result for Huffman automata O(N 2 ) – Cerny conjecture for all automata

Marek Biskup - Synchronization of Huffman Codes12 Detecting synchronization Can a decoder find out that it has synchronized? Yes! For example if it receives a synchronizing string A more general algorithm: Try to start decoding the text from h consecutive positions (h „decoders”) Synchronization takes place if all decoders reach the same word boundary This can be done without increasing the complexity of decoding (no h dependence)

Marek Biskup - Synchronization of Huffman Codes13 Guaranteed synchronization Self-synchronizing Huffman Codes: no upper bound on the number of bits before synchronization My work (together with prof. Wojciech Plandowski): Extension to the Huffman coding No redundancy if the code would synchronize Small redundancy if it wouldn’t: O(1/N) per bit N – number of bits before guaranteed synchronization Linear time in the number of coded bits Coder: Analyze each possible starting position of a decoder Add a synchronization string whenever there is a decoder with the number of lost bits above the threshold Decoder: Just decode Skip synchronization strings inserted by the coder

Marek Biskup - Synchronization of Huffman Codes14 Summary Huffman codes can be decompressed in parallel After some bits (on average) a decoder which starts in the middle will synchronize No upper bound on the number of incorrectly decoded symbols With a small additional redundancy one may impose such a bound