Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Coding Run Length Coding

Similar presentations


Presentation on theme: "Data Coding Run Length Coding"— Presentation transcript:

1 Data Coding Run Length Coding
Rather than code each symbol of a lengthy run of the same symbol, Describe the run as a symbol and a run length of that symbol.

2 Data Coding Entropy Log2(1/prob(head)) = 1 (“0”)
The Entropy of a sequence of symbols is the theoretical minimum average number of bits that can represent a sequence of symbols, given the probability of occurrence of each symbol of the sequence. Suppose we had a coin flicking experiment recording the number of times or probability a head or tail occurs: Prob(head) = 0.5 Prob(tail) = 0.5 Log2(1/prob(symbol)) gives the optimum number of bits required to encode a symbol given its probability of occurrence: Log2(1/prob(head)) = 1 (“0”) Log2(1/prob(tail)) = (“1”) Multiply each with the probability of occurrence of the symbol provides the average number of bits required to encode a symbol: [prob(head) x Log2(1/prob(head))] + [prob(tail) x Log2(1/prob(tail))] [0.5 x 1] + [0.5 x 1] = 1 In practice the number of symbols and probability of occurrence of each symbol is more complicated than this simple example, so an intuitive assignment of bits to represent symbols is not possible. Instead the following algorithms have been devised to achieve this: Shannon-Fano, Huffman, Arithmetic and Lempel-Ziv-Welch.

3 Data Coding Huffman Coding
Encoding for Huffman Coding Algorithm (A bottom-up approach ): Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE). Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN.

4 Data Coding Huffman Coding: Encoding Algorithm
Encoding for Huffman Coding Algorithm (with sorting after each merge) 1 (39) (24) A (15) 1 (13) (24) (11) 1 B (7) (13) C (6) D (6) 1 (11) E (5)

5 Data Coding Huffman Coding: Encoding Algorithm
loge(x) = y x = ey log10(x) = log10(ey) log10(x) = ylog10(e) y = [log10(x)/log10(e)] 39 TOTAL (# of bits): 87 entropy = (15 x x x x x 2.96) / 39 = / 39 = 2.19 Number of bits needed for Huffman Coding: 87/39 = 2.23

6 Data Coding Adaptive Huffman Coding: Motivations
The previous algorithms requires apriori statistical knowledge which is often not available (e.g., live audio, video). Even when it is available, it could be a heavy overhead especially when many tables had to be sent when a non-order 0 model is used, i.e. taking into account the impact of the previous symbol to the probability of the current symbol (e.g., "qu" often come together, ...). The solution is to use adaptive algorithms.

7 Data Coding Adaptive Huffman Coding: Algorithm
encoder and decoder use exactly the same initialization and update_model routines. ENCODER DECODER Initialize_model(); Initialize_model(); while ((c = getc (input)) != eof) while ((c = decode (input)) != eof) { { encode (c, output); putc (c, output); update_model (c); update_model (c); } }

8 Data Coding Adaptive Huffman Coding: Example
update_model does two things: (a) increment the count, (b) update the Huffman tree. During the updates, the Huffman tree will maintained its sibling property, i.e. the nodes (internal and leaf) are arranged in order of increasing weights (see figure).

9 Data Coding Adaptive Huffman Coding: Example (cont.)
When swapping is necessary, the farthest node with weight W is swapped with the node whose weight has just been increased to W Note: If the node with weight W has a subtree beneath it, then the subtree will go with it. The Huffman tree could look very different after node swapping, e.g., in the third tree, node A is again swapped and becomes the #5 node. It is now encoded using only 2 bits.

10 Data Coding Adaptive Huffman Coding: Example (cont.)

11 Data Coding Arithmetic Coding: Algorithm
A message is represented by an interval of real numbers (floating point) between 0.0 and 1.0. As message becomes larger, the interval needed to represent it becomes smaller, and the number of bits needed to specify that interval grows. A single number (floating point) in the interval can be uniquely decoded to create the exact stream of symbols that went into its construction.

12 Data Coding Arithmetic: Example
Consider letters (a, e, i, o, u, EOS), where EOS represents end of message As an example the message "eaiiEOS" is coded

13 Data Coding Arithmetic: Example

14 Data Coding Lempel-Ziv-Welch Compression Algorithms
Huffman and Arithmetic coding assume a stationary source. Lempel-Ziv-Welch is an adaptive lossless coding technique which “learns” its symbols as it codes. Original methods due to Ziv and Lempel in 1977 and Terry Welch improved the scheme in 1984 (called LZW compression). It is used in e.g., zip, gzip, pkzip, winzip, GIF, V.42 bis, Stacker. Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, Vol. 17, No. 6, 1984, pp

15 Data Coding: Lempel-Ziv (LZ78) Binary Compression Algorithm
LZ78 algorithm maintains a stack or tree containing all the phrases into which it has divided the portion of data sequence it has parsed so far. The next phrase is formed by concatenating two items 1. The phrase in the structure that achieves the longest match with the beginning of the as yet unparsed portion of the data, 2. The source datum beyond the end of this maximal match.

16 Data Coding Lempel-Ziv LZ78 Binary Coding Example
Input string is: 0, 00, 1, 01, 10, 000, 010, 100, 1001, 0001, 001, LZ78 Tree is: Log2(k) = x k = 2x log10k=xlog102 x = log10k/ log102

17 Data Coding LZ78S, LZ78E Compression Algorithm
LZ78S algorithm improves compression performance by noting that when a node is specified for the second time as being an ancestor node for the current phrase, everyone immediately knows that this phrase must end with the remaining descendant of that node rather than the descendant that was appended earlier. (E.g. if 0 ancestor has already been selected then we know the other ancestor must be 1) LZ78E algorithm improves compression performance by noting that after phrase 3 has been parsed, root node will never be used as the ancestor node because both its descendants are now in tree. after phrase 4 has been parsed, node 1 will never be used as the ancestor node because both its descendants are now in the tree. These “dead” nodes now no longer need to be coded.

18 Data Coding Lempel-Ziv-Welch Compression Algorithm
Given Webster's English dictionary which contains about 159,000 entries find a way to build the dictionary adaptively. w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else add wk to the dictionary; output the code for w; w = k; }

19 Data Coding Lempel-Ziv-Welch Coding: Example
Input string is "^WED^WE^WEE^WEB^WET". A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each code/symbol will need more than 8 bits, say 9 bits. Usually, compression doesn't start until a large number of bytes (e.g., > 100) are read in.

20 Data Coding Lempel-Ziv-Welch Decompression Algorithm
read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; }

21 Data Coding: Lempel-Ziv-Welch Decoding Example
Input string is "^WED<256>E<260><261><257>B<260>T".

22 Data Coding Binary Lempel-Ziv-Welch Coding: Example
Input string is “ ”.


Download ppt "Data Coding Run Length Coding"

Similar presentations


Ads by Google