Download presentation
Published byJesse Potter Modified over 8 years ago
1
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Room: C3-222, ext: 1204, 1 1 1
2
Dictionary Techniques
Statistical compression methods Use the statistical model of the data, and the quality of compression they achieve depends on how good that model is. Dictionary-based compression methods Do not use a statistical model The dictionary holds strings of symbols and it may be static or dynamic (adaptive). Static is permanent (allowing the addition of strings but no deletions). Dynamic holds strings previously found in the input stream, allowing for additions and deletions of strings as new input is being read. 2 2
3
Dictionary Techniques
So far we assumed independent symbol with some statistical model: Not true for many common data types, e.g.: text, images, code the quality of compression they achieve depends on how good that model is Basic idea (ADAPTIVE) Identify frequent symbol patterns. Encode those frequent symbols more efficiently. Use these encoded pattern as a default (less efficient) encoding for the rest! Notes This looks reasonable for things like “text, image” … Also, it looks reasonable for things which is not very random. 3 3
4
Lempel-Ziv Compression Techniques
Static coding requires two-passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode. Examples of Static techniques: Static Huffman Coding All of the adaptive methods are one-pass methods; only one scan of the message (to encode) is required. Examples of adaptive techniques: LZ77?, LZ78?, and Adaptive Huffman Coding 4 4
5
Lempel-Ziv Compression Techniques
LZ77 (Sliding Window) Variants: LZSS (Lempel-Ziv-Storer-Szymanski) Applications: gzip, Squeeze, LHA, PKZIP, ZOO LZ78 (Dictionary Based) Variants: LZW (Lempel-Ziv-Welch), LZC (Lempel-Ziv-Compress) Applications: compress, ARC , V.42bis (LZ78), gzip, zlib, GIF, CCITT (modems), PAK (LZ77) Traditionally: LZ77 was better but slower, but the gzip version is almost as fast as any LZ78. 5 5
6
Example Dictionary All/some letters from the alphabet +
As many diagrams(pairs/group of letters) as possible Example: A= {a, b, c, d, r} 3 pairs: ab, ac, ad Encode the following: abracadabra 6 6
7
Example 7 7
8
Example 8 8
9
Example 9 9
10
Example 10 10
11
Example 11 11
12
Example 12 12
13
Example 13 13
14
Example 14 14
15
Example 15 15
16
LZ78 Compression Algorithm
LZ78 inserts one- or multi-character, distinct patterns of the message to be encoded in a Dictionary. The multi-character patterns are of the form: C0C Cn-1Cn. The prefix of a pattern refers to all the pattern characters except the last: C0C Cn-1 LZ78 Output: Note: The dictionary is usually implemented as a hash table. 16
17
LZ78 Compression Algorithm (cont’d)
Dictionary empty ; Prefix empty ; DictionaryIndex 1; while(characterStream is not empty) { Char next character in characterStream; if(Prefix + Char exists in the Dictionary) Prefix Prefix + Char ; else if(Prefix is empty) CodeWordForPrefix 0 ; CodeWordForPrefix DictionaryIndex for Prefix ; Output: (CodeWordForPrefix, Char) ; insertInDictionary( ( DictionaryIndex , Prefix + Char) ); DictionaryIndex++ ; } if(Prefix is not empty) CodeWordForPrefix DictionaryIndex for Prefix; Output: (CodeWordForPrefix , ) ; 17
18
Example 1: LZ78 Compression
Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm. The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) Note: The above is just a representation, the commas and parentheses are not transmitted; we will discuss the actual form of the compressed message later! 18
19
Example 1: LZ78 Compression (cont’d)
How to solve it :-) 1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary, BC is in the Dictionary, BCA is in the Dictionary, BCAA is in the Dictionary. However, BCAAB is not in the Dictionary; insert it. 19
20
Example 2: LZ78 Compression
Encode (i.e., compress) the string BABAABRRRA using the LZ78 algorithm. The compressed message is: (0,B)(0,A)(1,A)(2,B)(0,R)(5,R)(2, ) 20
21
Example 2: LZ78 Compression (cont’d)
1. B is not in the Dictionary; insert it 2. A is not in the Dictionary; insert it 3. B is in the Dictionary. BA is not in the Dictionary; insert it. 4. A is in the Dictionary. AB is not in the Dictionary; insert it. 5. R is not in the Dictionary; insert it. 6. R is in the Dictionary. RR is not in the Dictionary; insert it. 7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, ) 21
22
Example 3: LZ78 Compression
Encode (i.e., compress) the string AAAAAAAAA using the LZ78 algorithm. 1. A is not in the Dictionary; insert it 2. A is in the Dictionary AA is not in the Dictionary; insert it 3. A is in the Dictionary. AA is in the Dictionary. AAA is not in the Dictionary; insert it. 4. A is in the Dictionary. AAA is in the Dictionary and it is the last pattern; output a pair containing its index: (3, ) 22
23
LZ78 Compression: Number of bits transmitted
Example: Uncompressed String: ABBCBCABABCAABCAAB Number of bits = Total number of characters * 8 = 18 * 8 = 144 bits Suppose the codewords are indexed starting from 1: Compressed string( codewords): (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) Codeword index Each code word consists of an integer and a character: The character is represented by 8 bits. The number of bits n required to represent the integer part of the codeword with index i is given by: Alternatively number of bits required to represent the integer part of the codeword with index i is the number of significant bits required to represent the integer i – 1 23
24
LZ78 Compression: Number of bits transmitted
Codeword (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) index Bits: (1 + 8) + (1 + 8) + (2 + 8) + (2 + 8) + (3 + 8) + (3 + 8) + (3 + 8) = 70 bits The actual compressed message is: 0A0B10C11A010A100A110B where each character is replaced by its binary 8-bit ASCII code. 24
25
LZ78 Decompression Algorithm (self study)
Dictionary empty ; DictionaryIndex 1 ; while(there are more (CodeWord, Char) pairs in codestream){ CodeWord next CodeWord in codestream ; Char character corresponding to CodeWord ; if(CodeWord = = 0) String empty ; else String string at index CodeWord in Dictionary ; Output: String + Char ; insertInDictionary( (DictionaryIndex , String + Char) ) ; DictionaryIndex++; } 25
26
LZ78 Decompression Algorithm (self study)
input: (CI, character) pairs output: if(CI == 0) output: currentCharacter else output: stringAtIndex CI + currentCharacter Insert: current output in dictionary 26
27
Example 1: LZ78 Decompression
Decode (i.e., decompress) the sequence (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) The decompressed message is: ABBCBCABABCAABCAAB 27
28
Example 2: LZ78 Decompression
Decode (i.e., decompress) the sequence (0, B) (0, A) (1, A) (2, B) (0, R) (5, R) (2, ) The decompressed message is: BABAABRRRA 28
29
Example 3: LZ78 Decompression
Decode (i.e., decompress) the sequence (0, A) (1, A) (2, A) (3, ) The decompressed message is: AAAAAAAAA 29
30
Exercises 1. Use LZ78 to trace encoding the string SATATASACITASA.
2. Write a MATLAB program that encodes a given string using LZ78. 3. Write a MATLAB program that decodes a given set of encoded codewords using LZ78. 30
31
LZ77: Sliding Window Lempel-Ziv
(o)ffset= search_ptr – match_ptr = 7 (l)ength= number of consecutive letters matched = 4 (c)odeword(r) Encoding Notes <o, l, c> = <7, 4, C(‘r’)> |search buff| = S, S + |LA buff| = |W(indow)| Page31 31
32
LZ77: Example (0,0,a) (1,1,c) (3,3,a) (0,0,b) (3,3,a) (1,1,a)
|LA buf| a c b (1,1,c) a c b (3,3,a) |search buff| a c b (0,0,b) O = 3 a c b (3,3,a) | search buff | | LA buff | O=1 a c b (1,1,a) (S)earch Buffer (size = 6) Page32 l (Length of matched Letters) Next character 32
33
LZ77 Decoding Decoder keeps the same dictionary window as encoder.
For each message it looks it up in the dictionary and inserts a copy (which one at the initialization?) What if l > o? (only part of the message is in the dictionary.) E.g. input = abcd, codeword = (2,9,e) Simply copy starting at the cursor for (i = 0; i < length; i++) out[cursor+i] = out[cursor-offset+i] Out = abcdcdcdcdcdce Page33 33
34
Comparison to Lempel-Ziv 78
Both LZ77 and LZ78 and their variants keep a “dictionary” of recent strings that have been seen. The differences are: How the dictionary is stored (LZ78 is a table! However, LZ77 is decoded online) How it is extended (both extends an existing entry by one character) How it is indexed (LZ78 indexes the nodes of the table) Page34 34
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.