Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lempel-Ziv Compression Techniques

Similar presentations


Presentation on theme: "Lempel-Ziv Compression Techniques"— Presentation transcript:

1 Lempel-Ziv Compression Techniques
Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using LZ78 Example 2: Decoding using LZ78

2 Introduction to Run-Length Encoding
A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs. Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency. Data to be transmitted is first scanned to find the coding information before transmission. For example, the message AAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD is replaced by 4A3BAA5B8CDABCB3A4B3CD Saving can be dramatic when long runs are present.

3 Run-Length Encoding: Possible Problems
A problem arises if one of the characters being transmitted is a digit, as in which is represented as (eleven 1s, one 5, five 4s). A solution is instead of using a number n, a character can be used whose ASCII value is n. For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49). This method is only modestly efficient for text files and widely used for encoding fax images. The main advantage of this algorithm is simplicity and speed.

4 Introduction to Lempel-Ziv Encoding
Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding. An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv. This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78. Due to patents, LZ77 and LZ78 led to many variants: The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes. LZH LZB LZSS LZR LZ77 Variants LZFG LZJ LZMW LZT LZC LZW LZ78 Variants

5 LZ78 Compression Algorithm
Start with empty dictionary P = empty WHILE (there more characters in the char stream) C = next character in the char stream IF (the string P+C in the dictionary) P = P+C ELSE //(if P is empty,zero is its codeword) output the code word corresponding to P output C add the string P+C to the dictionary IF (P is not empty) END

6 Example 1: LZ78 Compression 1
Example 1: Use the LZ78 algorithm to encode the message ABBCBCABABCAABCAAB Solution: The encoding process is presented below in which: The column STEP indicates the number of the encoding step. The column POS indicates the current position in the input data. The column DICTIONARY shows what string has been added to the dictionary. The index of the string is equal to the step number. The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary. The output of each step decodes to the string that has been added to the dictionary.

7 Example 1: LZ78 Compression Step 1
ABBCBCABABCAABCAAB POS = 1 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A)

8 Example 1: LZ78 Compression Step 2
ABBCBCABABCAABCAAB POS = 2 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) B (0,B)

9 Example 1: LZ78 Compression Step 3
ABBCBCABABCAABCAAB POS = 3 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) B (0,B) BC (2,C)

10 Example 1: LZ78 Compression Step 4
ABBCBCABABCAABCAAB POS = 5 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) B (0,B) BC (2,C) BCA (3,A)

11 Example 1: LZ78 Compression Step 5
ABBCBCABABCAABCAAB POS = 8 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) B (0,B) BC (2,C) BCA (3,A) BA (2,A)

12 Example 1: LZ78 Compression Step 6
ABBCBCABABCAABCAAB POS = 10 C = A P = BCA DICTIONARY OUTPUT STEP POS A (0,A) B (0,B) BC (2,C) BCA (3,A) BA (2,A) BCAA (4,A)

13 Example 1: LZ78 Compression Step 7
ABBCBCABABCAABCAAB POS = 14 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) B (0,B) BC (2,C) BCA (3,A) BA (2,A) BCAA (4,A) BCAAB (6,B)

14 Example 1: Coded Information in Bits
Now, we calculate the number of bits needed to represent the coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> The number of bits needed to represent each integer with index n is at most equal to the number of bits needed to represent the (n-1)th index. For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary. Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is (1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits

15 LZ78 Decompression Algorithm
At the start the dictionary is empty While (more code words in the code stream) W = code word C = character following it output the string of W + C add the string of W + C to the dictionary end

16 Example 2: LZ78 Decompression
Example 2: Use the above algorithm to decompress the compressed output sequence of Example 1, namely, <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

17 Example 2: LZ78 Decompression Step 1
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1

18 Example 2: LZ78 Decompression Step 2
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2

19 Example 2: LZ78 Decompression Step 3
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3

20 Example 2: LZ78 Decompression Step 4
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4

21 Example 2: LZ78 Decompression Step 5
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8

22 Example 2: LZ78 Decompression Step 6
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8 BCAA (4,A) 10 6

23 Example 2: LZ78 Decompression Step 7
<(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8 BCAA (4,A) 10 6 BCAAB (6,B) 14 7

24 LZ78: Concluding Remarks
LZ77 is based on a sliding window. Its output is very similar to that LZ78. The compression ratio of LZ77 is similar to that of LZ78. The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step. More application areas of LZ77 and LZ78: LZ77: gzip, Squeeze, LHA, PKZIP, ZOO LZ78: compress, GIF, CCITT (modems), ARC, PAK

25 Exercises How many bits are required to represent the codeword (2,C) generated on page 11? How many bits are required to represent the associated string without compression? Use LZ78 to trace encoding the string SATATASACITASA. Write a Java program that encodes a given string using LZ78. Write a Java program that decodes a given set of encoded code words using LZ78.


Download ppt "Lempel-Ziv Compression Techniques"

Similar presentations


Ads by Google