Lempel-Ziv Compression Techniques

Slides:



Advertisements
Similar presentations
15-583:Algorithms in the Real World
Advertisements

Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
Greedy Algorithms (Huffman Coding)
Lempel-Ziv-Welch (LZW) Compression Algorithm
Algorithms for Data Compression
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Huffman Encoding 16-Apr-17.
Introduction to Data Compression
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
1 Lempel-Ziv algorithms Burrows-Wheeler Data Compression.
A Data Compression Algorithm: Huffman Compression
Algorithm Programming Some Topics in Compression
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Data Compression Basics & Huffman Coding
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Dale & Lewis Chapter 3 Data Representation
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
compress! From theoretical viewpoint...
Lempel-Ziv-Welch Compression
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
CS 1501: Algorithm Implementation
Computer Sciences Department1. 2 Data Compression and techniques.
Page 1 Algorithms in the Real World Lempel-Ziv Burroughs-Wheeler ACB.
Data Compression: Huffman Coding in Weiss (p.389)
Unit 13 Data Compression King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
CSE 589 Applied Algorithms Spring 1999
Textbook does not really deal with compression.
Compression & Huffman Codes
Data Compression.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression Algorithm
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Lempel-Ziv Compression Techniques
Chapter 11 Data Compression
Table 3. Decompression process using LZW
CPS 296.3:Algorithms in the Real World
Lempel-Ziv-Welch (LZW) Compression Algorithm
Presentation transcript:

Lempel-Ziv Compression Techniques Introduction to Run-Length Encoding Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 Example 1: Encoding using LZ78 Example 2: Decoding using LZ78

Introduction to Run-Length Encoding A run is defined as a sequence of identical characters. Run-length encoding assumes data has many runs. Each character that repeats itself in a sequence three or more times is replaced by the single character and its frequency. Data to be transmitted is first scanned to find the coding information before transmission. For example, the message AAAABBBAABBBBBCCCCCCCCDABCBAAABBBBCCCD is replaced by 4A3BAA5B8CDABCB3A4B3CD Saving can be dramatic when long runs are present.

Run-Length Encoding: Possible Problems A problem arises if one of the characters being transmitted is a digit, as in 11111111111544444 which is represented as 1111554 (eleven 1s, one 5, five 4s). A solution is instead of using a number n, a character can be used whose ASCII value is n. For example, the run of 43 consecutive letters “c” is represented as +c (“+” has ASSCII code 43), and the run of 49 1s is coded as 11 (“1” has ASCII code 49). This method is only modestly efficient for text files and widely used for encoding fax images. The main advantage of this algorithm is simplicity and speed.

Introduction to Lempel-Ziv Encoding Data compression up until the late 1970's mainly directed towards creating better methodologies for Huffman coding. An innovative, radically different method was introduced in1977 by Abraham Lempel and Jacob Ziv. This technique (called Lempel-Ziv) actually consists of two considerably different algorithms, LZ77 and LZ78. Due to patents, LZ77 and LZ78 led to many variants: The zip and unzip use the LZH techique while UNIX's compress methods belong to the LZW and LZC classes. LZH LZB LZSS LZR LZ77 Variants LZFG LZJ LZMW LZT LZC LZW LZ78 Variants

LZ78 Compression Algorithm Start with empty dictionary P = empty WHILE (there more characters in the char stream) C = next character in the char stream IF (the string P+C in the dictionary) P = P+C ELSE //(if P is empty,zero is its codeword) output the code word corresponding to P output C add the string P+C to the dictionary IF (P is not empty) END

Example 1: LZ78 Compression 1 Example 1: Use the LZ78 algorithm to encode the message ABBCBCABABCAABCAAB Solution: The encoding process is presented below in which: The column STEP indicates the number of the encoding step. The column POS indicates the current position in the input data. The column DICTIONARY shows what string has been added to the dictionary. The index of the string is equal to the step number. The column OUTPUT presents the output in the form (W,C). W represents the index of prefix in the dictionary. The output of each step decodes to the string that has been added to the dictionary.

Example 1: LZ78 Compression Step 1 ABBCBCABABCAABCAAB POS = 1 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1

Example 1: LZ78 Compression Step 2 ABBCBCABABCAABCAAB POS = 2 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2

Example 1: LZ78 Compression Step 3 ABBCBCABABCAABCAAB POS = 3 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2 BC (2,C) 3 3

Example 1: LZ78 Compression Step 4 ABBCBCABABCAABCAAB POS = 5 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2 BC (2,C) 3 3 BCA (3,A) 4 5

Example 1: LZ78 Compression Step 5 ABBCBCABABCAABCAAB POS = 8 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2 BC (2,C) 3 3 BCA (3,A) 4 5 BA (2,A) 5 8

Example 1: LZ78 Compression Step 6 ABBCBCABABCAABCAAB POS = 10 C = A P = BCA DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2 BC (2,C) 3 3 BCA (3,A) 4 5 BA (2,A) 5 8 BCAA (4,A) 6 10

Example 1: LZ78 Compression Step 7 ABBCBCABABCAABCAAB POS = 14 C = empty P = empty DICTIONARY OUTPUT STEP POS A (0,A) 1 1 B (0,B) 2 2 BC (2,C) 3 3 BCA (3,A) 4 5 BA (2,A) 5 8 BCAA (4,A) 6 10 BCAAB (6,B) 7 14

Example 1: Coded Information in Bits Now, we calculate the number of bits needed to represent the coded information <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> The number of bits needed to represent each integer with index n is at most equal to the number of bits needed to represent the (n-1)th index. For example, the number of bits needed to represent the integer 3 within index 4 is equal to 2, because it takes two bits to express 3 (the (4-1)th index) in binary. Thus, we see that the number of bits required when the string ABBCBCABABCAABCAAB is compressed is (1+8)+(1+8)+(2+8)+(2+8)+(2+8)+(3+8)+(3+8) = 70 bits

LZ78 Decompression Algorithm At the start the dictionary is empty While (more code words in the code stream) W = code word C = character following it output the string of W + C add the string of W + C to the dictionary end

Example 2: LZ78 Decompression Example 2: Use the above algorithm to decompress the compressed output sequence of Example 1, namely, <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)>

Example 2: LZ78 Decompression Step 1 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1

Example 2: LZ78 Decompression Step 2 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2

Example 2: LZ78 Decompression Step 3 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3

Example 2: LZ78 Decompression Step 4 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4

Example 2: LZ78 Decompression Step 5 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8

Example 2: LZ78 Decompression Step 6 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8 BCAA (4,A) 10 6

Example 2: LZ78 Decompression Step 7 <(0,A)><(0,B)><(2,C)><(3,A)><(2,A)><(4,A)><(6,B)> W=empty C=empty DICTIONARY OUTPUT INPUT POS STEP A (0,A) 1 B (0,B) 2 BC (2,C) 3 BCA (3,A) 5 4 BA (2,A) 8 BCAA (4,A) 10 6 BCAAB (6,B) 14 7

LZ78: Concluding Remarks LZ77 is based on a sliding window. Its output is very similar to that LZ78. The compression ratio of LZ77 is similar to that of LZ78. The biggest advantage of LZ78 over the LZ77 algorithm is the reduced number of string comparisons in each encoding step. More application areas of LZ77 and LZ78: LZ77: gzip, Squeeze, LHA, PKZIP, ZOO LZ78: compress, GIF, CCITT (modems), ARC, PAK

Exercises How many bits are required to represent the codeword (2,C) generated on page 11? How many bits are required to represent the associated string without compression? Use LZ78 to trace encoding the string SATATASACITASA. Write a Java program that encodes a given string using LZ78. Write a Java program that decodes a given set of encoded code words using LZ78.