Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 1501: Algorithm Implementation

Similar presentations


Presentation on theme: "CS 1501: Algorithm Implementation"— Presentation transcript:

1 CS 1501: Algorithm Implementation
LZW Data Compression

2 Data Compression Reduce the size of data
Reduces storage space and hence the cost Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size Retrieval time from disk or Internet.

3 Lossless and Lossy Compression
compressedData = compress(originalData) decompressedData = decompress(compressedData) Lossless compression  originalData = decompressedData Lossy compression  originalData  decompressedData

4 Lossless and Lossy Compression
Lossy compressors generally obtain much higher compression ratios as compared to lossless compressors e.g vs. 2 Lossless compression is essential in applications such as text file compression. Lossy compression is acceptable in many audio and imaging applications In video transmission, a slight loss in the transmitted video is not noticable by the human eye. In some applications, such as medical images, local laws may forbid the use of lossy compression even though you may not be able to visually see the difference between the original and lossy images.

5 Text Compression LZW Compression Lossless compression is essential
Popular text compressors such as zip and Unix’s compress are based on the LZW (Lempel-Ziv-Welch) method. LZW Compression Character sequences in the original text are replaced by codes that are dynamically determined. The code table is not encoded into the compressed text, because it may be reconstructed from the compressed text during decompression.

6 LZW Compression Assume the letters in the text are limited to {a, b}
In practice, the alphabet may be the 256 character ASCII set. The characters in the alphabet are assigned code numbers beginning at 0 The initial code table is: code key a 1 b

7 Compression public static void compress() {
String input = BinaryStdIn.readString(); TST<Integer> st = new TST<Integer>(); for (int i = 0; i < R; i++) st.put("" + (char) i, i); int code = R+1; // R is codeword for EOF while (input.length() > 0) { String s = st.longestPrefixOf(input); // Find max prefix match s. BinaryStdOut.write(st.get(s), W); // Print s's encoding. int t = s.length(); if (t < input.length() && code < L) // Add s to symbol table. st.put(input.substring(0, t + 1), code++); input = input.substring(t); // Scan past s in input. } BinaryStdOut.write(R, W); BinaryStdOut.close(); Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).

8 LZW Compression code key a 1 b Original text = abababbabaabbabbaabba
a 1 b Original text = abababbabaabbabbaabba Compression is done by scanning the original text from left to right. Find longest prefix p for which there is a code in the code table. Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text to be compressed.

9 LZW Compression Original text = abababbabaabbabbaabba
code key a 1 b 2 ab Original text = abababbabaabbabbaabba p = a pCode = Compressed text = 0 c = b Enter p | c = ab into the code table.

10 Original text = abababbabaabbabbaabba Compressed text = 0
code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = 0 p = b pCode = Compressed text = 01 c = a Enter p | c = ba into the code table.

11 Original text = abababbabaabbabbaabba Compressed text = 01
code key a 1 b 2 ab 3 4 aba ba Original text = abababbabaabbabbaabba Compressed text = 01 p = ab pCode = Compressed text = 012 c = a Enter aba into the code table

12 Original text = abababbabaabbabbaabba Compressed text = 012
code key a 1 b 2 ab 3 4 5 abb ba aba Original text = abababbabaabbabbaabba Compressed text = 012 p = ab pCode = Compressed text = 0122 c = b Enter abb into the code table.

13 Original text = abababbabaabbabbaabba Compressed text = 0122
code key a 1 b 2 ab 3 4 5 6 bab ba aba abb Original text = abababbabaabbabbaabba Compressed text = 0122 p = ba pCode = Compressed text = 01223 c = b Enter bab into the code table.

14 Original text = abababbabaabbabbaabba Compressed text = 01223
code key a 1 b 2 ab 3 4 5 6 bab 7 baa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 01223 p = ba pCode = Compressed text = c = a Enter baa into the code table.

15 Original text = abababbabaabbabbaabba Compressed text = 012233
code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba ba aba abb Original text = abababbabaabbabbaabba Compressed text = p = abb pCode = Compressed text = c = a Enter abba into the code table.

16 Original text = abababbabaabbabbaabba Compressed text = 0122335
code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = p = abba pCode = 8 Compressed text = c = a Enter abbaa into the code table.

17 Original text = abababbabaabbabbaabba Compressed text = 01223358
code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = p = abba pCode = Compressed text = c = null No need to enter anything to the table

18 Code Table Representation
key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (key, element) = (key, code). Operations are: get(key) and put(key, code) Limit number of codes to 212 Use a hash table Convert variable length keys into fixed length keys. Each key has the form pc, where the string p is a key that is already in the table. Replace pc with (pCode)c

19 Code Table Representation
key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba 9 abbaa 0b 1a 2a 2b 3b 3a 5a 8a

20 LZW Decompression Original text = abababbabaabbabbaabba
code key a 1 b Original text = abababbabaabbabbaabba Compressed text = Convert codes to text from left to right. pCode=0 represents p=a DecompressedText = a

21 Expand public static void expand() { String[] st = new String[L];
int i; // next available codeword value // initialize symbol table with all 1-character strings for (i = 0; i < R; i++) st[i] = "" + (char) i; st[i++] = ""; // (unused) lookahead for EOF int codeword = BinaryStdIn.readInt(W); if (codeword == R) return; // expanded message is empty string String val = st[codeword]; while (true) { BinaryStdOut.write(val); codeword = BinaryStdIn.readInt(W); if (codeword == R) break; String s = st[codeword]; if (i == codeword) s = val + val.charAt(0); // special case hack if (i < L) st[i++] = val + s.charAt(0); val = s; } BinaryStdOut.close(); Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).

22 LZW Decompression Original text = abababbabaabbabbaabba
code key a 1 b 2 ab Original text = abababbabaabbabbaabba Compressed text = pCode=1 represents p = b DecompressedText = ab lastP = a followed by first character of p is entered into the code table (a | b)

23 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = pCode = 2 represents p=ab DecompressedText = abab lastP = b followed by first character of p is entered into the code table (b | a)

24 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba Original text = abababbabaabbabbaabba Compressed text = pCode = 2 represents p = ab DecompressedText = ababab. lastP = ab followed by first character of p is entered into the code table ( ab | a )

25 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba 5 abb Original text = abababbabaabbabbaabba Compressed text = pCode = 3 represents p = ba DecompressedText = abababba. lastP = ab followed by first character of p is entered into the code table

26 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab Original text = abababbabaabbabbaabba Compressed text = pCode = 3 represents p = ba DecompressedText = abababbaba. lastP = ba followed by first character of p is entered into the code table.

27 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa Original text = abababbabaabbabbaabba Compressed text = pCode = 5 represents p = abb DecompressedText = abababbabaabb. lastP = ba followed by first character of p is entered into the code table.

28 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba Original text = abababbabaabbabbaabba Compressed text = 8 represents ??? When a code is not in the table (then, it is the last one entered), and its key is lastP followed by first character of lastP lastP = abb So 8 represents p = abba Decompressed text = abababbabaabbabba

29 Original text = abababbabaabbabbaabba Compressed text = 012233588
code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 9 abbaa abba Original text = abababbabaabbabbaabba Compressed text = pCode= 8 represents p=abba DecompressedText=abababbabaabbabbaabba. lastP = abba followed by first character of p is entered into the code table ( abba | a )

30 Code Table Representation
key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (code, what the code represents) = (code, key). Operations are : get(code, key) and put(key) Code is an integer 0, 1, 2, … Use a 1D array codeTable. E.g. codeTable[code] = Key Each key has the form pc, where the string p is a codeKey that is already in the table; therefore you may replace pc with (pCode)c

31 Time Complexity Compression Decompression Expected time = O(n)
where n is the length of the text that is being compressed. Decompression O(n), where n is the length of the decompressed text. Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).


Download ppt "CS 1501: Algorithm Implementation"

Similar presentations


Ads by Google