Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lecture 10: Dictionary Coding
Algorithms for Data Compression
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Introduction to Data Compression
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Lempel-Ziv Compression Techniques
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
1 Lempel-Ziv algorithms Burrows-Wheeler Data Compression.
Web Algorithmics Dictionary-based compressors. LZ77 Algorithm’s step: Output Advance by len + 1 A buffer “window” has fixed length and moves aacaacabcaaaaaa.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Data Compression Basics & Huffman Coding
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Huffman Codes Message consisting of five characters: a, b, c, d,e
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Basics of Data Compression Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression(2)
Lempel-Ziv methods.
compress! From theoretical viewpoint...
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
Index construction: Compression of documents Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading Managing-Gigabytes: pg 21-36, 52-56,
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
CS 1501: Algorithm Implementation
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Page 1 Algorithms in the Real World Lempel-Ziv Burroughs-Wheeler ACB.
Unit 13 Data Compression King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
CSE 589 Applied Algorithms Spring 1999
HUFFMAN CODES.
Data Compression.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Algorithms in the Real World
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Huffman Coding Greedy Algorithm
CPS 296.3:Algorithms in the Real World
Lempel-Ziv-Welch (LZW) Compression Algorithm
Presentation transcript:

Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg 1 1

Dictionary Techniques Statistical compression methods Use the statistical model of the data, and the quality of compression they achieve depends on how good that model is. Dictionary-based compression methods Do not use a statistical model The dictionary holds strings of symbols and it may be static or dynamic (adaptive). Static is permanent (allowing the addition of strings but no deletions). Dynamic holds strings previously found in the input stream, allowing for additions and deletions of strings as new input is being read. 2

Dictionary Techniques So far we assumed independent symbol with some statistical model: Not true for many common data types, e.g.: text, images, code the quality of compression they achieve depends on how good that model is Basic idea (ADAPTIVE) Identify frequent symbol patterns. Encode those frequent symbols more efficiently. Use these encoded pattern as a default (less efficient) encoding for the rest! Notes This looks reasonable for things like “text, image” … Also, it looks reasonable for things which is not very random. 3

Lempel-Ziv Compression Techniques Static coding requires two-passes: one pass to compute probabilities (or frequencies) and determine the mapping, and a second pass to encode. Examples of Static techniques: Static Huffman Coding All of the adaptive methods are one-pass methods; only one scan of the message (to encode) is required. Examples of adaptive techniques: LZ77?, LZ78?, and Adaptive Huffman Coding 4

Lempel-Ziv Compression Techniques LZ77 (Sliding Window) Variants: LZSS (Lempel-Ziv-Storer-Szymanski) Applications: gzip, Squeeze, LHA, PKZIP, ZOO LZ78 (Full Dictionary Based) Variants: LZW (Lempel-Ziv-Welch), LZC (Lempel-Ziv-Compress) Applications: compress, ARC , V.42bis (LZ78), zlib, GIF, CCITT (modems), PAK (LZ77) Traditionally: LZ77 was better but slower, but the gzip version is almost as fast as any LZ78. 5

Example Dictionary All/some letters from the alphabet + As many diagrams(pairs/group of letters) as possible Example: A= {a, b, c, d, r} 3 pairs: ab, ac, ad Encode the following: abracadabra 6

Example 7

Example 8

Example 9

Example 10

Example 11

Example 12

Example 13

Example 14

Example 15

LZ78 Compression Algorithm LZ78 inserts one- or multi-character, distinct patterns of the message to be encoded in a Dictionary. The multi-character patterns are of the form: C0C1 . . . Cn-1Cn. The prefix of a pattern refers to all the pattern characters except the last: C0C1 . . . Cn-1 LZ78 Output: Note: The dictionary is usually implemented as a hash table.

LZ78 Compression Algorithm (cont’d) Dictionary  empty ; Prefix  empty ; DictionaryIndex  1; while(characterStream is not empty) { Char  next character in characterStream; if(Prefix + Char exists in the Dictionary) Prefix  Prefix + Char ; else if(Prefix is empty) CodeWordForPrefix  0 ; CodeWordForPrefix  DictionaryIndex for Prefix ; Output: (CodeWordForPrefix, Char) ; insertInDictionary( ( DictionaryIndex , Prefix + Char) ); DictionaryIndex++ ; } if(Prefix is not empty) CodeWordForPrefix  DictionaryIndex for Prefix; Output: (CodeWordForPrefix , ) ;

Example 1: LZ78 Compression Encode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm. The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) Note: The above is just a representation, the commas and parentheses are not transmitted; we will discuss the actual form of the compressed message later!

Example 1: LZ78 Compression (cont’d) How to solve it :-) 1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary, BC is in the Dictionary, BCA is in the Dictionary, BCAA is in the Dictionary. However, BCAAB is not in the Dictionary; insert it.

Example 2: LZ78 Compression Encode (i.e., compress) the string BABAABRRRA using the LZ78 algorithm. The compressed message is: (0,B)(0,A)(1,A)(2,B)(0,R)(5,R)(2, )

Example 2: LZ78 Compression (cont’d) 1. B is not in the Dictionary; insert it 2. A is not in the Dictionary; insert it 3. B is in the Dictionary. BA is not in the Dictionary; insert it. 4. A is in the Dictionary. AB is not in the Dictionary; insert it. 5. R is not in the Dictionary; insert it. 6. R is in the Dictionary. RR is not in the Dictionary; insert it. 7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, )

Example 3: LZ78 Compression Encode (i.e., compress) the string AAAAAAAAA using the LZ78 algorithm. 1. A is not in the Dictionary; insert it 2. A is in the Dictionary AA is not in the Dictionary; insert it 3. A is in the Dictionary. AA is in the Dictionary. AAA is not in the Dictionary; insert it. 4. A is in the Dictionary. AAA is in the Dictionary and it is the last pattern; output a pair containing its index: (3, )

LZ78 Compression: Number of bits transmitted Example: Uncompressed String: ABBCBCABABCAABCAAB Number of bits = Total number of characters * 8 = 18 * 8 = 144 bits Suppose the codewords are indexed starting from 1: Compressed string( codewords): (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) Codeword index 1 2 3 4 5 6 7 Each code word consists of an integer and a character: The character is represented by 8 bits. The number of bits n required to represent the integer part of the codeword with index i is given by: or 0 Alternatively number of bits required to represent the integer part of the codeword with index i is the number of significant bits required to represent the integer i – 1

LZ78 Compression: Number of bits transmitted Codeword (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) index 1 2 3 4 5 6 7 Bits: (1 + 8) + (1 + 8) + (2 + 8) + (2 + 8) + (2 + 8) + (3 + 8) + (3 + 8) = 70 bits The actual compressed message is: 0A0B10C11A010A100A110B where each character is replaced by its binary 8-bit ASCII code.

output: currentCharacter else LZ78 Decompression Algorithm (self study) input: (CI, character) pairs output: if(CI == 0) output: currentCharacter else output: stringAtIndex CI + currentCharacter Insert: current output in dictionary

Example 1: LZ78 Decompression Decode (i.e., decompress) the sequence (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) The decompressed message is: ABBCBCABABCAABCAAB

Example 2: LZ78 Decompression Decode (i.e., decompress) the sequence (0, B) (0, A) (1, A) (2, B) (0, R) (5, R) (2, ) The decompressed message is: BABAABRRRA

Example 3: LZ78 Decompression Decode (i.e., decompress) the sequence (0, A) (1, A) (2, A) (3, ) The decompressed message is: AAAAAAAAA

Exercises 1. Use LZ78 to trace encoding the string SATATASACITASA. 2. Write a MATLAB program that encodes a given string using LZ78. 3. Write a MATLAB program that decodes a given set of encoded codewords using LZ78.

LZ77: Sliding Window Lempel-Ziv (o)ffset= search_ptr – match_ptr = 7 (l)ength= number of consecutive letters matched = 4 (c)odeword(r) Encoding Notes <o, l, c> = <7, 4, C(‘r’)> |search buff| = S, S + |LA buff| = |W(indow)| Page30

LZ77: Example (0,0,a) (1,1,c) (3,3,a) (0,0,b) (3,3,a) (1,1,a) |LA buf| a c b (1,1,c) a c b (3,3,a) |search buff| a c b (0,0,b) O = 3 a c b (3,3,a) | search buff | | LA buff | O=1 a c b (1,1,a) (S)earch Buffer (size = 6) Page31 l (Length of matched Letters) Next character

LZ77 Decoding Decoder keeps the same dictionary window as encoder. For each message it looks it up in the dictionary and inserts a copy (which one at the initialization?) What if l > o? (only part of the message is in the dictionary.) E.g. input = abcd, codeword = (2,9,e) Simply copy starting at the cursor for (i = 0; i < length; i++) out[cursor+i] = out[cursor-offset+i] Out = abcdcdcdcdcdce Page32

LZ77 Optimizations used by gzip LZSS: Output one of the following formats (0, position, length) or (1,char) Typically use the second format if length < 3. (1,a) a a c a a c a b c a b a a a c (1,a) a a c a a c a b c a b a a a c (1,c) a a c a a c a b c a b a a a c (0,3,4) a a c a a c a b c a b a a a c 15-853 Page33

Optimizations used by gzip (cont.) Huffman code the positions, lengths and chars (as an outer code) Non greedy: possibly use shorter match so that next match is better Use hash table to store dictionary: Hash is based on strings of length 3. Find the longest match within the correct hash bucket. Limit on length of search. Store within bucket in order of position 15-853 Page34

Theory behind LZ77 The Problem: “long enough” is really really long. Sliding Window LZ is Asymptotically Optimal [Wyner-Ziv,94] Will compress “long enough” strings to the source entropy as the window size goes to infinity. The Problem: “long enough” is really really long. Page35