Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithm Programming 1 89-210 Some Topics in Compression Bar-Ilan University 2007-2008 תשס"ח by Moshe Fresko.

Similar presentations


Presentation on theme: "Algorithm Programming 1 89-210 Some Topics in Compression Bar-Ilan University 2007-2008 תשס"ח by Moshe Fresko."— Presentation transcript:

1 Algorithm Programming 1 89-210 Some Topics in Compression Bar-Ilan University 2007-2008 תשס"ח by Moshe Fresko

2 Huffman Coding  Variable-length encoding  Works on probabilities of symbols (characters, words, etc.)  Build a tree Get two least frequent symbols/nodes Join them into a parent node Parent node ’ s frequency is sum of child nodes ’ Continue until the tree contains all nodes and symbols The path of a leaf indicates its code  Frequent symbols are near the root giving them short codes

3 LZ77  Introduced in 1977 by Abraham Lempel and Jacob Ziv  Dictionary based  Works in a window size n  Decoding is easy and fast (but not Encoding)  Produces a list of tuples (Pos,Len,C) Pos : Position backwards from the current position Len : Number of symbols to be taken C : Next character

4 LZ77  Based on strings that repeat themselves An outcry in Spain is an outcry in vain An outcry in Spa(6,3)is a(22,12)v(21,3) aaaaaaaaaa a(1,9)

5 LZ77 - Example  Window size : 5  ABBABCABBBBC NextSeqCode A(0,0,A) B(0,0,B) BA(1,1,A) BC(3,1,C) ABB(3,2,B) BBC(2,2,C)

6 LZ77 - Some Variations  LZSS - A flag bit for distinguishing pointers from the other items.  LZR - No limit on the pointer size.  LZH - Compress the pointers in Huffman coding.

7 LZ78  Instead of a window to previously seen text, a dictionary of phrases will be build  Both encoding and decoding are simple From the current position in the text, find the longest phrase that is found in the dictionary Output the pair (Index,NextChar)  Index : The dictionary phrase of that index  NextChar : The next character after that phrase Add to the dictionary the new phrase by appending the next character

8 LZ78 - Example  ABBABCABBBBC InputOutputAdd to dictionary A(0,A)1 = “ A ” B(0,B)2 = “ B ” BA(2,A)3 = “ BA ” BC(2,C)4 = “ BC ” AB(1,B)5 = “ AB ” BB(2,B)6 = “ BB ” BC(4,EOLN)  Dictionary size

9 LZW  Produces only a list of dictionary entry indexes  Encoding 1. Starts with initial dictionary  For example, possible ascii characters (0..255) 2. From the input, find the longest string that exists in the dictionary 3. Output this string ’ s index in the dictionary 4. Append the next character in the input to that string and add it into the dictionary 5. Continue from that character on from (2)

10 LZW - Example  ABBABCABBBBC Initial dictionary 0= “ A ”, 1= “ B ”, 2= “ C ” InputNextCharOutputAdd to dictionary AB03 = “ AB ” BB14 = “ BB ” BA15 = “ BA ” ABC36 = “ ABC ” CA27 = “ CA ” ABB38 = “ ABB ” BBB49 = “ BBB ” BC110 = “ BC ” C-2-  Dictionary size : ?

11 LZW – Encoding Example  T=ababcbababaaaaaaa  Initial Dictionary Entries :1=a2=b3=c InputOutput NextSymbolAdd To Dictionary a 1b 4 = ab b 2a 5 = ba ab 4c 6 = abc c 3b 7 = cb ba 5b 8 = bab bab 8a 9 = baba a 1a10 = aa aa10a11 = aaa aaa11a12= aaaa a 1- -

12 LZW – Encoding Algorithm w = Empty while ( read next symbol k ) { if wk exists in the dictionary w = wk else add wk to the dictionary; output the code for w; w = k; }

13 LZW – Decoding Algorithm read a code k output dictionary entry for k w = k while ( read a code k ) { entry = dictionary entry for k output entry add w + entry[0] to dictionary w = entry }

14 LZW – Decoding  There is a special case problem with the previous algorithm It can be confronted on every decoding process of a big file It is the case where the index number read is not in the dictionary yet Example : ABABABA Initially : A=1,B=2 Output=1 2 3 5 In decoding above algorithm will not find the dictionary entry ABA=5 An additional small check will solve the problem  Be careful to do it in the Exercise 3

15 LZW – Dictionary Length  Dictionary length Typically : 14 bits = 16384 entries (first 256 of them are single bytes) What if we are out of dictionary length 1. Don ’ t add to the dictionary any more 2. Delete the whole dictionary (This will be used in the exercise) 3. LRU : Throw those that are not used recently 4. Monitor performance, and flush dictionary when the performance is poor. 5. Double the dictionary size


Download ppt "Algorithm Programming 1 89-210 Some Topics in Compression Bar-Ilan University 2007-2008 תשס"ח by Moshe Fresko."

Similar presentations


Ads by Google