Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Analysis of Algorithms Chapter - 08 Data Compression.

Similar presentations


Presentation on theme: "1 Analysis of Algorithms Chapter - 08 Data Compression."— Presentation transcript:

1 1 Analysis of Algorithms Chapter - 08 Data Compression

2 2 This Chapter Contains the following Topics: 1. Why Data Compression? 2. Lossless and Lossy Compression 3. Fixed-Length Coding 4. Variable-Length Coding 5. Huffman Coding

3 3 Why Data Compression?  What is data compression?  Transformation of data into a more compact form.  Transfer rate of compressed data is more than the uncompressed data.  Why compress data?  Saves storage space.  Saves transmission time over a network.  Examples:  Suppose ASCII code of a character is 1 byte.  Suppose we have a text file containing one hundred instances of ‘a’.  So, file size would be about 100 bytes.  Let us store this as “100a” in a new file to convey the same information  New file size would be 4 bytes  4/100  96% saving

4 4 Lossless and Lossy Data Compression  Last example shows “lossless” compression.  Can retrieve original data by decompression.  Lossless compression used when data integrity is important.  Example software:  winzip, gzip, compress etc.  “Lossy” means original not retrievable.  Reduces size by permanently eliminating certain information.  When uncompressed, only a part of the original information is there (but the user may not notice it)  When can we use lossy compression?  For audio, images, video.  jpeg, mpeg etc. are example softwares.

5 5 Fixed- Length Coding  Coding:  Way to represent information  Two ways:  Fixed-Length and Variable-Length Coding.  The code for a character is a “codeword”.  We consider binary codes, each character represented by a unique binary codeword.  Fixed-length coding  Length of codeword of each character same  E.g., ASCII, Unicode etc.  Suppose there are n characters  What is the minimum number of bits needed for fixed-length coding?   log 2 n   Example:  {a, b, c, d, e}; 5 characters   log 2 5  =  2.3…  = 3 bits per character  We can have codewords: a=000, b=001, c=010, d=011, e=100.

6 6 Variable-Length Coding  Length of codewords may differ from character to character.  Frequent characters get short codewords.  Infrequent ones get long codewords.  Example: abcdef Frequency4613121685 Codeword010110011111011100  Make sure that a codeword does not occur as the prefix of another codeword  What we need is a “prefix-free code”.  Last example is a prefix-free code  Prefix-free codes give unique decoding  E.g., “001011101” is decoded as “aabe” based on the table in last example  Huffman coding algorithm shows how to obtain prefix-free codes.

7 7 Huffman Coding Algorithm  Huffman invented a greedy method to construct an optimal prefix-free variable-length code  Code based on frequency of occurrence  Optimal code given by a full binary tree  Every internal node has 2 children  If |C| is the size of alphabet,, there are |C| leaves and |C|-1 internal nodes  We build the tree bottom-up  Begin with |C| leaves  Perform |C|-1 “merging” operations  Let f [c] denote frequency of character c  We use a priority queue Q in which high priority means low frequency  GetMin(Q) removes element with the lowest frequency and returns it

8 8 An Algorithm Input: Alphabet C and frequencies f [ ] Result: Optimal coding tree for C Algorithm Huffman(C, f) {n := |C|; Q := C; for i := 1 to n-1 do { z := NewNode( ); x := z.left := GetMin(Q); y := z.right := GetMin(Q); f [z] := f [x] + f [y]; Insert(Q, z); } return GetMin(Q); }  Running time is O(n lg n)

9 9 Example  Obtain the optimal coding for the following using the Huffman Algorithm Characterabcdef Frequency4513121695

10 10 Example (Contd.)

11 11 End of Chapter - 07


Download ppt "1 Analysis of Algorithms Chapter - 08 Data Compression."

Similar presentations


Ads by Google