Download presentation
Presentation is loading. Please wait.
Published byCatherine Hardy Modified over 9 years ago
1
1 Analysis of Algorithms Chapter - 08 Data Compression
2
2 This Chapter Contains the following Topics: 1. Why Data Compression? 2. Lossless and Lossy Compression 3. Fixed-Length Coding 4. Variable-Length Coding 5. Huffman Coding
3
3 Why Data Compression? What is data compression? Transformation of data into a more compact form. Transfer rate of compressed data is more than the uncompressed data. Why compress data? Saves storage space. Saves transmission time over a network. Examples: Suppose ASCII code of a character is 1 byte. Suppose we have a text file containing one hundred instances of ‘a’. So, file size would be about 100 bytes. Let us store this as “100a” in a new file to convey the same information New file size would be 4 bytes 4/100 96% saving
4
4 Lossless and Lossy Data Compression Last example shows “lossless” compression. Can retrieve original data by decompression. Lossless compression used when data integrity is important. Example software: winzip, gzip, compress etc. “Lossy” means original not retrievable. Reduces size by permanently eliminating certain information. When uncompressed, only a part of the original information is there (but the user may not notice it) When can we use lossy compression? For audio, images, video. jpeg, mpeg etc. are example softwares.
5
5 Fixed- Length Coding Coding: Way to represent information Two ways: Fixed-Length and Variable-Length Coding. The code for a character is a “codeword”. We consider binary codes, each character represented by a unique binary codeword. Fixed-length coding Length of codeword of each character same E.g., ASCII, Unicode etc. Suppose there are n characters What is the minimum number of bits needed for fixed-length coding? log 2 n Example: {a, b, c, d, e}; 5 characters log 2 5 = 2.3… = 3 bits per character We can have codewords: a=000, b=001, c=010, d=011, e=100.
6
6 Variable-Length Coding Length of codewords may differ from character to character. Frequent characters get short codewords. Infrequent ones get long codewords. Example: abcdef Frequency4613121685 Codeword010110011111011100 Make sure that a codeword does not occur as the prefix of another codeword What we need is a “prefix-free code”. Last example is a prefix-free code Prefix-free codes give unique decoding E.g., “001011101” is decoded as “aabe” based on the table in last example Huffman coding algorithm shows how to obtain prefix-free codes.
7
7 Huffman Coding Algorithm Huffman invented a greedy method to construct an optimal prefix-free variable-length code Code based on frequency of occurrence Optimal code given by a full binary tree Every internal node has 2 children If |C| is the size of alphabet,, there are |C| leaves and |C|-1 internal nodes We build the tree bottom-up Begin with |C| leaves Perform |C|-1 “merging” operations Let f [c] denote frequency of character c We use a priority queue Q in which high priority means low frequency GetMin(Q) removes element with the lowest frequency and returns it
8
8 An Algorithm Input: Alphabet C and frequencies f [ ] Result: Optimal coding tree for C Algorithm Huffman(C, f) {n := |C|; Q := C; for i := 1 to n-1 do { z := NewNode( ); x := z.left := GetMin(Q); y := z.right := GetMin(Q); f [z] := f [x] + f [y]; Insert(Q, z); } return GetMin(Q); } Running time is O(n lg n)
9
9 Example Obtain the optimal coding for the following using the Huffman Algorithm Characterabcdef Frequency4513121695
10
10 Example (Contd.)
11
11 End of Chapter - 07
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.