Presentation on theme: "Www.fakengineer.com Data compression. www.fakengineer.com INTRODUCTION If you download many programs and files off the Internet, we have probably encountered."— Presentation transcript:
INTRODUCTION If you download many programs and files off the Internet, we have probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users, because it reduces the overall number of bits and bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on a disk. The technique behind these ZIP files is known asData compression".
How data compression works? It is based on the following processes: Finding Redundancy Let us take an example: In John F. Kennedy's 1961 inaugural address, he delivered this famous line: " Ask not what your country can do for you --ask what you can do for your country. " The quote has 17 words, made up of 61 letters, 16 spaces, one dash and one period. If each letter, space or punctuation mark takes up one unit of memory, we get a total file size of 79 units.
How data compression works? To get the file size down, we need to look for redundancies. In the above quote the words ask, what, your, country, can, do, for, you appear two times. That means nine words -- ask, not, what, your, country, can, do, for, you -- give us almost everything we need for the entire quote. To construct the second half of the phrase, we just point to the words in the first half and fill in the spaces and punctuation.
Looking it Up In this step we pick out the words that are repeated and put them into the numbered index. Our sentence now reads: "1 not " Words Numbered Index ask1 what2 your3 country4 can5 do6 for7 you8
Searching for Patterns The phrase "can do for" is repeated, one time followed by "your" and one time followed by "you," giving us a repeated pattern of "can do for you." This lets us write 15 characters (including spaces), while "your country" only lets us write 13 characters (with spaces), so the program would overwrite the "your country" entry as just r country," and then write a separate entry for "can do for you."
Searching for Patterns Using the patterns we picked out above, and adding -" for spaces, we come up with this larger dictionary: And the quote converted to this smaller sentence: Words with space Index Number ask___1 what___2 you3 r_country4 _can_do_for_ you 5 "1 not ___ 2345 ___ -- __12354
Data compression methods Data compression Lossless Methods Lossy Methods
Lossless compression The following are some of the techniques used in lossless compression. Run- Length Encoding: When data containing strings of repeated symbols (such as bits or characters), the strings can be replaced by a special marker, followed by the repeated symbol, followed by the number of occurrences.
Run- Length Encoding Figure2(a): Original data 5726#409321# #015 The symbol 4 is repeated 09 times. The symbol 3 is repeated 19 times. The symbol 0 is repeated 15 times. Figure2(b): Compressed data The symbol # is the marker
Statistical Compression The three common encoding system using this principle are Morse code, Huffman encoding and Lempel-Ziv-Welch encoding. Morse code: It uses variable length combination of mark (dash) space (dot) to encode data. One-symbol code represents the most frequent characters and five- symbol codes represent the least frequent characters. Example dot (.) represents the character E and four dashes and a dot ( --.--) represent the character Q.
Huffman Encoding E:0T:1 A:00I:01M:10 N:11 C:000D:001G:010 K:011 O:100 R:101 S:110 U:111 Figure3: Bit assignment based on frequency of characters Code sent E I G O T U E First interpretation A M R E I T S D G O U M Second interpretation Third interpretation Figure4: Multiple interpretations of transmitted data
Lempel-Ziv-Welch Encoding The LZW method of compressing data is an evolution of the method originally created by Abraham Lempel and Jacob Ziv The compression which takes place at the sender site, has the following components: a dictionary, a buffer, and an algorithm.
Lempel-Ziv-Welch 123 ABC Figure 5: Original dictionary for a three symbol text. Buffer Dictionary Buffer Strings to dictionary Symbols from the text Codes sent Figure 6: Buffer at the compression site
Compression algorithm Figure 7 shows the flow chart for the compression algorithm.
Decompression algorithm The decompression process, which takes place at the receiver site, uses the same components at the compression process. Dictionary A very interesting point is that the sender does not send dictionary created by the compression process; instead, the dictionary will be created at receiver site and, surprisingly, it is the exact replica of the dictionary created at the sender site. Buffers BufferTemporary buffer String to dictionary Symbols to be printed Figure8: Buffers at the decompression site Codes received
Decompression algorithm Figure9 shows the flowchart for the decompression algorithm.
Lossy compression If the decompressed data is not an exact replica of original information but something very close, we can use a lossy data compression method. Several methods have been developed using lossy compression techniques. Joint photographic experts group (JPEG) is used to compress pictures and graphics. Motion picture experts group (MPEG) is used to compress video.
Conclusion With technologies developing at a rapid rate new data compression methods are arising. One of them is JBIG (Joint Hi- Level Image Experts Group). It is made for image compressions and is a lossless method. Using artificial neural network the data compression techniques are also developing.