Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.

Similar presentations


Presentation on theme: "Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less."— Presentation transcript:

1 Comp 335 File Structures Data Compression

2 Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less bits to transmit. “Matures” the computer science student!!

3 What is Data Compression? The “encoding” of data in a file in such a way as to take up less space. Two Basic Methods of Data Compression  REDUNDANCY REDUCTION  REPEATING SEQUENCE SUPPRESSION

4 Redundancy Reduction Decreasing the number of bits to represent the data Example: Assume a record maintains a STATE field which could contain the abbreviation for the fifty states. This necessitates a two byte field.

5 Redundancy Reduction There are only fifty possible values which could be in this STATE field. A two byte field is 16 bits. A bit savings could be created by using x number of bits to represent the 50 different possibilites. How many bits would we need? The answer is 6….Why?

6 Redundancy Reduction We exchange for 16 bits for 6…SO WHAT IS THE BIG DEAL? Suppose a file contains 1,000,000 records where each record has this field. How much space savings would be achieved if each state field were 6 bits instead of 16. 10,000,000 bits or 1,250,000 bytes.

7 Repeating Sequence Suppression Used with files which have the characteristic of byte patterns which repeat. Graphic files such a bitmaps are good candidates for using this type of compression. Run-length encoding is a method of suppressing these repeating codes.

8 Run-Length Encoding Replace repeating sequences with three (or four) bytes. Byte 1 – Run length code indicator Byte 2 – Byte code which is repeated Byte 3 – number of times byte code repeated. (With only one byte, a code cannot be repeated more than 255 times)

9 Fixed and Variable Length Encoding This is a type of redundancy reduction. The idea is to take a file and examine the byte codes which exist. (There can only be 256 different byte codes.) Fixed length encoding will create a special encoding for each byte code using only the number of bits necessary to represent the total number of unique byte codes. Each new encoding will be the same number of bits.

10 Fixed and Variable Length Encoding Variable length encoding will do the same as fixed length except each new encoding will be a variable number of bits. The Huffman Algorithm is a very popular type of data compression algorithm and is a variable length encoding algorithm. We will be studying in depth this algorithm. However, Morse code, is the oldest and most common of variable length encoding schemes.

11 Lossy vs Lossless Compression Lossless – a type of data compression algorithm which compresses a file in a manner that will allow a perfect reconstruction of the original when decompressed. It is used in the popular ZIP format as well as the Unix tool gzip. Typical examples are file executables, source code, and some image file formats, notably PNG. The Huffman algorithm is a lossless algorithm. Lossy – this type of algorithm will not enable a prefect reconstruction of the original file compressed. Most often used from compressing sound, video, and images.


Download ppt "Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less."

Similar presentations


Ads by Google