Presentation is loading. Please wait.

Presentation is loading. Please wait.

 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.

Similar presentations


Presentation on theme: " The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files."— Presentation transcript:

1

2  The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files are compressed to save space or for faster transmission

3  The simplest type of redundancy in a file is long runs of repeated characters  AAAABBBAABBBBBCCCCCCCC  This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count  4A3B2A5B8C  For binary files a refined version of this method can yield dramatic savings

4  Suppose we wish to encode  ABRACADABRA  Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3?  A = 000000 001 100 000 010 000 011 000 001 100 000  B = 001  C = 010  D = 011  R = 100

5  Why use the same number of bits for each letter?  A = 00 1 11 0 01 0 10 0 1 11 0  B = 1  C = 01  D = 10  R = 11  This is not really a code because it depends on the blanks  011100101001110

6  A slightly different code  A = 1  B = 010  C = 000  D = 001  R = 011  Can you decode this without the blanks?  0001010

7  A slightly different code  A = 1  C = 000  D = 001  B = 010  R = 011  Why can you decode without having the blanks?

8  A (5) = 1  C (1) = 000  D (1) = 001  B (2) = 010  R (2) = 011  What do you notice about the number of bits used to represent each character? A 0 00 1 11 CDBR 0 1

9  The general method for finding this code was developed by D. Huffman in 1952  Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code  The most common source symbols using shorter strings of bits than are used for less common source symbols  Used in many compression programs

10  Start with your text  GO GO TIGERS  Build a frequency table CharacterFrequency G3 O2 2 T1 I1 E1 R1 S1

11  Create a tree using two of the characters that appear least often  Merge them in the table  Repeat until everything is merged


Download ppt " The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files."

Similar presentations


Ads by Google