Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing.

Similar presentations


Presentation on theme: "Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing."— Presentation transcript:

1 Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing your notes.

2 How is data stored in a computer? Several different ways are possible In an ASCII-encoded text file (hithero known as a “text file”), each character consists of 8- bit chunks What’s a bit?

3 Bits Mean What You Want Them To Mean 01001101 01001001 01001011 01000101 MI K E But why 8-bit chunks? Couldn’t we interpret the same string as 4 bit chunks? 0100 1101 0100 1001 0100 1011 0100 0101 4134941145 With 8 bit chucks there are 2 8 (256) possible characters. With 4 bit chunks there are 2 4 (16) possible characters.

4 What if we want to save memory? Could just use fewer bits per chunk, but that limits the number of characters we can represent But my document contains a lot more ‘a’s than ‘%’s. What if the code for a was “01” and the code for % was “10000101”

5 A Problem Mike decides he wants to compress his vast collection of Harry Potter fanfic (all stored a textfiles). He notices that the character ‘H’ occurs much more frequently in the textfiles than does the character ‘%’, but they both use 8 bits. So he changes the encoding so that ‘H’ is represented by ‘01’ (2 bits) and ‘%’ is represented by ‘1010101010101010’ (16 bits). Assuming all other characters stay at 8 bits (a bit unlikely…but just pretend)…how much more frequent do ‘H’s need to be than ‘%’s for there to be a space savings? 1.There must be more than 3Hs for every %s 2.There must be more than 3Hs for every 2%s 3.There must be more than 4Hs for every 3%s 4.As long as there more Hs than %s you get a savings

6 We can get savings if we know what characters occur frequently So imagine you look at some set of data with character frequencies. How much can you save?

7 Consider the string: AACCCAAABAADAE LetterFrequency A8 C3 B1 D1 E1 What’s the best encoding?

8 Why will this encoding not work? LetterFrequencyEncoding A80 C31 B110 D101 E111 What is the encoding for AAE? What is the encoding for ADC?

9 Consider the string: AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 LetterFrequencyEncoding A8? C3? B1? D1? E1? Work with your row to find the most efficient possible encoding for each character. I was able to encode the entire string in 25 bits. See if you can do it that efficiently – but be careful with prefixes. Write your encoding, plus your computation of its total length, on your handin sheet.

10 My Encoding AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 LetterFrequencyEncoding A80 C311 B1100 D11010 E11011 There a several possible encodings that can give you 25. I’m pretty sure there’s no more efficient encoding, at least as far as compressing individual characters.

11 Given a set of frequencies, how should you select the character encoding for maximum efficiency? Huffman Encoding

12 The basic idea We will make a “huffman tree”. By repeatedly combining nodes with small frequencies into “meta nodes” with larger frequency A letter’s position in the tree will tell us what its encoded form is

13 Generate a huffman tree and encodings for this circumstance Follow the tutorial linked off the resources section in Sakai Letter2FrequencyEncoding A12? C9? B8? D8? E5? F5? Write the huffman tree and the resultant encoding on your handin sheet.

14 The Header The Magic Number Counts Or a tree for extra credit

15 The Psuedo-EOF character A made up character to write to your file

16 Please turn your sheet in at the back of the room (try for a neat pile)


Download ppt "Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing."

Similar presentations


Ads by Google