Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greedy Algorithms (Huffman Coding)

Similar presentations


Presentation on theme: "Greedy Algorithms (Huffman Coding)"— Presentation transcript:

1 Greedy Algorithms (Huffman Coding)

2 Huffman Coding A technique to compress data effectively
Compressed file Huffman coding Original file A technique to compress data effectively Usually between 20%-90% compression Lossless compression No information is lost When decompress, you get the original file

3 Huffman Coding: Applications
Compressed file Original file Saving space Store compressed files instead of original files Transmitting files or data Send compressed data to save transmission time and power Encryption and decryption Cannot read the compressed file without knowing the “key”

4 Main Idea: Frequency-Based Encoding
Assume in this file only 6 characters appear E, A, C, T, K, N The frequencies are: Character Frequency E 10,000 A 4,000 C 300 T 200 K 100 N Option I (No Compression) Each character = 1 Byte (8 bits) Total file size = 14,700 * 8 = 117,600 bits Option 2 (Fixed size compression) We have 6 characters, so we need 3 bits to encode them Total file size = 14,700 * 3 = 44,100 bits Character Fixed Encoding E 000 A 001 C 010 T 100 K 110 N 111

5 Main Idea: Frequency-Based Encoding (Cont’d)
Assume in this file only 6 characters appear E, A, C, T, K, N The frequencies are: Character Frequency E 10,000 A 4,000 C 300 T 200 K 100 N Option 3 (Huffman compression) Variable-length compression Assign shorter codes to more frequent characters and longer codes to less frequent characters Total file size: Char. HuffmanEncoding E A 10 C 110 T 1110 K 11110 N 11111 (10,000 x 1) + (4,000 x 2) + (300 x 3) + (200 x 4) + (100 x 5) + (100 x 5) = 20,700 bits

6 Huffman Coding A variable-length coding for characters
More frequent characters  shorter codes Less frequent characters  longer codes It is not like ASCII coding where all characters have the same coding length (8 bits) Two main questions How to assign codes (Encoding process)? How to decode (from the compressed file, generate the original file) (Decoding process)?

7 Decoding for fixed-length codes is much easier
Character Fixed-length Encoding E 000 A 001 C 010 T 100 K 110 N 111 Divide into 3’s Decode C A T K N E

8 Decoding for variable-length codes is not that easy…
000001 Character Variable-length Encoding E A 00 C 001 It means what??? EEEC EAC AEC Huffman encoding guarantees to avoid this uncertainty …Always have a single decoding

9 Huffman Algorithm Step 1: Get Frequencies
Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompress the file

10 Eerie eyes seen near lake.
Step 1: Get Frequencies Input File: Eerie eyes seen near lake.

11 Step 2: Build Huffman Tree & Assign Codes
It is a binary tree in which each character is a leaf node Initially each node is a separate root At each step Select two roots with smallest frequency and connect them to a new parent (Break ties arbitrary) [The greedy choice] The parent will get the sum of frequencies of the two child nodes Repeat until you have one root

12 Example Each char. has a leaf node with its frequency

13 Find the smallest two frequencies…Replace them with their parent

14 Find the smallest two frequencies…Replace them with their parent

15 Find the smallest two frequencies…Replace them with their parent

16 Find the smallest two frequencies…Replace them with their parent

17 Find the smallest two frequencies…Replace them with their parent

18 Find the smallest two frequencies…Replace them with their parent

19 Find the smallest two frequencies…Replace them with their parent

20 Find the smallest two frequencies…Replace them with their parent

21 Find the smallest two frequencies…Replace them with their parent

22 Find the smallest two frequencies…Replace them with their parent

23 Find the smallest two frequencies…Replace them with their parent

24 Now we have a single root…This is the Huffman Tree

25 Lets Analyze Huffman Tree
All characters are at the leaf nodes The number at the root = # of characters in the file High-frequency chars (E.g., “e”) are near the root Low-frequency chars are far from the root

26 Lets Assign Codes Traverse the tree Any left edge  add label 0
As right edge  add label 1 The code for each character is its root-to-leaf label sequence

27 Lets Assign Codes Traverse the tree Any left edge  add label 0
1 1 1 1 1 1 1 1 1 1 1 Traverse the tree Any left edge  add label 0 As right edge  add label 1 The code for each character is its root-to-leaf label sequence

28 Lets Assign Codes Coding Table Traverse the tree
Any left edge  add label 0 As right edge  add label 1 The code for each character is its root-to-leaf label sequence

29 Huffman Algorithm Step 1: Get Frequencies
Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompess the file

30 Step 3: Encode (Compress) The File
Input File: Eerie eyes seen near lake. Coding Table + Generate the encoded file 0000 10 1100 0001 10 …. Notice that no code is prefix to any other code  Ensures the decoding will be unique (Unlike Slide 8)

31 Step 4: Decode (Decompress)
Must have the encoded file + the coding tree Scan the encoded file For each 0  move left in the tree For each 1  move right Until reach a leaf node  Emit that character and go back to the root 0000 10 1100 0001 …. + Eerie … Generate the original file

32 Huffman Algorithm Step 1: Get Frequencies
Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompess the file


Download ppt "Greedy Algorithms (Huffman Coding)"

Similar presentations


Ads by Google