HUFFMAN CODES.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy Algorithms Huffman Coding
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression Basics & Huffman Coding
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Topic 20: Huffman Coding The author should gaze at Noah, and... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single source shortest paths Minimum Spanning Trees Task Scheduling.
Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Coding Yancy Vance Paredes. Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Greedy Algorithms Analysis of Algorithms.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
CS6045: Advanced Algorithms Greedy Algorithms. Main Concept –Divide the problem into multiple steps (sub-problems) –For each step take the best choice.
Design & Analysis of Algorithm Huffman Coding
CSC317 Greedy algorithms; Two main properties:
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Compression & Huffman Codes
Madivalappagouda Patil
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Huffman Coding.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Huffman Coding CSE 373 Data Structures.
Chapter 16: Greedy algorithms Ming-Te Chi
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Topic 20: Huffman Coding The author should gaze at Noah, and ... learn, as they did in the Ark, to crowd a great deal of matter into a very small compass.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

HUFFMAN CODES

Greedy Algorithm Design Steps of Greedy Algorithm Design: Formulate the optimization problem. Show that the greedy choice can lead to an optimal solution, so that the greedy choice is always safe. Demonstrate that an optimal solution to original problem = greedy choice + an optimal solution to the subproblem Optimal Substructure Property Greedy-Choice Property A good clue that that a greedy strategy will solve the problem.

Encoding and Compression of Data Definition Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth Applicable to many forms of data transmission Our example: text files

Huffman Codes Proposed by Dr. David A. Huffman in 1952 For compressing data (sequence of characters) Widely used Very efficient (saving 20-90%) Use a table to keep frequencies of occurrence of characters. Output binary string. Proposed by Dr. David A. Huffman in 1952 “A Method for the Construction of Minimum Redundancy Codes”

Huffman Coding Huffman codes can be used to compress information Like WinZip – although WinZip doesn’t use the Huffman algorithm JPEGs do use Huffman as part of their compression process Not all characters occur with the same frequency! The basic idea is that instead of storing each character in a file as an 8-bit ASCII value, we will instead store the more frequently occurring characters using fewer bits and less frequently occurring characters using more bits On average this should decrease the file size (usually ½)

Total space usage in bits: Computer Data Encoding: How do we represent data in binary? Historical Solution: Fixed length codes. Encode every symbol by a unique binary string of a fixed length. Examples: ASCII (8 bit code),

Total space usage in bits-Fixed length codes. Assume an ℓ bit fixed length code. For a file of n characters Need nℓ bits. Example: suppose alphabet of 3 symbols: { A, B, C }. suppose in file: 1,000,000 characters. Need 2 bits for a fixed length code for a total of 2,000,000 bits.

Variable Length codes Idea: In order to save space, use less bits for frequent characters and more bits for rare characters.

3 bits to represent 6 characters Saving approximately 25% Huffman Codes 3 bits to represent 6 characters Frequent characters short codewords and infrequent characters long codewords eg. “abc” = “000001010” eg. “abc” = “0101100” Example: Frequency Fixed-length Variable-length codeword codeword ‘a’ 45000 000 0 ‘b’ 13000 001 101 ‘c’ 12000 010 100 ‘d’ 16000 011 111 ‘e’ 9000 100 1101 ‘f’ 5000 101 1100 A file of 100,000 characters. Containing only ‘a’ to ‘e’ Encode file in 1*45000 + 3*13000 + 3*12000 + 3*16000 + 4*9000 + 4*5000 = 224,000 bits Encode file in 100,000 *3 = 300,000 bits Saving approximately 25%

Prefix Codes In variable length code we ca use an idea called prefix code, in which no codeword is also a prefix of some other codeword. Prefix code can always achieve the optimal data compression among any character code. Prefix codes are desirable because they simplify decoding.

How do we decode? In the fixed length, we know where every character starts, since they all have the same number of bits. Example: A = 000 B = 101 C = 100 000000000101101100 A A A B B C

How do we decode? In the variable length code, we use an idea called Prefix code, where no code is a prefix of another. Example: A = 0 B = 10 C = 11 None of the above codes is a prefix of another.

How do we decode? Example: A = 0 B = 10 C = 11 So, for the string: A A A B B C the encoding: 0 0 0 10 10 11

Prefix Code Example: A = 0 B = 10 C = 11 Decode the string 0 0 0101011 0 0 0101011 A A A B B C

Idea Consider a binary tree, with: 0 meaning a left turn 1 meaning a right turn. Example :Consider the paths from the root to each of the leaves A, B, C, D: A : 0 B : 10 C : 110 D : 111 1 A 1 B 1 C D

Observe: This is a prefix code, since each of the leaves has a path ending in it, without continuation. If the tree is full then we are not “wasting” bits. If we make sure that the more frequent symbols are closer to the root then they will have a smaller code. 1 A 1 B 1 C D

A full binary tree every nonleaf node has 2 children Huffman Codes A file of 100,000 characters. The coding schemes can be represented by trees: Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the frequencies of the leaves in its subtree. Frequency Fixed-length (in thousands) codeword ‘a’ 45 000 ‘b’ 13 001 ‘c’ 12 010 ‘d’ 16 011 ‘e’ 9 100 ‘f’ 5 101 Frequency Variable-length (in thousands) codeword ‘a’ 45 0 ‘b’ 13 101 ‘c’ 12 100 ‘d’ 16 111 ‘e’ 9 1101 ‘f’ 5 1100 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 100 86 14 1 58 28 a:45 b:13 c:12 d:16 e:9 f:5 1 a:45 b:13 c:12 d:16 e:9 f:5 Not a full binary tree 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 A full binary tree every nonleaf node has 2 children

Huffman Codes - Greedy Algorithm To find an optimal code for a file: 1. The coding must be unambiguous. Consider codes in which no codeword is also a prefix of other codeword. => Prefix Codes Prefix Codes are unambiguous. Once the codewords are decided, it is easy to compress (encode) and decompress (decode). 2. File size must be smallest. => Can be represented by a full binary tree. => Usually less frequent characters are at bottom Let C be the alphabet (eg. C={‘a’,’b’,’c’,’d’,’e’,’f’}) For each character c, no. of bits to encode all c’s occurrences = freqc*depthc File size B(T) = cCfreqc*depthc (no of bits to encode a file) Frequency Codeword ‘a’ 45000 0 ‘b’ 13000 101 ‘c’ 12000 100 ‘d’ 16000 111 ‘e’ 9000 1101 ‘f’ 5000 1100 100 55 1 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13 Cost of Tree Eg. “abc” is coded as” “0101100”

Huffman Codes - Greedy Algorithm 1. Consider all pairs: <frequency, symbol>. 2. Choose the two lowest frequencies, and make them brothers, with the root having the combined frequency. 3. Iterate.

Huffman Codes How do we find the optimal prefix code? Huffman code (1952) was invented to solve it. A Greedy Approach. Q: A min-priority queue f:5 e:9 c:12 b:13 d:16 a:45 c:12 b:13 d:16 a:45 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 d:16 a:45 14 25 c:12 b:13 30 f:5 e:9 a:45 d:16 14 25 c:12 b:13 30 55 f:5 e:9 100 55 25 30 a:45 14 f:5 e:9 d:16 c:12 b:13

Huffman Codes Q: A min-priority queue …. HUFFMAN(C) 1 Build Q from C 14 f:5 e:9 d:16 a:45 14 25 c:12 b:13 f:5 e:9 …. HUFFMAN(C) 1 Build Q from C 2 For i = 1 to |C|-1 3 Allocate a new node z 4 z.left = x = EXTRACT_MIN(Q) 5 z.right = y = EXTRACT_MIN(Q) 6 z.freq = x.freq + y.freq 7 Insert z into Q in correct position. 8 Return EXTRACT_MIN(Q) If Q is implemented as a binary min-heap, “Build Q from C” is O(n) “for loop is O(n) “EXTRACT_MIN(Q)” is O(lg n) “Insert z into Q” is O(lg n) Huffman(C) is O(n lg n)

Example

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text Mike Scott

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What characters are present? E e r i space y s n a r l k . Mike Scott

Building a Tree Scan the original text An Introduction to Huffman Coding March 21, 2000 Building a Tree Scan the original text Eerie eyes seen near lake. What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 Mike Scott

Building a Tree Prioritize characters An Introduction to Huffman Coding March 21, 2000 Building a Tree Prioritize characters Create binary tree nodes with character and frequency of each character Place nodes in a priority queue The lower the occurrence, the higher the priority in the queue Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree The queue after inserting all nodes E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree E 1 i y l k . r 2 s n a sp 4 e 8 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 E 1 i 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 2 y 1 l 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree n 2 a 2 2 e 8 sp 4 2 4 2 k 1 . 1 E 1 i 1 r 2 s 2 y 1 l 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 2 4 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 2 4 4 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 y 1 l 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 What is happening to the characters with a low number of occurrences? Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Building a Tree This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Eerie eyes seen near lake.  26 characters Mike Scott

Encoding the File Traverse Tree for Codes An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Perform a traversal of the tree to obtain new code words. Going left is a 0 going right is a 1 Code word is only completed when a leaf node is reached. E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott

Encoding the File Traverse Tree for Codes An Introduction to Huffman Coding March 21, 2000 Encoding the File Traverse Tree for Codes Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Encoding the File Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 Rescan text and encode file using new code words Eerie eyes seen near lake. 0000101100000110011100010101101101001111101011111100011001111110100100101 Why is there no need for a separator character? Mike Scott

Encoding the File Results An Introduction to Huffman Coding March 21, 2000 Encoding the File Results Have we made things any better? 73 bits to encode the text ASCII would take 8 * 26 = 208 bits 0000101100000110011100010101101101001111101011111100011001111110100100101 Mike Scott

An Introduction to Huffman Coding March 21, 2000 Decoding the File How does receiver know what the codes are? Tree constructed for each text file. Considers frequency for each file Data transmission is bit based versus byte based. Mike Scott

Decoding the File Once receiver has tree it scans incoming bit stream 0  go left 1  go right E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 10100011011110111101111110000110101 A. elk nay sir B. eel snake C. eel kin sly D. eek snarl nil E. eel snarl. 10 10 0011 011 1101 1110 1111 1100 0011 0101 e e l space s n a r l .

Summary Huffman coding is a technique used to compress files for transmission Uses statistical coding more frequently used symbols have shorter code words Works well for text and fax transmissions An application that uses several data structures