Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Association Clusters Definition The frequency of a stem in a document,, is referred to as. Let be an association matrix with rows and columns, where. Let.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Greedy Algorithms Huffman Coding
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
1 Analysis of Algorithms Chapter - 08 Data Compression.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single source shortest paths Minimum Spanning Trees Task Scheduling.
Huffman Coding Yancy Vance Paredes. Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Greedy Algorithms.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Chapter 16: Greedy Algorithm. 2 About this lecture Introduce Greedy Algorithm Look at some problems solvable by Greedy Algorithm.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Greedy Algorithms Analysis of Algorithms.
Huffman encoding.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Greedy Algorithms. p2. Activity-selection problem: Problem : Want to schedule as many compatible activities as possible., n activities. Activity i, start.
CS6045: Advanced Algorithms Greedy Algorithms. Main Concept –Divide the problem into multiple steps (sub-problems) –For each step take the best choice.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
Madivalappagouda Patil
The Greedy Method and Text Compression
Proving the Correctness of Huffman’s Algorithm
Introduction to Algorithms`
Greedy Algorithm.
Chapter 16: Greedy Algorithm
Chapter 9: Huffman Codes
Chapter 16: Greedy Algorithms
Huffman Coding.
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 16: Greedy algorithms Ming-Te Chi
Merge Sort Dynamic Programming
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
2019/2/25 chapter25.
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length code 111 1101 1100 The first way needs 1003=300 bits. The second way needs 45 1+13 3+12 3+16 3+9 4+5 4=232 bits. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Variable-length code Need some care to read the code. 001011101 (codeword: a=0, b=00, c=01, d=11.) Where to cut? 00 can be explained as either aa or b. Prefix of 0011: 0, 00, 001, and 0011. Prefix codes: no codeword is a prefix of some other codeword. (prefix free) Prefix codes are simple to encode and decode. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

Using codeword in Table to encode and decode Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.) Decode: 001011101 = 0.0.101.1101 = aabe a b c d e f frequency(%) 45 13 12 16 9 5 fixed-length code 000 001 010 011 100 101 variable-length code 111 1101 1100 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.) Decode: 001011101 = 0.0.101.1101 = aabe (use the (right)binary tree below:) a:45 b:13 c:12 d:16 e:9 f:5 1 100 14 86 28 58 a:45 b:13 c:12 d:16 e:9 f:5 55 25 30 14 100 1 Tree for the fixed length codeword Tree for variable-length codeword 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Binary tree Every nonleaf node has two children. The fixed-length code in our example is not optimal. The total number of bits required to encode a file is f ( c ) : the frequency (number of occurrences) of c in the file dT(c): denote the depth of c’s leaf in the tree 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

Constructing an optimal code Formal definition of the problem: Input: a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. Huffman proposed a greedy algorithm to solve the problem. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng b:13 d:16 a:45 (b) a:45 d:16 e:9 f:5 14 1 b:13 c:12 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng 14 1 b:13 c:12 25 (c) a:45 b:13 c:12 d:16 e:9 f:5 25 30 14 1 (d) 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng b:13 c:12 d:16 e:9 f:5 55 25 30 14 100 1 a:45 b:13 c:12 d:16 e:9 f:5 55 25 30 14 1 (f) (e) 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng HUFFMAN(C) 1 n:=|C| 2 Q:=C 3 for i:=1 to n-1 do 4 z:=ALLOCATE_NODE() 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q) 7 f[z]:=f[x]+f[y] 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q) 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng The Huffman Algorithm This algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. C is a set of n characters, and each character c in C is a character with a defined frequency f[c]. Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together. The result of the merger is a new object (internal node) whose frequency is the sum of the two objects. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Time complexity Lines 4-8 are executed n-1 times. Each heap operation in Lines 4-8 takes O(lg n) time. Total time required is O(n lg n). Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Another example: e:4 a:6 c:6 b:9 d:11 c:6 b:9 d:11 e:4 a:6 10 1 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng 10 1 d:11 c:6 b:9 15 1 c:6 b:9 15 1 d:11 e:4 a:6 10 1 21 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng b:9 15 1 d:11 e:4 a:6 10 21 36 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

Correctness of Huffman’s Greedy Algorithm (Fun Part, not required) Again, we use our general strategy. Let x and y are the two characters in C having the lowest frequencies. (the first two characters selected in the greedy algorithm.) We will show the two properties: There exists an optimal solution Topt (binary tree representing codewords) such that x and y are siblings in Topt. Let z be a new character with frequency f[z]=f[x]+f[y] and C’=C-{x, y}{z}. Let T’ be an optimal tree for C’. Then we can get Topt from T’ by replacing z with z x y 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Proof of Property 1 b x y c x b c y Topt Tnew Look at the lowest siblings in Topt, say, b and c. Exchange x with b and y with c. B(Topt)-B(Tnew)0 since f[x] and f[y] are the smallest. 1 is proved. 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng

CS3335 Design and Analysis of Algorithms/WANG Lusheng Let z be a new character with frequency f[z]=f[x]+f[y] and C’=C-{x, y}{z}. Let T’ be an optimal tree for C’. Then we can get Topt from T’ by replacing z with Proof: Let T be the tree obtained from T’ by replacing z with the three nodes. B(T)=B(T’)+f[x]+f[y]. … (1) (the length of the codes for x and y are 1 bit more than that of z.) Now prove T= Topt by contradiction. If TTopt, then B(T)>B(Topt). …(2) From 1, x and y are siblings in Topt . Thus, we can delete x and y from Topt and get another tree T’’ for C’. B(T’’)=B(Topt) –f[x]-f[y]<B(T)-f[x]-f[y]=B(T’). using (2) using (1) Thus, T(T’’)<B(T’). Contradiction to the assumption : T’ is optimum for C’. z y x 2019/7/4 CS3335 Design and Analysis of Algorithms/WANG Lusheng