Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Prof. Sin-Min Lee Department of Computer Science
Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Association Clusters Definition The frequency of a stem in a document,, is referred to as. Let be an association matrix with rows and columns, where. Let.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Greedy Algorithms Huffman Coding
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
 Greedy Algorithms. Greedy Algorithm  Greedy Algorithm - Makes locally optimal choice at each stage. - For optimization problems.  If the local optimum.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
GREEDY ALGORITHMS UNIT IV. TOPICS TO BE COVERED Fractional Knapsack problem Huffman Coding Single source shortest paths Minimum Spanning Trees Task Scheduling.
Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
COSC 3101A - Design and Analysis of Algorithms 9 Knapsack Problem Huffman Codes Introduction to Graphs Many of these slides are taken from Monica Nicolescu,
Greedy Algorithms.
1 Chapter 16: Greedy Algorithm. 2 About this lecture Introduce Greedy Algorithm Look at some problems solvable by Greedy Algorithm.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Greedy Algorithms Analysis of Algorithms.
Huffman encoding.
Greedy algorithms 2 David Kauchak cs302 Spring 2012.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Greedy Algorithms. p2. Activity-selection problem: Problem : Want to schedule as many compatible activities as possible., n activities. Activity i, start.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Proving the Correctness of Huffman’s Algorithm
Chapter 16: Greedy Algorithm
Chapter 16: Greedy Algorithms
Huffman Coding.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 16: Greedy algorithms Ming-Te Chi
Merge Sort Dynamic Programming
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Data Compression Section 4.8 of [KT].
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Huffman Codes

Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string C: Alphabet

Prefix Code  Prefix(-free) code: no codeword is also a prefix of some other codewords (Un-ambiguous) »An optimal data compression achievable by a character code can always be achieved with a prefix code »Simplify the encoding (compression) and decoding Encoding: abc  = Decoding: =  aabe –Use binary tree to represent prefix codes for easy decoding  An optimal code is always represented by a full binary tree, in which every non-leaf node has two children »|C| leaves and |C|-1 internal nodes (Exercise B.5-3) »Cost: Frequency of c Depth of c (length of the codeword)

(Not optimal)

Constructing A Huffman Code  C is a set of n characters »Each character c  C is an object with a frequency, denoted by f[c]  The algorithm builds the tree T in a bottom-up manner »Begin with |C| leaves and perform a sequence of |C|-1 merging »A min-priority queue Q, keyed on f, is used to identify the two least-frequent objects to merge together The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged

Constructing A Huffman Code (Cont.) O(lg n) Total computation time = O(n lg n)

Lemma 16.2 Greedy-Choice  Let C be an alphabet in which each character c  C has a frequency f[C]. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree

Proof of Lemma 16.2  Without loss of generality, assume f[a]  f[b] and f[x]  f[y]  The cost difference between T and T’ is B(T’’)  B(T), but T is optimal, B(T)  B(T’’)  B(T’’) = B(T) Therefore T’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth

Greedy-Choice?  Define the cost of a single merger as being the sum of the frequencies of the two items being merged  Of all possible mergers at each step, HUFFMAN chooses the one that incurs the least cost

Lemma 16.3 Optimal Substructure  Let C’ = C – {x, y}  {z} »f[z] = f[x] + f[y]  Let T’ be any tree representing an optimal prefix code for C’ Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children, represent an optimal prefix code for C  Observation: B(T) = B(T’) + f[x] + f[y]  B(T’) = B(T)-f[x]-f[y] »For each c  C – {x, y}  d T (c) = d T’ (c)  f[c]d T (c) = f[c]d T’ (c) »d T (x) = d T (y) = d T’ (z) + 1 »f[x]d T (x) + f[y]d T (y) = (f[x] + f[y])(d T’ (z) + 1) = f[z]d T’ (z) + (f[x] + f[y])

B(T’) = B(T)-f[x]-f[y] B(T) = 45*1+12*3+13*3+5*4+9*4+16*3 z:14 B(T’) = 45*1+12*3+13*3+(5+9)*3+16*3 = B(T)

Proof of Lemma 16.3  Prove by contradiction.  Suppose that T does not represent an optimal prefix code for C. Then there exists a tree T’’ such that B(T’’) < B(T).  Without loss of generality, by Lemma 16.2, T’’ has x and y as siblings. Let T’’’ be the tree T’’ with the common parent x and y replaced by a leaf with frequency f[z] = f[x] + f[y]. Then  B(T’’’) = B(T’’) - f[x] – f[y] < B(T) – f[x] – f[y] = B(T’) »T’’’ is better than T’  contradiction to the assumption that T’ is an optimal prefix code for C’