Proving the Correctness of Huffman’s Algorithm

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Chapter 9 Greedy Technique. Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible - b feasible.
Lecture 4 (week 2) Source Coding and Compression
Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Lecture 2: Greedy Algorithms II Shang-Hua Teng Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Lecture04 Data Compression.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
CSE 326 Huffman coding Richard Anderson. Coding theory Conversion, Encryption, Compression Binary coding Variable length coding A B C D E F.
Variable-Length Codes: Huffman Codes
Greedy Algorithms Huffman Coding
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Coding Dr. Ying Lu RAIK 283 Data Structures & Algorithms.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
 Rooted tree and binary tree  Theorem 5.19: A full binary tree with t leaves contains i=t-1 internal vertices.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Bahareh Sarrafzadeh 6111 Fall 2009
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
Huffman encoding.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
 2004 SDU Uniquely Decodable Code 1.Related Notions 2.Determining UDC 3.Kraft Inequality.
CS6045: Advanced Algorithms Greedy Algorithms. Main Concept –Divide the problem into multiple steps (sub-problems) –For each step take the best choice.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Assignment 6: Huffman Code Generation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
Proving the Correctness of Huffman’s Algorithm
Lecture 7 Greedy Algorithms
Chapter 16: Greedy Algorithm
Chapter 9: Huffman Codes
Chapter 16: Greedy Algorithms
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 16: Greedy algorithms Ming-Te Chi
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Data Compression Section 4.8 of [KT].
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Proving the Correctness of Huffman’s Algorithm

Quizz Suppose we have an alphabet containing 4 characters, {a, b, c, d}. Which of the following codes could be valid Huffman codes and which not ? a=00, b=01, c=10, d=11 a=0, b=10, c=110, d=111 a=0, b=10, c=110, d=1111 a=00, b=01, c=101, d=100

Huffman algorithm We must prove that the Huffman algorithm always produces optimal codes A code is optimal for a given source (alphabet with given character frequencies) if the length of the encoded source is smaller or equal with the one produced by any other uniquely-decodable code.

The sibling property Let C be an alphabet in which each character has a frequency. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit (x and y are “siblings”) Proof: we take the tree T representing an arbitrary optimal prefix code and modify it to make a tree representing another optimal prefix code such that the characters x and y appear as sibling leaves of maximum depth in the new tree.

The sibling property (cont) In the optimal tree T, leaves a and b are two siblings of maximum depth. Leaves x and y are the two characters with the lowest frequencies; they appear in arbitrary positions in T. Swapping leaves a and x produces tree T’, and then swapping leaves b and y produces tree T’’. Since each swap does not increase the cost, the resulting tree T’’ is also an optimal tree.

Huffman algorithm – Recursive formulation Procedure RecursiveHufman ( SetOfCharactersWithFreq C) n = card (C) if (n<=2) assign a 1 bit codeword to each character else select from C the characters x, y having the minimum frecvencies create the character z having the frequency freq[x]+freq[y] create set C’ = C – {x, y} +{z}, card(C’)=n-1 RecursiveHufman(C’) codeword of x = codeword of z + “1” codeword of y = codeword of z + ”0”

Huffman - proof We will prove that the Huffman algorithm produces optimal codes by induction on n, the number of characters of the alphabet Base case: n=2: each character is encoded using 1 bit, and this is the optimum value (you cannot use less than 1 bit per character) Inductive Hypothesis: Assume that Huffman produces optimal codes for alphabets having n=1, 2, ..., k-1 characters, k>2 Inductive Step: Show that Huffman produces optimal codes also for alphabets having n = k characters.

Huffman Proof – Inductive step For alphabets of size k: Let C={c1, c2, .. ck} be an alphabet having k characters, let x and y be the two least frequent characters. Let L be the length of the code produced when applied to the source of C. The recursive call will produce a code for C’=C-{x,y}+{z}, an alphabet of k-1 characters, having all frequencies the same as in C, except for freq[z]=freq[x]+freq[y] By the inductive hypothesis, the code produced for C’ having k-1 characters is optimal. Let its length be L’. L= L’+freq[x]+freq[y] Because the codes for C are exactly the codes for C’, except that one character z is replaced by two other characters x and y, whose codewords are 1 bit longer than the codeword of z. We have to add 1 bit every time z occurs – the frequency of z was freq[x]+freq[y]

Huffman Proof – Inductive step (cont) In order to prove that the code is optimal for C, we must show that no other code can produce a length LL<L. Assume (for contradiction) that there is another code such that for C it can produce a code with length LL<L. In this other code, the codewords for x and y are “siblings” The sibling property: there exists an optimal prefix code where the two least frequent characters are encoded by codewords having the same length and that differ only in the last bit In this other code, we can produce a code for C’, having the length LL’: LL=LL’+freq[x]+freq[y] Since for C’ we had the optimal code L’, L’<=LL’ LL=LL’+freq[x]+freq[y] >=L’+freq[x]+freq[y] =L LL >=L is in contradiction with the assumption LL<L, thus no other code can produce a length LL shorter than L for C