Greedy Algorithms Huffman Coding

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Association Clusters Definition The frequency of a stem in a document,, is referred to as. Let be an association matrix with rows and columns, where. Let.
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression and Huffman Trees (HW 4) Data Structures Fall 2008 Modified by Eugene Weinstein.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Coding Yancy Vance Paredes. Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Chapter 16: Greedy Algorithm. 2 About this lecture Introduce Greedy Algorithm Look at some problems solvable by Greedy Algorithm.
Greedy Algorithms Analysis of Algorithms.
Huffman encoding.
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Greedy Algorithms. p2. Activity-selection problem: Problem : Want to schedule as many compatible activities as possible., n activities. Activity i, start.
CS6045: Advanced Algorithms Greedy Algorithms. Main Concept –Divide the problem into multiple steps (sub-problems) –For each step take the best choice.
HUFFMAN CODES.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
The Greedy Method and Text Compression
Proving the Correctness of Huffman’s Algorithm
Chapter 16: Greedy Algorithm
Chapter 9: Huffman Codes
Chapter 16: Greedy Algorithms
Huffman Coding.
CS6045: Advanced Algorithms
Algorithms (2IL15) – Lecture 2
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Data Compression Section 4.8 of [KT].
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Lecture 2: Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Greedy Algorithms Huffman Coding Credits: Thanks to Dr. Suzan Koknar-Tezel for the slides on Huffman Coding.

Huffman Codes Widely used technique for compressing data Achieves a savings of 20% - 90% Assigns binary codes to characters

Fixed-length code? Consider a 6-character alphabet {a,b,c,d,e,f} Fixed-length: 3 bits per character Encoding a 100K character file requires 300K bits

Variable-length code Suppose you know the frequencies of characters in advance Main idea: Fewer bits for frequently occurring characters More bits for less frequent characters

Variable-length codes Savings compared to: ASCII – 72% Unicode – 86% Fixed-Len – 25% Variable-length codes An example: Consider a 100,000 character file with only 6 different characters:

Another way to look at this: Relative probability of character ‘a’: 45K/100K = 0.45 Expected encoded character length: 0.45 *1 + 0.12 *3 + 0.13 * 3 + 0.16 * 3+0.09*4 + 0.05 *4 = 2.24 If we have string of n characters Expected encoded string length = 2.24 * n

How to decode? Example: a = 0, b = 01, c = 10 Decode 0010 Does it translate to “aac” or “aba” Ambiguous 7

How to decode? Example: a = 0, b = 101, c = 100 Decode 00100 Translates to “aac” 8

What is the difference between the previous two codes?

What is the difference between the previous two codes? The second one is a prefix-code!

Prefix Codes In a prefix code, no code is a prefix of another code Why would we want this? It simplifies decoding Once a string of bits matches a character code, output that character with no ambiguity No need to look ahead

Prefix Codes (cont) We can use binary trees for decoding If 0, follow left path If 1, follow right path Leaves are the characters 12

Design & Analysis of Algorithms Prefix Codes (cont) a 45 f 5 e 9 14 d 16 c 12 b 13 1 101 111 100 1100 1101 Suzan Köknar-Tezel

Prefix Codes (cont) Given tree T (corresponding to a prefix code), compute the number of bits to encode the file C = set of unique characters in file f(c) = frequency of character c in file dT(c) = depth of c’s leaf node in T = length of code for character c

Prefix Codes (cont) Then the number of bits required to encode a file B(T) is

Huffman Codes (cont) Huffman's algorithm determines an optimal variable-length code (Huffman Codes) Minimizes B(T) 16

Greedy Algorithm for Huffman Codes Merge the two lowest frequency nodes x and y (leaf or internal) into a new node z until every leaf has been considered Set f(z) = f(x)+f(y) You can also view this as replacing x & y with a single character z in the alphabet, and after the process is completed, the code for z is determined , say 11, then the code for x is 110 and for y is 111. Use a priority queue Q to keep nodes ordered by frequency

Example of Creating a Huffman Code 15 b 25 d 40 a 50 e 75 c 15 b 25 d 40 a 50 e 75 1

Example of Creating a Huffman Code (cont) 50 e 75 d 40 c 15 b 25 80 1 i = 3

Example of Creating a Huffman Code (cont) 50 e 75 125 1 d 40 c 15 b 25 80 1 20

Example of Creating a Huffman Code (cont) 40 c 15 b 25 80 a 50 e 75 125 205 1 i = 5

Total run time: (nlgn) Huffman Algorithm Total run time: (nlgn) Huffman(C) n = |C| Q = C // Q is a binary Min-Heap; (n) Build-Heap for i = 1 to n-1 z = Allocate-Node() x = Extract-Min(Q) // (lgn), (n) times y = Extract-Min(Q) // (lgn), (n) times left(z) = x right(z) = y f(z) = f(x) + f(y) Insert(Q, z) // (lgn), (n) times return Extract-Min(Q) // return the root of the tree

Correctness Claim: Consider the two characters x and y with the lowest frequencies. Then there is an optimal tree in which x and y are siblings at the deepest level of the tree.

Proof Let T be an arbitrary optimal prefix code tree Let a and b be two siblings at the deepest level of T. We will show that we can convert T to another prefix tree where x and y are siblings at the deepest level without increasing the cost. Switch a and x Switch b and y

x y a b a y x b a b x y

Assume f(x)  f(y) and f(a)  f(b) We know that f(x)  f(a) and f(y)  f(b) Non-negative because a is at the max depth Non-negative because x has (at least) the lowest frequency

Since is at least as good as T But T is optimal, so T’must be optimal too Thus, moving x to the bottom (similarly, y to the bottom) yields a optimal solution

The previous claim asserts that the greedy-choice of Huffman’s algorithm is the proper one to make.

Claim: Huffman’s algorithm produces an optimal prefix code tree. Proof (by induction on n=|C|) Basis: n=1 the tree consists of a single leaf—optimal Inductive case: Assume for strictly less than n characters, Huffman’s algorithm produces the optimal tree Show for exactly n characters.

(According to the previous claim) in the optimal tree, the lowest frequency characters x and y are siblings at the deepest level. Remove x and y replacing them with z, where f(z)= f(x)+ f(y), Thus, n-1 characters remain in the alphabet.

Let T’be any tree representing any prefix code for this (n-1) character alphabet. Then, we can obtain a prefix-code treeT for the original set of n characters by replacing the leaf node for z with an internal node having x and y as children. The cost of T is B(T) = B(T’) – f(z)d(z)+f(x)(d(z)+1)+f(y)(d(z)+1) = B(T’) – (f(x)+f(y))d(z) + (f(x)+f(y))(d(z)+1) = B(T’) + f(x) + f(y) To minimize B(T) we need to build T’ optimally—which we assumed Huffman’s algorithm does.

z x y