Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Greedy Algorithms Greed is good. (Some of the time)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Greedy Algorithms CSE 331 Section 2 James Daly. Reminders Exam 2 next week Thursday, April 9th Covers heaps to spanning trees Greedy algorithms (today)
Greedy Algorithms Huffman Coding
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Greedy Algorithms CSc 4520/6520 Fall 2013 Problems Considered Activity Selection Problem Knapsack Problem – 0 – 1 Knapsack – Fractional Knapsack Huffman.
Huffman Coding Yancy Vance Paredes. Outline Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CS3381 Des & Anal of Alg ( SemA) City Univ of HK / Dept of CS / Helena Wong 5. Greedy Algorithms - 1 Greedy.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
HUFFMAN CODES.
CSC317 Greedy algorithms; Two main properties:
Assignment 6: Huffman Code Generation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Huffman Coding.
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 11 Data Compression
Chapter 16: Greedy algorithms Ming-Te Chi
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Greedy Algorithms TOPICS Greedy Strategy Activity Selection
Trees Addenda.
Data Structure and Algorithms
Chapter 16: Greedy algorithms Ming-Te Chi
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Huffman codes Binary character code: each character is represented by a unique binary string. A data file can be coded in two ways: a b c d e f frequency(%)
Analysis of Algorithms CS 477/677
Presentation transcript:

Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes

Huffman Codes Huffman codes are a way to compress data. They are widely used and effective, leading to 20 - 90% savings of space, depending on the data. A greedy algorithm uses the frequency and occurrence of each character to build an optimal code for representing each character as a binary string.

Variable length codes We can save space by assigning frequently occurring characters short codes, and infrequently occurring characters long codes. a b c d e f Total freq 45 13 12 16 9 5 100 Example: a 0 b 101 c 100 d 111 e 1101 f 1100 Number of bits = (45*1 + 13*3 + 12*3 + 16*3 + 9*4 +5*4)*1000 = 2.24 x 105 bits (This is optimal)

Tree for variable length code 100 1 a: 45 55 1 30 25 1 1 d: 16 c: 12 b: 13 14 1 f: 5 e: 9 Only full binary trees (every non-leaf node has exactly 2 children) represent optimal codes.

Size of Tree The tree for an optimal prefix code has 1 leaf for each letter in the alphabet. Let C = alphabet from which the set of characters is drawn. |C| = Size of C (number of characters in the alphabet) Then the number of leaves in the tree = |C| and the number of internal nodes in the tree = |C| -1

Cost of the tree The number of bits to encode a given character is given by the depth (dT(c)) of the leaf that contains that character. The depth is the length of the path from root to leaf. Given a tree, T, corresponding to a prefix code, Number of bits to encode a file is: frequency of c depth of c This is the cost of the tree.

Greedy Algorithm to create a Huffman code The greedy algorithm to create an optimal prefix code works as follows: We will store the characters in Q, which we will implement as a Binary-Min-Heap. (A binary-min-heap is a heap with the minimum at the top, and all children have values that are >= to the values of their parent nodes). At each stage, our greedy choice will be to combine the two nodes with the smallest frequencies (values) into a single new node whose value is the sum of the values of the original two nodes. (This new node will have pointers to the original two nodes as its left and right children).

Greedy Pseudocode Huffman(C) n = |C| Q = C //Store characters in C in a min-heap, Q for i = 1 to n-1 z = Allocate-Node() //Create an empty node z.left = x = Extract-Min(Q) //Node with lowest freq z.right = y = Extract-Min(Q) //Next lowest freq z.freq = x.freq + y.freq Insert(Q, z) //Insert z in appropriate place in Q return Extract-Min(Q) //Last remaining node in Q is root of tree

Example Q <- f : 5 e : 9 c : 12 b : 13 d : 16 a : 45 We will work this out in class.

Running time of Huffman algorithm Assume Q is implemented as a binary-min-heap. The time to build the heap = ? The for loop is executed n-1 times (once for each node in tree) Time for each extract min? Time for insert? T(n) = ?

Showing the greedy choice property We can show that this algorithm leads to optimal trees if we can show the greedy choice property and the optimal substructure property for this problem. The Greedy Choice Property: Let C be an alphabet for which each character c in C has frequency f(c). Let x and y be two characters in C having the lowest frequencies. We need to show that there exists an optimal prefix code for C in which the code words for x and y have the same length and differ only by the last bit (i.e. x and y are siblings of the same parent in an optimal tree). We will show this in class.

Using Huffman codes Decode files by starting at root and proceeding down the tree according to the bits in the message (0 = left, 1 = right). When a leaf is encountered, output the character at that leaf and restart at the root. Each message has a different tree. The tree must be saved with the message. Huffman codes are effective for long files where the savings in the message can offset the cost for storing the tree. Huffman codes are also effective when the tree can be precomputed and used for a large number of messages (e.g. a tree based on the frequency of occurrence of characters in the English language). Huffman codes are not very good for random files (each character about the same frequency).