Huffman Codes Message consisting of five characters: a, b, c, d,e

Slides:



Advertisements
Similar presentations
Chapter 9 Greedy Technique. Constructs a solution to an optimization problem piece by piece through a sequence of choices that are: b feasible - b feasible.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Data Structures – LECTURE 10 Huffman coding
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Greedy Algorithms Huffman Coding
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Huffman encoding.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
5.6 Prefix codes and optimal tree Definition 31: Codes with this property which the bit string for a letter never occurs as the first part of the bit string.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Design & Analysis of Algorithm Huffman Coding
EE465: Introduction to Digital Image Processing
Assignment 6: Huffman Code Generation
Greedy Technique.
Proving the Correctness of Huffman’s Algorithm
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Math 221 Huffman Codes.
Huffman Coding CSE 373 Data Structures.
Chapter 16: Greedy algorithms Ming-Te Chi
Trees Addenda.
Data Structure and Algorithms
Podcast Ch23d Title: Huffman Compression
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Proving the Correctness of Huffman’s Algorithm
Analysis of Algorithms CS 477/677
Presentation transcript:

Huffman Codes Message consisting of five characters: a, b, c, d,e Probabilities: .12, .4, .15, .08, .25 Encode each character into sequence of 0’s and 1’s so that no code for a character is the prefix of the code for any other character Prefix property Can decode a string of 0’s and 1’s by repeatedly deleting prefixes of the string that are codes for the character

Example Both codes have prefix property Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10 Both codes have prefix property Decode code 1: “grab” 3 bits at a time and translate each group into a character Ex.: 001010011  bcd

Example Cont’d Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10 Decode code 2: Repeatedly “grab” prefixes that are codes for characters and remove them from input Only difference, cannot “slice” up input at once How many bits depends on encoded character Ex.: 1101001  bcd

Big Deal? Huffman coding results in shorter average length of compressed (encoded) message Code 1 has average length of 3 multiply length of code for each symbol by probability of occurrence of that symbol Code 2 has average length of 2.2 (3*.12) + (2*.40) + ( 2*.15) + (3*.08) + (2*.25) Can we do better? Problem: Given a set of characters and their probabilities, find a code with the prefix property such that the average length of a code for a character is minimum

Representation Label leaves in tree by characters represented Think of prefix codes as paths in binary trees Following a path from a node to its left child as appending a 0 to a code, and proceeding form node to right child as appending 1 Can represent any prefix code as a binary tree Prefix property guarantees no character can have a code that is an interior node Conversely, labeling the leaves of a binary tree with characters gives us a code with prefix property

Sample Binary Trees 1 1 1 1 1 e b c 1 1 1 a d a b c d e Code 1 Code 2

Huffman’s Algorithm Select two characters a and b having the lowest probabilities and replacing them with a single (imaginary) character, say x x’s probability of occurrence is the sum of the probabilities for a and b Now find an optimal prefix code for this smaller set of characters, using the above procedure recursively Code for original character set is obtained by using the code for x with a 0 appended for a and with a 1 appended for b

Steps in the Construction of a Huffman Tree Sort input characters by frequency .08 .12 .15 .25 .40 . . . . . d a c e b

Merge a and d .20 .15 .25 .40 . . . . d a c e b

Merge a, d with c .35 .25 .40 . . e b c d a

Merge a, c, d with e .60 .40 . b e c d a

Final Tree 1.00 Codes: a - 1111 b - 0 c - 110 d - 1110 e - 10 average code length: 2.15 1 b 1 e 1 c 1 d a

Huffman Algorithm Example of greedy algorithm Combine nodes whenever possible without considering potential drawbacks inherent in making such a move I.e., at any individual stage select that option which is “locally optimal” Recall: vertex coloring problem Does not always yield optimal solution; however, Huffman coding is optimal See textbook for proof

Finishing Remarks Works well in theory, several restrictive assumptions (1) Frequency of letters is independent of the context of that letter in message Not true in English language (2) Huffman coding works better when large variation in frequency of letters Actual frequencies must match expected ones Examples: DEED  8 bits (12 bits ASCII) FUZZ  20 bits (12 bits ASCII)