A Data Compression Algorithm: Huffman Compression

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
Data Compression Basics & Huffman Coding
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Trees Chapter.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Trees (Revisited) CHAPTER 15 6/30/15 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Compression & Huffman Codes
Applied Algorithmics - week7
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Podcast Ch23d Title: Huffman Compression
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

A Data Compression Algorithm: Huffman Compression Gordon College

Compression Definition: process of encoding which uses fewer bits Reason: to save valuable resources such as communication bandwidth or hard disk space Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa Aaa aa aa aaa aa a a a aa aa aaaa Compress Aaa aa aa aaa aa a a a aa aa aaaa Uncompress

Compression Types Lossy Loses some information during compression which means the exact original can not be recovered (jpeg) Normally provides better compression Used when loss is acceptable - image, sound, and video files

Compression Types Lossless exact original can be recovered usually exploit statistical redundancy Used when loss is not acceptable - data Basic Term: Compression Ratio - ratio of the number of bits in original data to the number of bits in compressed data For example: 3:1 is when the original file was 3000 bytes and the compression file is now only 1000 bytes.

Variable-Length Codes Recall that ASCII, EBCDIC, and Unicode use same size data structure for all characters Contrast Morse code Uses variable-length sequences The Huffman Compression is a variable-length encoding scheme

Variable-Length Codes Each character in such a code Has a weight (probability) and a length The expected length is the sum of the products of the weights and lengths for all the characters 0.2 x 2 + 0.1 x 4 + 0.1 x 4 + 0.15 x 3 + 0.45 x 1 = 2.1 Goal minimize the expected length Goal is to minimize the “expected length”

Huffman Compression Uses prefix codes (sequence of optimal binary codes) Uses a greedy algorithm - looks at the data at hand and makes a decision based on the data at hand. Popular and effective choice for data compression

Huffman Compression Basic algorithm Generates a table that contains the frequency of each character in a text. Using the frequency table - assign each character a “bit code” (a sequence of bits to represent the character) Write the bit code to the file instead of the character. Popular and effective choice for data compression

Immediate Decodability Definition: When no sequence of bits that represents a character is a prefix of a longer sequence for another character Purpose: Can be decoded without waiting for remaining bits Coding scheme to the right is not immediately decodable However this one is

Huffman Compression Huffman (1951) Uses frequencies of symbols in a string to build a variable rate prefix code. Each symbol is mapped to a binary string. More frequent symbols have shorter codes. No code is a prefix of another. Popular and effective choice for data compression Not Huffman Codes

Huffman Codes We seek codes that are Immediately decodable Each character has minimal expected code length For a set of n characters { C1 .. Cn } with weights { w1 .. wn } We need an algorithm which generates n bit strings representing the codes

Cost of a Huffman Tree Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am, respectively. Define the cost of the Huffman tree T to be where ri is the length of the path from the root to ai. HC(T) is the expected length of the code of a symbol coded by the tree T. HC(T) is the bit rate of the code.

Example of Cost Example: a 1/2, b 1/8, c 1/8, d 1/4 HC(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75 a b c d

Huffman Tree Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes where ri is the length of the path from the root to ai. This is a Huffman tree or Huffman code.

Recursive Algorithm - Huffman Codes Initialize list of n one-node binary trees containing a weight for each character Repeat the following n – 1 times: a. Find two trees T' and T" in list with minimal weights w' and w" b. Replace these two trees with a binary tree whose root is w' + w" and whose subtrees are T' and T" and label points to these subtrees 0 and 1 .2 minimal weights means they are not common 0 / \ 1 / \ (.1) B C (.1)

Huffman's Algorithm The code for character Ci is the bit string labeling a path in the final binary tree from the root to Ci Given characters The end with codes result is

Huffman Decoding Algorithm Initialize pointer p to root of Huffman tree While end of message string not reached repeat the following: a. Let x be next bit in string b. if x = 0 set p equal to left child pointer else set p to right child pointer c. If p points to leaf i. Display character with that leaf ii. Reset p to root of Huffman tree

Huffman Decoding Algorithm For message string 0101011010 Using Hoffman Tree and decoding algorithm Click for answer

Iterative Huffman Tree Algorithm Form a node for each symbol ai with weight pi; Insert the nodes in a min priority queue ordered by probability; While the priority queue has more than one element do min1 := delete-min; min2 := delete-min; create a new node n; n.weight := min1.weight + min2.weight; n.left := min1; also associate this link with bit 0 n.right := min2; also associate this link with bit 1 insert(n) Return the last node in the priority queue.

Example of Huffman Tree Algorithm (1) P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1

Example of Huffman Tree Algorithm (2)

Example of Huffman Tree Algorithm (3)

Example of Huffman Tree Algorithm (4)

Huffman Code

In class example I will praise you and I will love you Lord Index Sym Freq 0 space 9 1 I 2 2 L 1 3 a 2 4 d 2 5 e 2 6 i 3 7 l 5 8 n 1 9 o 4 10 p 1 11 r 2 12 s 1 13 u 2 14 v 1 15 w 2 16 y 2

In class example I will praise you and I will love you Lord Index Sym Freq Parent Left Right Nbits Bits 0 space 9 30 -1 -1 2 01 1 I 2 23 -1 -1 5 11010 2 L 1 17 -1 -1 5 00010 3 a 2 20 -1 -1 5 11110 4 d 2 22 -1 -1 5 11101 5 e 2 21 -1 -1 4 0000 6 i 3 25 -1 -1 4 1100 7 l 5 28 -1 -1 3 101 8 n 1 17 -1 -1 5 00011 9 o 4 26 -1 -1 3 001 10 p 1 18 -1 -1 6 100110 11 r 2 23 -1 -1 5 11011 12 s 1 18 -1 -1 6 100111 13 u 2 24 -1 -1 4 1000 14 v 1 19 -1 -1 5 10010 15 w 2 20 -1 -1 5 11111 16 y 2 22 -1 -1 5 11100