Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Optimal Merging Of Runs
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
CSC317 Greedy algorithms; Two main properties:
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Optimal Merging Of Runs
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
Huffman Encoding.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Data Compressor---Huffman Encoding and Decoding

Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits Already wasting 1 bit for most purposes! Question What’s the smallest number of bits that can be used to store an arbitrary piece of text? Idea Find the frequency of occurrence of each character Encode Frequent charactersshort bit strings Rarer characterslonger bit strings

Huffman's Algorithm 1952 Repeatedly merges trees - maintains a forest Tree weight - the sum of its leaves frequencies For C characters to code, start with C single node trees Select two trees, T 1 and T 2, of smallest weights and merge them C - 1 merge operations

Huffman Encoding Encoding Use a tree Encode by following tree to leaf eg E is 00 S is 011 Frequent characters E, T2 bit encodings Others A, S, N, O 3 bit encodings

Huffman Encoding Encoding Use a tree Inefficient in practice Use a direct-addressed lookup table ?Finding the optimal encoding Smallest number of bits to represent arbitrary text A010 E00 B : : N : S T

A divide-and-conquer approach might have us asking which characters should appear in the left and right subtrees and trying to build the tree from the top down. A greedy approach places our n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node.

Huffman Encoding Divide and conquer Decide on a root - n choices Decide on roots for sub-trees - n choices Repeat n times  O(n!) Greedy Approach Sort characters by frequency Form two lowest weight nodes into a sub-tree Sub-tree weight = sum of weights of nodes Move new tree to correct place

Standard Coding Scheme

Binary Tree Representation For the character set of C characters, the standard fixed-length coding needs ┌ log C ┐ bits Fixed-length code can be represented by a binary tree where characters are stored only in leaf nodes - binary trie Each character path - start at the root, follow the branches, record 0 for the left branch and 1 for the right branch Optimal code is always a full tree - all nodes are either leaves or have two children

Representation by a Binary Trie

Improved Binary Trie

Prefix Code The fixed-length character code that has characters places only at the leaves guarantees that any bit sequence can be decoded unambiguously Prefix code - characters may have varying lengths as long as no character code is a prefix of another code That means that characters can be only in leafs

Optimal Prefix Code Tree

Optimal Prefix Code Cost

Huffman’s Algorithm Example - I

Huffman’s Algorithm Example - II

Huffman’s Algorithm Example - III

Huffman’s Algorithm Example - IV

Huffman’s Algorithm Example - V

Huffman’s Algorithm Example - VI

Huffman’s Algorithm Example-VII

Huffman Encoding - Operation Initial sequence Sorted by frequency Combine lowest two into sub-tree Move it to correct place

After shifting sub-tree to its correct place... Huffman Encoding - Operation Combine next lowest pair Move sub-tree to correct place

Move the new tree to the correct place... Huffman Encoding - Operation Now the lowest two are the “14” sub-tree and D Combine and move to correct place

Move the new tree to the correct place... Huffman Encoding - Operation Now the lowest two are the the “25” and “30” trees Combine and move to correct place

Huffman Encoding - Operation Combine last two trees

How do we decode a Huffman-encoded bit string? With these variable length strings, it's not possible to break up an encoded string of bits into characters!" The decoding procedure is deceptively simple. Starting with the first bit in the stream, one then uses successive bits from the stream to determine whether to go left or right in the decoding tree. When we reach a leaf of the tree, we've decoded a character, so we place that character onto the (uncompressed) output stream. The next bit in the input stream is the first bit of the next character.

Huffman Encoding - Decoding

Huffman Encoding - Time Complexity Sort keys O(n log n) Repeat n times Form new sub-tree O(1) Move sub-tree O(logn) (binary search) Total O(n log n) Overall O(n log n)