Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Data Structures: A Pseudocode Approach with C 1 Chapter 6 Objectives Upon completion you will be able to: Understand and use basic tree terminology and.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Huffman Encoding 16-Apr-17.
Huffman Coding: An Application of Binary Trees and Priority Queues
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Chapter 9: Huffman Codes
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
Expression Tree The inner nodes contain operators while leaf nodes contain operands. a c + b g * d e f Start of lecture 25.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Optimal Merging Of Runs
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Data Structures Lecture 27 Sohail Aslam.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Optimal Merging Of Runs
Math 221 Huffman Codes.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Trees Addenda.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths for the letters used in the original message. Huffman code is also part of the JPEG image compression scheme. The algorithm was introduced by David Huffman in 1952 as part of a course assignment at MIT.

Lecture No.26 Data Structures Dr. Sohail Aslam

Huffman Encoding To understand Huffman encoding, it is best to use a simple example. Encoding the 32-character phrase: "traversing threaded binary trees", If we send the phrase as a message in a network using standard 8-bit ASCII codes, we would have to send 8*32= 256 bits. Using the Huffman algorithm, we can send the message with only 116 bits.

Huffman Encoding List all the letters used, including the "space" character, along with the frequency with which they occur in the message. Consider each of these (character,frequency) pairs to be nodes; they are actually leaf nodes, as we will see. Pick the two nodes with the lowest frequency, and if there is a tie, pick randomly amongst those with equal frequencies. Start of Lecture 26

Huffman Encoding Make a new node out of these two, and make the two nodes its children. This new node is assigned the sum of the frequencies of its children. Continue the process of combining the two nodes of lowest frequency until only one node, the root, remains.

Huffman Encoding Original text: traversing threaded binary trees size: 33 characters (space and newline) NL : 1 SP : 3 a : 3 b : 1 d : 2 e : 5 g : 1 h : 1 i : 2 n : 2 r : 5 s : 2 t : 3 v : 1 y : 1

Huffman Encoding a 3 e 5 r 5 t 3 i 2 n 2 s 2 d 2 2 SP 3 NL 1 b 1 g 1 h 2 is equal to sum of the frequencies of the two children nodes. a 3 e 5 r 5 t 3 i 2 n 2 s 2 d 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding a 3 e 5 r 5 t 3 i 2 n 2 s 2 d 2 2 2 SP 3 NL 1 b 1 g 1 There a number of ways to combine nodes. We have chosen just one such way. a 3 e 5 r 5 t 3 i 2 n 2 s 2 d 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding a 3 e 5 r 5 t 3 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g v 1 y 1

Huffman Encoding a 3 e 5 r 5 t 3 4 4 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b v 1 y 1

Huffman Encoding 6 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 d 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 19 14 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 d SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 33 19 14 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 End of lecture 25. i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding List all the letters used, including the "space" character, along with the frequency with which they occur in the message. Consider each of these (character,frequency) pairs to be nodes; they are actually leaf nodes, as we will see. Pick the two nodes with the lowest frequency, and if there is a tie, pick randomly amongst those with equal frequencies.

Huffman Encoding Make a new node out of these two, and make the two nodes its children. This new node is assigned the sum of the frequencies of its children. Continue the process of combining the two nodes of lowest frequency until only one node, the root, remains.

Huffman Encoding Start at the root. Assign 0 to left branch and 1 to the right branch. Repeat the process down the left and right subtrees. To get the code for a character, traverse the tree from the root to the character leaf node and read off the 0 and 1 along the path.

Huffman Encoding 33 1 19 14 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 1 19 14 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 33 1 19 14 1 1 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 1 19 14 1 1 9 10 6 8 a 3 e 5 r 5 4 5 t 3 4 4 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 33 1 19 14 1 1 9 10 6 8 1 1 1 1 a 3 e 5 r 5 4 5 t 3 4 4 1 1 1 1 i 2 n 2 s 2 d 2 2 2 2 SP 3 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding 33 1 19 14 1 1 9 10 6 8 1 1 1 1 a 3 e 5 r 5 4 5 t 3 4 4 1 1 1 1 i 2 n 2 s 2 d 2 2 2 2 SP 3 1 1 1 NL 1 b 1 g 1 h 1 v 1 y 1

Huffman Encoding Huffman character codes NL  10000 SP  1111 a  000 b  10001 d  0100 e  101 g  10010 h  10011 i  0101 n  0110 r  110 s  0111 t  001 v  11100 y  11101 Notice that the code is variable length. Letters with higher frequencies have shorter codes. The tree could have been built in a number of ways; each would yielded different codes but the code would still be minimal.

Huffman Encoding Original: traversing threaded binary trees Encoded: 001110000111001011100111010101101001011110011001111010100001001010100111110000101011000011011101111100111010110101110000 t r a v e

Huffman Encoding Original: traversing threaded binary trees With 8 bits per character, length is 264. Encoded: 001110000111001011100111010101101001011110011001111010100001001010100111110000101011000011011101111100111010110101110000 Compressed into 122 bits, 54% reduction.

Mathematical Properties of Binary Trees

Properties of Binary Tree Property: A binary tree with N internal nodes has N+1 external nodes.

Properties of Binary Tree A binary tree with N internal nodes has N+1 external nodes. A internal nodes: 9 external nodes: 10 B C internal node D E F End of lecture 26 G E F external node

Properties of Binary Tree Property: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes. Start lecture 27

Threaded Binary Tree Property: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes. A B C internal link D E F external link G E F Internal links: 8 External links: 10

Properties of Binary Tree Property: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes. In every rooted tree, each node, except the root, has a unique parent. Every link connects a node to its parent, so there are N-1 links connecting internal nodes. Similarly, each of the N+1 external nodes has one link to its parent. Thus N-1+N+1=2N links.