Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Greedy Algorithms Huffman Coding
Lossless Data Compression Using run-length and Huffman Compression pages
Trees1 Trees and Codes a bc de abcde.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Huffman Codes Message consisting of five characters: a, b, c, d,e
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Huffman Codes Information coding: –Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. –Why? These electric signals.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Data Structures Week 6: Assignment #2 Problem
1 Huffman Codes Drozdek Chapter Objectives You will be able to Construct an optimal variable bit length code for an alphabet with known probability.
© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.
Data Structures and Algorithms Lecture (BinaryTrees) Instructor: Quratulain.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Coding The most for the least. Design Goals Encode messages parsimoniously No character code can be the prefix for another.
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Greedy Algorithms Analysis of Algorithms.
D ESIGN & A NALYSIS OF A LGORITHM 12 – H UFFMAN C ODING Informatics Department Parahyangan Catholic University.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
ISNE101 – Introduction to Information Systems and Network Engineering
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Digital Encodings.
Trees Addenda.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Huffman Codes

Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character can encode 256 characters  Unicode 16 bits per character can encode characters includes all characters encoded by ASCII  ASCII and Unicode are fixed-length codes  all characters represented by same number of bits

Problems  Suppose that we want to encode a message constructed from the symbols A, B, C, D, and E using a fixed-length code  How many bits are required to encode each symbol?  at least 3 bits are required  2 bits are not enough (can only encode four symbols)  How many bits are required to encode the message DEAACAAAAABA?  there are twelve symbols, each requires 3 bits  12*3 = 36 bits are required

Drawbacks of fixed-length codes  Wasted space  Unicode uses twice as much space as ASCII inefficient for plain-text messages containing only ASCII characters  Same number of bits used to represent all characters  ‘a’ and ‘e’ occur more frequently than ‘q’ and ‘z’  Potential solution: use variable-length codes  variable number of bits to represent characters when frequency of occurrence is known  short codes for characters that occur frequently

Advantages of variable-length codes  The advantage of variable-length codes over fixed- length is short codes can be given to characters that occur frequently  on average, the length of the encoded message is less than fixed-length encoding  Potential problem: how do we know where one character ends and another begins? not a problem if number of bits is fixed! A = 00 B = 01 C = 10 D = A C D B A D D D D D

Prefix property  A code has the prefix property if no character code is the prefix (start of the code) for another character  Example:  000 is not a prefix of 11, 01, 001, or 10  11 is not a prefix of 000, 01, 001, or 10 … SymbolCode P000 Q11 R01 S001 T R S T Q P T

Code without prefix property  The following code does not have prefix property  The pattern 1110 can be decoded as QQQP, QTP, QQS, or TS SymbolCode P0 Q1 R01 S10 T11

Problem  Design a variable-length prefix-free code such that the message DEAACAAAAABA can be encoded using 22 bits  Possible solution:  A occurs eight times while B, C, D, and E each occur once  represent A with a one bit code, say 0 remaining codes cannot start with 0  represent B with the two bit code 10 remaining codes cannot start with 0 or 10  represent C with 110  represent D with 1110  represent E with 11110

Encoded message SymbolCode A0 B10 C110 D1110 E11110 DEAACAAAAABA bits

Another possible code SymbolCode A0 B100 C101 D1101 E1111 DEAACAAAAABA bits

Better code SymbolCode A0 B100 C101 D110 E111 DEAACAAAAABA bits

What code to use?  Question: Is there a variable-length code that makes the most efficient use of space? Answer: Yes!

Huffman coding tree  Binary tree  each leaf contains symbol (character)  label edge from node to left child with 0  label edge from node to right child with 1  Code for any symbol obtained by following path from root to the leaf containing symbol  Code has prefix property  leaf node cannot appear on path to another leaf  note: fixed-length codes are represented by a complete Huffman tree and clearly have the prefix property

Building a Huffman tree  Find frequencies of each symbol occurring in message  Begin with a forest of single node trees  each contain symbol and its frequency  Do recursively  select two trees with smallest frequency at the root  produce a new binary tree with the selected trees as children and store the sum of their frequencies in the root  Recursion ends when there is one tree  this is the Huffman coding tree

Example  Build the Huffman coding tree for the message This is his message  Character frequencies  Begin with forest of single trees AGMTEH_IS AGISMTEH_

Step AGISMTEH_ 2

Step AGISMTEH_ 22

Step AGISMT_ EH 4

Step AGISMT_ EH 4 4

Step AGISMT_ EH 4 4 6

Step IS_ 22 EH AGMT

Step I S _ 22 EH AGMT

Step I S _ 22 EH AGMT

Label edges 33 5 I S _ 22 EH AGMT

Huffman code & encoded message S 11 E 010 H 011 _ 100 I 101 A0000 G0001 M0010 T0011 This is his message