Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Lecture 4 (week 2) Source Coding and Compression
Applied Algorithmics - week7
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression.
Chapter 6 Information Theory
Lecture04 Data Compression.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Information Theory and Security
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Huffman Codes Message consisting of five characters: a, b, c, d,e
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
STATISTIC & INFORMATION THEORY (CSNB134)
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Entropy coding Present by 陳群元. outline constraints  Compression efficiency  Computational efficiency  Error robustness.
Prepared by: Amit Degada Teaching Assistant, ECED, NIT Surat
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Entropy vs. Average Code-length Important application of Shannon’s entropy measure is in finding efficient (~ short average length) code words The measure.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Ch4. Zero-Error Data Compression Yuan Luo. Content  Ch4. Zero-Error Data Compression  4.1 The Entropy Bound  4.2 Prefix Codes  Definition and.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
Data Compression.
Context-based Data Compression
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Chapter 9: Huffman Codes
Analysis & Design of Algorithms (CSCE 321)
Chapter 11 Data Compression
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

Huffman Coding (2 nd Method)

Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental limit set by the entropy of discrete memory less source.  This code is “optimum” as it provides the smallest average code word length for a given discrete memory less source.  The Huffman coding is are as follow : The source symbols (message) are listed in the order of decreasing probability. The two source symbols of lowest probability are assigned a 0 and 1. (this part of the step is referred to as splitting stage)

These two source symbols (message) are regarded as being “combined” into a new source symbol (message) with probability equal to the sum of the two original probability. The probability of the new symbol is placed in the list in accordance with its value. This procedure is repeated until we are left with only two source symbol (message) for which a 0 and 1 are assigned. The code of each original source symbol is found by working backward and tracing the sequence of 0 and 1 assigned to that symbol as well as its successors.

The huffman code can be shown in the form of an algorithm as follow : 1.List source symbols (message) in the order of decreasing probability. 2.The two source symbols of lowest probability are assigned number 0 and 1. 3.These two source symbols are combined into a new message. 4.The probability of this new message is equal to the sum of probability of the two original symbols. 5.The probability of this new message is placed in the list according to its value. 6.Repeat this procedure until we are lift with only two source symbols, symbols for which a 0 and 1 are assigned.

Example: Construct Huffman code from the following given values & find average code length, Entropy, efficiency & redundancy. Messagem1m1 m2m2 m3m3 m4m4 m5m5 probability

Step 1: Arrange the message in the order of decreasing probability. Step 2: Assign number 0 and 1 to the two message having lowest probability. Step 3: Combine these two messages into a new message and place this probability in the probability list as per its value. Step 4 and 5 : Repeat this procedure. Step 6 : Write the code word for each message by tracking back from the last stage to the first stage

Step 1: Arrange the message in the order of decreasing probability. Step 2: The two message having lowest probability are assigned 0 and 1. the two message with lowest probabilities are m 4 and m 5 as shown in figure. Message Probabilities m m m m m New message with probability of ( ) = 0.2

Step 3: Now consider that these two message m 4 and m 5 as being combined into a new message and place the probability of the new combined message, in the list according to its value. Place the combined message as high as possible when its probability is equal to that of the other message. This is shown in figure. Message Stage I Stage II m m m m m New message with probability of ( ) = 0.4 New message with the probability of 0.2

Step 4: Now consider the two message of lowest probability in stage II of the figure assign 0 and 1 to these two massage. Consider that these two message are combined to from a new message with probability of ( ) = 0.4. Place the probability of the combined message according to its value in stage III. Place it as high as possible if the other message have the same probability. This is shown in figure Step 5: Follow the same procedure till only two messages remain and assign the 0 and 1 for them. All this is as shown.

Message Stage I Stage II Stage III Stage IV m m m m m Read the encircled bits to get code for m 5 as

Step 6 : How to write the code for a message? Consider the green path shown in figure. To write the code for message m 5 this path is to be used. Start from stage IV and track upto stage I along the dotted path. And write down the code word in terms of 0 s and 1 s starting from stage IV. The code word for message m 5 is 011. Similarly write code words for the other messages shown in table

To find the average code word length : The average code word length is given as, L = = (0.4 x 2) + (0.2 x 2) + (0.2 x 2) + (0.1 x 3) + (0.1 x 3) = 2.2 Messagem1m1 m2m2 m3m3 m4m4 m5m5 Probabilities Code word

Find the entropy of the source : The entropy of the source is given as, H = = 0.4 log 2 (1/0.4) log 2 (1/0.2) log 2 (1/0.2) log 2 (1/0.1) log 2 (1/0.1) H = ŋ = x100 % Ŋ =2.12/2.2 =.9636 = 96.36%

Example: Consider the same memoryless source as in above Ex. All the data is same. Find the Huffman code by moving the probability of the combined message as low as possible. Tracking backwards through the various steps find the code word of the second Huffman code. All the steps to be followed are same as those followed for the Huffman’s first code explained in above Ex. Except for the change that the combined message is to be placed as low as possible. This is shown in figure.

Message Stage I Stage II Stage III Stage IV m m m m m Read the encircled bits to get code for m 5 as

To find code word: This procedure is same as that followed in the previous example. Follow the green path in the figure to obatin the code for the message m 5 as, Code word for the message can be obtained. They are as listed below MessageProbabilitiesCode word m1m m2m m3m m4m m5m

Note that to transmit the same message as those of the previous example now we need more number of bits per message.

Example : Construct Huffman code from the following given values & find average code length, Entropy, efficiency & redundancy. Symbols0s0 s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 Probability

Solu. The huffman code for the source alphabets is as shown in figure. Symbol Stage I Stage II Stage III Stage IV Stage V Stage VI S S S S S S S The encircled bits on the dotted path correspond to the code for symbol S 0 i.e. S 0 = 10

Follow the path indicated by the dotted line to obtain the codeword for symbol S 0 as 10. Similarly we can obtain the code words for the remaining symbols. These are as listed in table. SymbolProbabilityCodewordCodeword length S0S bit S0S bit S0S bit S0S bit S0S bit S0S bit S0S bit

To compute the efficiency : The average code length L = From table L = (0.25 x 2) + (0.25 x 2) + (0.125 x 3)x3 + ( x 4)x2 L = bits/symbol

The average information per message H = H = [0.25 log 2 (4)]x2 + [0.125 log 2 (8)]x3 + [ log 2 (16)]x2 H = [0.25x2x2] + [0.125x3x3] + [0.0625x4x2] H = bits/message

Code efficiency ŋ = x 100 = x 100 ŋ = 100%

Difference Between Huffman & Shannon Code The point is whether another method would provide a better code efficiency. According to information theory a perfect code should offer an average code length of bit or 134,882 bit in total. For comparison purposes the former example will be endcoded by the Huffman algorithm:

Problem : Construct Shannon Fano code & Huffman code from following given values. Compare Average code length, entropy, code efficiency & redundancy

The Shannon-Fano code does not offer the best code efficiency for the exemplary data structure. This is not necessarily the case for any frequency distribution. But, the Shannon-Fano coding provides a similar result compared with Huffman coding at the best. It will never exceed Huffman coding. The optimum of 134,882 bit will not be matched by both.

Another Comparative Look at Huffman vs Shannon Fano Code

Entropy Average information per message is termed as entropy. It is denoted by H. We can determine the average information (H) contained in a message generated by a source by multiplying the information of each message by its probability of occurrence & taking summation over the entire alphabet set If P i is the probability of occurrence of symbol a i, information content of the message a i is counted as 1/log P i The Entropy H for n code is calculated as

EXAMPLE : A source generates messages from alphabet set (a, b, c, d, e, f, g, h). Calculate entropy of the source for the probability of occurrence of the symbols indicated within the brackets. a (0.48), b (0.08), c (0.12), d (0.02), e (0.12), f (0.04), g (0.06), h (0.08) Solution : Calculation of the entropy (H) of the source H = 0.48 log (1/0.48) log (1/0.08) log (1/0.12) log (1/0.02) log (1/0.12) log (1/0.04) log (1/0.06) log (1/0.08) = 2.367

Redundancy If we wish to encode the alphabet set of the above example, one obvious way is to assign a fixed length 3-bit code to each of its eight symbols. But we will not be utilizing full information carrying capability of the code since entropy of the source is We can reduce the average number of bits required to encode the alphabet by using variable length code instead of fixed 3-bit code. Some of the symbols can be assigned fewer than three bits so that average code length is reduced. Average code length L is the expected value as given below. n L =  P i L i i=1 So, Redundancy is calculated as = Average Code Length – Entropy of code = L - H

SymbolProbabilityCode a0.481 b c d e f g h EXAMPLE A source generates messages from alphabet set (a, b, c, d, e, f, g, h) with probabilities as indicated below. Calculate average code length and redundancy.

Solution The entropy H of the source is : H = 0.48 log (1/0.48) log (1/0.08) log (1/0.12) log (1/0.02) log (1/0.12) log (1/0.04) log (1/0.06) log (1/0.08) = The average code length (L) is L = x x x x x x x 4 = 2.38 The Redundancy R is L – H = 2.38 – =.013 or 1.3%