ENTROPY & RUN LENGTH CODING. Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic.

Slides:



Advertisements
Similar presentations
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Advertisements

Data Compression CS 147 Minh Nguyen.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
School of Computing Science Simon Fraser University
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
SWE 423: Multimedia Systems
Run Length Encoder/Decoder EE113D Project Authors: Imran Hoque Yipeng Li Yipeng Li Diwei Zhang Diwei Zhang.
A Data Compression Algorithm: Huffman Compression
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
Huffman Coding. Main properties : –Use variable-length code for encoding a source symbol. –Shorter codes are assigned to the most frequently used symbols,
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
CSE Lectures 22 – Huffman codes
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Source Coding-Compression
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
HUFFMAN CODES.
Data Coding Run Length Coding
Data Compression.
Data Compression CS 147 Minh Nguyen.
Analysis & Design of Algorithms (CSCE 321)
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Chapter 11 Data Compression
Data Structure and Algorithms
Greedy Algorithms Alexandra Stefan.
Image Coding and Compression
Huffman Encoding.
Presentation transcript:

ENTROPY & RUN LENGTH CODING

Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic coding Decoding Algorithm for Arithmetic decoding Run Length Encoding Question –Answer References

What is Entropy Coding?  Entropy coding is lossless compression scheme.  One of the main types of entropy coding creates and assigns a unique prefix-free code to each unique symbol that occurs in the input.  These entropy encoders then compress data by replacing each fixed-length input symbol with the corresponding variable-length prefix-free output code word.

Continue……  The length of each code word is approximately proportional to the negative logarithm of the probability.  Therefore, the most common symbols use the shortest codes.  According to Shannon's source coding theorem, the optimal code length for a symbol is −logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol

Entropy Encoding Techniques-  Huffman Coding  Arithmetic coding

Huffman Encoding-  For each encoding unit (letter, symbol or any character), associate with a frequency.  You can choose percentage or probability for occurrence of the encoding unit.  Create a binary tree whose children are the encoding units with the smallest frequencies/ probabilities.  The frequency of the root is the sum of the frequencies/probabilities of the leaves  Repeat this procedure until all the encoding units are covered in the binary tree.

Example, step I  Assume that relative frequencies are:  A: 40  B: 20  C: 10  D: 10  R: 20  (I chose simpler numbers than the real frequencies)  Smallest number are 10 and 10 (C and D), so connect those

Example, step II  C and D have already been used, and the new node above them (call it C+D) has value 20  The smallest values are B, C+D, and R, all of which have value 20  Connect any two of these

Example, step III  The smallest values is R, while A and B+C+D all have value 40  Connect R to either of the others

Example, step IV  Connect the final two nodes

Example, step V  Assign 0 to left branches, 1 to right branches  Each encoding is a path from the root  A = 0 B = 100 C = 1010 D = 1011 R = 11  Each path terminates at a leaf  Do you see why encoded strings are decodable?

Unique prefix property  A = 0 B = 100 C = 1010 D = 1011 R = 11  No bit string is a prefix of any other bit string  For example, if we added E=01, then A (0) would be a prefix of E  Similarly, if we added F=10, then it would be a prefix of three other encodings (B=100, C=1010, and D=1011)  The unique prefix property holds because, in a binary tree, a leaf is not on a path to any other node

Data compression-  Huffman encoding is a simple example of data compression: representing data in fewer bits than it would otherwise need  A more sophisticated method is GIF (Graphics Interchange Format) compression, for.gif files  Another is JPEG (Joint Photographic Experts Group), for.jpg files  Unlike the others, JPEG is lossy—it loses information  Generally OK for photographs (if you don’t compress them too much), because decompression adds “fake” data very similiar to the original

Arithmetic Coding

A rithmetic Coding-  Huffman coding has been proven the best in compare to fixed length coding method available.  Yet, since Huffman codes have to be an integral number of bits long, while the entropy value of a symbol may (as a matter of fact, almost always so) be a fraction number, theoretical possible compressed message cannot be achieved. 15

A rithmetic Coding(Cont…)  For example, if a statistical method assign 90% probability to a given character, the optimal code size would be 0.15 bits.  The Huffman coding system would probably assign a 1-bit code to the symbol, which is six times longer than necessary. 16

A rithmetic Coding(Cont..)  Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating point output number. 17

18 Character probability Range ^(space) 1/10 A 1/10 B 1/10 E 1/10 G 1/10 I 1/10 L 2/10 S 1/10 T 1/10 BILL GATES” Suppose that we want to encode the message “BILL GATES”

Encoding Algorithm For A rithmetic Coding-  Encoding algorithm for arithmetic coding : low = 0.0 ; high =1.0 ; while not EOF do range = high - low ; read(c) ; high = low + range  high_range(c) ; low = low + range  low_range(c) ; end do output(low); 19

Continue……………….  To encode the first character B properly, the final coded message has to be a number greater than or equal to 0.20 and less than  range = 1.0 – 0.0 = 1.0  high = × 0.3 = 0.3  low = × 0.2 = 0.2  After the first character is encoded, the low end for the range is changed from 0.00 to 0.20 and the high end for the range is changed from 1.00 to

Continue…………..  The next character to be encoded, the letter I, owns the range 0.50 to 0.60 in the new sub range of 0.20 to  So, the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established.  Thus, this number is further restricted to 0.25 to

Continue……………………….  Note that any number between 0.25 and 0.26 is a legal encoding number of ‘BI’. Thus, a number that is best suited for binary representation is selected.  (Condition : the length of the encoded message is known or EOF is used.) 22

( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T ( ) A B E G I L S T

Continue…………….. CharacterLowHigh B I L L ^(space) G A T E S

Continue……………….  So, the final value (or, any value between and , if the length of the encoded message is known at the decode end), will uniquely encode the message ‘BILL GATES’. 25

A rithmetic Coding(Decoding)  Decoding is the inverse process.  Since falls between 0.2 and 0.3, the first character must be ‘B’.  Removing the effect of ‘B’from by first subtracting the low value of B, 0.2, giving  Then divided by the width of the range of ‘B’, 0.1. This gives a value of

Decoding (Cont………..)  Then calculate where that lands, which is in the range of the next letter, ‘I’.  The process repeats until 0 or the known length of the message is reached. 27

A rithmetic Decoding Algorithm-  Decoding algorithm : r = input_number repeat search c such that r falls in its range output(c) ; r = r - low_range(c); r = r ÷ (high_range(c) - low_range(c)); until EOF or the length of the message is reached 28

29 r cLow High range B I L L ^(space) G A T E S

A rithmetic Coding Summary  In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol.  The new range is proportional to the predefined probability attached to that symbol.  Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted. 30

Continue…………………..  Coding rate approaches high-order entropy theoretically.  Not so popular as Huffman coding because ×, ÷ are needed. 31

Run Length Encoder/Decoder

What is RLE?  Compression technique  Represents data using value and run length  Run length defined as number of consecutive equal values e.g RLE Val ues Run Length s

Advantage of RLE-  Useful for compressing data that contains repeated values  e.g. output from a filter, many consecutive values are 0.  Very simple compared with other compression techniques  Reversible (Lossless) compression  decompression is just as easy

Applications-  I Frame compression in Video- Run Length Encoder!

RLE Effectiveness-  Compression effectiveness depends on input  Must have consecutive runs of values in order to maximize compression  Best case: all values same  Can represent any length using two values  Worst case: no repeating values  Compressed data twice the length of original!!  Should only be used in situations where we know for sure have repeating values

Encoder - Algorithm  Start on the first element of input  Examine next value  If same as previous value  Keep a counter of consecutive values  Keep examining the next value until a different value or end of input then output the value followed by the counter. Repeat  If not same as previous value  Output the previous value followed by ‘1’ (run length. Repeat

Encoder – Matlab Code % Run Length Encoder % EE113D Project function encoded = RLE_encode(input) my_size = size(input); length = my_size(2); run_length = 1; encoded = []; for i=2:length if input(i) == input(i-1) run_length = run_length + 1; else encoded = [encoded input(i-1) run_length]; run_length = 1; end if length > 1 % Add last value and run length to output encoded = [encoded input(i) run_length]; else % Special case if input is of length 1 encoded = [input(1) 1]; end

Encoder – Matlab Results >> RLE_encode([ ]) ans = >> RLE_encode([ ]) ans = 0 11 >> RLE_encode([ ]) ans =

Encoder  Input from separate.asm file  In the form of a vector  e.g. ‘array.word 4,5,5,2,7,3,6,9,9,10,10,10,10,10,10,0,0’  Output is declared as data memory space  Examine memory to get output  Originally declared to be all -1.  Immediate Problem  Output size not known until run-time (depends on input size as well as input pattern)  Cannot initialize variable size array

Encoder  Solution  Limit user input to preset length (16)  Initialize output to worst case (double input length – 32)  Initialize output to all -1’s (we’re only handling positive numbers and 0 as inputs)  Output ends when -1 first appears or if length of output equals to worst case

Decoder – Matlab Code % Run Length Decoder % EE113D Project % The input to this function should be the output from Run Length Encoder, % which means it assumes even number of elements in the input. The first % element is a value followed by the run count. Thus all odd elements in % the input are assumed the values and even elements the run counts. % function decoded = RLE_decode(encoded) my_size = size(encoded); length = my_size(2); index = 1; decoded = []; % iterate through the input while (index <= length) % get value which is followed by the run count value = encoded(index); run_length = encoded(index + 1); for i=1:run_length % loop adding 'value' to output 'run_length' times decoded = [decoded value]; end % put index at next value element (odd element) index = index + 2; end

Decoder – Matlab Results >> RLE_decode([0 12]) ans = >> RLE_decode([ ]) ans = >> RLE_decode(RLE_encode([ ])) ans =

Reference: is.cs.nthu.edu.tw/course/2012Spring/ISA530100/chapt06.ppt 4.ihoque.bol.ucla.edu/presentation.ppt