Huffman Codes Information coding: –Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. –Why? These electric signals.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Optimal Merging Of Runs
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
Variable-Length Codes: Huffman Codes
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Noise, Information Theory, and Entropy
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
x x x 1 =613 Base 10 digits {0...9} Base 10 digits {0...9}
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Huffman Codes Message consisting of five characters: a, b, c, d,e
Let G be a pseudograph with vertex set V, edge set E, and incidence mapping f. Let n be a positive integer. A path of length n between vertex v and vertex.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
exercise in the previous class
Communication Technology in a Changing World Week 2.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
CSCE350 Algorithms and Data Structure Lecture 19 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
Foundation of Computing Systems
Bahareh Sarrafzadeh 6111 Fall 2009
Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
1 Huffman Codes. 2 ASCII use same size encoding for all characters. Variable length codes can produce shorter messages than fixed length codes Huffman.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
Information Theory Information Suppose that we have the source alphabet of q symbols s 1, s 2,.., s q, each with its probability p(s i )=p i. How much.
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
EE465: Introduction to Digital Image Processing
Chapter 5. Greedy Algorithms
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
Chapter 9: Huffman Codes
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Communication Technology in a Changing World
Communication Technology in a Changing World
Trees Addenda.
CSE 589 Applied Algorithms Spring 1999
Lecture 8 Huffman Encoding (Section 2.2)
Presentation transcript:

Huffman Codes Information coding: –Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. –Why? These electric signals are either present or absent at any specific time. Suppose Voyager on-board camera is sensitive to four shades of gray: –White –Light gray –Dark gray –black Camera picture is digitized into (400*600) “dots”, then transmitted by radio to Earth, in a single stream of signals, to be reconstructed and printed.

Huffman Codes In designing a binary code, we want to decide how to encode the “color” of each dot in binary, so that: –1) No waste of signals (efficiency) –2) Recognizable (later) Example: encode –White – 0001 –Light gray – 0010 –Dark gray – 0100 –Black – 1000 WASTEFUL!! One picture would cost 4*24000 = almost signals 4 “digits” per symbol (dot) How many digits do you need? –1 not enough, only 2 values –2 ok 4 values –3 too much –…–…

Huffman Codes Fixed-length code of length 2 (2 yes/no questions suffice to identify the color) No problem on receiving end, every two digits define a dot. Try 2: –W – 00 –LG – 01 –DG – 10 –B – 11 Encoding mechanism: Decision tree 0 W LG DG B Start at root, follow till leaf is reached

Huffman Codes There are other shapes with four leaf nodes 0 W LG DGB Which one is better? Criterion is weighted average length Suppose we have these probabilities: W LG DG B

Huffman Codes VARIABLE – LENGTH CODE Weighted average for tree 1 =.40*2 +.30*2 +.18*2 +.12*2 = 2 Weighted average for tree 2 =.40*1 +.30*2 +.18*3 +.12*3 = 1.9 On average, tree 2 is better, costs only 1.9*24000 = 45600, less than half of first try.

Huffman Codes General problem: –Given n symbols, with their respective probabilities, which is the best tree? (code?) –To determine the fewest digits (yes/no questions necessary to identify the symbol) Construct the tree from the leaves to root: –1) label each leaf with its probabilities –2) Determine the two fatherless nodes with the smallest probabilities. In case of tie, choose arbitrarily. –3) Create a father for these two nodes; label father with the sum of the two probabilities. –4) Repeat 2) 3) until there is 1 fatherless node (the root).

In our case: By convention, left is 0, right is B DG LG W Using this method, the code obtained is minimum – redundancy, or Huffman code. So, we have: W LG DG B

a – 01 b – 11 c – 10 d – 001 e – 000 Sample Huffman code; minimize the average number of yes/no questions necessary to distinguish 1 of 5 symbols that occur with known probabilities e 0.15 d 0.21 c 0.25 b 0.28 a

Weighted Average Length = 2*( )+3*( ) = 2* *.26 = 2.26 The Huffman code is always a prefix code. A prefix code satisfies the prefix condition. A code satisfies the prefix condition if no code is a prefix of another code.

Not a Prefix code: a:0 b:1 c:00 d:01 If met with 00, it is ambiguous, can’t figure out if it is aa or c Not A Prefix code: a:0 b:01 c:10 Not ambiguous 1 A Prefix code: At any point, it’s possible to delimit the symbol Example.