Huffman Encoding 16-Apr-17.

Slides:



Advertisements
Similar presentations
Functional Programming Lecture 15 - Case Study: Huffman Codes.
Advertisements

Compression techniques. Why we need compression. Types of compression –Lossy and lossless Concentrate on lossless techniques. Run Length coding. Entropy.
15-583:Algorithms in the Real World
Data Compression CS 147 Minh Nguyen.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Dale & Lewis Chapter 3 Data Representation
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Communication Technology in a Changing World Week 2.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
ENTROPY & RUN LENGTH CODING. Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Characters CS240.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Compression techniques Adaptive and non-adaptive.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Data Compression: Huffman Coding in Weiss (p.389)
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
B/B+ Trees 4.7.
Data Compression.
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Huffman Encoding 16-Apr-17

Entropy Entropy is a measure of information content: the number of bits actually required to store data. Entropy is sometimes called a measure of surprise A highly predictable sequence contains little actual information Example: 11011011011011011011011011 (what’s next?) Example: I didn’t win the lottery this week A completely unpredictable sequence of n bits contains n bits of information Example: 01000001110110011010010000 (what’s next?) Example: I just won $10 million in the lottery!!!! Note that nothing says the information has to have any “meaning” (whatever that is)

Actual information content A partially predictable sequence of n bits carries less than n bits of information Example #1: 111110101111111100101111101100 Blocks of 3: 111110101111111100101111101100 Example #2: 101111011111110111111011111100 Unequal probabilities: p(1) = 0.75, p(0) = 0.25 Example #3: "We, the people, in order to form a..." Unequal character probabilities: e and t are common, j and q are uncommon Example #4: {we, the, people, in, order, to, ...} Unequal word probabilities: the is very common

Fixed and variable bit widths To encode English text, we need 26 lower case letters, 26 upper case letters, and a handful of punctuation We can get by with 64 characters (6 bits) in all Each character is therefore 6 bits wide We can do better, provided: Some characters are more frequent than others Characters may be different bit widths, so that for example, e use only one or two bits, while x uses several We have a way of decoding the bit stream Must tell where each character begins and ends

Example Huffman encoding A = 0 B = 100 C = 1010 D = 1011 R = 11 ABRACADABRA = 01001101010010110100110 This is eleven letters in 23 bits A fixed-width encoding would require 3 bits for five different letters, or 33 bits for 11 letters Notice that the encoded bit string can be decoded!

Why it works In this example, A was the most common letter In ABRACADABRA: 5 As code for A is 1 bit long 2 Rs code for R is 2 bits long 2 Bs code for B is 3 bits long 1 C code for C is 4 bits long 1 D code for D is 4 bits long

Creating a Huffman encoding For each encoding unit (letter, in this example), associate a frequency (number of times it occurs) You can also use a percentage or a probability Create a binary tree whose children are the encoding units with the smallest frequencies The frequency of the root is the sum of the frequencies of the leaves Repeat this procedure until all the encoding units are in the binary tree

Example, step I Assume that relative frequencies are: B: 20 C: 10 D: 10 R: 20 (I chose simpler numbers than the real frequencies) Smallest number are 10 and 10 (C and D), so connect those

Example, step II C and D have already been used, and the new node above them (call it C+D) has value 20 The smallest values are B, C+D, and R, all of which have value 20 Connect any two of these

Example, step III The smallest values is R, while A and B+C+D all have value 40 Connect R to either of the others

Example, step IV Connect the final two nodes

Example, step V Assign 0 to left branches, 1 to right branches Each encoding is a path from the root A = 0 B = 100 C = 1010 D = 1011 R = 11 Each path terminates at a leaf Do you see why encoded strings are decodable?

Unique prefix property A = 0 B = 100 C = 1010 D = 1011 R = 11 No bit string is a prefix of any other bit string For example, if we added E=01, then A (0) would be a prefix of E Similarly, if we added F=10, then it would be a prefix of three other encodings (B=100, C=1010, and D=1011) The unique prefix property holds because, in a binary tree, a leaf is not on a path to any other node

Practical considerations It is not practical to create a Huffman encoding for a single short string, such as ABRACADABRA To decode it, you would need the code table If you include the code table in the entire message, the whole thing is bigger than just the ASCII message Huffman encoding is practical if: The encoded string is large relative to the code table, OR We agree on the code table beforehand For example, it’s easy to find a table of letter frequencies for English (or any other alphabet-based language)

About the example My example gave a nice, good-looking binary tree, with no lines crossing other lines That’s because I chose my example and numbers carefully If you do this for real data, you can expect your drawing will be a lot messier—that’s OK Example:

Data compression Huffman encoding is a simple example of data compression: representing data in fewer bits than it would otherwise need A more sophisticated method is GIF (Graphics Interchange Format) compression, for .gif files Another is JPEG (Joint Photographic Experts Group), for .jpg files Unlike the others, JPEG is lossy—it loses information Generally OK for photographs (if you don’t compress them too much), because decompression adds “fake” data very similiar to the original

The End