Data Compression: Huffman Coding 10.1.2 in Weiss (p.389)

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compression Michael J. Watts
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
Introduction to Data Compression
SWE 423: Multimedia Systems
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
A Data Compression Algorithm: Huffman Compression
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Data Compression Basics & Huffman Coding
Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Lecture 10 Data Compression.
ECE242 L30: Compression ECE 242 Data Structures Lecture 30 Data Compression.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Prof. Amr Goneid Department of Computer Science & Engineering
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
1 i206: Lecture 2: Computer Architecture, Binary Encodings, and Data Representation Marti Hearst Spring 2012.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
1 Introduction to Information Technology LECTURE 5 COMPRESSION.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
ENTROPY & RUN LENGTH CODING. Contents What is Entropy coding? Huffman Encoding Huffman encoding Example Arithmetic coding Encoding Algorithms for arithmetic.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Multi-media Data compression
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Textbook does not really deal with compression.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Compression & Huffman Codes
Data Compression.
IMAGE COMPRESSION.
Data Compression.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
CSE 589 Applied Algorithms Spring 1999
Data Compression Section 4.8 of [KT].
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Chapter 8 – Compression Aims: Outline the objectives of compression.
Huffman Coding Greedy Algorithm
Presentation transcript:

Data Compression: Huffman Coding in Weiss (p.389)

2 Why compress files? Resources are limited –Long-term storage (disk space) –Internet transfers (network bandwidth) –Fast memory access (cache) Because we can

3 Is compression possible? Most data contains redundancies –E.g. Human-readable text –Not all combinations are equally likely. –In English, some letter pairs (“qu”, “th”, etc.) appear more frequently than others. The essential information content is much less –Information theory developed by Shannon in 1950s –If you have n equally likely symbols, how many bits do you need to represent them?

4 What can be compressed? Which of the following would we require in pristine shape? (lossless) –C++ source file –Binary executable –Photograph of your thumb –Video of a monkey eating a banana –MP3 ringtones –

5 Lossless compression X = X’ Lossy compression X != X’ Compression Ratio |X|/|Y| –Where |X| is the # of bits in X. Data Compression Encoder Decode r X Y X’ originalcompresse d decompresse d Reversible or Entropy Coding Irreversible Coding

6 Lossy Compression Ideal for signals with more data than humans can process (high-fidelity). Most audio and video information can be removed without being noticed. Standards: JPEG (Joint Photographic Experts Group) MPEG (Motion Picture Experts Group) MP3 (MPEG-1, Layer 3) Can get compression ratios of 10:1

7 Lossless Compression No data is lost. Information is low-fidelity to begin with. Standards: Gzip, Unix compress, zip, GIF Can get compression ratios of 4:1 Another technique is run-length encoding (RLE), part of several compression techniques (BMP, PCX, TIFF, PDF) A run of characters is replaced by the number of characters of that typeand a single character: RTAAAAAADEEEE RT*6AD*4E

8 Lossless Compression of text ASCII = fixed 8 bits per character Example: “hello there” –11 characters * 8 bits = 88 bits Can we encode this message using fewer bits? Really only need 7 bits for 128 things We could look JUST at the message, there are only 6 possible characters + one space. = 7 things – needs 3 bits. Encode: aabddcaa = could do as 16 bits (each character = 2 bits each) Huffman can do as 14 bits

9 Huffman Coding Uses frequencies of symbols in a string to build a prefix code. Prefix Code – no code in our encoding is a prefix of another code. 11d 101c 100b 0a codeLetter 1951 Note: codes are variable length – (0 to 3 bits per character)

10 Huffman Tree All symbols at leaves Edges labeled with 0-1 Why does this guarantee prefix code? a d bc

11 Decoding a Prefix Code Loop start at root of tree loop if bit read = 1 then take 1-child else, take 0-child until node is a leaf Report character found! Until end of the message

12 Decode: d 101c 100b 0a codeLetter 8 characters: 8*8 bits = 64 bits in ASCII 8*2 bits = 16 bits (if used 2 bits each) 14 bits = Huffman (uses frequency Why did we need the code to be a prefix code?

13 Cost of a Huffman Tree Cost of a Huffman Tree containing n symbols is the expected length of a codeword. C(T) = p 1 *r 1 +p 2 *r 2 +p 3 *r 3 +….+ p n *r n Where: p i = the probability that a symbol occurs r i = the length of the path from the root to the node For previous example = (.50 * 1) + (.125 * 3) + (.125 * 3) + (.25 * 2)

14 Constructing a Huffman Tree Frequency 11d 101c 100b 0a codeLetter What is the cost of this tree?

15 Huffman Tree Construction Part the First Given a symbol-frequency table: –Start with a forest of one-node trees –One for each symbol –Associate a frequency with each tree a 0.5 b c d 0.2 5

16 Huffman Tree Construction Part the Second While there is more than one tree –Pick the two trees with smallest frequency –Combine them into one tree And add their frequencies a 0.5 b c d 0.2 5

17 Huffman Tree Construction Part the Third Pick arbitrary 0-1 labellings for the edges –More than one Huffman tree is possible –How to get from one Huffman tree to another? a d bc

18 Digression: Why “anti-compress” files? Error-correcting codes –By adding redundancies into data instead of removing it, we can make it robust to noise. –Noise on our communication channel will corrupt this redundancy. CD/DVD optical storage Hard disk magnetic storage WiFi Ethernet / CDMA –Examples: checksums, phonetic alphabet