Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Applied Algorithmics - week7
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Lecture04 Data Compression.
School of Computing Science Simon Fraser University
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Computer Science 335 Data Compression.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Klara Nahrstedt Spring 2014
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Fundamentals of Multimedia Part II: Multimedia Data Compression Chapter 7 : Lossless Compression Algorithms 2 nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan.
Lossless Compression - I Hao Jiang Computer Science Department Sept. 13, 2007.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Speech coding. What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication,
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
CMPT 365 Multimedia Systems
1 Multimedia Compression Algorithms Wen-Shyang Hwang KUAS EE.
Prof. Amr Goneid Department of Computer Science & Engineering
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Digital Image Processing Image Compression
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Image Compression Fasih ur Rehman. Goal of Compression Reduce the amount of data required to represent a given quantity of information Reduce relative.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
Chapter 3 Data Representation. 2 Compressing Files.
Lossless Compression(2)
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Image Compression Review of Basics Huffman coding run length coding Quantization.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Multi-media Data compression
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2012.
Images. Audio. Cryptography - Steganography MultiMedia Compression } Movies.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
Dr. Hadi AL Saadi Image Compression. Goal of Image Compression Digital images require huge amounts of space for storage and large bandwidths for transmission.
1 SWE 423 – Multimedia System. 2 SWE Multimedia System Introduction  Compression is the process of coding that will effectively reduce the total.
IMAGE PROCESSING IMAGE COMPRESSION
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Context-based Data Compression
Chapter 8 – Binary Search Tree
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
Data Compression.
Presentation transcript:

Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011

Lecture 4: Lossless Compression (1) r Topics (Chapter 7) m Introduction m Basics of Information Theory m Compression techniques Lossless compression Lossy compression

4.1 Introduction r Compression: the process of coding that will effectively reduce the total number of bits needed to represent certain information.

Introduction r Compression ratio: m B 0 - number of bits before compression m B 1 - number of bits after compression

Types of Compression r Lossless compression m Does not lose information – the signal can be perfectly reconstructed after decompression m Produces a variable bit-rate m It is not guaranteed to actually reduce the data size Depends on the characteristics of the data m Example : Winzip r Lossy compression m Loses some information – the signal is not perfectly reconstructed after decompression m Produces any desired constant bit-rate m Exampe: JPEG, MPEG

4.2 Basics of Information Theory r Model Information at the Source m Model data at the Source as a Stream of Symbols –This defines the “Vocabulary” of the source. m Each symbol in the vocabulary is represented by bits m If your vocabulary has N symbols, each symbol represented with log 2 N bits. Text by ASCII code – 8bits/code: N=2 8 =256 symbols Speech -16 bits/sample: N=2 16 =65,536 symbols Color Image: 3x8 bits/sample: N=2 24 =17x10 6 symbols 8x8 Image Blocks: 8x64 bits/block: N=2 512 =10 77 symbols

Lossless Compression r Lossless compression techniques ensure no loss of data after compression/decompression. r Coding: “Translate” each symbol in the vocabulary into a “binary codeword”. Codewords may have different binary lengths. r Example: You have 4 symbols (a, b, c, d). Each in binary may be represented using 2 bits each, but coded using a different number of bits. m a(00) -> 000 m b(01) -> 001 m c(10) -> 01 m d(11) -> 1 r Goal of Coding is to minimize the average symbol length

Average Symbol Length r The vocabulary of the source has N symbols r l(i) – binary length of i th symbols r Symbol i has been emitted m(i) times r M = number of symbols that the source emits (on every T second) r Number of bits been emitted in T second r Probability P(i) of a symbol: number of times it occurs in the transmission and is defined as P(i) = m(i) /M

Average Symbol Length r Average length per symbol / average symbol length r Average bit rate

Minimum Average Symbol Length r Goal of compression m to minimize the number of bits being transmitted m Equivalent to minimize the average symbol length r How to reduce the average symbol length m Assign shorter codewords to symbols that appear more frequently, m Assign longer codewords to symbols that appear less frequently

Minimum Average Symbol Length r What is the lower bound of average symbol length? m Decided by the entropy m Shannon’s Information Theorem r The average binary length of the encoded symbol is always greater than or equal to the entropy H of the source

Entropy r The entropy η of an information source with alphabet S = {s 1, s 2, …, s n } is: p i - probability that symbol s i will occur in S. indicates the amount of information ( self- information as defined by Shannon) contained in s i, which corresponds to the number of bits needed to encode s i.

Entropy r The entropy is characteristics of a given source of symbols r Entropy is largest (equal to log 2 N) when all symbols are equally probable r Entropy is small (always >=0) when some symbols are much more likely to appear than other symbols r The chances that each symbol appear are similar, or symbols are uniformly distributed in the source

Entropy and code length r The entropy represents the average amount of information contained per symbol in the source S. r The entropy species the lower bound for the average number of bits to code each symbol in S, i.e., r - the average length (measured in bits) of the codewords produced by the encoder. r Efficiency of the Encoder

Distribution of Gray-Level Intensities Fig. 7.2(a) shows the histogram of an image with uniform distribution of gray-level intensities, i.e., p i = 1/256. Hence, the entropy of this image is: log = 8 (7.4) - No compression is possible for this image!

4.3 Compression Techniques r Compression techniques are broadly classified into m Lossless compression Run-length encoding Variable length coding (entropy coding): –Shannon-fano algorithm, –Huffman coding, –adaptive Huffman coding Arithmetic coding LZW m Lossy compression

Run-length Encoding r Sequence of elements, c1, c2, …, ci,…, is mapped to a run (ci,li) m Ci = symbol m li = length of the symbol ci’s run r For example, given the sequence of symbols {1,1,1,1,3,3,4,4,4,3,3,5,2,2,2} The run-length encoding is (1,4),(3,2),(4,3),(3,2),(5,1),(2,3) r Apply run-length encoding to a bi-level image (with only 1-bit black and while pixels) r Assume the starting run is of a particular color (either black or white) r Code the length of each run

Variable Length Coding r VLC generates variable length codewords from fixed length r VLC is one of the best known entropy coding method r Methods of VLC m Shannon-Fano algorithm m Huffman coding m Adaptive Huffman coding

Shannon-Fano Algorithm A top-down approach, Steps: 1. Sort the symbols according to the frequency count of their occurrences. 2. Recursively divide the symbols into two parts, each with approximately the same number of counts, until all parts contain only one symbol. An Example: coding of “HELLO” Sort symbols according to their frequencies, LHEO

Assign bit 0 to its left branches and 1 to the right branches.

m Coded bits: 10 bits m Raw datawords, 8 bits per character: 40 bits m Compression ratio : 10/40 = 25%