Increasing Information per Bit

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Chapter 6 Information Theory
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Compression & Huffman Codes
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
Spatial and Temporal Data Mining
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Variable-Length Codes: Huffman Codes
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
©Brooks/Cole, 2003 Chapter 15 Data Compression. ©Brooks/Cole, 2003 Realize the need for data compression. Differentiate between lossless and lossy compression.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Huffman Codes Message consisting of five characters: a, b, c, d,e
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
Source Coding-Compression
Data Compression1 File Compression Huffman Tries ABRACADABRA
Prof. Amr Goneid Department of Computer Science & Engineering
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Abdullah Aldahami ( ) April 6,  Huffman Coding is a simple algorithm that generates a set of variable sized codes with the minimum average.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Huffman encoding.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
COMP261 Lecture 22 Data Compression 2.
Data Coding Run Length Coding
Compression & Huffman Codes
Data Compression.
EE465: Introduction to Digital Image Processing
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Succinct Data Structures
Information and Coding Theory
Data Compression.
Introduction to Information theory
Applied Algorithmics - week7
Analysis & Design of Algorithms (CSCE 321)
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Strings CopyWrite D.Bockus.
Huffman Coding Greedy Algorithm
CSE 589 Applied Algorithms Spring 1999
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Table 3. Decompression process using LZW
Presentation transcript:

Increasing Information per Bit Information in a source Mathematical Models of Sources Information Measures Compressing information Huffman encoding Optimal Compression for DMS? Lempel-Ziv-Welch Algorithm For Stationary Sources? Practical Compression Quantization of analog data Scalar Quantization Vector Quantization Model Based Coding Practical Quantization m-law encoding Delta Modulation Linear Predictor Coding (LPC)

Huffman encoding Variable length binary code for DMS finite alphabet, fixed probabilities Code satisfies the Prefix Condition Codes are instantaneously and unambiguously decodable as they arrive e.g, 0,10,110,111 is OK 0,01,011,111 is not OK 01 = 0,111, or 01

Huffman encoding Use Probabilities to order coding priorities of letters Low probability get codes first (more bits) This smoothes out the information per bit

Huffman encoding D0 D1 D2 D3 D4 Use a code tree to make the code Combine the symbols with lowest probability to make a new block symbol Assign a 1 to one of the old symbols code word and 0 to the other symbol Now reorder and combine the two lowest probability symbols of the new set Each time the synthesized block symbol has lowest probability the code words get shorter D0 D1 D2 D3 D4

Huffman encoding Result: Self Information or Entropy H(X) = 2.11 (The best possible average number of bits) Average number of bits per letter nk = number of bits per symbol So the efficiency =

Huffman encoding Lets compare to simple 3 bit code

Huffman encoding Another example D0 D1 D2 D3

Huffman encoding Multi-symbol Block codes Use symbols made of original symbols Symbols Can show the new codes information per bit satisfies: So large enough block code gets you as close to H(X) as you want

Huffman encoding Lets consider a J=2 block code example

Encoding Stationary Sources Now there are joint probabilities of blocks of symbols that depend on previous bits Unless DMS Can show the joint entropy is: Which means less bits than a symbol by symbol code can be used

Encoding Stationary Sources H(X | Y) is the conditional entropy Joint (total) probability of xi Information in xi given yj Can show:

Conditional Entropy Plotting this for n = m = 2 we see that when Y depends strongly on X then H(X|Y) is low P(Y=0|X=0) P(Y=1|X=1)

Conditional Entropy To see how P(X|Y) and P(Y|X) relate consider: They are very simlar when P(X=0) ~ 0.5 P(X=0|Y=0) P(Y=0|X=0)

Optimal Codes for Stationary Sources Can show that for large blocks of symbols Huffman encoding is efficient Define Then Huffman code gives: Now if Get: i.e., Huffman is optimal

Lempel-Ziv-Welch Code Huffman encoding is efficient but need to know joint probabilities of large blocks of symbols Finding joint probabilities is hard LZW is independent of source statistics Is a universal source code algorithm Is not optimal

Input String = /WED/WE/WEE/WEB/WET Lempel-Ziv-Welch Build a table from strings not already in the table Output table location for strings in the table Build the table again to decode Input String = /WED/WE/WEE/WEB/WET Characters Input Code Output New code value New String / 256 /W W 257 WE E 258 ED D 259 D/ 260 /WE 261 E/ 262 /WEE 263 E/W 264 WEB B 265 B/ 266 /WET T Source: http://dogma.net/markn/articles/lzw/lzw.htm

Lempel-Ziv-Welch Decode Input Codes: / W E D 256 E 260 261 257 B 260 T NEW_CODE OLD_CODE STRING/ Output CHARACTER New table entry / W 256 = /W E 257 = WE D 258 = ED 256 /W 259 = D/ 260 = /WE 260 /WE 261 = E/ 261 E/ 262 = /WEE 257 WE 263 = E/W B 264 = WEB 265 = B/ T 266 = /WET

Lempel-Ziv-Welch Typically takes hundreds of entries to table before compression occurs Some nasty patents make licensing an issue