1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2.

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

15-583:Algorithms in the Real World
Lecture 4 (week 2) Source Coding and Compression
Lecture 3: Source Coding Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Sampling and Pulse Code Modulation
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Information Theory EE322 Al-Sanie.
An introduction to Data Compression
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Data Compression.
Chapter 6 Information Theory
Lecture04 Data Compression.
Huffman Encoding 16-Apr-17.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Fundamental limits in Information Theory Chapter 10 :
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Data Structures – LECTURE 10 Huffman coding
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Variable-Length Codes: Huffman Codes
Lecture 4 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
CSI Uncertainty in A.I. Lecture 201 Basic Information Theory Review Measuring the uncertainty of an event Measuring the uncertainty in a probability.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
Information Theory and Security
Noise, Information Theory, and Entropy
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Noise, Information Theory, and Entropy
Huffman Codes Message consisting of five characters: a, b, c, d,e
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Information and Coding Theory
STATISTIC & INFORMATION THEORY (CSNB134)
Lecture 1 Source Coding and Compression
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Source Coding-Compression
Dr.-Ing. Khaled Shawky Hassan
Channel Capacity.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 5.
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
DCSP-8: Minimal length coding I Jianfeng Feng Department of Computer Science Warwick Univ., UK
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-217, ext: 1204, Lecture 4 (Week 2)
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Source Coding Efficient Data Representation A.J. Han Vinck.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
Digital Image Processing Lecture 22: Image Compression
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 7 (W5)
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
ECE 101 An Introduction to Information Technology Information Coding.
Huffman Coding (2 nd Method). Huffman coding (2 nd Method)  The Huffman code is a source code. Here word length of the code word approaches the fundamental.
SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
 2004 SDU Uniquely Decodable Code 1.Related Notions 2.Determining UDC 3.Kraft Inequality.
EE465: Introduction to Digital Image Processing
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Huffman Encoding.
Lecture 2: Basic Information Theory
Huffman Coding Greedy Algorithm
Presentation transcript:

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2

Now we will learn 3 concepts: ● Self-information ● Entropy ● Kraft's inequality ● Modeling The Source Coding Theorem

Assume: ● A source with finite number of symbols S ={s 1, s 2,..., s N } ● Symbol s n has a probability P(s n )= p n ● Lets assume that for every symbol s n, there is a l n different bits (bits/symbol) ● Before we do any further progress, lets define the term: self- information that is: The Source Coding Theorem - I Theorem 1.1: (Self Information) The self information is measured by:

● Properties of self-information: – barking of a dog during breaking a house carry or does not carry too much info if ● If he only bark in dangerous cases (info) ● If he is barking all the time (no-info) – Independent events have accumulated self- information, i.e., Self-Information - I – to calculate the information in bits we need to take the logarithm with the base (b in the log base) = 2 and this mean: – for b = e (2.7183), information is measured in “nat” and for b = 10 is in dit (decimal digits)

Self-Information Examples – Example 1: the out come of flipping a coin if: ● The coin is fair, i.e., – then: ● The coin tossing is not fair (some one cheating ;-) ); for and – Then: (No Surprise) – If we have a set of independent events,, then the average self- information associated with the experiment is This is the entropy (H)

Self-Information -IV (examples) Example 2: For a binary (0 or 1) event with P(S n ) = 0 Therefore, then: - If we have a set of independent events,, then the average self- information associated with the experiment is This is the entropy (H) –H(X) is called the first order entropy of the source. – This can be regarded as the degree of uncertainty about the following symbol.

Self-Information -V (examples) Example 3: Case 1: Imagine a person sitting in a room. Looking out the window, she can clearly see that the sun is shining. If at this moment she receives a call from a neighbor saying “It is now daytime”, does this message contain any information? Ans.: It does not, the message contains no information. Why? Because she is already certain that is daytime. Case 2 : A person has bought a lottery ticket. A friend calls to tell her that she has won first prize. Does this message contain any information? Ans.: It does. The message contains a lot of information, because the probability of winning first prize is very small Conclusion ✔ The information content of a message is inversely proportional to the probability of the occurrence of that message. ✔ If a message is very probable, it does not contain any information. If it is very improbable, it contains a lot of information

Entropy as Information Content -II Entropy is a probabilistic model such that: Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a long string of B's has an entropy of 0, since the next character will always be a 'B'. Shannon showed that: Definition 1.2: If the experiments is a source that puts out symbols s n from a set “A”, then the entropy is a measure of the average number of binary symbols (bits) needed for encoding the source Avg. message size in bits

Source Coding Theorem Theorem 1.2: (Entropy) The minimum average length of a codeword is: Entropy is the minimum expected average length. If the average length decreased than H, then the code will not be decoded

10 Lossless Data Compression ● Let's focus on the lossless data compression problem for now, and not worry about noisy channel coding for now. In practice these two problems (noise and compression) are handled separately. ● An efficient code for the source data is able to (remove source redundancy) ● Then (if necessary) we design a channel code to help us transmit the source code over the channel (adding redundancy) almost error free. Assumptions (for now): – the channel is perfectly noiseless, i.e., the receiver sees exactly the encoder's output – we always require the output of the decoder to exactly match the original sequence X. – X is generated according to a fixed probabilistic model, p(X), i.e., which is known!! ● We will measure the quality of our compression scheme by examining the average length of the encoded string Y, i.e., over p(X). EncoderDecoder y x x' Original Compressed decompressed

11 Notation for Sequences & Codes ● Assume a sequence of symbols X = {X 1 ;X 2 ; … ;X n } from a finite source alphabet A X ● We almost always use A = {0, 1} (e.g. computer files, digital, communication) but the theory can be generalized to any finite set. ● Encoder: outputs a new sequence Y = {Y 1 ;Y 2 ; … ;Y n }, (using a possibly different code alphabet A Y ). ● A symbol code, C, is a mapping of A X → A Y ; We use c(x) to denote the codeword to which C maps x. EncoderDecoder y x x' Original Compressed decompressed

12 For Decoding, What Do We Need? ● We need to set some rules like: – How does the channel terminate the transmission? (e.g. it could explicitly mark the end, it could send only 0s after the end, it could send random garbadge after the end,...) – How soon do we require a decoded symbol to be known? (e.g. “instantaneously” { as soon as the codeword for the symbol is received, within a fixed delay of when its codeword is received, not until the entire message has been received,...) ● Thus, we need to study the code classes!

13 Variable-Length Codes Non-singular codes ● A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is injective. Uniquely decodable codes ● A code is obviously not uniquely decodable if two symbols have the same codeword, i.e., if c(a i ) = c(a j ) for any or combination of two codewords gives a third one Prefix codes ● A code is a prefix code if no codeword is the prefix of the other. This means that symbols can be decoded instantaneously after their entire codeword is received

14 Source Coding Theorem Code Word and Code Length: Let C(s n ) be the codeword corresponding to s n and let l(s n ) denote the length of C(s n ). To proceed, let us focus on codes that are “instantaneously” decoded, e.g., Prefix Code Definition 1.3: A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword. – Example: ● Non-prefix: {s 1 =0, s 2 =01, s 3 =011, s 4 =0111} ● Prefix: {s 1 =0, s 2 =10, s 3 =110, s 4 =111 }

15 Four different classes a1a2a3a4a1a2a3a Singular Non-singular Decoding probem: 010 could mean a 1 a 4 or a 2 or a 3 a 1. Then, it is not uniquly decodable All codes Non-singular codes

16 Four different classes a1a2a3a4a1a2a3a Singular Non-singular Uniquely decodable All codes Non-singular codes Uniquely decodable Decoding probem: … is uniqely decodable, but the first symbol ( a 3 or a 4 ) cannot be decoded until the third ’1’ arrives

17 Four different classes a1a2a3a4a1a2a3a Singular Non-singular Uniqely decodable Instantaneous All codes Non-singular codes Uniqely decodable Instantaneous

18 Code Trees and Tree Codes ● Consider, again, our favourite example code {a 1, …, a 4 } = {0, 10, 110, 111}. ● The codewords are the leaves in a code tree. ● Tree codes are instantaneous. No codeword is a prefix of another! 0 1 a1a1 0 1 a2a2 0 1 a3a3 a4a4

19 Binary trees as prefix decoders |Symbol | Code | | a | 00 | | b | 01 | | c | 1 | ab c repeat curr= root repeat if get_bit(input) = 1 curr= curr.right else curr= curr.left endif until isleaf(curr) output curr.symbol Until eof(input)

20 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | abracadabra =

21 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | abracadabra = Output =

22 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | abracadabra = Output = a a

23 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = – Output = a

24 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = – Output = a

25 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = –– Output = ab b

26 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = ––– Output = ab

27 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = ––– Output = ab

28 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = –––– Output = ab

29 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = ––––– Output = ab

30 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = –––––– Output = abr r

31 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = ––––––– Output = abr

32 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = ––––––– Output = abr a

33 Binary trees as prefix decoders |Symbol | Code | | a | 0 | | b | 10 | | c | 110 | | d | 1110 | | r | 1111 | Abracadabra = –––––––– Output = abra a and so on !!!

Source Coding Theorem Theorem 1.2: (Kraft-McMillan Inequality) Any prefix (prefix-free) code satisfies Conversely, given a set of codeword lengths satisfying the above Inequality, one can have instantaneous code with these word lengths!! (see only can!)

Source Coding Theorem Theorem 1.3: (Shannon, 1992) Any prefix code satisfies Can you say why?

Source Coding Theorem Proof: Try to prove that does not grow up! i.e., less than 1 ! is simply the length of n codewords; can this < n ? Can you follow ? (self study; K. Sayood page: 32, chapter 2) Theorem 1.2: (Kraft-McMillan Inequality) Any prefix code satisfies Do It Your Self!!

37 Average codeword length [bits/codeword] Kraft’s Inequality Optimal Codes!! Minimize under the constraint Disregarding integer constraints, we get that Should be minimized Kraft's inequality: Differentiate: then, Optimal codeword lengths ) the entropy limit is reached!

38 But what about the integer constraints? l i = – log p i is not always an integer! Optimal Codes!! Choosewhere such that such that

39 ● Assume that the source X is memory-free, and create the tree code for the extended source, i.e., blocks of n symbols. ● We have: or ● We can come arbitrarily close to the entropy for large n ! Optimal Codes and Theorem 1

40 In Practice ● Two practical problems need to be solved: – Bit-assignment – The integer contraint (number of bits are integer!) Theoretically: Chose l i = – log p i – and then, Rounding up not always the best! (Shannon Coding!) – OR JUST SELECT NICE VALUES :-) ● Bad Example: Binary source p 1 = 0.25, p 2 = 0.75 l 1 = – log = log 2 4 = 2 l 2 = – log = log 2 4/3 = ● OR, instead, use, e.g., the Huffman algorithm (developed by D.Huffman, 1952) to create an optimal tree code!

Source Coding Theorem Example:

42 Modeling & Coding Developing compression algorithms ● Phase I: Modeling – Develop the means to extract redundancy information – Redundancy → Predictability ● Phase II: Coding – Binary representation of the – The representation depends on the “Modeling”

43 Modeling Example 1 Let us consider this arbitrary sequence: S n = 9, 11, 11, 11, 14, 13, 15, 17, 16, 17, 20, 21 Binary encoding requires 5 – bits/sample; WHY ? Now, let us consider the model: Ŝ n = n + 8: Thus: Ŝ n = 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 Can be decoded as: e n = S n -Ŝ n : 0, 1, 0, -1, 1, -1, 0, 1, -1, -1, 1, 1 How many bits are required now? Coding: ‘00’ -1, ‘01’ 0, ‘10’ 1 2 bits/sample Therefore, the model also needs to be designed and encoded in the algorithm

44 Modeling: Yet another example Let us consider this arbitrary sequence: S n = Assume it correctly describes the probabilities generated by the source; then P(1) = P(6) = P(7) = p(10) = 1/16 P(2) = P(3) = P(4) = P(5) = P(8) = P(9) = 2/16 Assuming the sequence is independent and identically distributed (i.i.d.) Then: However, if we found somehow a sort of correlation, then: Thus, instead of coding the samples, code the difference only: i.e.,: Now, P(1) = 13/16, P(-1) = 3/16 Then: H = 0.70 bits (per symbol)

45 Markov Models ●Assume that each output symbol depends on previous k ones. Formally: ●Let { x n } be a sequence of observations ●We call { x n } a k th -order discrete Markov chain (DMC) if ●Usually, we use a first order DMC (the knowledge of 1 symbol in the past is enough) ●The State of the Process

46 Non-Linear Markov Models ●Consider a BW image as a string of black & white pixels (e.g. row-by-row) ●Define two states: S b & S w for the current pixel ●Define probabilities: ●P(S b ) = prob of being in S b ●P(S w ) = prob of being in S w ●Transition probabilities ●P(b|b), P(b|w) ●P(w|b), P(w|w)

47 Markov Models Example ●Assume P(Sw) = 30/31 P(Sb) = 1/31 P(w/w) = 0.99 P(b/w) = 0.01 P(b/b) = 0.7 P(w/b) = 0.3 ●For the Markov Model H(Sb) = -0.3log(0.3) – 0.7log(0.7) = H(Sw) = 0.01log(0.01) – 0.99log(0.99) = H Markov = 30/31* /31*0.881 = ●What is iid different in ??

48 Markov Models in Text Compression ●In written English, probability of next letter is heavily influenced by previous ones ➢ E.g. “u” after “q” ●Shannon's work: ➢ He used 2nd-order MM, 26 letters + space,, found H = 3.1 bits/letter ➢ Word-based model, H=2.4bits/letter ➢ Human prediction based on 100 previous letters, he found the limits 0.6 ≤H ≤1.3 bits/letter ●Longer context => better prediction

49 Composite Source Model ●In many applications, it is not easy to use a single model to describe the source. In such cases, we can define a composite source, which can be viewed as a combination or composition of several sources, with only one source being active at any given time. E.g.: an executable contains: For very complicated resources (text, images, …) ●Solution: composite model, different sources/each with different model work in sequence :

50 Let Us Decorate This: Knowing something about the source it self can help us to ‘reduce’ the entropy This is called; Entropy Encoding Note the we cannot actually reduce the entropy of the source, as long as our coding is lossless Strictly speaking, we are only reducing our estimate of the entropy