1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2.

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Email: khaled.shawky@guc.edu.eg Lecture 2

Now we will learn 3 concepts: ● Self-information ● Entropy ● Kraft's inequality ● Modeling The Source Coding Theorem

Assume: ● A source with finite number of symbols S ={s 1, s 2,..., s N } ● Symbol s n has a probability P(s n )= p n ● Lets assume that for every symbol s n, there is a l n different bits (bits/symbol) ● Before we do any further progress, lets define the term: self- information that is: The Source Coding Theorem - I Theorem 1.1: (Self Information) The self information is measured by:

● Properties of self-information: – barking of a dog during breaking a house carry or does not carry too much info if ● If he only bark in dangerous cases (info) ● If he is barking all the time (no-info) – Independent events have accumulated self- information, i.e., Self-Information - I – to calculate the information in bits we need to take the logarithm with the base (b in the log base) = 2 and this mean: – for b = e (2.7183), information is measured in “nat” and for b = 10 is in dit (decimal digits)

Self-Information Examples – Example 1: the out come of flipping a coin if: ● The coin is fair, i.e., – then: ● The coin tossing is not fair (some one cheating ;-) ); for and – Then: (No Surprise) – If we have a set of independent events,, then the average self- information associated with the experiment is This is the entropy (H)

Self-Information -IV (examples) Example 2: For a binary (0 or 1) event with P(S n ) = 0 Therefore, then: - If we have a set of independent events,, then the average self- information associated with the experiment is This is the entropy (H) –H(X) is called the first order entropy of the source. – This can be regarded as the degree of uncertainty about the following symbol.

Self-Information -V (examples) Example 3: Case 1: Imagine a person sitting in a room. Looking out the window, she can clearly see that the sun is shining. If at this moment she receives a call from a neighbor saying “It is now daytime”, does this message contain any information? Ans.: It does not, the message contains no information. Why? Because she is already certain that is daytime. Case 2 : A person has bought a lottery ticket. A friend calls to tell her that she has won first prize. Does this message contain any information? Ans.: It does. The message contains a lot of information, because the probability of winning first prize is very small Conclusion ✔ The information content of a message is inversely proportional to the probability of the occurrence of that message. ✔ If a message is very probable, it does not contain any information. If it is very improbable, it contains a lot of information

Entropy as Information Content -II Entropy is a probabilistic model such that: Independent fair coin flips have an entropy of 1 bit per flip. A source that always generates a long string of B's has an entropy of 0, since the next character will always be a 'B'. Shannon showed that: Definition 1.2: If the experiments is a source that puts out symbols s n from a set “A”, then the entropy is a measure of the average number of binary symbols (bits) needed for encoding the source Avg. message size in bits

Source Coding Theorem Theorem 1.2: (Entropy) The minimum average length of a codeword is: Entropy is the minimum expected average length. If the average length decreased than H, then the code will not be decoded

10 Lossless Data Compression ● Let's focus on the lossless data compression problem for now, and not worry about noisy channel coding for now. In practice these two problems (noise and compression) are handled separately. ● An efficient code for the source data is able to (remove source redundancy) ● Then (if necessary) we design a channel code to help us transmit the source code over the channel (adding redundancy) almost error free. Assumptions (for now): – the channel is perfectly noiseless, i.e., the receiver sees exactly the encoder's output – we always require the output of the decoder to exactly match the original sequence X. – X is generated according to a fixed probabilistic model, p(X), i.e., which is known!! ● We will measure the quality of our compression scheme by examining the average length of the encoded string Y, i.e., over p(X). EncoderDecoder y x x' Original Compressed decompressed

11 Notation for Sequences & Codes ● Assume a sequence of symbols X = {X 1 ;X 2 ; … ;X n } from a finite source alphabet A X ● We almost always use A = {0, 1} (e.g. computer files, digital, communication) but the theory can be generalized to any finite set. ● Encoder: outputs a new sequence Y = {Y 1 ;Y 2 ; … ;Y n }, (using a possibly different code alphabet A Y ). ● A symbol code, C, is a mapping of A X → A Y ; We use c(x) to denote the codeword to which C maps x. EncoderDecoder y x x' Original Compressed decompressed

12 For Decoding, What Do We Need? ● We need to set some rules like: – How does the channel terminate the transmission? (e.g. it could explicitly mark the end, it could send only 0s after the end, it could send random garbadge after the end,...) – How soon do we require a decoded symbol to be known? (e.g. “instantaneously” { as soon as the codeword for the symbol is received, within a fixed delay of when its codeword is received, not until the entire message has been received,...) ● Thus, we need to study the code classes!

13 Variable-Length Codes Non-singular codes ● A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is injective. Uniquely decodable codes ● A code is obviously not uniquely decodable if two symbols have the same codeword, i.e., if c(a i ) = c(a j ) for any or combination of two codewords gives a third one Prefix codes ● A code is a prefix code if no codeword is the prefix of the other. This means that symbols can be decoded instantaneously after their entire codeword is received

14 Source Coding Theorem Code Word and Code Length: Let C(s n ) be the codeword corresponding to s n and let l(s n ) denote the length of C(s n ). To proceed, let us focus on codes that are “instantaneously” decoded, e.g., Prefix Code Definition 1.3: A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword. – Example: ● Non-prefix: {s 1 =0, s 2 =01, s 3 =011, s 4 =0111} ● Prefix: {s 1 =0, s 2 =10, s 3 =110, s 4 =111 }

15 Four different classes a1a2a3a4a1a2a3a4 00000000 Singular 0 010 01 10 Non-singular Decoding probem: 010 could mean a 1 a 4 or a 2 or a 3 a 1. Then, it is not uniquly decodable All codes Non-singular codes

16 Four different classes a1a2a3a4a1a2a3a4 00000000 Singular 0 010 01 10 Non-singular 10 00 11 110 Uniquely decodable All codes Non-singular codes Uniquely decodable Decoding probem: 1100000000000000001… is uniqely decodable, but the first symbol ( a 3 or a 4 ) cannot be decoded until the third ’1’ arrives

17 Four different classes a1a2a3a4a1a2a3a4 00000000 Singular 0 010 01 10 Non-singular 10 00 11 110 Uniqely decodable 0 10 110 111 Instantaneous All codes Non-singular codes Uniqely decodable Instantaneous

18 Code Trees and Tree Codes ● Consider, again, our favourite example code {a 1, …, a 4 } = {0, 10, 110, 111}. ● The codewords are the leaves in a code tree. ● Tree codes are instantaneous. No codeword is a prefix of another! 0 1 a1a1 0 1 a2a2 0 1 a3a3 a4a4

19 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 00 | ------------------------ | b | 01 | ------------------------ | c | 1 | ------------------------ ab c 0 0 1 1 repeat curr= root repeat if get_bit(input) = 1 curr= curr.right else curr= curr.left endif until isleaf(curr) output curr.symbol Until eof(input)

20 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ abracadabra = 010111101100111001011110

21 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ abracadabra = 010111101100111001011110 Output = -----------

22 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ abracadabra = 010111101100111001011110 Output = a---------- a

23 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –10111101100111001011110 Output = a----------

24 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –10111101100111001011110 Output = a----------

25 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = ––0111101100111001011110 Output = ab--------- b

26 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –––111101100111001011110 Output = ab---------

27 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –––111101100111001011110 Output = ab---------

28 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = ––––11101100111001011110 Output = ab---------

29 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –––––1101100111001011110 Output = ab---------

30 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = ––––––101100111001011110 Output = abr-------- r

31 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –––––––01100111001011110 Output = abr--------

32 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = –––––––01100111001011110 Output = abr-------- a

33 Binary trees as prefix decoders |Symbol | Code | ------------------------ | a | 0 | ------------------------ | b | 10 | ------------------------ | c | 110 | ------------------------ | d | 1110 | ------------------------ | r | 1111 | ------------------------ Abracadabra = ––––––––1100111001011110 Output = abra------- a and so on !!!

Source Coding Theorem Theorem 1.2: (Kraft-McMillan Inequality) Any prefix (prefix-free) code satisfies Conversely, given a set of codeword lengths satisfying the above Inequality, one can have instantaneous code with these word lengths!! (see only can!)

Source Coding Theorem Theorem 1.3: (Shannon, 1992) Any prefix code satisfies Can you say why?

Source Coding Theorem Proof: Try to prove that does not grow up! i.e., less than 1 ! is simply the length of n codewords; can this < n ? Can you follow ? (self study; K. Sayood page: 32, chapter 2) Theorem 1.2: (Kraft-McMillan Inequality) Any prefix code satisfies Do It Your Self!!

37 Average codeword length [bits/codeword] Kraft’s Inequality Optimal Codes!! Minimize under the constraint Disregarding integer constraints, we get that Should be minimized Kraft's inequality: Differentiate: then, Optimal codeword lengths ) the entropy limit is reached!

38 But what about the integer constraints? l i = – log p i is not always an integer! Optimal Codes!! Choosewhere such that such that

39 ● Assume that the source X is memory-free, and create the tree code for the extended source, i.e., blocks of n symbols. ● We have: or ● We can come arbitrarily close to the entropy for large n ! Optimal Codes and Theorem 1

40 In Practice ● Two practical problems need to be solved: – Bit-assignment – The integer contraint (number of bits are integer!) Theoretically: Chose l i = – log p i – and then, Rounding up not always the best! (Shannon Coding!) – OR JUST SELECT NICE VALUES :-) ● Bad Example: Binary source p 1 = 0.25, p 2 = 0.75 l 1 = – log 2 0.25 = log 2 4 = 2 l 2 = – log 2 0.75 = log 2 4/3 = 0.41504 ● OR, instead, use, e.g., the Huffman algorithm (developed by D.Huffman, 1952) to create an optimal tree code!

Source Coding Theorem Example:

42 Modeling & Coding Developing compression algorithms ● Phase I: Modeling – Develop the means to extract redundancy information – Redundancy → Predictability ● Phase II: Coding – Binary representation of the – The representation depends on the “Modeling”

43 Modeling Example 1 Let us consider this arbitrary sequence: S n = 9, 11, 11, 11, 14, 13, 15, 17, 16, 17, 20, 21 Binary encoding requires 5 – bits/sample; WHY ? Now, let us consider the model: Ŝ n = n + 8: Thus: Ŝ n = 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 Can be decoded as: e n = S n -Ŝ n : 0, 1, 0, -1, 1, -1, 0, 1, -1, -1, 1, 1 How many bits are required now? Coding: ‘00’ -1, ‘01’ 0, ‘10’ 1 2 bits/sample Therefore, the model also needs to be designed and encoded in the algorithm

44 Modeling: Yet another example Let us consider this arbitrary sequence: S n = 1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10 Assume it correctly describes the probabilities generated by the source; then P(1) = P(6) = P(7) = p(10) = 1/16 P(2) = P(3) = P(4) = P(5) = P(8) = P(9) = 2/16 Assuming the sequence is independent and identically distributed (i.i.d.) Then: However, if we found somehow a sort of correlation, then: Thus, instead of coding the samples, code the difference only: i.e.,: 1 1 1 -1 1 1 1 -1 1 1 1 1 1 -1 1 1 Now, P(1) = 13/16, P(-1) = 3/16 Then: H = 0.70 bits (per symbol)

45 Markov Models ●Assume that each output symbol depends on previous k ones. Formally: ●Let { x n } be a sequence of observations ●We call { x n } a k th -order discrete Markov chain (DMC) if ●Usually, we use a first order DMC (the knowledge of 1 symbol in the past is enough) ●The State of the Process

46 Non-Linear Markov Models ●Consider a BW image as a string of black & white pixels (e.g. row-by-row) ●Define two states: S b & S w for the current pixel ●Define probabilities: ●P(S b ) = prob of being in S b ●P(S w ) = prob of being in S w ●Transition probabilities ●P(b|b), P(b|w) ●P(w|b), P(w|w)

47 Markov Models Example ●Assume P(Sw) = 30/31 P(Sb) = 1/31 P(w/w) = 0.99 P(b/w) = 0.01 P(b/b) = 0.7 P(w/b) = 0.3 ●For the Markov Model H(Sb) = -0.3log(0.3) – 0.7log(0.7) = 0.881 H(Sw) = 0.01log(0.01) – 0.99log(0.99) = 0.088 H Markov = 30/31*0.081 + 1/31*0.881 = 0.107 ●What is iid different in ??

48 Markov Models in Text Compression ●In written English, probability of next letter is heavily influenced by previous ones ➢ E.g. “u” after “q” ●Shannon's work: ➢ He used 2nd-order MM, 26 letters + space,, found H = 3.1 bits/letter ➢ Word-based model, H=2.4bits/letter ➢ Human prediction based on 100 previous letters, he found the limits 0.6 ≤H ≤1.3 bits/letter ●Longer context => better prediction

49 Composite Source Model ●In many applications, it is not easy to use a single model to describe the source. In such cases, we can define a composite source, which can be viewed as a combination or composition of several sources, with only one source being active at any given time. E.g.: an executable contains: For very complicated resources (text, images, …) ●Solution: composite model, different sources/each with different model work in sequence :

50 Let Us Decorate This: Knowing something about the source it self can help us to ‘reduce’ the entropy This is called; Entropy Encoding Note the we cannot actually reduce the entropy of the source, as long as our coding is lossless Strictly speaking, we are only reducing our estimate of the entropy

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2.

Similar presentations

Presentation on theme: "1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2.

Similar presentations

Presentation on theme: "1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 2."— Presentation transcript:

Similar presentations

About project

Feedback