Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Today 1.What is information theory about? 2.Stochastic (information) sources. 3.Information and entropy. 4.Entropy for stochastic sources. 5.The source coding theorem.

Part 1: Information Theory Claude Shannon Claude Shannon: A Mathematical Theory of Communication The Bell System Technical Journal, 1948 Be careful! Sometimes referred to as Shannon-Weaver, since the standalone publication has a foreword by Weaver. Be careful!

Quotes about Shannon What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable commodity. What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable commodity. Today, Shannons insight help shape virtually all systems that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines to deep space probes. Today, Shannons insight help shape virtually all systems that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines to deep space probes. Information theory has also infilitrated fields outside communications, including linguistics, psychology, economics, biology, even the arts. Information theory has also infilitrated fields outside communications, including linguistics, psychology, economics, biology, even the arts.

Change to an efficient representation, i.e., data compression. Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder Channel Any source of informationChange to an efficient representation for, transmission, i.e., error control coding. Recover from channel distortion.Uncompress The channel is anything transmitting or storing information – a radio link, a cable, a disk, a CD, a piece of paper, …

Fundamental Entities Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder Channel H H: The information content of the source. H R R: Rate from the source coder. R C C C C: Channel capacity.

Shannon 2 Shannon 2: Source coding and channel coding can be optimized independently, and binary symbols can be used as intermediate format. Assumption: Arbitrarily long delays. Fundamental Theorems Source Channel coder Source coder Channel Source decoder Sink, receiver Channel decoder ChannelHR C C Shannon 1 Shannon 1: Error-free transmission possible if R ¸ H and C ¸ R. Source coding theorem (simplified)Channel coding theorem (simplified)

Part 2: Stochastic sources A source outputs symbols X 1, X 2,... A source outputs symbols X 1, X 2,... Each symbol take its value from an alphabet A = (a 1, a 2, …). Each symbol take its value from an alphabet A = (a 1, a 2, …). Model: P(X 1,…,X N ) assumed to be known for all combinations. Model: P(X 1,…,X N ) assumed to be known for all combinations. Source X 1, X 2, … Example 1: A text is a sequence of symbols each taking its value from the alphabet A = (a, …, z, A, …, Z, 1, 2, …9, !, ?, …). Example 2: A (digitized) grayscale image is a sequence of symbols each taking its value from the alphabet A = (0,1) or A = (0, …, 255).

Two Special Cases 1.The Memoryless Source Each symbol independent of the previous ones. Each symbol independent of the previous ones. P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 ) ¢ … ¢ P(X n ) P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 ) ¢ … ¢ P(X n ) 2.The Markov Source Each symbol depends on the previous one. Each symbol depends on the previous one. P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 |X 1 ) ¢ P(X 3 |X 2 ) ¢ … ¢ P(X n |X n-1 ) P(X 1, X 2, …, X n ) = P(X 1 ) ¢ P(X 2 |X 1 ) ¢ P(X 3 |X 2 ) ¢ … ¢ P(X n |X n-1 )

The Markov Source A symbol depends only on the previous symbol, so the source can be modelled by a state diagram. A symbol depends only on the previous symbol, so the source can be modelled by a state diagram. a b c 1.0 0.5 0.7 0.3 0.2 A ternary source with alphabet A = (a, b, c).

The Markov Source Assume we are in state a, i.e., X k = a. Assume we are in state a, i.e., X k = a. The probabilities for the next symbol are: The probabilities for the next symbol are: a b c 1.0 0.5 0.7 0.3 0.3 0.2 P(Xk+1 = a | Xk = a) = 0.3 P(Xk+1 = b bb b | Xk = a) = 0.7 P(Xk+1 = c | Xk = a) = 0

The Markov Source So, if X k+1 = b, we know that X k+2 will equal c. So, if X k+1 = b, we know that X k+2 will equal c. a b c 1.0 0.5 0.7 0.3 0.2 P(X k+2 = a | X k+1 = b) = 0 P(X k+2 =b | X k+1 = b) = 0 P(X k+2 = c | X k+1 = b) = 1 P(X k+2 = a | X k+1 = b) = 0 P(X k+2 = b | X k+1 = b) = 0 P(X k+2 = c | X k+1 = b) = 1

The Markov Source If all the states can be reached, the stationary probabilities for the states can be calculated from the given transition probabilities. If all the states can be reached, the stationary probabilities for the states can be calculated from the given transition probabilities. Markov models can be used to represent sources with dependencies more than one step back. Markov models can be used to represent sources with dependencies more than one step back. –Use a state diagram with several symbols in each state. Stationary probabilities? Thats the probabilities i = P(X k = a i ) for any k when X k-1, X k-2, … are not given.

Analysis and Synthesis Stochastic models can be used for analysing a source. Stochastic models can be used for analysing a source. –Find a model that well represents the real-world source, and then analyse the model instead of the real world. Stochastic models can be used for synthesizing a source. Stochastic models can be used for synthesizing a source. –Use a random number generator in each step of a Markov model to generate a sequence simulating the source.

Show plastic slides!

Part 3: Information and Entropy Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads? Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when we are told that the outcome is heads? –If its a fair coin, i.e., P (heads) = P (tails) = 0.5, we say that the amount of information is 1 bit. –If we already know that it will be (or was) heads, i.e., P (heads) = 1, the amount of information is zero! –If the coin is not fair, e.g., P (heads) = 0.9, the amount of information is more than zero but less than one bit! –Intuitively, the amount of information received is the same if P (heads) = 0.9 or P (heads) = 0.1.

Self Information So, lets look at it the way Shannon did. So, lets look at it the way Shannon did. Assume a memoryless source with Assume a memoryless source with –alphabet A = (a 1, …, a n ) –symbol probabilities (p 1, …, p n ). How much information do we get when finding out that the next symbol is a i ? How much information do we get when finding out that the next symbol is a i ? According to Shannon the self information of a i is According to Shannon the self information of a i is

Why? Assume two independent events A and B, with probabilities P(A) = p A and P(B) = p B. For both the events to happen, the probability is p A ¢ p B. However, the amount of information should be added, not multiplied. Logarithms satisfy this! No, we want the information to increase with decreasing probabilities, so lets use the negative logarithm.

Self Information Example 1: Example 2: Which logarithm? Pick the one you like! If you pick the natural log, youll measure in nats, if you pick the 10-log, youll get Hartleys, if you pick the 2-log (like everyone else), youll get bits.

Self Information H(X) is called the first order entropy of the source. This can be regarded as the degree of uncertainty about the following symbol. On average over all the symbols, we get:

Entropy Example: Example: Binary Memoryless Source BMS 0 1 1 0 1 0 0 0 … 1 00.51 The uncertainty (information) is greatest when Often denoted Then Let

Entropy: Three properties 1.It can be shown that 0 · H · log N. 2.Maximum entropy ( H = log N ) is reached when all symbols are equiprobable, i.e., p i = 1/N. 3.The difference log N – H is called the redundancy of the source.

Part 4: Entropy for Memory Sources Assume a block of source symbols (X 1, …, X n ) and define the block entropy: Assume a block of source symbols (X 1, …, X n ) and define the block entropy: The entropy for a memory source is defined as: The entropy for a memory source is defined as: That is, the summation is done over all possible combinations of n symbols. That is, let the block length go towards infintity. Divide by n to get the number of bits / symbol.

Entropy for a Markov Source The entropy for a state S k can be expressed as Averaging over all states, we get the entropy for the Markov source as P kl is the transition probability from state k to state l.

The Run-length Source Certain sources generate long runs or bursts of equal symbols. Certain sources generate long runs or bursts of equal symbols. Example: Example: Probability for a burst of length r : P(r) = (1- ) r-1 ¢ Probability for a burst of length r : P(r) = (1- ) r-1 ¢ Entropy: H R = - r=1 1 P(r) log P(r) Entropy: H R = - r=1 1 P(r) log P(r) If the average run length is, then H R / = H M. If the average run length is, then H R / = H M. A B

Part 5: The Source Coding Theorem The entropy is the smallest number of bits allowing error-free representation of the source. Why is this? Lets take a look on typical sequences!

Typical Sequences Assume a long sequence from a binary memoryless source with P(1) = p. Assume a long sequence from a binary memoryless source with P(1) = p. Among n bits, there will be approximately w = n ¢ p ones. Among n bits, there will be approximately w = n ¢ p ones. Thus, there is M = (n over w) such typical sequences! Thus, there is M = (n over w) such typical sequences! Only these sequences are interesting. All other sequences will appear with smaller probability the larger is n. Only these sequences are interesting. All other sequences will appear with smaller probability the larger is n.

How many are the typical sequences? bits/symbol Enumeration needs log M bits, i.e, bits per symbol!

How many bits do we need? Thus, we need H(X) bits per symbol to code any typical sequence!

The Source Coding Theorem Does tell us Does tell us –that we can represent the output from a source X using H(X) bits/symbol. –that we cannot do better. Does not tell us Does not tell us –how to do it.

Summary The mathematical model of communication. The mathematical model of communication. –Source, source coder, channel coder, channel,… –Rate, entropy, channel capacity. Information theoretical entities Information theoretical entities –Information, self-information, uncertainty, entropy. Sources Sources –BMS, Markov, RL The Source Coding Theorem The Source Coding Theorem

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Similar presentations

Presentation on theme: "Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)

Similar presentations

Presentation on theme: "Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)"— Presentation transcript:

Similar presentations

About project

Feedback