# CY2G2 Information Theory 1

## Presentation on theme: "CY2G2 Information Theory 1"— Presentation transcript:

CY2G2 Information Theory 1
A review on important bits of Part I Quantity of Information (noiseless system) a) Depends on probability of event. b) Depends on length of message. Average Information: Entropy probability of event To introduce the concept of information, we start by comparing the amount of information contained in two statements; “The sun has risen today”, and “ I am pleased to tell you that you’ve just won the lottery” (assumed this is true). We know that a communication system carries information-bearing baseband signals from one place to another over a communication channel. What information it will provide to receiver if the receiver receives a fixed amplitude and fixed frequency sinusoidal signal. It is not interesting because it is deterministic (known), and bring no information to receiver. Two questions naturally arise: (1) What is information? and (2) How to quantify information conveyed in a signal? Qualitative description of information is not helpful in this context. A quantification of information is necessary. It seems that information involves an element of uncertainty. The amount of information conveyed in an event depends on the probability of the event, depends on how much the receiver is surprised by the message. In other words, information is the resolution of uncertainty. The smaller probability (chance) of the message is, the larger the information quantity. Therefore, information is related to the inverse of probability . Source producing many symbols of probabilities etc.

CY2G2 Information Theory 1
Maximum entropy For a binary source The base of the logarithm is not important here, but base 2 is most common. This results in the unit of information as bit. Taking base ‘e’, we get unit of nats (natural unit). We present the definition in a more formal way as shown in above slide. We will illustrate this with a few examples; i. Letters of alphabet. If the letters are assumed equiprobable, the probability of a letter is 1/26, and I=log 26 =4.7bits; ii. Numerals. Assuming that numbers from 0 ~9 are equiprobable, the probability of a number is 1/10, then I =log 10= 3.32 bits. Note that in these cases, equiprobablity (or a uniformly probability distribution function ) is assumed.

CY2G2 Information Theory 1
Redundancy Conditional entropy H(j|i) If there is intersymbol influence, average information is given by Redundancy is the presence of more symbols in a message than is strictly necessary. Spoken languages usually have high redundancy. English has a redundancy of about 80 percent. Redundancy is an important concept in Information theory. In order to overcome noise and inference, to increase the reliability of the data transmitted through the channel, it is often necessary to introduce in a controlled manner some redundancy in the binary sequence from the source. Conditional probability (probability of j given i) Joint probability

Coding in noiseless channel : Source coding
(Speed of transmission is the main consideration ) Important properties of codes uniquely decodable (all combinations of code words distinct) instantaneous (no code words a prefix of another) compact (shorter code words given to more probable symbols)

CY2G2 Information Theory 1
Important parameters: where is length (in binary digits) Coding methods Fano-Shannon method Huffman’s Method

Coding methods Fano-Shannon method 1. Writing the symbol in a table in the order of descending order of probabilities ; 2. Dividing lines are inserted to successively divide the probabilities into halves, quarters, etc (or as near as possible); 3. A ‘0’ and ‘1’ are added to the code at each division. 4. Final code for each symbol is obtained by reading from towards each symbol.

s1 0.5 s2 0.2 1 100 s3 0.1 101 s4 110 s5 111 L=0.5×1+0.2 ×3+3 × 0.1 ×3=2.0 H=1.96 E=0.98

Coding methods Huffman’s Method 1. Writing the symbol in a table in the order of descending order of probabilities ; The probabilities are added in pairs from bottom and reordered. 3. A ‘0’ or ‘1’ is placed at each branch; 4. Final code for each symbol is obtained by reading from towards each symbol.

S5 S4 S3 S2 S1 0.1 0.2 0.5 S3 {S5, S4} S2 S1 0.1 0.2 0.5

S2 {S3,{S5, S4}} S1 0.2 0.3 0.5 {S2, {S3, {S5, S4}}} S1 0.5

Codes: S1: S2: S3: S4: S5:

L=0.5×1+0.2 × ×3+2 × 0.1 ×4 =2.0 H=1.96 E=0.98

Shannon’s first theorem
Shannon proved formally that if the source symbols are coded in groups of n, then the average length per symbol tends to the source entropy H as n tends to infinite. In consequence, a further increase in efficiency can be obtained by grouping the source symbols in groups, ( pairs, threes), and applying the coding procedure to the relevant probabilities of the chosen group. Matching source to channel The coding process is sometimes known as ‘matching source to channel’ , that is to making the output of the coder as suitable as possible for the channel.

Coding singly, using Fano-Shannon method
Example An information source produces a long sequence of three independent symbols A, B, C with probabilities 16/20,3/20 and 1/20 respectively; 100 such symbols are produced per second. The information is to be transmitted via a noiseless binary channel which can transmit up to 100 binary digits per second. Design a suitable compact instantaneous code and find the probabilities of the binary digits produced. 0, 1 100 symbol/s channel decoder source coder P(A)=16/20, p(B)=3/20, p(C)=1/20 Coding singly, using Fano-Shannon method A 16/20 B 3/20 1 10 c 1/20 11 P(0)=0.73, p(1)=0.27

close to maximum value of 1bit, (p(0)=p(1)).
AA 0.64 AB 0.12 1 10 BA 110 AC 0.04 11100 CA 11101 BB 0.0225 11110 BC 0.0075 111110 CB CC 0.0025 L=1.865 per pair, R=93.25bits/s p(0)=0.547, p(1)=0.453. The entropy of the output stream is –(p(1)logp(0)+p(1)logp(1))=0.993 bits. close to maximum value of 1bit, (p(0)=p(1)).

Similar presentations