# Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen.

## Presentation on theme: "Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen."— Presentation transcript:

Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen bit from the encoded message is 1? a1/20 b1/410 c1/8110 d1/8111 = (expected number of 1s/expected number of bits)

Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen bit from the encoded message is 1? a1/20 b1/410 c1/8110 d1/8111

Shannon-Fano theorem Channel capacity –Entropy (bits/sec) of encoder determined by entropy of source (bits/sym) –If we increase the rate at which source generates information (bits/sym) eventually we will reach the limit of the encoder (bits/sec). At this point the encoders entropy will have reached a limit This is the channel capacity S-F theorem –Source has entropy H bits/symbol –Channel has capacity C bits/sec –Possible to encode the source so that its symbols can be transmitted at up to C/H symbols per second, but no faster –(general proof in notes) source encode/ transmit receive/ decode destination channel

Conditional Entropy (lecture 3) conditional entropy of A given B=b k is the entropy of the probability distribution Pr(A|B=b k ) the conditional entropy of A given B is the average of this quantity over all b k the average uncertainty about A when B is known

Mutual information (lecture 3) H(A) – H(A|B) = H(B) – H(B|A) = H(B) + H(A|B) = H(A) +H(B|A) H(B, A ) = H(A, B) Rearrange: I(A ; B)I(B ; A) I(A ; B) = information about A contained in B H(A,B) H(A) H(B) H(A|B)H(B|A)I(A;B)

Mutual information - example A B 0 with probability p 1 with probability 1-p 0 with probability q 1 with probability 1-q C c= a+b mod 2 if p=q = 0.5, (i) what is the probability that c = 0 ? (ii) what is I(C;A) ? What if p= 0.5 and q = 0.1 ? What about the general case, any p, q ? transmit (A)receive (C) noise (B)

General case a=0 with probability p - in this case, Pr(c=0) = q, Pr(c=1) = 1-q a=1 with probability 1-p - in this case, Pr(c=0) = 1-q, Pr(c=1) = q average uncertainty about C given A = I(A;C) = H(C) - H(B) H(A|C) H(C) H(A) H(C|A) = H(B)

Discrete Channel with Noise noise channel source encode/ transmit receive/ decode dest. A X Y B A equivocation= H(X | Y ) transmission rate = H(X ) H(X | Y ) channel capacity = max (transmission rate)

Noisy Channels A noisy channel consists of an input alphabet X, an output alphabet Y and a set of conditional distributions Pr(y|x) for each y Y and x X binary symmetric channel 0 1 0 1 x y

Inferring input from output 0 1 0 1 x y error probability = 0.15 source distribution P(0)=0.9 observe y=1 Use Bayes x=0 is still more probable than x=1

Other useful models binary erasure channel 0 1 0 1 x y Z channel 0 1 0 1 x y ?

Information conveyed by a channel input distribution P(x), output distribution P(y) mutual information I(X;Y) what is the distribution P(x) that maximises I (channel capacity) I(X;Y) H(X|Y) H(Y|X) H(Y) H(X) (also depends on error matrix)

Shannons Fundamental Theorem Consider source with entropy R and a channel with capacity C such that R < C. There is a way of (block) coding the source so that it can be transmitted with arbitrarily small error group input symbols together (block code) use spare capacity for error correcting code (Hamming code etc.)

Example - noisy dice 1 2 3 4 5 6 1 2 3 4 5 6 imagine restricting the input symbols to 2 and 5 This is a non-confusable subset - - for any output, we would know the input (similarly (1, 4) or {3, 6} )

Outline of proof consider sequence of signals of length N –as N increases, probability of error reduces –(typical outputs are unlikely to overlap) –as N, Pr(error) 0 –e.g. binary symmetric channel, f=0.15 repeat signal N times (figures from Mackay, Information Theory, Inference, and Learning Algorithms, CUP)

Outline consider long time T, sequence length N –2 NH(X) typical source sequences, each occurs with probability 2 -H(X) –a typical received signal y corresponds to 2 N(X|Y) possible inputs –choose 2 NR random input sequences to represent our source messages –consider transmitting x i - if it is corrupted, it may be decoded as x j where ji –if y is received, it corresponds to a set of inputs S y since R < C we can make this as small as we like by choosing large N

Error Detection / Correction noise channel source encode/ transmit receive/ decode dest. A X Y B Error code Error detect Error correct resend Error-detecting code –Detect if one or more digits has been changed –Cannot say which digits have changed –E.g. parity check Error-correcting code –Error-detection as on left –Can also work out which digits have been changed –E.g. Hamming code

Error detection If code words are very similar then it is difficult to detect errors –e.g. 1010110100 and 1010110101 If code words are very different then easier to detect errors –e.g. 1111111111 and 0000000000 Therefore more different code words the better –Measure using Hamming Distance, d number of different digits e.g. 011011 and 010101 differ in 3 places, therefore d = 3 011011 010101 Differ ?

Hamming distance Measure of distance between words Choose nearest code word e.g. a = 10000 b = 01100 c = 10011 Use d to predict number of errors we can detect/correct e.g. parity check 10000 sent 10001 rec. 11000 rec. d 2e+1 Can correct up to e errors per word d = 2eCan correct up to e-1 errors per word, can detect e errors d e+1 Can detect up to e errors per word

Error correction Like error-detecting code, but need more bits (obvious really!). More efficient when larger code words are being used Overhead of coding/decoding arithmetic Hamming code –D = number of data bits –P = number of parity bits –C = D + P = code word length –Hamming inequality for single error correction: D + P + 1 2 P –If P is small – hardly worth doing – cheaper to re-send code word –If P 3 some increase in transmission rate is possible

Hamming code Process: –Coding Take code word from encoder (before adding parity bits) and multiply by generator matrix G using modulo-2 arithmetic This gives the code word –(d 1, …, d D, p 1, …, p P ) –Decoding Take received code word (D+P bits) and multiply by decoder matrix X using modulo-2 arithmetic This gives us a syndrome (or parity) vector s If s contains all zeros then no errors Vector s is matched against X to find position of single error.

Example E.g. d = 4, p = 3 G = [ I | A ] Encode 1001 X = [A T | I] Receive 1101001 & decode Error in bit 2

Summary probability joint conditional bayes entropy decomposition conditional mututal information sources simple Markov stationary ergodic information capacity source coding theorem coding optimal / compression channel capacity shannons fundamental theorem error correction/detection noise channel source encode/ transmit receive/ decode dest. A X Y B

Next stop … Theory of Discrete Information and Communication Systems weeks 1-6 Communications Systems Performance (Mark Beach) Coursework Languages, Automata and Complexity (Colin Campbell) 1st order Predicate Logic (Enza di Tomaso) EMAT 31520 Information Systems (CSE 3, Eng Maths 3, Knowledge Eng 3) EMAT 20530 Logic and Information (CSE 2, Eng Maths 2) EENG 32000 Communication Systems (EE 3, Avionics 3) EENG M2100 Communication Systems (MSc Comms/Sig Proc) weeks 7-12

Pictorial representation I(X;Y) H(X|Y) H(Y|X) H(Y) H(X) (from Volker Kuhn, Bremen)

error correcting - transmitted bits - choose st even parity in each set s - source bits any two 4-bit codewords differ in at least 3 places

xi xj

graph 00000 0000100010001000100010000 00011001010100110001001100101010010101000110011000

Similar presentations