# Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......

## Presentation on theme: "Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......"— Presentation transcript:

Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba...... (ii)Quaternary source (4 symbols), typical sequences abbabbabbababbbaabbabbbab....... and cdccdcdccddcccdccdccddcccdc...... (iii)Ternary source (3 symbols), typical sequence AABACACBACCBACBABAABCACBAA… (iv)Quaternary source, typical sequence 124124134124134124124 …

Definitions A source is stationary if its symbol probabilities do not change with time, e.g. Binary source: Pr(0) = Pr(1) = 0.5 Probabilities assumed same all the time A source is ergodic if it is stationary and (a)No proper subset of it is stationary i.e. source does not get locked in subset of symbols or states (b)It is not periodic i.e. the states do not occur in a regular pattern E.g. output s 1 s 2 s 3 s 1 s 4 s 3 s 1 s 4 s 5 s 1 s 2 s 5 s 1 s 4 s 3 … is periodic because s 1 occurs every 3 symbols

Review source encode/ transmit receive/ decode destination channel Ideal message signal NOISE Actual measure of information - entropy conditional entropy, mutual information entropy per symbol (or per second), entropy of Markov source redundancy information capacity (ergodic source) (i)remove redundancy to maximise information transfer (ii) use redundancy to correct transmission errors

Shannon Source Coding Theorem N identical independently distributed random variables each with entropy H(x) virtually certain that no information will be lost N H(x) bits virtually certain that information will be lost compression Number of bits 0

Optimal Coding Requirements for a code –efficiency –uniquely decodable –immunity to noise –instantaneous source encode/ transmit receive/ decode destination A X B X channel output of transmitter = input to receiver alphabet A output of source = input to encoder {a 1, a 2,... a m } X{b 1, b 2,... b n } B output of decoder = input to destination {a 1, a 2,... a m } Noise-free communication channel

Definitions Coding conversion of source symbols into a different alphabet for transmission over a channel. Input to encoder = source alphabet = encoder output alphabet = Coding necessary if n < m Code word group of output symbols corresponding to an input symbol (or group of input symbols) Code set (table) of all input symbols (or input words) and the corresponding code words Word Length number of output symbols in a code word Average Word Length (AWL) where N i = length of word for symbol a i Optimal Code has minimum average word length for a given source Efficiency where H is the entropy per symbol of the source

Binary encoding A (binary) symbol code f is a mapping or (abusing notation) where {0,1} + = {0, 1, 00, 01, 10, 11, 000, 001, … } if f has an inverse then it is uniquely decodable compression is achieved (on average) by assigning –shorter encodings to the more probable symbols in A –longer encodings to the less probable symbols easy to decode if we can identify the end of a codeword as soon as it arrives (instantaneous) –no codeword can be a prefix of another codeword –e.g 1 and 10 are prefixes of 101

Prefix codes no codeword is a prefix of any other codeword. –also known as an instantaneous or self-punctuating code, –an encoded string can be decoded from left to right without looking ahead to subsequent codewords – prefix code is uniquely decodeable (but not all uniquely decodable codes are prefix codes) –can be written as a tree, leaves = codewords a1 b10 c100 d1000 a0 b10 c110 d111 a00 b01 c10 d11 a0 b01 c011 d111

Limits on prefix codes the maximum number of codewords of length l is 2 l if we shorten one codeword, we must lengthen others to retain unique decodability For any uniquely decodable binary coding, the codeword lengths l i satisfy (Kraft inequality) 0 00 000 0000 0001 001 0010 0011 01 010 0100 0101 011 0110 0111 1 10 100 1000 1001 101 1010 1011 11 110 1100 1101 111 1110 1111

sourcecode 1code 2code 3 000000000 110001001 21000100110 31100110111 41000100 51010101 61100110100 71110111101 81000 110 91001 111 average word length 2.643.4 code 1code 2code 3 lengthvariablefixedvariable efficiency uniquely decodable instantaneous prefix Kraft inequality all source digits equally probable source entropy = log 2 10 = 3.32 bits/sym Coding example

Prefix codes (reminder) variable length uniquely decodable instantaneous can be represented as a tree no code word is a prefix of another –e.g. if ABAACA is a code word then A, AB, ABA, ABAA, ABAAC cannot be used as code words Kraft inequality

Optimal prefix codes if Pr(a 1 ) Pr(a 2 ) … Pr(a m ), then l 1 l 2 … l m where l i = length of word for symbol a i at least 2 (up to n) least probable input symbols will have the same prefix and only differ in the last output symbol every possible sequence up to l m -1 output symbols must be a code word or have one of its prefixes used as a code word (l m is the longest word length) for a binary code, the optimal word length for a symbol is equal to the information content i.e. l i = log 2 (1/p i )

Converse conversely, any set of word lengths {l i } implicitly defines a set of symbol probabilities {q i } for which the word lengths {l i } are optimal a0 b10 c110 d111 1/2 1/4 1/8

Compression - How close can we get to the entropy? We can always find a binary prefix code with average word length L satisfying Letbe the smallest integer that is x Clearly Now consider

Huffman prefix code used for image compression General approach –Work out necessary conditions for a code to be optimal –Use these to construct code from condition (3) of prefix code (earlier slide) a m x x … x 0(least probable) a m-1 x x … x 1(next probable) therefore assign final digit first e.g. consider the source on the right SymbolProbability s1s1 0.1 s2s2 0.25 s3s3 0.2 s4s4 0.45

Algorithm 1.Lay out all symbols in a line, one node per symbol 2.Merge the two least probable symbols into a single node 3.Add their probabilities and assign this to the merged node 4.Repeat until only one node remains 5.Assign binary code from last node, assigning 0 for the lower probability link at each step

Example s 1 Pr(s 1 )=0.1 s 2 Pr(s 2 )=0.25 s 3 Pr(s 3 )=0.2 s 4 Pr(s 4 )=0.45 s 1 Pr(s 1 )=0.1 s 2 Pr(s 2 )=0.25 s 3 Pr(s 3 )=0.2 s 4 Pr(s 4 )=0.45 0.3

Example - contd. s 1 Pr(s 1 )=0.1 s 2 Pr(s 2 )=0.25 s 3 Pr(s 3 )=0.2 s 4 Pr(s 4 )=0.45 0.3 0.55 1

Example - step 5 s 1 Pr(s 1 )=0.1 s 2 Pr(s 2 )=0.25 s 3 Pr(s 3 )=0.2 s 4 Pr(s 4 )=0.45 0.3 0.55 1 1 0 100 1 0 10 111110

Algorithm 1.Lay out all symbols in a line, one node per symbol 2.Merge the two least probable symbols into a single node 3.Add their probabilities and assign this to the merged node 4.Repeat until only one node remains 5.Assign binary code from last node, assigning 0 for the lower probability link at each step

Comments we can choose different ordering of 0 or 1 at each node –2 m different codes (m = number of merging nodes, i.e., not symbol nodes) –2 3 = 8 in previous example But, AWL is the same for all codes –hence source entropy and efficiency are the same What if n (number of symbols in code alphabet) is larger than 2? –Condition (2) says we can group from 2 to n symbols –Condition (3) effectively says we should use groups as large as possible and end with one composite symbol at end

Disadvantages of Huffman Code we have assumed that probabilities of our source symbols are known and fixed –symbol frequencies may vary with context (e.g. markov source) up to 1 extra bit per symbol is needed –could be serious if H(A) 1bit ! –e.g. English : entropy is approx 1 bit per character beyond symbol codes - arithmetic coding –move away from the idea that one symbol integer number of bits –e.g. Lempel-Ziv coding –not covered in this course

Another question consider a message (sequence of characters) from {a, b, c, d} encoded using the code shown what is the probability that a randomly chosen bit from the encoded message is 1? a1/20 b1/410 c1/8110 d1/8111

Shannon-Fano theorem Channel capacity –Entropy (bits/sec) of encoder determined by entropy of source (bits/sym) –If we increase the rate at which source generates information (bits/sym) eventually we will reach the limit of the encoder (bits/sec). At this point the encoders entropy will have reached a limit This is the channel capacity S-F theorem –Source has entropy H bits/symbol –Channel has capacity C bits/sec –Possible to encode the source so that its symbols can be transmitted at up to C/H symbols per second, but no faster –(general proof in notes) source encode/ transmit receive/ decode destination channel

satisfies kraft average word length

Download ppt "Question (from exercises 2) Are the following sources likely to be stationary and ergodic? (i)Binary source, typical sequence aaaabaabbabbbababbbabbbbbabbbbaabbbbbba......"

Similar presentations