Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5.

Similar presentations


Presentation on theme: "1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5."— Presentation transcript:

1 1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5 1/0.2  = 0.5   = 0.4  + 0.8  + 0.5   = 0.6  + 0.2   +  +  = 1  ( , ,  ) = (0.1,0.7, 0.2) P(010)= P(A)P(010|A) + P(B)P(010|B) + P(C)P(010|C) = 0.1 ∙ 0.4 ∙0.2 ∙0.5 +... = 0.09

2 today’s class We define quantitative measure of information. step 1: the entropy of information sources large entropy  large uncertainty step 2: information in one (ambiguous) message information = (BEFORE entropy) – (AFTER entropy) step 3: mutual information between two events event here, and the event at the other end of channel 2

3 the entropy of memoryless source S: a memoryless & stationary information source with... 3 The first-order entropy ( 一次エントロピー ) of S is a1p1a1p1 a2p2a2p2 aMpMaMpM... symbol probability nonnegative ( 非負 )  the entropy  0

4 examples of entropy computation 4

5 what does the entropy represent? 5 (binary entropy function) p H(p)H(p) 1.00.5 1.0 large entropy  more uncertain about the result  more difficult to predict

6 general M-ary case Lemma ( 補題 ): For an M-ary information source S, max H 1 (S) = log 2 M (when all symbols are with prob. 1/M) min H 1 (S) = 0 (one symbol is with prob. 1, the others with 0) 6 Proof (sketch): use recursion to show that

7 joint entropy the joint ( 結合 ) entropy of S A and S B ; 7 Lemma: H 1 (S 1,S 2 ) ≤ H 1 (S 1 ) + H 1 (S 2 ). Lemma: H 1 (S 1,S 2 ) = H 1 (S 1 ) + H 1 (S 2 ) if P(S 1,S 2 ) = P(S 1 )P(S 2 ). SASA SBSB SBSB SASA

8 conditional entropy the conditional ( 条件付 ) entropy of S A conditioned by S B ; 8 Lemma: H 1 (S A |S B ) = H 1 (S A ) if P(S A,S B ) = P(S A )P(S B ). Lemma: H 1 (S A,S B ) = H 1 (S A ) + H 1 (S B |S A ) = H 1 (S B ) + H 1 (S A |S B ). H1(SA)H1(SA) H1(SB|SA)H1(SB|SA) H1(SB)H1(SB) H1(SA|SB)H1(SA|SB) H 1 (S A, S B ) to predict the events of S A and S B, first predict the event a of S A, and predict the event b of S B assuming a is correct.

9 extension of information sources slides 9 to 16 have... little relation to the rest of today’s talk. strong relation to the chapter 2 of this course. S n : n-th order extension of the information source S; the set of symbols of S n is M n = {(a 1,..., a n ) ; a i  M} P(a 1,..., a n ) = P(a 1 ) P(a 2 |a 1 )... P(a n |a 1,...,a n–1 )... “A symbol of S n is a block of n symbols of S.” 9 S 0 1 0 0 0 1 0 1 S2S2 M = {0, 1}M = {00, 01, 10, 11}

10 more entropies the second-order extension of coin-toss 10 the first-order entropy of this second-order extended source: H 1 (S 2 ) = 4×(–0.25log 2 0.25) = 2 bits this is the difficultness to predict two symbols in block two times difficult, because we have two symbols... really? for the further discussion, define... (extended) symbol probability HH 0.25 HT 0.25 TH 0.25T the n-th order entropy of S the (limit) entropy of S

11 memoryless case S: binary memoryless information source with P(0)=0.8, P(1)=0.2. 11 0101 0.8 0.2 S H 1 (S)= –0.8log0.8 – 0.2log0.2 = 0.72 00 01 10 11 0.64 0.16 0.04 S2S2 H 1 (S 2 )=–0.64log0.64 – 0.16log0.16 –0.16log0.16 – 0.04log0.04 = 1.44 H 2 (S) = H 1 (S 2 )/2 = 1.44/2 = 0.72 For this particular example, we have H 1 (S n ) = 0.72n for any n.  H n (S) = H(S) = 0.72 entropy = first-order entropy

12 proof for the memoryless case Theorem: If S is memoryless & stationary, then H 1 (S n ) = nH 1 (S). sketch of the proof, for case n = 2 12 memoryless: P(x 0, x 1 ) = P(x 0 )P(x 1 ) the sum of P(x 0 ) is 1 Corollary: If S is memoryless & stationary, then H (S) = H 1 (S).

13 extension of Markov sources 13 AB 0/0.9 1/0.1 0/0.41/0.6 stationary probability: ( ,  ) = (0.8, 0.2) 00.8·0.9 + 0.2·0.4 = 0.80 10.8·0.1 + 0.2·0.6 = 0.20 000.8·0.9·0.9 + 0.2·0.4·0.9 = 0.72 010.8·0.9·0.1 + 0.2·0.4·0.1 = 0.08 100.8·0.1·0.4 + 0.2·0.6·0.4 = 0.08 110.8·0.1·0.6 + 0.2·0.6·0.6 = 0.12 H 1 (S) = 0.722 H 1 (S 2 ) = 1.2914 H 2 (S) = H 1 (S 2 )/2 = 0.6457 It is easier to predict two symbols in block, than to predict two symbols separately.

14 the entropy of Markov sources For a Markov source, we have H 1 (S) > H 2 (S) >.... Theorem: The n-th order entropy approaches to the limit entropy. How to compute the limit entropy H(S) of Markov source: 1. determine the stationary probabilities. 2. modify transition edges to self-loops. 3. compute entropies of each state. 4. determine the weighted average of the entropies. 14 n Hn(S)Hn(S) H(S)H(S)

15 example 15 AB 0/0.9 1/0.1 0/0.41/0.6 stationary probabilities: ( ,  ) = (0.8, 0.2) A 0/0.9 1/0.1 B 0/0.4 1/0.6 the weighted average = 0.8×0.469 + 0.2×0.971= 0.5694 bit = H(S)

16 summary of the extension of Markov source For a Markov information source... the n-th order entropy decreases as n increases. the n-th order entropy approaches to the limit entropy. This is the case for general information sources with memory. For Markov sources, furthermore, the limit entropy is determined easily. For information sources with memory, to predict n symbols in a block is easier than to predict one symbol n times separately. 16 n Hn(S)Hn(S) H(S)H(S)

17 today’s class We define quantitative measure of information. step 1: the entropy of information sources large entropy  large uncertainty step 2: information in one (ambiguous) message information = (BEFORE entropy) – (AFTER entropy) step 3: mutual information between two events event here, and the event at the other end of channel 17

18 information in a message There was a game of Tigers, but you don’t know the result. Statistics says P(win) = P(draw) = P(lose) = 1/3. Mr. A, a Giants fan, tweets: “Tigers did not lose” BEFORE you know the tweet...large uncertainty P(win) = P(draw) = P(lose) = 1/3 AFTER you know the tweet...the uncertainty reduced P(win) = P(draw) = 1/2, P(lose) = 0 The uncertainty reduced because the tweet had information. 18

19 use the entropy BEFORE P(win) = P(draw) = P(lose) = 1/3 the entropy is 3×(–1/3)log 2 (1/3) = 1.585 bit AFTER P(win) = P(draw) = 1/2, P(lose) = 0 the entropy is 2×(–1/2)log 2 (1/2) = 1.000 bit The amount of information in the tweet was; 1.585 – 1.000 = 0.585 bit 19

20 general case An information source S at remote site produces a symbol. You obtain a “hint” on the produced symbol. The hint tells you that S behaves as another source S’. 20 The amount of information brought by the hint is H(S) – H(S’) bit.

21 another friend Mr. B., whose behavior is given as the diagram, tweeted “oh!” yesterday. the amount of information in “oh!”? P(win | oh!) = 0.25 P(draw | oh!) = 0.50 P(lose | oh!) = 0.25 the entropy AFTER receiving “oh!” is 1.5 bit. the information is 1.585 – 1.5 = 0.085 bit only. Mr. A’s tweet “Tigers did not lose” had 0.585 bit information.  Mr. B is less valuable than Mr. A yesterday. 21 result win draw lose tweet won! oh! lost! 0.5 1.0

22 another friend, another day Mr. B tweeted “won!” today. the amount of information in “won!”? P(win | won!) = 1 P(draw | won!) = 0 P(lose | won!) = 0 the entropy AFTER receiving “won!” is 0 bit. the information is so much as 1.585 – 0 = 1.585 bit.  Today, Mr. B is much more valuable than Mr. A. Which of Mr. A or Mr. B is more valuable to you? 22 result win draw lose tweet won! oh! lost! 0.5 1.0

23 the average amount of information Mr. B “won!” with prob. 1/6...1.585 bit info. “oh!” with prob. 2/3...0.085 bit info. “lost!” with prob. 1/6...1.585 bit info. 1.585×1/6 +... = 0.585 bit in average. Mr. A “not lost” with prob. 2/3...0.585 bit info. “lost!” with prob. 1/3...1.585 bit info. 0.585×2/3 +... = 0.918 bit in average. Mr. A gives 0.333 bit more information. 23 Mr. B win draw lose won! oh! lost! 0.5 1.0 Mr. A win draw lose not lost lost! 1.0

24 what have we computed? X = {win, draw, lose}... results of the game Y = {won!, oh!, lost!}... tweets of Mr. B The average amount of information was computed as 24 the mutual information between X and Y ( 相互情報量 )

25 properties of mutual information Lemma: I(X; Y) = I(Y; X) proof: I(X; Y)= H(X) – H(X|Y) = H(X) – (H(X, Y) – H(Y)) = H(X) – (H(X) + H(Y|X) – H(Y)) = H(Y) – H(Y|X) = I(Y; X) Lemma: H(X, Y) = H(X) + H(Y) – I(X; Y) 25 H(X)H(X) H(Y|X)H(Y|X) H(Y)H(Y) H(X|Y)H(X|Y) H(X, Y) I(X; Y) = I(Y; X) H(X, Y)= H(X) + H(Y|X) = H(Y) + H(X|Y)

26 mutual information and communication Mr. A and Mr. B... two channels with different characteristics 26 “not lost” “oh!” “win” We evaluated the mutual information between the input and output of the channel;...the mutual information gives an answer to the question: “How much information can we send through the channel?” × weighted average

27 mutual information to the capacity The mutual information changes if... a channel is connected to a different information source 27 win draw lose 1/3 0.918 bit not lost lost! 2/3 1/3 win draw lose not lost lost! Mr. A win draw lose 1/2 0 1/2 1.000 bit not lost lost! 1/2 channel capacity ( 通信路容量 )...the maximum of the mutual information (  chapter 3)

28 summary of today’s class the entropy of information sources large entropy  large uncertainty information in one (ambiguous) message information = (BEFORE entropy) – (AFTER entropy) mutual information between two events event here, and the event at the other end of channel 28

29 exercise The tables show 100 days statistics of the weather (X) and its forecasts (Y 1 and Y 2 ). Y 1 made 60 (=45+15) “sunny” forecasts, but only 45 days out of 60 were really sunny... Q1: Compute P(X=Y 1 ) and P(X=Y 2 ). Q2:Compute I(X; Y 1 ) and I(X; Y 2 ). Q3: Which is the better forecasting? 29 X sunny rain sunny 45 15 rain 12 28 Y1Y1 X sunny rain sunny 0 43 rain 57 0 Y2Y2


Download ppt "1 exercise in the previous class Determine the stationary probabilities. Compute the probability that 010 is produced. A BC 0/0.4 0/0.5 1/0.6 0/0.81/0.5."

Similar presentations


Ads by Google