Refinement of Two Fundamental Tools in Information Theory Raymond W. Yeung Institute of Network Coding The Chinese University of Hong Kong Joint work with.

Slides:



Advertisements
Similar presentations
Random Processes Introduction (2)
Advertisements

Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Information Theory EE322 Al-Sanie.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Chain Rules for Entropy
10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.
CWIT Robust Entropy Rate for Uncertain Sources: Applications to Communication and Control Systems Charalambos D. Charalambous Dept. of Electrical.
Entropy Rates of a Stochastic Process
Chapter 6 Information Theory
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Fundamental limits in Information Theory Chapter 10 :
The 1’st annual (?) workshop. 2 Communication under Channel Uncertainty: Oblivious channels Michael Langberg California Institute of Technology.
Network Coding Theory: Consolidation and Extensions Raymond Yeung Joint work with Bob Li, Ning Cai and Zhen Zhan.
Chapter 4 Probability Distributions
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 III SIZE FUNCTIONS.
1 Chapter 5 A Measure of Information. 2 Outline 5.1 Axioms for the uncertainty measure 5.2 Two Interpretations of the uncertainty function 5.3 Properties.
Lecture Slides Elementary Statistics Twelfth Edition
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
§1 Entropy and mutual information
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
The Complex Braid of Communication and Control Massimo Franceschetti.
Rate-distortion Theory for Secrecy Systems
EEET 5101 Information Theory Chapter 1
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Channel Capacity.
A Continuity Theory of Source Coding over Networks WeiHsin Gu, Michelle Effros, Mayank Bakshi, and Tracey Ho FLoWS PI Meeting, Washington DC, September.
Rei Safavi-Naini University of Calgary Joint work with: Hadi Ahmadi iCORE Information Security.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
§2 Discrete memoryless channels and their capacity function
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
A Mathematical Theory of Communication Jin Woo Shin Sang Joon Kim Paper Review By C.E. Shannon.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Transmission over composite channels with combined source-channel outage: Reza Mirghaderi and Andrea Goldsmith Work Summary STATUS QUO A subset Vo (with.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Joint Moments and Joint Characteristic Functions.
Jayanth Nayak, Ertem Tuncel, Member, IEEE, and Deniz Gündüz, Member, IEEE.
Jayanth Nayak, Ertem Tuncel, Member, IEEE, and Deniz Gündüz, Member, IEEE.
Rate Distortion Theory. Introduction The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a.
CDC 2006, San Diego 1 Control of Discrete-Time Partially- Observed Jump Linear Systems Over Causal Communication Systems C. D. Charalambous Depart. of.
Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions 5-4 Mean, Variance and Standard Deviation.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
ON AVCS WITH QUADRATIC CONSTRAINTS Farzin Haddadpour Joint work with Madhi Jafari Siavoshani, Mayank Bakshi and Sidharth Jaggi Sharif University of Technology,
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
The Concavity of the Auxiliary Function for Classical-Quantum Channels Speaker: Hao-Chung Cheng Co-work: Min-Hsiu Hsieh Date: 01/09/
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Chapter 2 Sets and Functions.
Chapter 10 Limits and the Derivative
Distributed Compression For Still Images
Availability Availability - A(t)
12. Principles of Parameter Estimation
Introduction to Information theory
Graduate School of Information Sciences, Tohoku University
Context-based Data Compression
General Strong Polarization
A Brief Introduction to Information Theory
Tim Holliday Peter Glynn Andrea Goldsmith Stanford University
STOCHASTIC HYDROLOGY Random Processes
Lecture 2: Basic Information Theory
Information Theoretical Analysis of Digital Watermarking
12. Principles of Parameter Estimation
Chapter 2 Limits and the Derivative
Presentation transcript:

Refinement of Two Fundamental Tools in Information Theory Raymond W. Yeung Institute of Network Coding The Chinese University of Hong Kong Joint work with Siu Wai Ho and Sergio Verdu

P.2 Discontinuity of Shannon’s Information Measures  Shannon’s information measures: H(X), H(X|Y), I(X;Y) and I(X;Y|Z).  They are described as continuous functions [Shannon 1948] [Csiszár & Körner 1981] [Cover & Thomas 1991] [McEliece 2002] [Yeung 2002].  All Shannon's information measures are indeed discontinuous everywhere when random variables take values from countably infinite alphabets [Ho & Yeung 2005].  e.g., X can be any positive integer.

P.3  Let P X = {1, 0, 0,...} and  As n  , we have  However, Discontinuity of Entropy

P.4  Theorem 1: For any c  0 and any X taking values from a countably infinite alphabet with H(X) < , Discontinuity of Entropy n 0

P.5  Theorem 2: For any c  0 and any X taking values from countably infinite alphabet with H(X) < , Discontinuity of Entropy n 0

Pinsker’s inequality  By Pinsker’s inequality, convergence w.r.t. implies convergence w.r.t..  Therefore, Theorem 2 implies Theorem 1. P.6

Discontinuity of Entropy P.7

P.8  Theorem 3: For any X, Y and Z taking values from countably infinite alphabet with I(X;Y|Z) < , Discontinuity of Shannon’s Information Measures

Applications: channel coding theorem lossless/lossy source coding theorems, etc. Typicality Fano ’ s Inequality Shannon ’ s Information Measures Discontinuity of Shannon’s Information Measures P.9

To Find the Capacity of a Communication Channel Alice Capacity  C 1 Typicality Channel Bob Capacity  C 2 Fano ’ s Inequality P.10

P.11 On Countably Infinite Alphabet Applications: channel coding theorem lossless/lossy source coding theorems, etc. Typicality Fano ’ s Inequality Shannon ’ s Information Measures discontinuous!

12 Typicality  Weak typicality was first introduced by Shannon [1948] to establish the source coding theorem.  Strong typicality was first used by Wolfowitz [1964] and then by Berger [1978]. It was further developed into the method of types by Csiszár and Körner [1981].  Strong typicality possesses stronger properties compared with weak typicality.  It can be used only for random variables with finite alphabet.

13  Consider an i.i.d. source {X k, k  1}, where X k taking values from a countable alphabet X.  Let for all k.  Assume H(P X ) < .  Let X = (X 1, X 2, …, X n ) Notations  For a sequence x = (x 1, x 2, …, x n )  X n,  N(x; x) is the number of occurrences of x in x  q(x; x) = n -1 N(x; x) and  Q X = {q(x; x)} is the empirical distribution of x  e.g., x = (1, 3, 2, 1, 1). N(1; x) = 3, N(2; x) = N(3; x) =1 Q X = {3/5, 1/5, 1/5}.

14  Definition (Weak typicality): For any  > 0, the weakly typical set W n [X]  with respect to P X is the set of sequences x = (x 1, x 2, …, x n )  X n such that Weak Typicality

15  Note that Weak Typicality  Definition 1 (Weak typicality): For any  > 0, the weakly typical set W n [X]  with respect to P X is the set of sequences x = (x 1, x 2, …, x n )  X n such that while Empirical entropy

16  Theorem 4 (Weak AEP): For any  > 0:  1) If x  W n [X] , then  2) For sufficiently large n,  3) For sufficiently large n, Asymptotic Equipartition Property

17 Illustration of AEP X n − Set of all n-sequences Typical Set of n-sequences: Prob. ≈ 1 ≈ Uniform distribution

18  Strong typicality has been defined in slightly different forms in the literature.  Definition 2 (Strong typicality): For | X | 0, the strongly typical set T n [X]  with respect to P X is the set of sequences x = (x 1, x 2, …, x n )  X n such that the variational distance between the empirical distribution of the sequence x and P X is small. Strong Typicality

19  Theorem 5 (Strong AEP): For a finite alphabet X and any  > 0:  1) If x  T n [X] , then  2) For sufficiently large n,  3) For sufficiently large n, Asymptotic Equipartition Property

Breakdown of Strong AEP  If strong typicality is extended (in the natural way) to countably infinite alphabets, strong AEP no longer holds  Specifically, Property 2 holds but Properties 1 and 3 do not hold. P.20

21 Typicality X n finite alphabet Weak Typicality: Strong Typicality:

22 Unified Typicality X n countably infinite alphabet Weak Typicality: Strong Typicality:  x s.t. but

23 Unified Typicality X n countably infinite alphabet Weak Typicality: Unified Typicality: Strong Typicality:

 Definition 3 (Unified typicality): For any  > 0, the unified typical set U n [X]  with respect to P X is the set of sequencesx = (x 1, x 2, …, x n )  X n such that  Weak Typicality: Strong Typicality:  Each typicality corresponds to a “distance measure”  Entropy is continuous w.r.t. the distance measure induced by unified typicality 24 Unified Typicality

25  Theorem 6 (Unified AEP): For any > 0:  1) If x  U n [X] , then  2) For sufficiently large n,  3) For sufficiently large n, Asymptotic Equipartition Property

26  Theorem 7: For any x  X n, if x  U n [X] , then x  W n [X]  and x  T n [X] , Unified Typicality

27  Consider a bivariate information source {(X k, Y k ), k  1} where (X k, Y k ) are i.i.d. with generic distribution P XY.  We use (X, Y) to denote the pair of generic random variables.  Let (X, Y) = ((X 1, Y 1 ), (X 2, Y 2 ), …, (X n, Y n )).  For the pair of sequence (x, y), the empirical distribution is Q XY = {q(x,y; x,y)} where q(x,y; x,y) = n -1 N(x,y; x,y). Unified Jointly Typicality

28  Definition 4 (Unified jointly typicality): For any  > 0, the unified typical set U n [XY]  with respect to P XY is the set of sequences(x, y)  X n  Y n such that  This definition cannot be simplified. Unified Jointly Typicality

29  Definition 5: For any x  U n [X] , the conditional typical set of Y is defined as  Theorem 8: For x  U n [X] , if then where  0 as   0 and then n   Conditional AEP

Illustration of Conditional AEP P.30

31  Rate-distortion theory  A version of rate-distortion theorem was proved by strong typicality [Cover & Thomas 1991][Yeung 2008]  It can be easily generalized to countably infinite alphabet  Multi-source network coding  The achievable information rate region in multisource network coding problem was proved by strong typicality [Yeung 2008]  It can be easily generalized to countably infinite alphabet Applications

P.32 Fano’s Inequality  Fano's inequality: For discrete random variables X and Y taking values on the same alphabet X = {1, 2,  }, let  Then where for 0 < x < 1 and h(0) = h(1) = 0.

P.33 Motivation 1  This upper bound on is not tight.  For fixed and, can always find such that  Then we can ask, for fixed P X and, what is

P.34 Motivation 2  If X is countably infinite, Fano’s inequality no longer gives an upper bound on H(X|Y).  It is possible that which can be explained by the discontinuity of entropy.   Then H(X n |Y n ) = H(X n )   but  Under what conditions   0  H(X|Y)  0 for countably infinite alphabets ? 

P.35 Tight Upper Bound on H(X|Y)  Theorem 9: Suppose, then where the right side is the tight bound dependent on  and P X. (This is the simplest of the 3 cases.)  Let

P.36 Generalizing Fano’s Inequality  Fano's inequality [Fano 1952] gives an upper bound on the conditional entropy H(X|Y) in terms of the error probability  Pr  X  Y .  e.g. P X = [0.4, 0.4, 0.1, 0.1] [Ho & Verdú 2008]  H(X|Y)H(X|Y) [Fano 1952]

P.37 Generalizing Fano’s Inequality  e.g., X is a Poisson random variable with mean equal to 10.  Fano's inequality no longer gives an upper bound on H(X|Y).  H(X|Y)H(X|Y)

P.38 Generalizing Fano’s Inequality  H(X|Y)H(X|Y)  e.g. X is a Poisson random variable with mean equal to 10.  Fano's inequality no longer gives an upper bound on H(X|Y). [Ho & Verdú 2008]

P.39 Joint Source-Channel Coding Encoder Channel Decoder k-to-n joint source-channel code

P.40 Error Probabilities  The average symbol error probability is defined as  The block error probability is defined as

P.41 Symbol Error Rate  Theorem 10: For any discrete memoryless source and general channel, the rate of a k-to-n joint source- channel code with symbol error probability k satisfies where S* is constructed from {S 1, S 2,..., S k } according to

P.42 Block Error Rate  Theorem 11: For any general discrete source and general channel, the block error probability  k of a k- to-n joint source-channel code is lower bounded by

 Weak secrecy has been considered in [Csiszár & Körner 78, Broadcast channel] and some seminal papers.  [Wyner 75, Wiretap channel I] only stated that “a large value of the equivocation implies a large value of P ew ”, where the equivocation refers to and P ew means  It is important to clarify what exactly weak secrecy implies. P.43 Information Theoretic Security

P.44 Weak Secrecy  E.g., P X = (0.4, 0.4, 0.1, 0.1). [Ho & Verdú 2008]  = P[X  Y] H(X|Y)H(X|Y) [Fano 1952] H(X)H(X)

P.45 Weak Secrecy  Theorem 12: For any discrete stationary memoryless source (i.i.d. source) with distribution P X, if  Then  Remark:  Weak Secrecy together with the stationary source assumption is insufficient to show the maximum error probability.  The proof is based on the tight upper bound on H(X|Y) in terms of error probability.

P.46 Summary Applications: channel coding theorem lossless/lossy source coding theorems Typicality Fano ’ s Inequality Weak Typicality Strong Typicality Shannon ’ s Information Measures

P.47 On Countably Infinite Alphabet Applications: channel coding theorem lossless/lossy source coding theorem Typicality Weak Typicality Shannon ’ s Information Measures discontinuous!

P.48 Unified Typicality Applications: channel coding theorem MSNC/lossy SC theorems Typicality Unified Typicality Shannon ’ s Information Measures

P.49 Unified Typicality Generalized Fano’s Inequality Applications: results on JSCC, IT security MSNC/lossy SC theorems Typicality Generalized Fano ’ s Inequality Shannon ’ s Information Measures

P.50 A lot of fundamental research in information theory are still waiting for us to investigate. Perhaps...

References  S.-W. Ho and R. W. Yeung, “On the Discontinuity of the Shannon Information Measures,” IEEE Trans. Inform. Theory, vol. 55, no. 12, pp. 5362–5374, Dec  S.-W. Ho and R. W. Yeung, “On Information Divergence Measures and a Unified Typicality,” IEEE Trans. Inform. Theory, vol. 56, no. 12, pp. 5893–5905, Dec  S.-W. Ho and S. Verdú, “On the Interplay between Conditional Entropy and Error Probability,” IEEE Trans. Inform. Theory, vol. 56, no. 12, pp. 5930–5942, Dec  S.-W. Ho, “On the Interplay between Shannon's Information Measures and Reliability Criteria,” in Proc IEEE Int. Symposium Inform. Theory (ISIT 2009), Seoul, Korea, June 28-July 3,  S.-W. Ho, “Bounds on the Rates of Joint Source-Channel Codes for General Sources and Channels,” in Proc IEEE Inform. Theory Workshop (ITW 2009), Taormina, Italy, Oct , P.51

P.52 Q & A

P.53 Why Countably Infinite Alphabet?  An important mathematical theory can provide some insights which cannot be obtained from other means.  Problems involve random variables taking values from countably infinite alphabets.  Finite alphabet is the special case.  Benefits: tighter bounds, faster convergent rates, etc.  In source coding, the alphabet size can be very large, infinite or unknown.

P.54 Discontinuity of Entropy  Entropy is a measure of uncertainty.  We can be more and more sure that a particular event will happen as time goes, but at the same time, the uncertainty of the whole picture keeps on increasing.  If one found the above statement counter-intuitive, he/she may have the concept that entropy is continuous rooted in his/her mind.  The limiting probability distribution may not fully characterize the asymptotic behavior of a Markov chain.

Discontinuity of Entropy Suppose a child hides in a shopping mall where the floor plan is shown in the next slide. In each case, the chance for him to hide in a room is directly proportional to the size of the room. We are only interested in which room the child locates in but not his exact position inside a room. Which case do you expect is the easiest to locate the child? P.55

Case A 1 blue room + 2 green rooms Case B 1 blue room + 16 green rooms Case C 1 blue room green rooms Case D 1 blue room green rooms Case ACase BCase CCase D The chance in the blue room The chance in a green room P.56

Discontinuity of Entropy From Case A to Case D, the difficulty is increasing. By the Shannon entropy, the uncertainty is increasing although the probability of the child being in the blue room is also increasing. We can continue to construct this example and make the chance in the blue room approaching to 1! The critical assumption is that the number of rooms can be unbounded. So we have seen that “ There is a very sure event ” and “ large uncertainty of the whole picture ” can exist at the same time. Imagine there is a city where everyone has a normal life everyday with probability With probability 0.01, however, any kind of accident that beyond our imagination can happen. Would you feel a big uncertainty about your life if you were living in that city? P.57

P.58 Weak secrecy is insufficient to show the maximum error probability. Example 1: Let W, V and X i be binary random variables. Suppose W and V are independent and uniform. Let

P.59 Let Then Y1Y1 Y2Y2 Y3Y3 Y4Y4... X1X1 X4X4 X9X9 X 16 X2X2 X3X3 X8X8 X 15 X5X5 X6X6 X7X7 X 14 X 10 X 11 X 12 X 13 ==== Example 1

1 Joint Unified Typicality 32 Q = {q(xy)} X Y 222 P = {p(xy)} X Y D(Q||P) << 1 Ans: Can be changed to P.60 ?

1 Joint Unified Typicality 31 Q = {q(xy)} X Y 222 P = {p(xy)} X Y ? D(Q||P) << 1 Ans: Can be changed to P.61

 Theorem 5 (Consistency): For any (x,y)  X n  Y n, if (x,y)  U n [XY] , then x  U n [X]  and y  U n [Y] .  Theorem 6 (Unified JAEP): For any  > 0:  1) If (x, y)  U n [XY] , then  2) For sufficiently large n,  3) For sufficiently large n, Asymptotic Equipartition Property P.62