§1 Entropy and mutual information

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

Binary Symmetric channel (BSC) is idealised model used for noisy channel. symmetric p( 01) =p(10)
Sampling and Pulse Code Modulation
Chapter 2 Concepts of Prob. Theory
Information Theory EE322 Al-Sanie.
Chain Rules for Entropy
Entropy Rates of a Stochastic Process
Chapter 6 Information Theory
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Fundamental limits in Information Theory Chapter 10 :
Information Theory and Security. Lecture Motivation Up to this point we have seen: –Classical Crypto –Symmetric Crypto –Asymmetric Crypto These systems.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Distributed Source Coding 教師 : 楊士萱 老師 學生 : 李桐照. Talk OutLine Introduction of DSCIntroduction of DSC Introduction of SWCQIntroduction of SWCQ ConclusionConclusion.
June 1, 2004Computer Security: Art and Science © Matt Bishop Slide #32-1 Chapter 32: Entropy and Uncertainty Conditional, joint probability Entropy.
Lecture 2: Basic Information Theory Thinh Nguyen Oregon State University.
Information Theory and Security
Joint Probability distribution
X= {x 0, x 1,….,x J-1 } Y= {y 0, y 1, ….,y K-1 } Channel Finite set of input (X= {x 0, x 1,….,x J-1 }), and output (Y= {y 0, y 1,….,y K-1 }) alphabet.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
2. Mathematical Foundations
Information Theory & Coding…
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
§4 Continuous source and Gaussian channel
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
Channel Capacity.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
§2 Discrete memoryless channels and their capacity function
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
1 Information Theory Nathanael Paul Oct. 09, 2002.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Source Coding Efficient Data Representation A.J. Han Vinck.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Random Variables Example:
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
President UniversityErwin SitompulPBST 3/1 Dr.-Ing. Erwin Sitompul President University Lecture 3 Probability and Statistics
Chapter 5 Discrete Random Variables Probability Distributions
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
The Channel and Mutual Information
Basic Concepts of Information Theory A measure of uncertainty. Entropy. 1.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Lecture 3 Appendix 1 Computation of the conditional entropy.
Mutual Information, Joint Entropy & Conditional Entropy
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Mutual Information and Channel Capacity Multimedia Security.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Chapter 4: Information Theory. Learning Objectives LO 4.1 – Understand discrete and continuous messages, message sources, amount of information and its.
Discrete Random Variable Random Process. The Notion of A Random Variable We expect some measurement or numerical attribute of the outcome of a random.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Chapter 6: Discrete Probability
Introduction to Information theory
What is Probability? Quantification of uncertainty.
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Corpora and Statistical Methods
Hiroki Sayama NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory p I(p)
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Subject Name: Information Theory Coding Subject Code: 10EC55
Random Variables and Probability Distributions
Lecture 2 Basic Concepts on Probability (Section 0.2)
Presentation transcript:

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2 Discrete random vectors §1.1.1 Discrete memoryless source and entropy §1.1.2 Discrete memoryless channel and mutual information

§1.1.1 Discrete memoryless source and entropy 1. DMS (Discrete memoryless source ) Probability Space: Example 1.1.1 Let X represent the outcome of a single roll of a fair die.

§1.1.1 Discrete memoryless source and entropy 2. self information Example 1.1.2 red white red white blue black 板书(例2——摸球及猜字【单猜及双猜】) 函数I(ai) = f [P(ai)]应满足以下条件: Analyse the uncertainty of red ball selected from X and from Y.

§1.1.1 Discrete memoryless source and entropy 2. self information I(ai) = f [p(ai)] Satisfy: 1) I(ai) is the monotone decreasing function of p(ai): if p(a1)> p(a2), then I(a1) < I(a2); 2) if p(ai)=1, then I(ai)=0; 3) if p(ai)=0 , then I(ai)→∞; 4) if p(ai aj)=p(ai) p(aj), then I(aiaj)=I(ai)+I(aj) 板书(例2——摸球及猜字【单猜及双猜】) 函数I(ai) = f [P(ai)]应满足以下条件:

§1.1.1 Discrete memoryless source and entropy self information bit nat hart Remark: I(ai) p(ai) 1 The measure of uncertainty of the random variable ai The measure of information the random variable ai provides. a and b are statistically independent

§1.1.1 Discrete memoryless source and entropy Definition: Suppose X is a discrete random variable, whose range R={a1,a2,…} is finite or countable. Let p(ai)=P{X=ai}. The entropy of X is defined by 2009.02.11到此,次2 第1节后20分钟和第2节前10分钟由学生介绍上次留下的英文文章的翻译及理解 uncertainty (or randomness) about X. average A measure of amount of information provided by X.

§1.1.1 Discrete memoryless source and entropy Entropy-the amount of “information”provided by an observation of X Example 1.1.3 100 balls in a bag, 80% is red, and remain is white. Now , we fetch out a ball. How about the information of every fetching? Let X represent the color of the ball.a1--red,a2--white =0.722 bit/sig

§1.1.1 Discrete memoryless source and entropy Entropy-the “uncertainty” or “randomness” about X average Example 1.1.4

§1.1.1 Discrete memoryless source and entropy Note: 1) units: bit/sig,nat/sig,hart/sig 2) If p(ai)=0, p(ai)log p(ai)-1 = 0 3) If R is infinite , H(X) may be +

§1.1.1 Discrete memoryless source and entropy Example 1.1.5 entropy of BS entropy function probability vector

4. The properties of entropy §1.1.1 Discrete memoryless source and entropy 4. The properties of entropy Theorem1.1 Let X assume values in R={x1,x2,…,xr}. (Theorem 1.1 in textbook) 1) 2) H(X) = 0 iff pi = 1 for some i 3) H(X) ≤ logr ,with equality iff pi = 1/r for all i Nonnegative Certainty Extremum ——base of data compressing Proof:

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 4) Example 1.1.6 Let X,Y,Z are all discrete random variables: symmetry

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 5) If X,Y are independent , then H(XY) = H(X) + H(Y) Proof: addition

§1.1.1 Discrete memoryless source and entropy Proof: Joint source: + = H(X) H(Y)

§1.1.1 Discrete memoryless source and entropy 4. The properties of entropy 6) Convex properties Theorem1.2 The entropy function H(p1,p2,…,pr) is a convex function of probability vector (p1,p2,…,pr) . H p Example 1.1.5 (continued) entropy of BS 1 1/2 1

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Definition: X, Y are a pair of random variables, if (X,Y)~p(x,y) Then the conditional entropy of X , given Y is defined by

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Analyse:

§1.1.1 Discrete memoryless source and entropy 5. conditional entropy Example 1.1.7 H(X) = H(2/3,1/3)=0.9183 bit/sig H(X)=? 3/4 1 ? 1/4 1/2 X Y H(X|Y=0) = 0 H(X|Y=0) = ? H(X|Y=1) = 0 H(X|Y=?) = H(1/2,1/2)=1 bit/sig H(X|Y=?) = ? H(X|Y) = ? H(X|Y) = 1/3 bit/sig pX(0)=2/3, pX(1) = 1/3

5. conditional entropy Theorem1.3 §1.1.1 Discrete memoryless source and entropy 5. conditional entropy Theorem1.3 (conditioning reduces entropy) with equality iff X and Y are independent. Proof:

Measure of information Review KeyWords: Measure of information self information entropy properties of entropy conditional entropy

Homework P44: T1.1, P44: T1.4, P44: T1.6, 4. Let X be a random variable taking on a finite number of values. What is the relationship of H(X) or H(Y) if (1) Y=2X ? (2) Y=cosX ?

Homework

Homework 6. Given a chessboard with 8×8=64 squares. A chessman is put randomly in a square. Guess the location of the chessman. Find the uncertainty of the result. if we mark every square by its row and column number, and already know the row number of the chessman, how about the uncertainty?

Homework thinking: Coin flip. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits. Imply:

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.1.1 Discrete memoryless source and entropy §1.1.2 Discrete memoryless channel and mutual information

§1.1.2 Discrete memoryless channel and mutual information p(y∣x)

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) The model of DMC r input symbols, s output symbols 1 r-1 s-1 p(y|x)

1. DMC (Discrete Memoryless Channel) §1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) representation of DMC graph x y p(y|x) transition probabilities for all x,y for all x

1. DMC (Discrete Memoryless Channel) §1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) representation of DMC transition probabilities matrix matrix

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) representation of DMC formula

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) Example 1.1.8: BSC (Binary Symmetric Channel) r = s = 2 1 1-p p p(0|0) = p(1|1) = 1-p p(0|1) = p(1|0) = p

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) Example 1.1.9: BEC (Binary Erasure Channel) 1 ? 1 1 ? ? 1 ? 1 ? 1 ?

§1.1.2 Discrete memoryless channel and mutual information 1. DMC (Discrete Memoryless Channel) Example 1.1.9: BEC (Binary Erasure Channel) r = 2, s = 3 p 1 ? 1-p 1-q q p(0|0) = p, p(?|0) = 1-p p(1|1) = q, p(?|1) = 1-q

2. average mutual information §1.1.2 Discrete memoryless channel and mutual information 2. average mutual information definition The reduction in uncertainty about X conveyed by the observations Y; The information about X from Y. Channel p(y∣x) or p(ai|bj) H(X) H(X|Y) entropy equivocation I(X;Y) = H(X) – H(X|Y) average mutual information

§1.1.2 Discrete memoryless channel and mutual information 2. average mutual information definition I(X;Y) = H(X) – H(X|Y)

§1.1.2 Discrete memoryless channel and mutual information 2. average mutual information definition I(X;Y) and I(x;y) mutual information I(X;Y)=EXY[I(x;y)] I(X;Y) and H(X)

2. Average mutual information §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties 1) Non-negativity of average mutual information Theorem1.4 For any discrete random variables X and Y, .Moreover I(X;Y) = 0 iff X and Y are independent. 2009.02.18到此,次4 全损信道还未介绍 We do not expect to be misled on average by observing the output of channel. (Theorem 1.3 in textbook) Proof:

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties total loss X Y Y’ S encrypt Key channel decrypt D listener-in A cryptosystem Caesar cryptography message:arrive at four ciphertext:duulyh dw irxu

2. Average mutual information §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties 2) symmetry I(X;Y) = I(Y;X) 3) relationship between entropy and average mutual information Joint entropy H(XY) Mnemonic Venn diagram I(X;Y) = H(X) – H(X|Y) H(X∣Y) H(Y∣X) I(X;Y) = H(Y) – H(Y|X) H(Y) I(X;Y) = H(X) + H(Y) – H(XY) H(X) I(X;Y)

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties Recognising channel a1 a2 ar b1 b2 br a1 a2 b1 b2 b5 b3 b4 1/2 1/5 2/5 a1 a2 a3 b1 b2

I(X;Y)=f [P(x),P(y|x)] §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties 4) Convex property I(X;Y)=f [P(x),P(y|x)]

I(X;Y)=f [P(x),P(y|x)] §1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information properties 4) Convex properties I(X;Y)=f [P(x),P(y|x)] Theorem1.5 I(X;Y) is a convex function of the input probabilities P(x). (Theorem 1.6 in textbook) Theorem1.6 I(X;Y) is a convex function of the transition probabilities P(y|x). (Theorem 1.7 in textbook)

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information Example 1.1.10 analyse the I(X;Y) of BSC ,channel: 1-p p 1 source:

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information Example 1.1.10 analyse the I(X;Y) of BSC 板书推导

§1.1.2 Discrete memoryless channel and mutual information 2. Average mutual information Example 1.1.10 analyse the I(X;Y) of BSC

Review KeyWords: Channel and it’s information measure channel model equivocation average mutual information mutual information properties of average mutual information

§1.1.2 Discrete memoryless channel and mutual information Thinking ? 2009.02.23到此,次5 第一节朱建清老师听课,反映很好,认为讲解流利,发音清楚,难点以中文讲解很好。同时指出发音有的时候会吞音,用语还可以多练习,一个错误:as random as posible,不是as posible as random!

§1.1.2 Discrete memoryless channel and mutual information Example 1.1.11 D Q D Q a(t) b0 b1 b2 Let the source have alphabet A={0,1} with p0=p1=0.5. Let encoder C have alphabet B={0,1,… ,7}and let the elements of B have binary representation The encoder is shown below. Find the entropy of the coded output and find the output sequence if the input sequence is a(t)={101001011000001100111011} and the initial contents of the registers are where the “addtion” blocks are modulo-2 adders(i.e,exclusice-or gates).

§1.1.2 Discrete memoryless channel and mutual information Yt Yt+1 1 1 2 2 3 3 a(t)={101001011000001100111011} b = {001242425124366675013666} 4 4 5 5 6 6 7 7

conveys message through a channel: Homework P45: T1.10, P46: T1.19(except c) 3. Let the DMS conveys message through a channel: Calculate that: H(X) and H(Y); the mutual information of xi and yj (i,j=1,2); the equivocation H(X|Y) and average mutual information.

Homework 4. Suppose that I(X;Y)=0.Does this imply that I(X;Z)=I(X;Z|Y)? 5. In a joint ensemble XY, the mutual information I(x;y) is a random variable. In this problem we are concerned with the variance of that random variable, VAR[I(x;y)]. Prove that VAR[I(x;y)]=0 iff there is a constant αsuch that, for all x,y with P(xy)>0, P(xy)= αP(x) P(y) Express I(X;Y) in term of α and interpret the special case α =1. (continued)

Homework 5. (3) for each of the channel in fig5 , find a probability assignment P(x) such that I(X;Y) >0 and VAR[I(x;y)]=0 . Calculate I(X;Y). a1 a2 a3 b1 b2 1 a1 a2 a3 b1 b2 1/2 b3

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.1 Discrete random variables §1.2 Discrete random vectors §1.2.1 Extended source and joint entropy §1.2.2 Extended channel and mutual information

§1.2.1 Extended source and joint entropy Source model Example 1.2.1 N-times extended source Example:二元二次扩展信源

2. Joint entropy Definition: §1.2.1 Extended source and joint entropy 2. Joint entropy Definition: The joint entropy H(XY) of a pair of discrete random variables (X,Y) with a joint distribution p(x,y) is defined as which can also be expressed as

§1.2.1 Extended source and joint entropy Extended DMS

§1.2.1 Extended source and joint entropy memory source 1)Conditional entropy 2)Joint entropy 3)(per symbol) entropy bit/sig

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy Theorem1.7 (Chain rule) : H(XY) = H(X) + H(Y|X) Proof:

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy Example 1.2.3 Let X be a random variable, its probability space is: 1 2 (per symbol) entropy 1/4 1/4 Its joint probability 1 1/4 1/24 1/24 2 H(X)=? 1/24 1/8

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy Relationship H(X2) ≥ H(X2|X1) H(X1X2) ≤ 2H(X1)

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy General stationary source Let X1,X2,…,XN be dependent , the joint probability is:

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy Definition of entropies conditional entropy Joint entropy (per symbol) entropy

3. Properties of joint entropy §1.2.1 Extended source and joint entropy 3. Properties of joint entropy Theorem1.8 (Chain rule for entropy): Let X1,X2,…,Xn be drawn according to p(x1,x2,…,xn). Then Proof (do it by yourself)

§1.2.1 Extended source and joint entropy ——base of data compressing 3. Properties of joint entropy Relation of entropies If H(X)<∞, then: ——base of data compressing entropy rate

3. Properties of joint entropy §1.2.1 Extended source and joint entropy 3. Properties of joint entropy Theorem1.9 (Independence bound on entropy): Let X1,X2,…, Xn be drawn according to p(x1,x2,…,xn). Then with equality iff the Xi are independent (P37(corollary) in textbook)

§1.2.1 Extended source and joint entropy 3. Properties of joint entropy Example 1.2.4 Suppose a memoryless source with A={0,1} having equal probabilities emits a sequence of six symbols. Following the sixth symbol, suppose a seventh symbol is transmitted which is the sum modulo 2 of the six previous symbols. What is the entropy of the seven-symbol sequence? 2009.2.25到此,第1节课唐波、韩永宁听课,反映很好。 此题很吸引学生,第2节课留出20分钟学生报告:coin weighing 一名学生(李陆洋)还没讲完,因赶班车先走,学生自己组织讨论。

§1 Entropy and mutual information §1.1 Discrete random variables §1.2 Discrete random vectors §1.2.1 Extended source and joint entropy §1.2.2 Extended channel and mutual information

§1.2.2 Extended channel and mutual information 1. The model of extended channel (U1,U2,…,Uk) (X1,X2,…,XN) source encoder XN (Y1,Y2,…,YN) channel decoder (V1,V2,…,Vk) YN A general communication system

§1.2.2 Extended channel and mutual information 1. The model of extended channel Extended channel

§1.2.2 Extended channel and mutual information 1. The model of extended channel

§1.2.2 Extended channel and mutual information 2. Average mutual information example 1.2.5

§1.2.2 Extended channel and mutual information 3. The properties Theorem1.11 If the components(X1, X2,…,XN) of XN are independent, then (Theorem 1.8 in textbook)

§1.2.2 Extended channel and mutual information 3. The properties Theorem1.12 If XN =(X1, X2,…,XN) and YN =(Y1, Y2,…,YN) are random vectors and the channel is memoryless, that is then (Theorem 1.9 in textbook)

§1.2.2 Extended channel and mutual information example 1.2.6 Let X1,X2,…,X5 be independent identically distributed random variables with common entropy H. Also let T be a permutation of the set {1, 2,3,4,5}, and let Yi = XT(i) 1 2 3 4 5 讲完下课,并补讲coin weighing 和保密编码题。2009.3.2,次7 感觉讲启示学生兴趣很大,且有恍然大悟之感 Show that

Measure of information Review Keywords: vector Extented source stationary source Extented channel Measure of information joint entropy (per symbol) entropy conditional entropy entropy rate

conditioning reduces entropy Independence bound on entropy Review Conclusion: conditioning reduces entropy chain rule for entropy Independence bound on entropy properties of

Homework P47: T1.23, P47: T1.24,

Homework 1)show that 2)when 3)when 4.Let X1, X2 be identically distributed random variables. Let be: 1)show that 2)when 3)when

Homework Thinking : 5. Shuffles increase entropy. Argue that for any distribution on shuffles T and any distribution on card positions X that H(TX) ≥ H(TX|T) , if X and T are independent.