Introduction to Information theory

Slides:



Advertisements
Similar presentations
CY2G2 Information Theory 1
Advertisements

Information theory Multi-user information theory A.J. Han Vinck Essen, 2004.
15-583:Algorithms in the Real World
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Information Theory EE322 Al-Sanie.
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Chain Rules for Entropy
Chapter 6 Information Theory
ENGS Lecture 8 ENGS 4 - Lecture 8 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko,
Fundamental limits in Information Theory Chapter 10 :
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
Lossless data compression Lecture 1. Data Compression Lossless data compression: Store/Transmit big files using few bytes so that the original files.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Variable-Length Codes: Huffman Codes
Information Theory and Security
Noise, Information Theory, and Entropy
Noise, Information Theory, and Entropy
Some basic concepts of Information Theory and Entropy
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
§1 Entropy and mutual information
STATISTIC & INFORMATION THEORY (CSNB134)
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
DIGITAL COMMUNICATION Error - Correction A.J. Han Vinck.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany Data communication line codes and constrained sequences A.J. Han Vinck Revised.
Source Coding-Compression
§4 Continuous source and Gaussian channel
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Basic Concepts of Encoding Codes, their efficiency and redundancy 1.
Introduction to Information theory A.J. Han Vinck University of Duisburg-Essen April 2012.
Channel Capacity.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Linawati Electrical Engineering Department Udayana University
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Communication System A communication system can be represented as in Figure. A message W, drawn from the index set {1, 2,..., M}, results in the signal.
Information Theory The Work of Claude Shannon ( ) and others.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Coding Theory Efficient and Reliable Transfer of Information
Cryptography and Authentication A.J. Han Vinck Essen, 2008
Source Coding Efficient Data Representation A.J. Han Vinck.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Mutual Information, Joint Entropy & Conditional Entropy
ENTROPY Entropy measures the uncertainty in a random experiment. Let X be a discrete random variable with range S X = { 1,2,3,... k} and pmf p k = P X.
UNIT I. Entropy and Uncertainty Entropy is the irreducible complexity below which a signal cannot be compressed. Entropy is the irreducible complexity.
Institute for Experimental Mathematics Ellernstrasse Essen - Germany DATA COMMUNICATION introduction A.J. Han Vinck May 10, 2003.
UNIT –V INFORMATION THEORY EC6402 : Communication TheoryIV Semester - ECE Prepared by: S.P.SIVAGNANA SUBRAMANIAN, Assistant Professor, Dept. of ECE, Sri.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Statistical methods in NLP Course 2 Diana Trandab ă ț
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Statistical methods in NLP Course 2
Shannon Entropy Shannon worked at Bell Labs (part of AT&T)
Increasing Information per Bit
Hiroki Sayama NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory p I(p)
Context-based Data Compression
Digital Multimedia Coding
Analysis & Design of Algorithms (CSCE 321)
COT 5611 Operating Systems Design Principles Spring 2012
COT 5611 Operating Systems Design Principles Spring 2014
Subject Name: Information Theory Coding Subject Code: 10EC55
A Brief Introduction to Information Theory
Tim Holliday Peter Glynn Andrea Goldsmith Stanford University
Distributed Compression For Binary Symetric Channels
Data Compression Section 4.8 of [KT].
Lecture 2: Basic Information Theory
Theory of Information Lecture 13
Presentation transcript:

Introduction to Information theory A.J. Han Vinck University of Essen October 2002

content Introduction Source coding Channel coding Multi-user models Entropy and some related properties Source coding Channel coding Multi-user models Constraint sequence Applications to cryptography

First lecture What is information theory about Entropy or shortest average presentation length Some properties of entropy Mutual information Data processing theorem Fano inequality

Field of Interest Information theory deals with the problem of efficient and reliable transmission of information It specifically encompasses theoretical and applied aspects of - coding, communications and communications networks - complexity and cryptography - detection and estimation - learning, Shannon theory, and stochastic processes

Some of the successes of IT Satellite communications: Reed Solomon Codes (also CD-Player) Viterbi Algorithm Public Key Cryptosystems (Diffie-Hellman) Compression Algorithms Huffman, Lempel-Ziv, MP3, JPEG,MPEG Modem Design with Coded Modulation ( Ungerböck ) Codes for Recording ( CD, DVD )

OUR Definition of Information Information is knowledge that can be used i.e. data is not necessarily information we: 1) specify a set of messages of interest to a receiver 2) and select a message to be transmitted 3) sender and receiver build a pair

Model of a Communication System Message e.g. English symbols Encoder e.g. English to 0,1 sequence Information Source Coding Communication Channel Destination Decoding Can have noise or distortion Decoder e.g. 0,1 sequence to English

Shannons (1948) definition of transmission of information: Reproducing at one point (in time or space) either exactly or approximatelya message selected at another point Shannon uses: Binary Information digiTS (BITS) 0 or 1 n bits specify M = 2n different messages OR M messages specified by n =  log2 M bits

Example: fixed length representation   00000  a    11001  y 00001  b 11010  z - the alphabet: 26 letters,  log2 26 = 5 bits - ASCII: 7 bits represents 128 characters

Example: suppose we have a dictionary with 30.000 words these can be numbered (encoded) with 15 bits if the average word length is 5, we need „on the average“ 3 bits per letter 01000100   

Example: variable length representation of messages   C1 C2 letter frequency of occurence P(*)   00 1     e 0.5 01 01          a 0.25 10 000 x 0.125 11 001 q 0.125  0111001101000… aeeqea… Note: C2 is uniquely decodable! (check!)

C2 is more efficient than C1 Efficiency of C1 and C2 Average number of coding symbols of C1 Average number of coding symbols of C2 C2 is more efficient than C1

another example Source output a,b, or c translate output binary In out aaa 00000 aab 00001 aba 00010  ccc 11010 improve efficiency ? improve efficiency Efficiency = 2 bits/output symbol Efficiency = 5/3 bits/output symbol Homework: calculate optimum efficiency

entropy The minimum average number of binary digits needed to specify a source output (message) uniquely is called   “SOURCE ENTROPY”

SHANNON (1948):   1) Source entropy:= = L 2) minimum can be obtained !   QUESTION: how to represent a source output in digital form? QUESTION: what is the source entropy of text, music, pictures? QUESTION: are there algorithms that achieve this entropy?

entropy source X Output x  { finite set of messages} Example: binary source: x  { 0, 1 } with P( x = 0 ) = p; P( x = 1 ) = 1 - p M-ary source: x  {1,2, , M} with Pi =1.

Joint Entropy: H(X,Y) = H(X) + H(Y|X) also H(X,Y) = H(Y) + H(X|Y) intuition: first describe Y and then X given Y from this: H(X) – H(X|Y) = H(Y) – H(Y|X) As a formel: Homework: check the formel

entropy Useful approximation where: Homework: prove the approximation using ln N! ~ N lnN for N large.

Binary Entropy: h(p) = -plog2p – (1-p) log2 (1-p) Note: h(p) = h(1-p)

entropy interpretation: let a binary sequence contain pn ones then, we can specify each sequence with n h(p) bits

Transmission efficiency (1) I need on the average H(X) bits/source output to describe the source symbols X After observing Y, I need H(X|Y) bits/source output H(X) H(X|Y) Reduction in description length is called the transmitted information Transmitted R = H(X) - H(X|Y) = H(Y) –H(Y|X) from calculations We can maximize R by changing the input probabilities. The maximum is called CAPACITY (Shannon 1948) X Y channel

Example 1 Suppose that X Є { 000, 001, , 111 } with H(X) = 3 bits Channel: X Y = parity of X channel H(X|Y) = 2 bits: we transmitted H(X) – H(X|Y) = 1 bit of information! We know that X|Y Є { 000, 011, 101, 110 } or X|Y Є { 001, 010, 001, 111 } Homework: suppose the channel output gives the number of ones in X. What is then H(X) – H(X|Y)?

Transmission efficiency (2) Example: Erasure channel 1-e ½ 1 (1-e)/2 E 1 e e e ½ (1-e)/2 1-e H(X) = 1 H(X|Y) = e H(X)-H(X|Y) = 1-e = maximum!

Example 2 Suppose we have 2n messages specified by n bits 1-e Transmitted : 0 0 e E 1 1 After n transmissions we are left with ne erasures Thus: number of messages we cannot specify = 2ne We transmitted n(1-e) bits of information over the channel!

Transmission efficiency (3) Easy obtainable when feedback! 0 or 1 received correctly If erasure, repeat until correct 0,1 0,1,e erasure R = 1/ T =1/ Average time to transmit 1 correct bit = 1/ {(1-e) + 2e(1-e) + 3e2(1-e) +  }= 1- e

Properties of entropy A: For a source X with M different outputs: log2M  H(X)  0 the „worst“ we can do is just assign log2M bits to each source output B: For a source X „related“ to a source Y: H(X)  H(X|Y) Y gives additional info about X

Entropy: Proof of A We use the following important inequalities log2M = lnM log2e M-1 lnM 1-1/M M Homework: draw the inequality

Entropy: Proof of A

Entropy: Proof of B

Entropy: corrolary H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) H(X,Y,Z) = H(X) + H(Y|X) + H(Z|XY)  H(X) + H(Y) + H(Z)

homework Consider the following figure Y 0 1 2 3 X All points are equally likely. Calculate H(X), H(X|Y) and H(X,Y) 3 2 1

mutual information I(X;Y):= I(X;Y) := H(X) – H(X|Y) = H(Y) – H(Y|X) ( homework: show this! ) i.e. the reduction in the description length of X given Y note that I(X;Y)  0 or: the amount of information that Y gives about X equivalently: I(X;Y|Z) = H(X|Z) – H(X|YZ) the amount of information that Y gives about X given Z

Data processing (1) Let X, Y and Z form a Markov chain: X  Y  Z and Z is independent from X given Y i.e. P(x,y,z) = P(x) P(y|x) P(z|y) X P(y|x) Y P(z|y) Z I(X;Y)  I(X; Z) Conclusion: processing may destroy information

Data processing (2) To show that: I(X;Y)  I(X; Z) Proof: I(X; (Y,Z) ) = I(X; Y) + I(X; Z|Y) = I(X; Z) + I(X;Y|Z) now I(X;Z|Y) = 0 (independency) Thus: I(X; Y)  I(X; Z)

Fano inequality (1) Suppose we have the following situation: Y is the observation of X X p(y|x) Y decoder X‘ Y determines a unique estimate X‘: correct with probability 1-P; incorrect with probability P

Fano inequality (2) to describe X, given Y, we can act as follows: remember, Y uniquely specifies X‘ 1. Specify whether we have X‘  X or X‘ = X for this we need h(P) bits per estimate 2. If X‘ = X, no further specification needed X‘  X, we need  log2(M-1) bits Hence: H(X|Y)  h (P) + P log2(M-1)

List decoding Suppose that the decoder forms a list of size L. PL is the probability of being in the list Then H(X|Y)  h(PL ) + PLlog 2L + (1-PL) log2 (M-L) The bound is not very tight, because of log 2L. Can you see why?