Information and coding theory Information theory is concerned with the description of information sources, the representation of the information from a source, and the transmission of this information over channel. This might be the best example to demonstrate how a deep mathematical theory could be successfully applied to solving engineering problems.
Information theory is a discipline in applied mathematics involving the quantification of data with the goal of enabling as much data as possible to be reliably stored on a medium and/or communicated over a channel. The measure of data, known as information entropy, is usually expressed by the average number of bits needed for storage or communication.
The field is at the crossroads of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering.
Its impact has been crucial to success of the voyager missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.
Information theory is generally considered to have been founded in 1948 by Claude Shannon in his seminal work, A Mathematical Theory of Communication
The central paradigm of classic information theory is the engineering problem of the transmission of information over a noisy channel. An avid chess player, Professor Shannon built a chess-playing computer years before IBM's Deep Blue came along. While on a trip to Russia in 1965, he challenged world champion Mikhail Botvinnik to a match. He lost in 42 moves, considered an excellent showing.
The most fundamental results of this theory are 1.Shannon's source coding theorem which establishes that, on average, the number of bits needed to represent the result of an uncertain event is given by its entropy; 2. Shannon's noisy-channel coding theorem which states that reliable communication is possible over noisy channels provided that the rate of communication is below a certain threshold called the channel capacity. The channel capacity can be approached by using appropriate encoding and decoding systems.
Consider to predict the activity of Prime minister tomorrow. This prediction is an information source. The information source has two outcomes: He will be in his office, he will be naked and run 10 miles in London.
Clearly, the outcome of 'in office' contains little information; it is a highly probable outcome. The outcome 'naked run', however contains considerable information; it is a highly improbable event.
In information theory, an information source is a probability distribution, i.e. a set of probabilities assigned to a set of outcomes. "Nothing is certain, except death and taxes" This reflects the fact that the information contained in an outcome is determined not only by the outcome, but by how uncertain it is. An almost certain outcome contains little information. A measure of the information contained in an outcome was introduced by Hartley in 1927.
He defined the information contained in an outcome x a I(x) = - log 2 p(x) This measure satisfied our requirement that the information contained in an outcome is proportional to its uncertainty. If P(x)=1, then I(x)=0, telling us that a certain event contains no information
The definition above also satisfies the requirement that the total information in in dependent events should add. Clearly, our prime minister prediction for two days contain twice as much information as for one day. For two independent outcomes x i and x j, I(x i and x j ) = log P(x i and x j ) = log P(x i ) P(x j ) = Hartley's measure defines the information in a single outcome.
The measure entropy H(X) defines the information content of the course X as a whole. It is the mean information provided by the source. We have H(X)= i P(x i )I(x i ) = - i P(x i ) log 2 P(x i ) A binary symmetric source (BSS) is a source with two outputs whose probabilities are p and 1-p respectively.
The prime minister discussed is a BSS. The entropy of the source is H(X) = -p log 2 p - (1-p) log 2 (1-p)
The function takes the value zero when p=0. When one outcome is certain, so is the other, and the entropy is zero. As p increases, so too does the entropy, until it reaches a maximum when p = 1-p = 0.5. When p is greater than 0.5, the curve declines symmetrically to zero, reached when p=1.
We conclude that the average information in the BSS is maximised when both outcomes are equally likely. The entropy is measuring the average uncertainty of the source. (The term entropy is borrowed from thermodynamics. There too it is a measure of the uncertainly of disorder of a system).