Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hiroki Sayama sayama@binghamton.edu NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory 1 2 3 4 5 6 0.5 p I(p)

Similar presentations


Presentation on theme: "Hiroki Sayama sayama@binghamton.edu NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory 1 2 3 4 5 6 0.5 p I(p)"— Presentation transcript:

1 Hiroki Sayama sayama@binghamton.edu
NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory 1 2 3 4 5 6 0.5 p I(p) Hiroki Sayama

2 Four approaches to complexity
Nonlinear Dynamics Complexity = No closed-form solution, Chaos Information Complexity = Length of description, Entropy Computation Complexity = Computational time/space, Algorithmic complexity Collective Behavior Complexity = Multi-scale patterns, Emergence

3 Information? Matter Known since ancient times Energy
Knows since 19th century (industrial revolution) Information Known since 20th century (WW’s, rise of computers)

4 An informal definition of information
Aspects of some physical phenomenon that can be used to select a smaller set of options out of the original set of options (Things that reduce the number of possibilities) An observer or interpreter involved A default set of options needed

5 Quantitative Definition of Information

6 Quantitative definition of information
If something is expected to occur almost certainly, its occurrence should have nearly zero information If something is expected to occur very rarely, its occurrence should have very large information If an event is expected to occur with probability p, the information produced by its occurrence (self-information) is given by I(p) = - log p

7 Quantitative definition of information
I(p) = - log p 2 is often used as the base of log Unit of information is bit (binary digit) 1 2 3 4 5 6 0.5 I(p) p

8 Why log? To fulfill the additivity of information
For independent events A and B: Self-information of “A happened”: I(pA) Self-information of “B happened”: I(pB) Self-information of “A and B happened”: I(pApB) = I(pA) + I(pB) “I(p) = - log p” satisfies this additivity

9 Exercise You picked up a card from a well-shuffled deck of cards (w/o jokers): How much self-information does the event “the card is of spade” have? How much self-information does the event “the card is a king” have? How much self-information does the event “the card is a king of spades” have?

10 Information Entropy

11 Some terminologies Event: An individual outcome (or a set of outcomes) to which a probability of its occurrence can be assigned Sample space: A set of all possible individual events Probability space: A combination of sample space and probability distribution (i.e., probabilities assigned to individual events)

12 Probability distribution and expected self-information
Probability distribution in probability space A: pi (i = 1…n, Si pi = 1) Expected self-information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi

13 What does H(A) mean? Average amount of self-information the observer could obtain by one observation Average “newsworthiness” the observer should expect for one event Ambiguity of knowledge the observer had about the system before observation Amount of “ignorance” the observer had about the system before observation

14 Information Entropy What does H(A) mean?
Amount of “ignorance” the observer had about the system before observation It quantitatively shows the lack of information (not the presence of information) before observation Information Entropy

15 Information entropy Similar to thermodynamic entropy both conceptually and mathematically Entropy is zero if the system state is uniquely determined with no fluctuation Entropy increases as the randomness increases within the system Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur)

16 Exercise Prove the following:
Entropy is maximal if the system is completely random (i.e., if every event is equally likely to occur) Show that f(p1, p2, …, pn) = - Si=1~n pi log pi (with Si=1~n pi = 1) takes its maximum when pi = 1/n Remove one variable using the constraint Or use the method of Lagrange multipliers

17 Entropy and complex systems
Entropy shows how much information would be needed to fully specify the system’s state in every single detail Ordered -> low information entropy Disordered -> high information entropy May not be consistent with the usual notion of “complexity” Multiscale views are needed to address this issue

18 Information Entropy and Multiple Probability Spaces

19 Probability of composite events
Probability of composite event (x, y): p(x, y) = p(y, x) = p(x | y) p(y) = p(y | x) p(x) p(x | y): Conditional probability for x to occur when y already occurred p(x | y) = p(x) if X and Y are independent from each other

20 Exercise: Bayes’ theorem
Define p(x | y) using p(y | x) and p(x) Use the following formula as needed p(x) = Sy p(x, y) p(y) = Sx p(x, y) p(x, y) = p(y | x) p(x) = p(x | y) p(y)

21 Product probability space
Prob. space X: {x1, x2}, {p(x1), p(x2)} Prob. space Y: {y1, y2}, {p(y1), p(y2)} Product probability space XY: {(x1, y1), (x1, y2), (x2, y1), (x2, y2)}, {p(x1, y1), p(x1, y2), p(x2, y1), p(x2, y2)} Composite events

22 Joint entropy Entropy of product probability space XY:
H(XY) = - Sx Sy p(x, y) log p(x, y) H(XY) = H(YX) If X and Y are independent: H(XY) = H(X) + H(Y) If Y completely depends on X: H(XY) = H(X) ( >= H(Y) )

23 Conditional entropy Expected entropy of Y when a specific event occurred in X: H(Y | X) = Sx p(x) H(Y | X=x) = - Sx p(x) Sy p(y | x) log p(y | x) = - Sx Sy p(y, x) log p(y | x) If X and Y are independent: H(Y | X) = H(Y) If Y completely depends on X: H(Y | X) = 0

24 Exercise Prove the following: H(Y | X) = H(YX) - H(X)
Hint: Use Bayes’ theorem

25 Mutual Information

26 Mutual information H(Y) – H(Y | X) I(Y; X) = Mutual information
Conditional entropy measures how much ambiguity still remains on Y after observing an event on X Reduction of ambiguity on Y by one observation on X can be written as: H(Y) – H(Y | X) I(Y; X) = Mutual information

27 Symmetry of mutual information
I(Y; X) = H(Y) – H(Y | X) = H(Y) + H(X) – H(YX) = H(X) + H(Y) – H(XY) = I(X; Y) Mutual information is symmetric in terms of X and Y

28 Exercise Prove the following: If X and Y are independent: I(X; Y) = 0
If Y completely depends on X: I(X; Y) = H(Y)

29 Exercise System A System B 1 a b 2 3 c
Measure the mutual information between the two systems on the right:

30 Use of mutual information
Mutual information can be used to measure how much interaction exists between two subsystems in a complex system Correlation only works for quantitative measures and detects only linear relationships Mutual information works for qualitative (discrete, symbolic) measures and nonlinear relationships as well

31 Information Source

32 Information source Sequence of values of a random variable that obeys some probabilistic rules Sequence may be over time or space Values (events) may or may not be independent from each other Example: Repeated coin tosses Sound Visual image

33 Memoryless and Markov information sources
Memoryless information source p(0) = p(1) = 1/2 Markov information source p(1|0) = p(0|1) = 1/4

34 Markov information source
Information source whose probability distribution at time t depends only on its immediate past value Xt-1 (or past n values Xt-1, Xt-2, ..., Xt-n) Cases n>1 can be converted into n=1 form by defining composite events Probabilistic rules are given as a set of conditional probabilities, which can be written in the form of a transition probability matrix (TPM)

35 State-transition diagram
Markov information source p(1|0) = p(0|1) = 1/4 1 1/4 3/4

36 Matrix representation
Markov information source p(1|0) = p(0|1) = 1/4 3/4 1/4 Probability vector at time t-1 at time t TPM p0 p1 p0 p1 =

37 abcaccaabccccaaabc aaccacaccaaaaabcc
Exercise abcaccaabccccaaabc aaccacaccaaaaabcc Consider the above sequence as a Markov information source and create its state-transition diagram and matrix representation

38 Review: Convenient properties of transition probability matrix
The product of two TPMs is also a TPM All TPMs have eigenvalue 1 |l|  1 for all eigenvalues of any TPM If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration)

39 Review: TPM and asymptotic probability distribution
|l|  1 for all eigenvalues of any TPM If the transition network is strongly connected, the TPM has one and only one eigenvalue 1 (no degeneration) → This eigenvalue is a unique dominant eigenvalue and the probability vector will eventually converge to its corresponding eigenvector

40 Markov information source
Exercise Calculate the asymptotic probability distribution of the following: Markov information source p(1|0) = p(0|1) = 1/4 3/4 1/4 p0 p1 p0 p1 =

41 Calculating Entropy of Markov Information Source

42 Review: Information entropy
Expected information H(A) when one of the individual events happened: H(A) = Si pi I(pi) = - Si pi log pi This applies only to memoryless information source in which events are independent from each other

43 Generalizing information entropy
For other types of information source where events are not independent, information entropy is defined as: H{X} = limk→∞ H(Xk+1 | X1X2…Xk) Xk: k-th value of random variable X

44 Calculating information entropy of Markov information source (1)
H{X} = limk→∞ H(Xk+1 | X1X2…Xk) This means the expected entropy of the k+1-th value given a specific history of past k values All that matter is the last value of the history, so let’s focus on Xk

45 Calculating information entropy of Markov information source (2)
p(Xk=x): Probability for the last (k-th) value to be x H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) H(Xk+1 | Xk=x) = - Sx p(Xk=x) Sy ayx log ayx = Sx p(Xk=x) h(ax) ayx: y-th row x-th column element in TPM h(ax): Entropy of x-th column vector in TPM

46 Calculating information entropy of Markov information source (3)
H(Xk+1 | X1X2…Xk) = Sx p(Xk=x) h(ax) If the information source has only one asymptotic probability distribution q: limk→∞ p(Xk=x) = qx (q’s x-th element) H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q h: A row vector whose x-th element is h(ax)

47 Calculating information entropy of Markov information source (4)
H{X} = limk→∞ H(Xk+1 | X1X2…Xk) = h·q Information entropy of Markov information source is given by the average of entropies of its TPM’s column vectors weighted by its asymptotic probability distribution If the information source has only one asymptotic probability distribution

48 abcaccaabccccaaabc aaccacaccaaaaabcc
Exercise Calculate information entropy of the following Markov information source we discussed earlier: abcaccaabccccaaabc aaccacaccaaaaabcc

49 Summary Complexity of a system may be characterized using information
Length of description Entropy (ambiguity of knowledge) Mutual information quantifies the coupling between two components within a system Entropy may be measured for Markov information sources as well


Download ppt "Hiroki Sayama sayama@binghamton.edu NECSI Summer School 2008 Week 3: Methods for the Study of Complex Systems Information Theory 1 2 3 4 5 6 0.5 p I(p)"

Similar presentations


Ads by Google