Ariel Caticha on Information and Entropy July 8, 2007 (16)

E. T. Jaynes “Information Theory and Statistical Mechanics” Physical Review, 1957

Information and Entropy Ariel Caticha Department of Physics University at Albany - SUNY MaxEnt 2007

4 Preliminaries: Goal: reasoning with incomplete information degrees of rational belief Problem 1: description of a state of knowledge consistent web of beliefs probabilities

5 Problem 2: Updating probabilities when new information becomes available. What is information? Why entropy? Which entropy? Are Bayesian and Entropy methods compatible? Updating methods: Entropy Bayes’ rule

6 Entropy and heat, multiplicities, disorder... Clausius, Maxwell, Boltzmann, Gibbs,... Entropy as a measure of information: MaxEnt Shannon, Jaynes, Kullback, Renyi,.... Entropy as a tool for updating: M.E. Shore & Johnson, Skilling, Csiszar,....

7 An important distinction: MaxEnt is a method to assign probabilities. measure prior Bayes’ rule is a method to update probabilities.

8 Our goal: M.E. is an updating method that allows both. prior MaxEnt allows arbitrary constraints but no priors. Bayes allows arbitrary priors but no constraints.

9 The logic behind the M.E. method The goal: To update from old beliefs to new beliefs when new information becomes available. ?? Information is what induces a change in beliefs. constraints the prior q(x) the posterior p(x) Information is what constrains rational beliefs.

10 An analogy from mechanics: Information is what induces a change of rational beliefs. initial state of motion. final state of motion. Force Force is whatever induces a change of motion:

11 Question: How do we select a distribution from among all those that satisfy the constraints? Skilling: Rank the distributions according to preference. Transitivity: if is better than, and is better than, then is better than. To each p assign a real number S[p,q] such that

12 This answers the question “Why an entropy?” Remarks: Entropies are real numbers designed to be maximized. Next question: How do we select the functional S[p,q] ? Answer: Use induction. We want to generalize from special cases where the best distribution is known to all other cases.

13 Skilling’s method of induction: If enough special cases are known the general theory is constrained completely.* The known special cases are called the axioms. If a general theory exists it must apply to special cases. If a special case is known, it can be used to constrain the general theory. * But if too many the general theory might not exist.

14 How do we choose the axioms? Basic principle:Minimal Updating Prior information is valuable. Beliefs should be revised only to the extent required by new evidence. Shore & Johnson, Skilling, Karbelkar, Uffink, A.C.,...

15 Axiom 1: Locality Local information has local effects. If the information does not refer to a domain D, then p(x|D) is not updated. Consequence: Axiom 2: Coordinate invariance Coordinates carry no information. Consequence: invariants

16 To determine m(x) use Axiom 1 (Locality) again: If there is no new information there is no update. Consequence: is the prior. To determine we need more special cases.

17 Consequence: Axiom 3: Consistency for independent systems When systems are known to be independent it should not matter whether they are treated jointly or separately.

18 We have a one-dimensional continuum of η-entropies. Can we live with η-dependent updating? We just need more known special cases to single out a unique S[p,q]. NO!! Suppose different systems could have different ηs.

19 Independent systems with different ηs Single system 1: use Single system 2: use Combined system 1+2: use with some undetermined η. But this is equivalent to using and Therefore and

20 Conclusion: η must be a universal constant. What is the value of η? We need more special cases! Hint: For large N we do not need entropy. We can use the law of large numbers.

21 Axiom 4: Consistency with large numbers Multinomial distribution: For large N : and (in probability)

22 Conclusion: The only consistent ranking criterion for updating probabilities is Other entropies may be useful for other purposes, but for updating the only candidate of general applicability is the logarithmic relative entropy.

23 Bayes’ rule from ME Maximize the appropriate entropy data constraints: This is an ∞ number of constraints: one for each data point. data

24 The joint posterior is so that the new marginal for θ is which is Bayes’ rule !!

25 A hybrid example Maximize the appropriate entropy posterior:

26 Conclusions and remarks The tool for updating is (relative) Entropy. Minimal updating: Prior information is valuable. Entropy needs no interpretation. Bayes is a special case of M.E. Information is what constrains rational beliefs.

27 Bayes’ rule for repeatable experiments data constraints: posterior:

28 constraint Constraints do not commute in general

Ariel Caticha on Information and Entropy July 8, 2007 (16)

Similar presentations

Presentation on theme: "Ariel Caticha on Information and Entropy July 8, 2007 (16)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ariel Caticha on Information and Entropy July 8, 2007 (16)

Similar presentations

Presentation on theme: "Ariel Caticha on Information and Entropy July 8, 2007 (16)"— Presentation transcript:

Similar presentations

About project

Feedback