Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu.

Similar presentations


Presentation on theme: "Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu."— Presentation transcript:

1 Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu

2 “Efficient Convergence Implies Ockham's Razor”, Proceedings of the 2002 International Workshop on Computational Models of Scientific Reasoning and Applications, Las Vegas, USA, June 24- 27, 2002. (with C. Glymour) “Why Probability Does Not Capture the Logic of Scientific Justification”, C. Hitchcock, ed., Contemporary Debates in the Philosophy of Science, Oxford: Blackwell, 2004. “Justification as Truth-finding Efficiency: How Ockham's Razor Works”, Minds and Machines 14: 2004, pp. 485-505. “Learning, Simplicity, Truth, and Misinformation”, The Philosophy of Information, under review. “Ockham's Razor, Efficiency, and the Infinite Game of Science”, proceedings, Foundations of the Formal Sciences 2004: Infinite Game Theory, Springer, under review. Further Reading

3 Which Theory to Choose? T1T2T3T4T5 ??? Compatible with data

4 Use Ockham’s Razor Simple Complex T1T2T3 T4T5

5 Dilemma If you know the truth is simple, then you don’t need Ockham. T1T2T3 T4T5 Simple Complex

6 Dilemma If you don’t know the truth is simple, then how could a fixed simplicity bias help you if the truth is complex? T1T2T3 T4 T5 Simple Complex

7 Puzzle A fixed bias is like a broken thermometer. How could it possibly help you find unknown truth? Cold!

8 I. Ockham Apologists

9 Wishful Thinking Simple theories are nice if true: Simple theories are nice if true: Testability Testability Unity Unity Best explanation Best explanation Aesthetic appeal Aesthetic appeal Compress data Compress data So is believing that you are the emperor. So is believing that you are the emperor.

10 Overfitting Overfitting Maximum likelihood estimates based on overly complex theories can have greater predictive error (AIC, Cross-validation, etc.). Maximum likelihood estimates based on overly complex theories can have greater predictive error (AIC, Cross-validation, etc.). Same is true even if you know the true model is complex. Same is true even if you know the true model is complex. Doesn’t converge to true model. Doesn’t converge to true model. Depends on random data. Depends on random data. Thanks, but a simpler model still has lower predictive error. The truth is complex. -God-....

11 Messy worlds are legion Tidy worlds are few. That is why the tidy worlds Are those most likely true. (Carnap) unity Ignorance = Knowledge

12 1/3 Messy worlds are legion Tidy worlds are few. That is why the tidy worlds Are those most likely true. (Carnap)

13 Ignorance = Knowledge 2/6 1/6 2/6 1/6 Messy worlds are legion Tidy worlds are few. That is why the tidy worlds Are those most likely true. (Carnap)

14 Depends on Notation But mess depends on coding, which Goodman noticed, too. The picture is inverted if we translate green to grue. 2/6 1/6 2/6 1/6 Notation Indicates truth?

15 Same for Algorithmic Complexity Goodman’s problem works against every fixed simplicity ranking (independent of the processes by which data are generated and coded prior to learning). Goodman’s problem works against every fixed simplicity ranking (independent of the processes by which data are generated and coded prior to learning). Extra problem: any pair-wise ranking of theories can be reversed by choosing an alternative computer language. Extra problem: any pair-wise ranking of theories can be reversed by choosing an alternative computer language. So how could simplicity help us find the true theory? So how could simplicity help us find the true theory? Notation Indicates truth?

16 Just Beg the Question Assign high prior probability to simple theories. Assign high prior probability to simple theories. Why should you? Why should you? Preference for complexity has the same “explanation”. Preference for complexity has the same “explanation”. You presume simplicity Therefore you should presume simplicity!

17 Miracle Argument Simple data would be a miracle if a complex theory were true (Bayes, BIC, Putnam). Simple data would be a miracle if a complex theory were true (Bayes, BIC, Putnam).

18 Begs the Question SC “Fairness” between theories  “Fairness” between theories  bias against complex worlds.

19 Two Can Play That Game SC “Fairness” between worlds  “Fairness” between worlds  bias against simple theory.

20 Convergence At least a simplicity bias doesn’t prevent convergence to the truth (MDL, BIC, Bayes, SGS, etc.). At least a simplicity bias doesn’t prevent convergence to the truth (MDL, BIC, Bayes, SGS, etc.). Neither do other biases. Neither do other biases. May as well recommend flat tires since they can be fixed. May as well recommend flat tires since they can be fixed. O S I M P L E O C O M P L E X

21 Does Ockham Have No Frock? Philosopher’s stone, Perpetual motion, Free lunch Ockham’s Razor??? Ash Heap of History...

22 II. How Ockham Helps You Find the Truth

23 What is Guidance? Indication or tracking Indication or tracking Too strong Too strong Fixed bias can’t indicate anything Fixed bias can’t indicate anything Convergence Convergence Too weak Too weak True of other biases True of other biases “Straightest” convergence “Straightest” convergence Just right? Just right? C C S S CS

24 A True Story Niagara Falls Pittsburgh Clarion

25 A True Story Niagara Falls Pittsburgh Clarion

26 A True Story Niagara Falls Pittsburgh Clarion

27 A True Story Niagara Falls Pittsburgh Clarion !

28 Clarion A True Story Niagara Falls Pittsburgh

29 A True Story ?

30

31 Ask directions!

32 A True Story Where’s …

33 What Does She Say? Turn around. The freeway ramp is on the left.

34 You Have Better Ideas You Have Better Ideas Phooey! The Sun was on the right!

35 You Have Better Ideas !!

36

37

38

39 Stay the Course! Ahem…

40

41

42 Don’t Flip-flop!

43

44

45 Then Again…

46

47 @@$##!

48 One Good Flip Can Save a Lot of Flop Pittsburgh

49 The U-Turn Pittsburgh

50 Pittsburgh

51 Pittsburgh

52 Pittsburgh

53 Pittsburgh

54 Pittsburgh

55 Pittsburgh Told ya!

56 The U-Turn Pittsburgh

57 Pittsburgh

58 Pittsburgh

59 Pittsburgh

60 Pittsburgh

61 Your Route Pittsburgh Needless U-turn

62 The Best Route Pittsburgh Told ya!

63 The Best Route Anywhere from There Pittsburgh NY, DC Told ya!

64 The Freeway to the Truth ComplexSimple Fixed advice for all destinations Fixed advice for all destinations Disregarding it entails an extra course reversal… Disregarding it entails an extra course reversal… Told ya!

65 The Freeway to the Truth ComplexSimple …even if the advice points away from the goal! …even if the advice points away from the goal! Told ya!

66 Counting Marbles

67

68 May come at any time…

69 Counting Marbles May come at any time…

70 Counting Marbles May come at any time…

71 Counting Marbles May come at any time…

72 Counting Marbles May come at any time…

73 Counting Marbles May come at any time…

74 Counting Marbles May come at any time…

75 Ockham’s Razor 3? If you answer, answer with the current count. If you answer, answer with the current count.

76 Analogy Marbles = detectable “effects”. Marbles = detectable “effects”. Late appearance = difficulty of detection. Late appearance = difficulty of detection. Count = model (e.g., causal graph). Count = model (e.g., causal graph). Appearance times = free parameters. Appearance times = free parameters.

77 Analogy U-turn = model revision (with content loss) U-turn = model revision (with content loss) Highway = revision-efficient truth-finding method. Highway = revision-efficient truth-finding method. T T

78 The U-turn Argument Suppose you converge to the truth but Suppose you converge to the truth but violate Ockham’s razor along the way. violate Ockham’s razor along the way. 3

79 The U-turn Argument Where is that extra marble, anyway? Where is that extra marble, anyway? 3

80 The U-turn Argument It’s not coming, is it? It’s not coming, is it? 3

81 The U-turn Argument If you never say 2 you’ll never converge to the truth…. If you never say 2 you’ll never converge to the truth…. 3

82 The U-turn Argument That’s it. You should have listened to Ockham. That’s it. You should have listened to Ockham. 3 222

83 The U-turn Argument Oops! Well, no method is infallible! Oops! Well, no method is infallible! 3 222

84 The U-turn Argument If you never say 3, you’ll never converge to the truth…. If you never say 3, you’ll never converge to the truth…. 3 222

85 The U-turn Argument Embarrassing to be back at that old theory, eh? Embarrassing to be back at that old theory, eh? 3 2223

86 The U-turn Argument And so forth… And so forth… 3 22234

87 The U-turn Argument And so forth… And so forth… 3 222345

88 The U-turn Argument And so forth… And so forth… 3 2223456

89 The U-turn Argument And so forth… And so forth… 3 22234567

90 The Score You: You: 3 22234567 Subproblem

91 The Score Ockham: Ockham: 22234567 Subproblem

92 Ockham is Necessary If you converge to the truth, If you converge to the truth,and you violate Ockham’s razor you violate Ockham’s razorthen some convergent method beats your worst- case revision bound in each answer in the subproblem entered at the time of the violation. some convergent method beats your worst- case revision bound in each answer in the subproblem entered at the time of the violation.

93 Ockham is Sufficient If you converge to the truth, If you converge to the truth,and you never violate Ockham’s razor you never violate Ockham’s razorthen You achieve the worst-case revision bound of each convergent solution in each answer in each subproblem. You achieve the worst-case revision bound of each convergent solution in each answer in each subproblem.

94 Efficiency Efficiency = achievement of the best worst- case revision bound in each answer in each subproblem. Efficiency = achievement of the best worst- case revision bound in each answer in each subproblem.

95 Ockham Efficiency Theorem Among the convergent methods… Among the convergent methods… Ockham = Efficient! Efficient Inefficient

96 “Mixed” Strategies mixed strategy = chance of output depends only on actual experience. mixed strategy = chance of output depends only on actual experience. convergence in probability = chance of producing true answer approaches 1 in the limit. convergence in probability = chance of producing true answer approaches 1 in the limit. efficiency = achievement of best worst-case expected revision bound in each answer in each subproblem. efficiency = achievement of best worst-case expected revision bound in each answer in each subproblem.

97 Ockham Efficiency Theorem Among the mixed methods that converge in probability… Among the mixed methods that converge in probability… Ockham = Efficient! Efficient Inefficient

98 Dominance and “Support” Every convergent method is weakly dominated in revisions by a clone who says “?” until stage n. Every convergent method is weakly dominated in revisions by a clone who says “?” until stage n. 1. Convergence Must leap eventually. 2. Efficiency Only leap to simplest. 3. Dominance Could always wait longer. Can’t wait forever!

99 III. Ockham on Steroids

100 Ockham Wish List General definition of Ockham’s razor. General definition of Ockham’s razor. Compare revisions even when not bounded within answers. Compare revisions even when not bounded within answers. Prove theorem for arbitrary empirical problems. Prove theorem for arbitrary empirical problems.

101 Empirical Problems Problem = partition of a topological space. Problem = partition of a topological space. Potential answers = partition cells. Potential answers = partition cells. Evidence = open (verifiable) propositions. Evidence = open (verifiable) propositions. Example: Symmetry

102 Example: Parameter Freeing Euclidean topology. Euclidean topology. Say which parameters are zero. Say which parameters are zero. Evidence = open neighborhood. Evidence = open neighborhood. Curve fitting a2 a1 0 a1 = 0 a2 = 0 a1 > 0 a2 = 0 a1 > 0 a2 > 0 a1 = 0 a2 > 0

103 The Players Scientist: Scientist: Produces an answer in response to current evidence. Produces an answer in response to current evidence. Demon: Demon: Chooses evidence in response to scientist’s choices Chooses evidence in response to scientist’s choices

104 Winning Scientist wins… Scientist wins… by default if demon doesn’t present an infinite nested sequence of basic open sets whose intersection is a singleton. by default if demon doesn’t present an infinite nested sequence of basic open sets whose intersection is a singleton. else by merit if scientist eventually always produces the true answer for world selected by demon’s choices. else by merit if scientist eventually always produces the true answer for world selected by demon’s choices.

105 Comparing Revisions One answer sequence maps into another iff One answer sequence maps into another iff there is an order and answer-preserving map from the first to the second (? is wild). there is an order and answer-preserving map from the first to the second (? is wild). Then the revisions of first are as good as those of the second. Then the revisions of first are as good as those of the second. ?????...

106 Comparing Revisions The revisions of the first are strictly better if, in addition, the latter doesn’t map back into the former. The revisions of the first are strictly better if, in addition, the latter doesn’t map back into the former. ?... ?????

107 Comparing Methods F is as good as G iff F is as good as G iff each output sequence of F is as good as some output sequence of G. each output sequence of F is as good as some output sequence of G. F G as good as

108 Comparing Methods F is better than G iff F is better than G iff F is as good as G and F is as good as G and G is not as good as F F G not as good as

109 Comparing Methods F is strongly better than G iff each output sequence of F is strictly better than an output sequence of G but … F is strongly better than G iff each output sequence of F is strictly better than an output sequence of G but … strictly better than

110 Comparing Methods … no output sequence of G is as good as any of F. … no output sequence of G is as good as any of F. not as good as

111 Terminology Efficient solution: as good as any solution in any subproblem. Efficient solution: as good as any solution in any subproblem.

112 What Simplicity Isn’t Syntactic length. Syntactic length. Data-compression (MDL). Data-compression (MDL). Computational ease. Computational ease. Social “entrenchment” (Goodman). Social “entrenchment” (Goodman). Number of free parameters (BIC, AIC). Number of free parameters (BIC, AIC). Euclidean dimensionality Euclidean dimensionality Only by accident!! Symptoms…

113 What Simplicity Is Simpler theories are compatible with deeper problems of induction. Simpler theories are compatible with deeper problems of induction. Worst demon Smaller demon

114 Problem of Induction No true information entails the true answer. No true information entails the true answer. Happens in answer boundaries. Happens in answer boundaries.

115 Demonic Paths A demonic path from w is a sequence of alternating answers that a demon can force an arbitrary convergent method through starting from w. 0…1…2…3…4

116 Simplicity Defined The A-sequences are the demonic sequences beginning with answer A. A is as simple as B iff each B-sequence is as good as some A-sequence. 2, 3 2, 3, 4 2, 3, 4, 5 <<<<<< 3 3, 4 3, 4, 5... So 2 is simpler than 3!

117 Ockham Answer An answer as simple as any other answer. An answer as simple as any other answer. = number of observed particles. = number of observed particles.... So 2 is Ockham! 2, …, n 2, …, n, n+1 2, …, n, n+1, n+2 <<<<<< n n, n+1 n, n+1, n+2

118 Ockham Lemma A is Ockham iff for all demonic p, (A*p) ≤ some demonic sequence. 3 I can force you through 2 but not through 3,2. So 3 isn’t Ockham

119 Ockham Answer E.g.: Only simplest curve compatible with data is Ockham. a2 a1 0 Demonic sequence: Non-demonic sequences:

120 General Efficiency Theorem If the topology is metrizable and separable and the question is countable then: Ockham = Efficient. Proof: uses Martin’s Borel Determinacy theorem.

121 Stacked Problems There is an Ockham answer at every stage. 0 1 2 3... 1

122 Non-Ockham  Strongly Worse If the problem is a stacked countable partition over a restricted Polish space: Each Ockham solution is strongly better than each non-Ockham solution in the subproblem entered at the time of the violation.

123 Simplicity  Low Dimension Suppose God says the true parameter value is rational. Suppose God says the true parameter value is rational.

124 Simplicity  Low Dimension Topological dimension and integration theory dissolve. Topological dimension and integration theory dissolve. Does Ockham? Does Ockham?

125 Simplicity  Low Dimension The proposed account survives in the preserved limit point structure. The proposed account survives in the preserved limit point structure.

126 IV. Ockham and Symmetry

127 Respect for Symmetry If several simplest alternatives are available, don’t break the symmetry. If several simplest alternatives are available, don’t break the symmetry. Count the marbles of each color. You hear the first marble but don’t see it. Why red rather than green? ?

128 Respect for Symmetry Before the noise, (0, 0) is Ockham. Before the noise, (0, 0) is Ockham. After the noise, no answer is Ockham: After the noise, no answer is Ockham: (0, 1) (1, 0) DemonicNon-demonic (0, 1) (1, 0) (1, 0) (0, 1) ? Right! (0, 0)

129 Goodman’s Riddle Count oneicles--- a oneicle is a particle at any stage but one, when it is a non-particle. Count oneicles--- a oneicle is a particle at any stage but one, when it is a non-particle. Oneicle tranlation is auto-homeomorphism that does not preserve the problem. Oneicle tranlation is auto-homeomorphism that does not preserve the problem. Unique Ockham answer is current oneicle count. Unique Ockham answer is current oneicle count. Contradicts unique Ockham answer in particle counting. Contradicts unique Ockham answer in particle counting.

130 Supersymmetry Say when each particle appears. Say when each particle appears. Refines counting problem. Refines counting problem. Every auto-homeomorphism preserves problem. Every auto-homeomorphism preserves problem. No answer is Ockham. No answer is Ockham. No solution is Ockham. No solution is Ockham. No method is efficient. No method is efficient.

131 Dual Supersymmetry Say only whether particle count is even or odd. Say only whether particle count is even or odd. Coarsens counting problem. Coarsens counting problem. Particle/Oneicle auto-homeomorphism preserves problem. Particle/Oneicle auto-homeomorphism preserves problem. Every answer is Ockham. Every answer is Ockham. Every solution is Ockham. Every solution is Ockham. Every solution is efficient. Every solution is efficient.

132 Broken Symmetry Count the even or just report odd. Count the even or just report odd. Coarsens counting problem. Coarsens counting problem. Refines the even/odd problem. Refines the even/odd problem. Unique Ockham answer at each stage. Unique Ockham answer at each stage. Exactly Ockham solutions are efficient. Exactly Ockham solutions are efficient.

133 Simplicity Under Refinement Particle countingOneicle countingTwoicle counting Particle counting or odd particles Oneicle counting or odd oneicles Twoicle counting or odd twoicles Time of particle appearance Even/odd Broken symmetry Unique Ockham answer Supersymmetry No answer is Ockham Dual supersymmetry Both answers are Ockham

134 Proposed Theory is Right Objective efficiency is grounded in problems. Objective efficiency is grounded in problems. Symmetries in the preceding problems would wash out stronger simplicity distinctions. Symmetries in the preceding problems would wash out stronger simplicity distinctions. Hence, such distinctions would amount to mere conventions (like coordinate axes) that couldn’t have anything to do with objective efficiency. Hence, such distinctions would amount to mere conventions (like coordinate axes) that couldn’t have anything to do with objective efficiency.

135 Furthermore… If Ockham’s razor is forced to choose in the supersymmetrical problems then either: If Ockham’s razor is forced to choose in the supersymmetrical problems then either: following Ockham’s razor increases revisions in some counting problems following Ockham’s razor increases revisions in some counting problemsOr Ockham’s razor leads to contradictions as a problem is coarsened or refined. Ockham’s razor leads to contradictions as a problem is coarsened or refined.

136 V. Conclusion

137 What Ockham’s Razor Is “Only output Ockham answers” “Only output Ockham answers” Ockham answer = a topological invariant of the empirical problem addressed. Ockham answer = a topological invariant of the empirical problem addressed.

138 What it Isn’t preference for preference for brevity, brevity, computational ease, computational ease, entrenchment, entrenchment, past success, past success, Kolmogorov complexity, Kolmogorov complexity, dimensionality, etc…. dimensionality, etc….

139 How it Works Ockham’s razor is necessary for mininizing revisions prior to convergence to the truth. Ockham’s razor is necessary for mininizing revisions prior to convergence to the truth.

140 How it Doesn’t No possible method could: No possible method could: Point at the truth; Point at the truth; Indicate the truth; Indicate the truth; Bound the probability of error; Bound the probability of error; Bound the number of future revisions. Bound the number of future revisions.

141 Spooky Ockham Science without support or safety nets. Science without support or safety nets.

142 Spooky Ockham Science without support or safety nets. Science without support or safety nets.

143 Spooky Ockham Science without support or safety nets. Science without support or safety nets.

144 Spooky Ockham Science without support or safety nets. Science without support or safety nets.

145 VI. Stochastic Ockham

146 “Mixed” Strategies mixed strategy = chance of output depends only on actual experience. mixed strategy = chance of output depends only on actual experience. P e (M = H at n) = P e|n (M = H at n). e

147 Stochastic Case Ockham = Ockham = at each stage, you produce a non-Ockham answer with prob = 0. Efficiency = Efficiency = achievement of the best worst-case expected revision bound in each answer in each subproblem over all methods that converge to the truth in probability.

148 Stochastic Efficiency Theorem Among the stochastic methods that converge in probability, Ockham = Efficient! Among the stochastic methods that converge in probability, Ockham = Efficient! Efficient Inefficient

149 Stochastic Methods Your chance of producing an answer is a function of observations made so far. Your chance of producing an answer is a function of observations made so far. 2 Urn selected in light of observations. p

150 Stochastic U-turn Argument Suppose you converge in probability to the truth but produce a non-Ockham answer with prob > 0. Suppose you converge in probability to the truth but produce a non-Ockham answer with prob > 0. 3 r > 

151 Choose small  > 0. Consider answer 4. Choose small  > 0. Consider answer 4. 3 Stochastic U-turn Argument r > 

152 Stochastic U-turn Argument By convergence in probability to the truth: By convergence in probability to the truth: 3 2 p > 1 -  /3 r > 

153 Stochastic U-turn Argument Etc. Etc. 3 2 p> 1-  /3 r >  3 p > 1-  /3 4

154 Stochastic U-turn Argument Since  can be chosen arbitrarily small, Since  can be chosen arbitrarily small, sup prob of ≥ 3 revisions ≥ r. sup prob of ≥ 3 revisions ≥ r. sup prob of ≥ 2 revisions =1 sup prob of ≥ 2 revisions =1 3 r >  2 p> 1-  /3 3 4

155 Stochastic U-turn Argument So sup Exp revisions is ≥ 2 + 3r. So sup Exp revisions is ≥ 2 + 3r. But for Ockham = 2. But for Ockham = 2. 3 r >  Subproblem 2 p> 1-  /3 3 4

156 VII. Statistical Inference (Beta Version)

157 The Statistical Puzzle of Simplicity Assume: Normal distribution, s = 1, m  ≥ 0. Assume: Normal distribution, s = 1, m  ≥ 0. Question: m  = 0 or m  > 0 ? Question: m  = 0 or m  > 0 ? Intuition: m  = 0 is simpler than m  > 0. Intuition: m  = 0 is simpler than m  > 0.  = 0 mean

158 Analogy Marbles: potentially small “effects” Marbles: potentially small “effects” Time: sample size Time: sample size Simplicity: fewer free parameters tied to potential “effects” Simplicity: fewer free parameters tied to potential “effects” Counting: freeing parameters in a model Counting: freeing parameters in a model

159 U-turn “in Probability” Convergence in probability: chance of producing true model goes to unity Convergence in probability: chance of producing true model goes to unity Retraction in probability: chance of producing a model drops from above  to below 1 –  Retraction in probability: chance of producing a model drops from above  to below 1 –  Sample size 1 0   Chance of producing true model Chance of producing alternative model

160  = 0 sample mean  Suppose You (Probably) Choose a Model More Complex than the Truth mean  > 0 zone for choosing  > 0 Revision Counter: 0  > 0

161  = 0 sample mean  Eventually You Retract to the Truth (In Probability) mean  = 0 zone for choosing  = 0 Revision Counter: 1

162  > 0 sample mean  So You (Probably) Output an Overly Simple Model Nearby mean  = 0 zone for choosing  = 0 Revision Counter: 1

163  > 0 sample mean  mean  > 0 zone for choosing  > 0 Eventually You Retract to the Truth (In Probability) Revision Counter: 2

164 But Standard (Ockham) Testing Practice Requires Just One Retraction!  = 0 sample mean  mean  = 0 zone for choosing  = 0 Revision Counter: 0

165 In The Simplest World, No Retractions  = 0 sample mean  mean  = 0 zone for choosing  = 0 Revision Counter: 0

166  = 0 sample mean  mean  = 0 zone for choosing  = 0 Revision Counter: 0 In The Simplest World, No Retractions

167 In Remaining Worlds, at Most One Retraction  > 0  mean  > 0 zone for choosing  > 0 Revision Counter: 0

168  > 0 mean Revision Counter: 0 In Remaining Worlds, at Most One Retraction  > 0 zone for choosing  > 0

169  > 0 mean Revision Counter: 1 In Remaining Worlds, at Most One Retraction  > 0 zone for choosing  > 0 

170 So Ockham Beats All Violators Ockham: at most one revision. Ockham: at most one revision. Violator: at least two revisions in worst case Violator: at least two revisions in worst case

171 Summary Standard practice is to “test” the point hypothesis rather than the composite alternative. Standard practice is to “test” the point hypothesis rather than the composite alternative. This amounts to favoring the “simple” hypothesis a priori. This amounts to favoring the “simple” hypothesis a priori. It also minimizes revisions in probability! It also minimizes revisions in probability!

172 Two Dimensional Example Assume: independent bivariate normal distribution of unit variance. Assume: independent bivariate normal distribution of unit variance. Question: how many components of the joint mean are zero? Question: how many components of the joint mean are zero? Intuition: more nonzeros = more complex Intuition: more nonzeros = more complex Puzzle: How does it help to favor simplicity in less- than-simplest worlds? Puzzle: How does it help to favor simplicity in less- than-simplest worlds?

173 A Real Model Selection Method Bayes Information Criterion (BIC) Bayes Information Criterion (BIC) BIC(M, sample) = BIC(M, sample) = - log(max prob that M can assign to sample) + - log(max prob that M can assign to sample) + + log(sample size)  model complexity  ½. + log(sample size)  model complexity  ½. BIC method: choose M with least BIC score. BIC method: choose M with least BIC score.

174 Official BIC Property In the limit, minimizing BIC finds a model with maximal conditional probability when the prior probability is flat over models and fairly flat over parameters within a model. In the limit, minimizing BIC finds a model with maximal conditional probability when the prior probability is flat over models and fairly flat over parameters within a model. But it is also revision-efficient. But it is also revision-efficient.

175 AIC in Simplest World n = 2 n = 2  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

176 AIC in Simplest World n = 100 n = 100  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

177 AIC in Simplest World n = 4,000,000 n = 4,000,000  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

178 BIC in Simplest World n = 2 n = 2  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

179 BIC in Simplest World n = 100 n = 100  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

180 BIC in Simplest World n = 4,000,000 n = 4,000,000  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

181 BIC in Simplest World n = 20,000,000 n = 20,000,000  = (0, 0).  = (0, 0). Retractions = 0 Retractions = 0 Complex Simple

182 Performance in Complex World n = 2 n = 2  = (.05,.005).  = (.05,.005). Retractions = 0 Retractions = 0 Complex Simple 95%

183 Performance in Complex World n = 100 n = 100  = (.05,.005).  = (.05,.005). Retractions = 0 Retractions = 0 Complex Simple

184 Performance in Complex World n = 30,000 n = 30,000  = (.05,.005).  = (.05,.005). Retractions = 1 Retractions = 1 Complex Simple

185 Performance in Complex World n = 4,000,000 (!) n = 4,000,000 (!)  = (.05,.005).  = (.05,.005). Retractions = 2 Retractions = 2 Complex Simple

186 Question Does the statistical retraction minimization story extend to violations in less-than-simplest worlds? Does the statistical retraction minimization story extend to violations in less-than-simplest worlds? Recall that the deterministic argument for higher retractions required the concept of minimizing retractions in each “subproblem”. Recall that the deterministic argument for higher retractions required the concept of minimizing retractions in each “subproblem”. A “subproblem” is a proposition verified at a given time in a given world. A “subproblem” is a proposition verified at a given time in a given world. Some analogue “in probability” is required. Some analogue “in probability” is required.

187 Subproblem. H is an  -subroblem in w at n: H is an  -subroblem in w at n: There is a likelihood ratio test of {w} at significance <  such that this test has power < 1 -  at each world in H. There is a likelihood ratio test of {w} at significance <  such that this test has power < 1 -  at each world in H. w sample size = n  H reject worlds accept   reject

188 Significance Schedules A significance schedule  (.) is a monotone decreasing sequence of significance levels converging to zero that drop so slowly that power can be increased monotonically with sample size. A significance schedule  (.) is a monotone decreasing sequence of significance levels converging to zero that drop so slowly that power can be increased monotonically with sample size. a(n)a(n)a(n)a(n) a(n+1) n+1 n

189 Ockham Violation  Inefficient (  X,  Y ) Subproblem At sample size n

190 Ockham Violation  Inefficient (  X,  Y ) Ockham violation: Probably say blue hypothesis at white world (p >  ) Subproblem At sample size n

191 Ockham Violation  Inefficient (  X,  Y ) Probably say blue Probably say white Subproblem at time of violation

192 Ockham Violation  Inefficient (  X,  Y ) Probably say blue Probably say white Subproblem at time of violation

193 Ockham Violation  Inefficient (  X,  Y ) Probably say blue Probably say white Probably say blue Subproblem at time of violation

194 Oops! Ockham  Inefficient (  X,  Y ) Subproblem

195 Oops! Ockham  Inefficient (  X,  Y ) Subproblem

196 Oops! Ockham  Inefficient (  X,  Y ) Subproblem

197 Oops! Ockham  Inefficient (  X,  Y ) Subproblem

198 Oops! Ockham  Inefficient (  X,  Y ) Two retractions Subproblem

199 Local Retraction Efficiency (  X,  Y ) Subproblem Ockham does as well as best subproblem performance in some neighborhood of w. Ockham does as well as best subproblem performance in some neighborhood of w. At most one retraction Two retractions

200 Ockham Violation  Inefficient (  X,  Y ) Subproblem at time of violation Note: no neighborhood around w avoids extra retractions. Note: no neighborhood around w avoids extra retractions. w

201 Gonzo Ockham  Inefficient (  X,  Y ) Subproblem Gonzo = probably saying simplest answer in entire subproblem entered in simplest world. Gonzo = probably saying simplest answer in entire subproblem entered in simplest world.

202 Balance Be Ockham (avoid complexity) Be Ockham (avoid complexity) Don’t be Gonzo Ockham (avoid bad fit). Don’t be Gonzo Ockham (avoid bad fit). Truth-directed: sole aim is to find true model with minimal revisions! Truth-directed: sole aim is to find true model with minimal revisions! No circles: totally worst-case; no prior bias toward simple worlds. No circles: totally worst-case; no prior bias toward simple worlds.

203 THE END


Download ppt "Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly Department of Philosophy Carnegie Mellon University www.cmu.edu."

Similar presentations


Ads by Google