Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ockham’s Razor in Causal Discovery: A New Explanation Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation.

Similar presentations


Presentation on theme: "Ockham’s Razor in Causal Discovery: A New Explanation Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation."— Presentation transcript:

1 Ockham’s Razor in Causal Discovery: A New Explanation Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation Carnegie Mellon University www.hss.cmu.edu/philosophy/faculty-kelly.php

2 I. Prediction vs. Policy

3 Predictive Links Correlation or co-dependency allows one to predict Y from X. Correlation or co-dependency allows one to predict Y from X. Ash trays Lung cancer Ash trays Linked to Lung cancer! scientist policy maker

4 Policy Policy manipulates X to achieve a change in Y. Policy manipulates X to achieve a change in Y. Ash trays Lung cancer Prohibit ash trays! Ash trays Linked to Lung cancer!

5 Policy Policy manipulates X to achieve a change in Y. Policy manipulates X to achieve a change in Y. Ash trays Lung cancer We failed!

6 Correlation is not Causation Manipulation of X can destroy the correlation of X with Y. Manipulation of X can destroy the correlation of X with Y. Ash trays Lung cancer We failed!

7 Standard Remedy Randomized controlled study Randomized controlled study Ash trays Lung cancer That’s what happens if you carry out the policy.

8 Infeasibility Expense Expense Morality Morality Lead IQ Let me force a few thousand children to eat lead.

9 Infeasibility Expense Expense Morality Morality Lead IQ Just joking!

10 Ironic Alliance Lead IQ Ha! You will never prove that lead affects IQ… industry

11 Ironic Alliance Lead IQ And you can’t throw my people out of work on a mere whim.

12 Lead IQ So I will keep on polluting, which will never settle the matter because it is not a randomized trial. Ironic Alliance

13 II. Causes From Correlations

14 Causal Discovery Protein A Protein B Protein CCancer protein Patterns of conditional correlation can imply unambiguous causal conclusions Patterns of conditional correlation can imply unambiguous causal conclusions (Pearl, Spirtes, Glymour, Scheines, etc.) (Pearl, Spirtes, Glymour, Scheines, etc.) Eliminate protein C!

15 Basic Idea Causation is a directed, acyclic network over variables. Causation is a directed, acyclic network over variables. What makes a network causal is a relation of compatibility between networks and joint probability distributions. What makes a network causal is a relation of compatibility between networks and joint probability distributions. X Y Z X Y Z compatibility p G

16 Joint distribution p is compatible with directed, acyclic network G iff: Causal Markov Condition: each variable X is independent of its non-effects given its immediate causes. Faithfulness Condition: every conditional independence relation that holds in p is a consequence of the Causal Markov Cond. Compatibility YZ X W V V

17 B C Common Cause Common Cause B yields info about C (Faithfulness); B yields no further info about C given A (Markov). A A BC

18 Causal Chain Causal Chain B yields info about C (Faithfulness); B yields no further info about C given A (Markov). B A C A B C

19 Common Effect Common Effect B yields no info about C (Markov); B yields extra info about C given A (Faithfulness). A BC A BC

20 Distinguishability Distinguishability A BC A B C A C B A BC indistinguishable distinctive

21 Immediate Connections Immediate Connections There is an immediate causal connection between X and Y iff X is dependent on Y given every subset of variables not containing X and Y (Spirtes, Glymour and Scheines) XY No intermediate conditioning set breaks dependency XY Z W Some conditioning set breaks dependency

22 Recovery of Skeleton Recovery of Skeleton Apply preceding condition to recover every non- oriented immediate causal connection. XY Y Z skeleton XY Y Z truth

23 Orientation of Skeleton Orientation of Skeleton Look for the distinctive pattern of common effects. Common effect XY Y Z XY Y Z truth

24 Orientation of Skeleton Orientation of Skeleton Look for the distinctive pattern of common effects. Draw all deductive consequences of these orientations. Common effect XY Y Z Y is not common effect of ZY So orientation must be downward XY Y Z truth

25 Causation from Correlation Protein A Protein B Protein CCancer protein The following network is causally unambiguous if all variables are observed. The following network is causally unambiguous if all variables are observed.

26 Causation from Correlation Protein A Protein B Protein CCancer protein The red arrow is also immune to latent confounding causes The red arrow is also immune to latent confounding causes

27 Brave New World for Policy Protein A Protein B Protein CCancer protein Experimental (confounder-proof) conclusions from correlational data! Experimental (confounder-proof) conclusions from correlational data! Eliminate protein C!

28 III. The Catch

29 Metaphysics vs. Inference The above results all assume that the true statistical independence relations for p are given. The above results all assume that the true statistical independence relations for p are given. But they must be inferred from finite samples. But they must be inferred from finite samples. Sample Inferred statistical dependencies Causal conclusions

30 Problem of Induction Independence is indistinguishable from sufficiently small dependence at sample size n. Independence is indistinguishable from sufficiently small dependence at sample size n. independence dependence data

31 Bridging the Inductive Gap Assume conditional independence until the data show otherwise. Assume conditional independence until the data show otherwise. Ockham’s razor: assume no more causal complexity than necessary. Ockham’s razor: assume no more causal complexity than necessary.

32 Inferential Instability No guarantee that small dependencies will not be detected later. No guarantee that small dependencies will not be detected later. Can have spectacular impact on prior causal conclusions. Can have spectacular impact on prior causal conclusions.

33 Current Policy Analysis Protein A Protein B Protein CCancer protein Eliminate protein C! Protein A Protein B Protein CCancer protein

34 As Sample Size Increases… Rescind that order ! Protein A Protein B Protein CCancer protein weak Protein D

35 As Sample Size Increases Again… Eliminate protein C again! Protein A Protein B Protein CCancer protein weak Protein D Protein E weak

36 As Sample Size Increases Again… Protein A Protein B Protein CCancer protein weak Protein D Protein E weak Etc. Eliminate protein C again!

37 Typical Applications Linear Causal Case: each variable X is a linear function of its parents and a normally distributed hidden variable called an “error term”. The error terms are mutually independent. Linear Causal Case: each variable X is a linear function of its parents and a normally distributed hidden variable called an “error term”. The error terms are mutually independent. Discrete Multinomial Case: each variable X takes on a finite range of values. Discrete Multinomial Case: each variable X takes on a finite range of values.

38 No unobserved latent confounding causes No unobserved latent confounding causes An Optimistic Concession An Optimistic Concession Genetics SmokingCancer

39 Causal Flipping Theorem Causal Flipping Theorem No matter what a consistent causal discovery procedure has seen so far, there exists a pair G, p satisfying the above assumptions so that the current sample is arbitrarily likely in p and the procedure produces arbitrarily many opposite conclusions in p about an arbitrary causal arrow in G as sample size increases. No matter what a consistent causal discovery procedure has seen so far, there exists a pair G, p satisfying the above assumptions so that the current sample is arbitrarily likely in p and the procedure produces arbitrarily many opposite conclusions in p about an arbitrary causal arrow in G as sample size increases. oops I meant oops I meant

40 Causal Flipping Theorem Causal Flipping Theorem Every consistent causal inference method is covered. Every consistent causal inference method is covered. Therefore, multiple instability is an intrinsic feature of the causal discovery problem. Therefore, multiple instability is an intrinsic feature of the causal discovery problem. oops I meant oops I meant

41 The Crooked Course "Living in the midst of ignorance and considering themselves intelligent and enlightened, the senseless people go round and round, following crooked courses, just like the blind led by the blind." Katha Upanishad, I. ii. 5.

42 Extremist Reaction Since causal discovery cannot lead straight to the truth, it is not justified. Since causal discovery cannot lead straight to the truth, it is not justified. I must remain silent. Therefore, I win.

43 Moderate Reaction Many explanations have been offered to make sense of the here-today-gone-tomorrow nature of medical wisdom — what we are advised with confidence one year is reversed the next — but the simplest one is that it is the natural rhythm of science. Many explanations have been offered to make sense of the here-today-gone-tomorrow nature of medical wisdom — what we are advised with confidence one year is reversed the next — but the simplest one is that it is the natural rhythm of science. (Do We Really Know What Makes us Healthy?, NY Times Magazine, Sept. 16, 2007). (Do We Really Know What Makes us Healthy?, NY Times Magazine, Sept. 16, 2007).

44 Skepticism Inverted Unavoidable retractions are justified because they are unavoidable. Unavoidable retractions are justified because they are unavoidable. Avoidable retractions are not justified because they are avoidable. Avoidable retractions are not justified because they are avoidable. So the best possible methods for causal discovery are those that minimize causal retractions. So the best possible methods for causal discovery are those that minimize causal retractions. The best possible means for finding the truth are justified. The best possible means for finding the truth are justified.

45 Larger Proposal The same holds for Ockham’s razor in general when the aim is to find the true theory. The same holds for Ockham’s razor in general when the aim is to find the true theory.

46 IV. Ockham’s Razor

47 Which Theory is Right? ???

48 Ockham Says: Choose the Simplest!

49 But Why? Gotcha!

50 Puzzle An indicator must be sensitive to what it indicates. An indicator must be sensitive to what it indicates. simple

51 Puzzle An indicator must be sensitive to what it indicates. An indicator must be sensitive to what it indicates. complex

52 Puzzle But Ockham’s razor always points at simplicity. But Ockham’s razor always points at simplicity. simple

53 Puzzle But Ockham’s razor always points at simplicity. But Ockham’s razor always points at simplicity. complex

54 Puzzle How can a broken compass help you find something unless you already know where it is? How can a broken compass help you find something unless you already know where it is? complex

55 Standard Accounts 1. Prior Simplicity Bias Bayes, BIC, MDL, MML, etc. 2. Risk Minimization SRM, AIC, cross-validation, etc.

56 1. Bayesian Account Ockham’s razor is a feature of one’s personal prior belief state. Ockham’s razor is a feature of one’s personal prior belief state. Short run: no objective connection with finding the truth (flipping theorem applies). Short run: no objective connection with finding the truth (flipping theorem applies). Long run: converges to the truth, but other prior biases would also lead to convergence. Long run: converges to the truth, but other prior biases would also lead to convergence.

57 2. Risk Minimization Acct. Risk minimization is about prediction rather than truth. Risk minimization is about prediction rather than truth. Urges using a false causal theory rather than the known true theory for predictive purposes. Urges using a false causal theory rather than the known true theory for predictive purposes. Therefore, not suited to exact science or to practical policy applications. Therefore, not suited to exact science or to practical policy applications.

58 V. A New Foundation for Ockham’s Razor

59 Connections to the Truth Short-run Reliability Short-run Reliability Too strong to be feasible when theory matters. Too strong to be feasible when theory matters. Long-run Convergence Long-run Convergence Too weak to single out Ockham’s razor Too weak to single out Ockham’s razor Complex Simple ComplexSimple

60 Middle Path Short-run Reliability Short-run Reliability Too strong to be feasible when theory matters. Too strong to be feasible when theory matters. “Straightest” convergence “Straightest” convergence Just right? Just right? Long-run Convergence Long-run Convergence Too weak to single out Ockham’s razor Too weak to single out Ockham’s razor ComplexSimple Complex Simple ComplexSimple

61 Empirical Problems T1T2T3 Set K of infinite input sequences. Set K of infinite input sequences. Partition of K into alternative theories. Partition of K into alternative theories. K

62 Empirical Methods T1T2T3 Map finite input sequences to theories or to “?”. Map finite input sequences to theories or to “?”. K T3 e

63 Method Choice T1T2T3 e1e1e2e2e3e3e4e4 Input history Output history At each stage, scientist can choose a new method (agreeing with past theory choices).

64 Aim: Converge to the Truth T1T2T3 K ?T2?T1... T1

65 Retraction Choosing T and then not choosing T next Choosing T and then not choosing T next T’ T ?

66 Aim: Eliminate Needless Retractions Truth

67 Aim: Eliminate Needless Retractions Truth

68 Aim: Eliminate Needless Delays to Retractions theory

69 application corollary application theory application corollary application corollary Aim: Eliminate Needless Delays to Retractions

70 Why Timed Retractions? Retraction minimization = generalized significance level. Retraction time minimization = generalized power.

71 Easy Retraction Time Comparisons T1 T2 T1 T2 T3 T2T4 T2 Method 1 Method 2 T4... at least as many at least as late

72 Worst-case Retraction Time Bounds T1T2 Output sequences T1T2 T1T2 T4 T3 T4... (1, 2, ∞)... T4 T1T2T3 T4T3...

73 Curve Fitting Data = open intervals around Y at rational values of X. Data = open intervals around Y at rational values of X.

74 Curve Fitting No effects: No effects:

75 Curve Fitting First-order effect: First-order effect:

76 Curve Fitting Second-order effect: Second-order effect:

77 Ockham Constant Linear Quadratic Cubic There yet? Maybe.

78 Ockham Constant Linear Quadratic Cubic There yet? Maybe.

79 Ockham Constant Linear Quadratic Cubic There yet? Maybe.

80 Ockham Constant Linear Quadratic Cubic There yet? Maybe.

81 Ockham Violation Constant Linear Quadratic Cubic There yet? Maybe.

82 Ockham Violation Constant Linear Quadratic Cubic I know you’re coming!

83 Ockham Violation Constant Linear Quadratic Cubic Maybe.

84 Ockham Violation Constant Linear Quadratic Cubic !!! Hmm, it’s quite nice here…

85 Ockham Violation Constant Linear Quadratic Cubic You’re back! Learned your lesson?

86 Violator’s Path Constant Linear Quadratic Cubic See, you shouldn’t run ahead Even if you are right!

87 Ockham Path Constant Linear Quadratic Cubic

88 More General Argument Required Cover case in which demon has branching paths (causal discovery) Cover case in which demon has branching paths (causal discovery)

89 More General Argument Required Cover case in which scientist lags behind (using time as a cost) Cover case in which scientist lags behind (using time as a cost) Come on!

90 Empirical Effects

91

92 May take arbitrarily long to discover But can’t be taken back

93 Empirical Effects May take arbitrarily long to discover But can’t be taken back

94 Empirical Effects May take arbitrarily long to discover But can’t be taken back

95 Empirical Effects May take arbitrarily long to discover But can’t be taken back

96 Empirical Effects May take arbitrarily long to discover But can’t be taken back

97 Empirical Effects May take arbitrarily long to discover But can’t be taken back

98 Empirical Effects May take arbitrarily long to discover But can’t be taken back

99 Empirical Theories True theory determined by which effects appear. True theory determined by which effects appear.

100 Empirical Complexity More complex

101 Background Constraints More complex

102 Background Constraints More complex

103 Ockham’s Razor Don’t select a theory unless it is uniquely simplest in light of experience. Don’t select a theory unless it is uniquely simplest in light of experience.

104 Weak Ockham’s Razor Don’t select a theory unless it among the simplest in light of experience. Don’t select a theory unless it among the simplest in light of experience.

105 Stalwartness Don’t retract your answer while it is uniquely simplest Don’t retract your answer while it is uniquely simplest

106 Stalwartness

107 Timed Retraction Bounds r(M, e, n) = the least timed retraction bound covering the total timed retractions of M along input streams of complexity n that extend e r(M, e, n) = the least timed retraction bound covering the total timed retractions of M along input streams of complexity n that extend e Empirical Complexity0123... M

108 Efficiency of Method M at e M converges to the truth no matter what; M converges to the truth no matter what; For each convergent M’ that agrees with M up to the end of e, and for each n: For each convergent M’ that agrees with M up to the end of e, and for each n: r(M, e, n)  r(M’, e, n) r(M, e, n)  r(M’, e, n) Empirical Complexity0123... MM’

109 M is Beaten at e There exists convergent M’ that agrees with M up to the end of e, such that There exists convergent M’ that agrees with M up to the end of e, such that For each n, r(M, e, n)  r(M’, e, n); For each n, r(M, e, n)  r(M’, e, n); Exists n, r(M, e, n) > r(M’, e, n). Exists n, r(M, e, n) > r(M’, e, n). Empirical Complexity0123... MM’

110 Ockham Efficiency Theorem Let M be a solution. The following are equivalent: Let M be a solution. The following are equivalent: M is always strongly Ockham and stalwart; M is always strongly Ockham and stalwart; M is always efficient; M is always efficient; M is never weakly beaten. M is never weakly beaten.

111 Example: Causal Inference Effects are conditional statistical dependence relations. Effects are conditional statistical dependence relations. X dep Y | {Z}, {W}, {Z,W} Y dep Z | {X}, {W}, {X,W} X dep Z | {Y}, {Y,W}...

112 Causal Discovery = Ockham’s Razor XYZW

113 Ockham’s Razor XYZW X dep Y | {Z}, {W}, {Z,W}

114 Causal Discovery = Ockham’s Razor XYZW X dep Y | {Z}, {W}, {Z,W} Y dep Z | {X}, {W}, {X,W} X dep Z | {Y}, {Y,W}

115 Causal Discovery = Ockham’s Razor XYZW X dep Y | {Z}, {W}, {Z,W} Y dep Z | {X}, {W}, {X,W} X dep Z | {Y}, {W}, {Y,W}

116 Causal Discovery = Ockham’s Razor XYZW X dep Y | {Z}, {W}, {Z,W} Y dep Z | {X}, {W}, {X,W} X dep Z | {Y}, {W}, {Y,W} Z dep W| {X}, {Y}, {X,Y} Y dep W| {Z}, {X,Z}

117 Causal Discovery = Ockham’s Razor XYZW X dep Y | {Z}, {W}, {Z,W} Y dep Z | {X}, {W}, {X,W} X dep Z | {Y}, {W}, {Y,W} Z dep W| {X}, {Y}, {X,Y} Y dep W| {X}, {Z}, {X,Z}

118 IV. Simplicity Defined

119 Approach Empirical complexity reflects nested problems of induction posed by the problem. Empirical complexity reflects nested problems of induction posed by the problem. Hence, simplicity is problem-relative but topologically invariant. Hence, simplicity is problem-relative but topologically invariant.

120 Empirical Problems T1T2T3 Set K of infinite input sequences. Set K of infinite input sequences. Partition Q of K into alternative theories. Partition Q of K into alternative theories. K

121 Simplicity Concepts A simplicity concept for (K, Q) is just a well- founded order < on a partition S of K with ascending chains of order type not exceeding omega such that: A simplicity concept for (K, Q) is just a well- founded order < on a partition S of K with ascending chains of order type not exceeding omega such that: 1. Each element of S is included in some answer in Q. 2. Each downward union in (S, <) is closed; 3. Incomparable sets share no boundary point. 4. Each element of S is included in the boundary of its successor.

122 Empirical Complexity Defined Let K|e denote the set of all possibilities compatible with observations e. Let K|e denote the set of all possibilities compatible with observations e. Let (S, <) be a simplicity concept for (K|e, Q). Let (S, <) be a simplicity concept for (K|e, Q). Define c(w, e) = the length of the longest < path to the cell of S that contains w. Define c(w, e) = the length of the longest < path to the cell of S that contains w. Define c(T, e) = the least c(w, e) such that T is true in w. Define c(T, e) = the least c(w, e) such that T is true in w.

123 Applications Polynomial laws: complexity = degree Polynomial laws: complexity = degree Conservation laws: complexity = particle types – conserved quantities. Conservation laws: complexity = particle types – conserved quantities. Causal networks: complexity = number of logically independent conditional dependencies entailed by faithfulness. Causal networks: complexity = number of logically independent conditional dependencies entailed by faithfulness.

124 General Ockham Efficiency Theorem Let M be a solution. The following are equivalent: Let M be a solution. The following are equivalent: M is always strongly Ockham and stalwart; M is always strongly Ockham and stalwart; M is always efficient; M is always efficient; M is never beaten. M is never beaten.

125 Conclusions Causal truths are necessary for counterfactual predictions. Causal truths are necessary for counterfactual predictions. Ockham’s razor is necessary for staying on the straightest path to the true theory but does not point at the true theory. Ockham’s razor is necessary for staying on the straightest path to the true theory but does not point at the true theory. No evasions or circles are required. No evasions or circles are required.

126 Future Directions Extension of unique efficiency theorem to stochastic model selection. Extension of unique efficiency theorem to stochastic model selection. Latent variables as Ockham conclusions. Latent variables as Ockham conclusions. Degrees of retraction. Degrees of retraction. Pooling of marginal Ockham conclusions. Pooling of marginal Ockham conclusions. Retraction efficiency assessment of MDL, SRM. Retraction efficiency assessment of MDL, SRM.

127 Suggested Reading "Ockham’s Razor, Truth, and Information", in Handbook of the Philosophy of Information, J. van Behthem and P. Adriaans, eds., to appear. "Ockham’s Razor, Truth, and Information", in Handbook of the Philosophy of Information, J. van Behthem and P. Adriaans, eds., to appear. "Ockham’s Razor, Truth, and Information" "Ockham’s Razor, Truth, and Information" "Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency", Theoretical Computer Science, 383: 270-289, 2007. "Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency", Theoretical Computer Science, 383: 270-289, 2007. "Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency" "Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency" Both available as pre-prints at: www.hss.cmu.edu/philosophy/faculty-kelly.php Both available as pre-prints at: www.hss.cmu.edu/philosophy/faculty-kelly.php

128 1. Prior Simplicity Bias The simple theory is more plausible now because it was more plausible yesterday.

129 More Subtle Version Simple data are a miracle in the complex theory but not in the simple theory. Simple data are a miracle in the complex theory but not in the simple theory. P C Regularity: retrograde motion of Venus at solar conjunction Has to be!

130 However… e would not be a miracle given P(  ); e would not be a miracle given P(  ); Why not this? C P

131 The Real Miracle Ignorance about model: p(C)  p(P); + Ignorance about parameter setting: p’(P(  ) | P)  p(P(  ’ ) | P). = Knowledge about C vs. P(  ): p(P(  )) << p(C). CP         Lead into gold. Perpetual motion. Free lunch. Sounds good!

132 Standard Paradox of Indifference Ignorance of red vs. not-red + Ignorance over not-red: = Knowledge about red vs. white.   Knognorance = All the priveleges of knowledge With none of the responsibilities Sounds good!

133 The Ellsberg Paradox 1/3??

134 Human Preference 1/3?? a > b ac < c b b

135 Human View 1/3?? a > b ac < c b b knowledge ignorance knowledgeignorance

136 Bayesian “Rationality” 1/3?? a > b ac > c b b knognorance

137 In Any Event The coherentist foundations of Bayesianism have nothing to do with short-run truth- conduciveness. Not so loud!

138 Bayesian Convergence Too-simple theories get shot down… Too-simple theories get shot down… Complexity Theories Updated opinion

139 Bayesian Convergence Plausibility is transferred to the next-simplest theory… Plausibility is transferred to the next-simplest theory… Blam! Complexity Theories Updated opinion Plink!

140 Bayesian Convergence Plausibility is transferred to the next-simplest theory… Plausibility is transferred to the next-simplest theory… Blam! Complexity Theories Updated opinion Plink!

141 Bayesian Convergence Plausibility is transferred to the next-simplest theory… Plausibility is transferred to the next-simplest theory… Blam! Complexity Theories Updated opinion Plink!

142 Bayesian Convergence The true theory is never shot down. The true theory is never shot down. Blam! Complexity Theories Updated opinion Zing!

143 Convergence But alternative strategies also converge: But alternative strategies also converge: Any theory choice in the short run is compatible with convergence in the long run. Any theory choice in the short run is compatible with convergence in the long run.

144 Summary of Bayesian Approach Prior-based explanations of Ockham’s razor are circular and based on a faulty model of ignorance. Prior-based explanations of Ockham’s razor are circular and based on a faulty model of ignorance. Convergence-based explanations of Ockham’s razor fail to single out Ockham’s razor. Convergence-based explanations of Ockham’s razor fail to single out Ockham’s razor.

145 2. Risk Minimization 2. Risk Minimization Ockham’s razor minimizes expected distance of empirical estimates from the true value. Ockham’s razor minimizes expected distance of empirical estimates from the true value. Truth

146 Unconstrained Estimates Unconstrained Estimates are Centered on truth but spread around it. are Centered on truth but spread around it. Pop! Unconstrained aim

147 Off-center but less spread. Off-center but less spread. Clamped aim Truth Constrained Estimates Constrained Estimates

148 Off-center but less spread Off-center but less spread Overall improvement in expected distance from truth… Overall improvement in expected distance from truth… Truth Pop! Constrained Estimates Constrained Estimates Clamped aim

149 Doesn’t Find True Theory Doesn’t Find True Theory The theory that minimizes estimation risk can be quite false… The theory that minimizes estimation risk can be quite false… Four eyes! Clamped aim

150 Makes Sense Makes Sense …when loss of an answer is similar in nearby distributions. Similarity p Close is good enough! Loss

151 But Not When Truth Matters But Not When Truth Matters …i.e., when loss of an answer is discontinuous with similarity. Similarity p Close is no cigar! Loss

152 Examples Theoretical science: small terms matter Theoretical science: small terms matter Causal discovery: practical policy depends on causal orientation which depends upon small dependencies. Causal discovery: practical policy depends on causal orientation which depends upon small dependencies.


Download ppt "Ockham’s Razor in Causal Discovery: A New Explanation Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation."

Similar presentations


Ads by Google