Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 2010Max Albert, JLU Giessen 1 1. The Problem 1.1 Critical Rationalism (CR) 1.2 Extending CR to Statistics 2. Maximizing Empirical Content 3. Conclusions.

Similar presentations


Presentation on theme: "May 2010Max Albert, JLU Giessen 1 1. The Problem 1.1 Critical Rationalism (CR) 1.2 Extending CR to Statistics 2. Maximizing Empirical Content 3. Conclusions."— Presentation transcript:

1 May 2010Max Albert, JLU Giessen 1 1. The Problem 1.1 Critical Rationalism (CR) 1.2 Extending CR to Statistics 2. Maximizing Empirical Content 3. Conclusions Max Albert, Justus Liebig University Giessen A Falsificationist Theory of Statistical Inference

2 May 2010Max Albert, JLU Giessen 2 1.1 Critical Rationalism (CR) Ingredients Law-like hypothesis H =  x (Cx  Fx) Background knowledge B: corroborated assumptions, uncon- tentious observation statements, neither B  H nor B   H Severe test: Requires a test situation a („trial“) where Ca holds, and where B does not already imply Fa. Basic Methodological Rules Make severe tests. Falsification: Reject H if you observe Ca  Fa. Corroboration: Accept H if you observe Ca  Fa. Note: These are decision rules, or rules concerning rational beliefs. The complete argument is deductive (with premises concerning rational beliefs), not inductive (Musgrave 1993).

3 May 2010Max Albert, JLU Giessen 3 Possible Errors: standard fare in CR First-kind error (FKE): erroneous falsification (false observation statements, due to observational errors) Second-kind error (SKE): erroneous corroboration (false ob- servations statements or factors other than Ca leading to Qa) Safeguards: Improve basic methodological rules. Accept observation statements only if they come from certain trusted sources (qualified personnel, standard procedures, in- centives to be thorough and truthful). Make several trials. Vary conditions that should be irrelevant. Error Probabilities Error probabilities of safeguarding procedures are unknown. Error probabilites in statistical testing come from an additional source (“sampling error”). Total error probabilities are therefore always unknown.

4 May 2010Max Albert, JLU Giessen 4 Blocking Ad-hoc Explanations Consider H =  x (Cx  Fx) and relevant data Ca  Fa, Cb  Fb. Afterwards, one can often find some different initial condition T that was also fulfilled in these trials (i.e., find that Ta, Tb). Therefore, one might claim that, for instance, H*=  x (Tx  Fx) is also corroborated because it also explains the data. This is usually not accepted: proponents of H* will have to pro- vide new data like, ideally, Ch   Th   Fh and  Ck  Tk  Fk. Use Novelty Criterion (UNC, Worrall 2010): Old data that have been used to construct H* do not support H*. In other words: New data are required in order to show that a SKE occurred. UNC regulates scientific competition: It provides incentives for new empirical work and prevents purely parasitic theories.

5 May 2010Max Albert, JLU Giessen 5 1.2 Extending CR to Statistics Ingredients Law-like hypothesis H = (C,X,P): If initial condition C holds, then X is distributed according to probability distribution P. Propensity interpretation, causal hypothesis: C is a (genera- lized) cause of the X values (Albert 2007). H implies that X is i.i.d. in different trials. Add methodological rules for checking the then-part. Fisher‘s Theory of Significance Testing Let H 0 = (C,X,P). 1. Choose a level of significance . 2. Choose a rejection region R with P(X  R) =  where densities are lowest („cut off the tails“).

6 May 2010Max Albert, JLU Giessen 6 Critical Discussion of Fisher‘s Theory Neyman‘s Paradox: Transformations of X can map low density areas into high density areas and vice versa. Neyman-Pearson Theory (NPT): Choose H 1 = (C,X,P 1 ). Maxi- mize the power P 1 (X  R) = 1-  given P(X  R)  . But: Unless the statistical model H 0  H 1 is corroborated, this ignores third-kind errors (misspecification: H 0  H 1 false). Mayo & Spanos (2006): Fisher’s theory supplies misspecifica- tion tests for statistical models. However: If Fisher’s theory works, H 0 can be tested in isolation. My position: Neyman’s Paradox poses no problems (Albert 2001). Falsificationism supports Fisher’s basic idea (Albert 1992). The NPT applies if a statistical model follows from background knowledge. If not, hypotheses should be tested in isolation.

7 May 2010Max Albert, JLU Giessen 7 2. Maximizing Empirical Content Fact: All observable random variables are discrete (and finite). Consequence Neyman‘s Paradox cannot occur. “Low probability” replaces “low density”: Choose a rejection region R with P(X  R) =  where probabilities are lowest. Example: For  = 0.1, choose R = {1,6}, not R = {2}. But why? Falsificationist Reason: Given , the rule maximizes the “empi- rical content” (R’s share of the sample space).

8 May 2010Max Albert, JLU Giessen 8 Empirical Content (EC) for Sample Size n Consider H = (C,X,P) with X  {1,...,k}, P(X=j) = p j and sample space S n = {(h 1,...,h k ):  j h j = n}, with h j as the frequency of X = j. Let the rejection region be R  S n. Then EC(R) = |R|/|S n |. Maximizing EC(R) for P(X  R)   yields the multinomial good- ness-of-fit (mgof) test (approx. by the  2 gof test for n·p j  20). For each n, the mgof test yields a trade-off between 1-  and EC. The trade-off is concave to the origin and shifts outward with increasing n. Interpretation of EC Non-probabilistic measure of the vulnerability of the hypothesis. Substitute for power. Severe test: 1-  and EC large (p 1,...,p 6 ) = (1,2,5,7,4,1)/20 n = 1, 5, 10, 20

9 May 2010Max Albert, JLU Giessen 9 Problem: Why do samples differing w.r.t. order, like {Head, Tail} and {Tail, Head} in coin tossing, just count as one element? Hacking (1965): Neglecting order can only be justified by appeal to alternative hypotheses. This is not quite right: H 0 = (C,X,P) implies that order is irrelevant. Ex post, one can always find “suspicious patterns” in the data. But the data do not support new hypotheses H j = (C j,X,P j ) ex- plaining these patterns (UNC). Thus, tests can neglect order. Hypotheses inspired by patterns in the data must be tested on their own. But: Deciding how to count requires a further argument. The best argument takes the alternatives H j = (C,X,P j ) into account: When the sample size n increases, the power of the test w.r.t. these alternatives goes up, approaching 1 for n  .

10 May 2010Max Albert, JLU Giessen 10 4. Conclusions 1. The multinomial goodness-of-fit test fulfills all the require- ments of a falsificationist theory of statistical inference. 2. The NPT can be used by critical rationalists when the statistical model follows from background knowledge. 3. If there is no such model, each hypothesis of a disjunction (compound hypothesis) should be evaluated on its own (“disjunctive test criterion”, defended by Bowley against Fisher, see Baird BJPS 1983 on the  2 controvery). 4. Any kind of test can be used for heuristic purposes (dia- gnostic tests, search for suspicious patterns). This should not be confused with testing. Hypotheses found in this way have to be tested with new data (UNC).


Download ppt "May 2010Max Albert, JLU Giessen 1 1. The Problem 1.1 Critical Rationalism (CR) 1.2 Extending CR to Statistics 2. Maximizing Empirical Content 3. Conclusions."

Similar presentations


Ads by Google