Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo

Similar presentations


Presentation on theme: "Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo"— Presentation transcript:

1 Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo arossman@calpoly.edu

2 2 22 Outline 2 × 2 tables  Activity/example 1: Dolphin therapy?  Activity/example 2: Murderous nurse? Quantitative response  Activity/example 3: Sleep deprivation?  Activity/example 4: Age discrimination?  Activity/example 5: Memory study? Extensions, reflections, further reading

3 3 33 Example 1: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

4 4 44 Example 1 (cont.) Standard approach: Could calculate test statistic, p- value from approximate sampling distribution (z, chi-square)  But technical conditions do not hold  But this would be approximate anyway  But how does this relate to what “significance” means?

5 5 55 Example 1 (cont.) Alternative: Simulate random assignment process many times, see how often such an extreme result occurs  Assume no treatment effect (null model)  Re-randomize 30 subjects to two groups (using cards) Assuming 13 improvers, 17 non-improvers regardless  Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions  Repeat large number of times (turn to computer)  Ask whether observed result is in tail of distribution Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective

6 6 66 Example 1 (cont.) www.rossmanchance.com/applets (Dolphin study)

7 7 77 Example 1 (cont.) Conclusion: Experimental result is statistically significant  What does that mean; what is logic behind that? Experimental result very unlikely to occur by chance alone A difference in success proportions at least as large as.467 (in favor of dolphin group) would happen in less than 2% of all possible random assignments if dolphin therapy was not effective

8 8 Example 1 (cont.) Exact randomization distribution  Hypergeometric distribution  Fisher’s Exact Test  p-value = =.0127

9 9 Example 2: Murderous Nurse? Murder trial: U.S. vs. Kristin Gilbert  Accused of giving patients fatal dose of heart stimulant  Data presented for 18 months of 8-hour shifts  Relative risk: 6.34

10 10 Example 2 (cont.) Structurally the same as previous example, but with one crucial difference  No random assignment to groups Observational study Allows many potential explanations other than “random chance”  Confounding variables  Perhaps she worked intensive care unit or night shift  Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled out as an explanation  Some statisticians disagree

11 11 Example 2 (cont.) Simulation results p-value: less than 1 in a billion

12 12 Example 2 (cont.) Incredibly unlikely to observe such a difference/ratio by chance alone, if there were no difference between the groups  But this does not prove, or perhaps even strongly suggest, guilt Observational study Allows many potential explanations other than “random chance”  Confounding variables  Perhaps she worked intensive care unit or night shift

13 13 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later?  21 subjects; random assignment Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

14 14 Example 3 (cont.) Could calculate test statistic, p-value from approximate “sampling” distribution (if conditions are met)

15 15 Example 3 (cont.) Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs Start with tactile simulation using index cards  Write each “score” on a card  Shuffle the cards  Randomly deal out 11 for deprived group, 10 for unrestricted group  Calculate difference in group means  Repeat many times

16 16 Example 3 (cont.) Then use technology to simulate this randomization process Applet: www.rossmanchance.com/applets/ (Randomization Tests)

17 17 Example 3 (cont.) Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later  Justification: Experimental results as extreme as those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation Easy to analyze medians instead

18 18 Example 3 (cont.) Exact randomization distribution: Exact p-value 2533/352,716 =.0072

19 19 Example 4: Age discrimination? Martin vs. Westvaco (Statistics in Action) Employee ages:  25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Fired employee ages in bold:  25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Robert Martin: 55 years old Do the data provide evidence that the firing process was not “random”  How unlikely is it that a “random” firing process would produce such a large average age?

20 20 Example 4 (cont.) Exact permutation distribution: Exact p-value: 6 / 120 =.05

21 21 Example 5: Memorizing letters You will be given a string of 30 letters  Memorize as many as you can in 20 seconds (in order) Design questions  What kind of study is this?  What kind of randomness was used in this study?  What are the variable, and what kind are they? Analysis questions  Do boxplots suggest a significant difference?  Simulate a randomization test, interpret the results

22 22 Extensions Matched pairs design  Randomize within pairs (e.g., by flipping coin) Comparing more than 2 groups  Alternative to chi-square, ANOVA  Same use of randomization Somewhat harder to define test statistic Regression/correlation  Randomize/permute one of the variables

23 23 Reflections You can do this at beginning of course  Then repeat for new scenarios with more richness  Spiraling could lead to deeper conceptual understanding Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized  Flexibility in choice of test statistic (e.g. medians, odds ratio)  Generalize to more than two groups Takes advantage of modern computing power  Does not require assumptions of normality

24 24 Fisher on randomization tests “The statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.” – R.A. Fisher (1936)

25 25 Ptolemaic curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007)

26 26 Further reading Ernst (2005), Statistical Science Scheaffer and Tabor (2008), Mathematics Teacher Rossman (2008), Statistics Education Research Journal Statistics: A Guide to the Unknown (ed. R. Peck) NSF-funded project: http://statweb.calpoly.edu/csi/

27 27 More information Please feel free to contact me  arossman@calpoly.edu Thanks very much!


Download ppt "Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo"

Similar presentations


Ads by Google