Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

Panel at 2013 Joint Mathematics Meetings
An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Comparing Two Proportions (p1 vs. p2)
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
ANALYZING MORE GENERAL SITUATIONS UNIT 3. Unit Overview  In the first unit we explored tests of significance, confidence intervals, generalization, and.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
CHAPTER 13: Binomial Distributions
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Statistical Decision Making
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
Introducing Concepts of Statistical Inference Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
The Practice of Statistics
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
BCOR 1020 Business Statistics
Common Core State Standards for Mathematics Making Inferences and Justifying Conclusions S-IC Math.S-IC.5. Use data from a randomized experiment to compare.
Simulation and Resampling Methods in Introductory Statistics Michael Sullivan Joliet Junior College
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
Chapter 5 Sampling Distributions
QNT 531 Advanced Problems in Statistics and Research Methods
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008.
Inference for a Single Population Proportion (p).
Using Lock5 Statistics: Unlocking the Power of Data
Chapter 8 Introduction to Hypothesis Testing
Using Activity- and Web-Based Materials in Post-Calculus Probability and Statistics Courses Allan Rossman (and Beth Chance) Cal Poly – San Luis Obispo.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 10, Slide 1 Chapter 10 Understanding Randomness.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
No criminal on the run The concept of test of significance FETP India.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Section 10.1 Estimating with Confidence AP Statistics February 11 th, 2011.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Copyright © 2009 Pearson Education, Inc. Chapter 11 Understanding Randomness.
The normal approximation for probability histograms.
Review Statistical inference and test of significance.
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Assessing Statistical Significance ROSS 2016 Lane-Getaz.
Statistics 200 Objectives:
Introducing Statistical Inference with Resampling Methods (Part 1)
Simulation Based Inference for Learning
Randomization Tests PSU /2/14.
What Is a Test of Significance?
Unit 5: Hypothesis Testing
When we free ourselves of desire,
Understanding Randomness
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Stat 217 – Day 28 Review Stat 217.
Using Simulation Methods to Introduce Inference
Using Simulation Methods to Introduce Inference
Presentation transcript:

Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State University

2 CAUSE Webinar April Outline Overview, motivation Three examples Merits, advantages Five questions Assessment issues Conclusions, lessons learned Q&A

3 3 Ptolemaic Curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007)

4 Is randomization-based approach feasible? Experience at post-calculus level  Developed spiral curriculum with logic of inference (Fisher’s Exact Test) in chapter 1  ISCAM: Investigating Statistical Concepts, Applications, and Methods New project  Rethinking for lower mathematical level  More complete shift, including focus on entire statistical process as a whole 4

5 5 Example 1: Helper/hinderer? Sixteen infants were shown two videotapes with a toy trying to climb a hill  One where a “helper” toy pushes the original toy up  One where a “hinderer” toy pushes the toy back down Infants were then presented with the two toys as wooden blocks  Researchers noted which toy infants chose r-Hinderer.html r-Hinderer.html

6 6 Example 1: Helper/hinderer? Data: 14 of the 16 infants chose the “helper” toy Core question of inference:  Is such an extreme result unlikely to occur by chance (random selection) alone …  … if there were no genuine preference (null model)?

7 7 Analysis options Could use a binomial probability calculation We prefer a simulation approach  To emphasize issue of “how often would this happen in long run?”  Starting with tactile simulation

8 8 Strategy Students flip a fair coin 16 times  Count number of heads, representing choices of “helper” toy  Fair coin represent null model of no genuine preference Repeat several times, combine results  See how surprising to get 14 or more heads even with “such a small sample size”  Approximate (empirical) P-value Turn to applet for large number of repetitions: st3/BinomDist.html st3/BinomDist.html

9 Results  Pretty unlikely to obtain 14 or more heads in 16 tosses of a fair coin, so …  Pretty strong evidence that infants do have genuine preference for helper toy and were not just picking at random

10 Example 2: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

11 Some approaches Could calculate test statistic, P-value from approximate sampling distribution (z, chi-square)  But it’s approximate  But conditions might not hold  But how does this relate to what “significance” means? Could conduct Fisher’s Exact Test  But there’s a lot of mathematical start-up required  But that’s still not closely tied to what “significance” means Even though this is a randomization test

12 Alternative approach Simulate random assignment process many times, see how often such an extreme result occurs  Assume no treatment effect (null model)  Re-randomize 30 subjects to two groups (using cards) Assuming 13 improvers, 17 non-improvers regardless  Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions  Repeat large number of times (turn to computer)  Ask whether observed result is in tail of distribution Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective

13 Analysis hins/Dolphins.html hins/Dolphins.html

14 Conclusion Experimental result is statistically significant  And what is the logic behind that? Observed result very unlikely to occur by chance (random assignment) alone (if dolphin therapy was not effective)

15 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later?  21 subjects; random assignment Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

16 One approach Calculate test statistic, p-value from approximate sampling distribution

17 Another approach Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs

18 Advantages You can do this at beginning of course  Then repeat for new scenarios with more richness  Spiraling could lead to deeper conceptual understanding Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized  Flexibility in choice of test statistic (e.g. medians, odds ratio)  Generalize to more than two groups Takes advantage of modern computing power

19 Question #1 Should we match type of randomness in simulation to role of randomness in data collection?  Major goal: Recognize distinction between random assignment and random sampling, and the conclusions that each permit  Or should we stick to “one crank” (always re-randomize) in the analysis, for simplicity’s sake?  For example, with 2 × 2 table, always fix both margins, or only fix one margin (random samples from two independent groups), or fix neither margin (random sampling from one group, then cross-classifying)

20 Question #2 What about interval estimation?  Estimating effect size at least as important as assessing significance How to introduce this?  Invert test Test “all” possible values of parameter, see which do not put observed result in tail Easy enough with binomial, but not as obvious how to introduce this (or if it’s possible) with 2×2 tables  Alternative: Estimate +/- margin-of-error Could estimate margin-of-error with empirical randomization distribution or bootstrap distribution

21 Question #3 How much bootstrapping to introduce, and at what level of complexity?  Use to approximate SE only?  Use percentile intervals?  Use bias-correction? Too difficult for Stat 101 students? Provide any helpful insights?

22 Question #4 What computing tools can help students to focus on understanding ideas?  While providing powerful, generalizable tool? Some possibilities  Java applets, Flash Very visual, contextual, conceptual; less generalizable  Minitab Provide students with macros? Or ask them to edit? Or ask them to write their own? RR Need simpler interface?  Other packages?  StatCrunch, JMP h ave been adding resampling capabilities

23 Question #5 What about normal-based methods? Do not ignore them!  Introduce after students have gained experience with randomization-based methods  Students will see t-tests in other courses, research literature  Process of standardization has inherent value  A common shape often arises for empirical randomization/sampling distributions Duh!

24 Assessment: Developing instruments that assess … Conceptual understanding of core logic of inference  Jargon-free multiple choice questions on interpretation, effect size, etc.  “Interpret this p-value in context”: probability of observed data, or more extreme, under randomness, if null model is true Ability to apply to new studies, scenarios  Define null model, design simulation, draw conclusion  More complicated scenarios (e.g., compare 3 groups)

Understanding of components of activity/simulation Designed for use after an in-class activity using simulation. Example Questions  What did the cards represent?  What did shuffling and dealing the cards represent?  What implicit assumption about the two groups did the shuffling of cards represent?  What observational units were represented by the dots on the dotplot?  Why did we count the number of repetitions with 10 or more “successes” (that is, why 10)? 25

26 Conducting small classroom experiments Research Questions:  Start with study that has with significant result or non?  Start with binomial setting or 2×2 table?  Do tactile simulations add value beyond computer ones?  Do demonstrations of simulations provide less value than student-conducted simulations?

27 Conclusions/Lessons Learned Put core logic of inference at center  Normal-based methods obscure this logic  Develop students’ understanding with randomization-based inference  Emphasize connections among Randomness in design of study Inference procedure Scope of conclusions  But more difficult than initially anticipated “Devil is in the details”

Conclusions/Lessons Learned Don’t overlook null model in the simulation Simulation vs. Real study Plausible vs. Possible How much worry about being a tail probability How much worry about p-value = probability that null hypothesis is true 28

29 Thanks very much! Thanks to NSF (DUE-CCLI # ) Thanks to George Cobb, advisory group More information:  Draft modules, assessment instruments  Questions/comments: