Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Section 9.1 ~ Fundamentals of Hypothesis Testing Introduction to Probability and Statistics Ms. Young.
Chapter 10: Hypothesis Testing
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Introducing Concepts of Statistical Inference Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Inference about Population Parameters: Hypothesis Testing
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Simulation and Resampling Methods in Introductory Statistics Michael Sullivan Joliet Junior College
How Can We Test whether Categorical Variables are Independent?
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Sampling Theory Determining the distribution of Sample statistics.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 6 Copyright © 2015 by R. Halstead. All rights reserved.
CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008.
Using Lock5 Statistics: Unlocking the Power of Data
Chapter 8 Introduction to Hypothesis Testing
Significance Tests: THE BASICS Could it happen by chance alone?
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Hypothesis Testing. The 2 nd type of formal statistical inference Our goal is to assess the evidence provided by data from a sample about some claim concerning.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
CHAPTER 9 Testing a Claim
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Significance Tests Section Cookie Monster’s Starter Me like Cookies! Do you? You choose a card from my deck. If card is red, I give you coupon.
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
Section 10.2: Tests of Significance Hypothesis Testing Null and Alternative Hypothesis P-value Statistically Significant.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Sampling Theory Determining the distribution of Sample statistics.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Copyright © 2009 Pearson Education, Inc. Chapter 11 Understanding Randomness.
Review Statistical inference and test of significance.
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Assessing Statistical Significance ROSS 2016 Lane-Getaz.
Introducing Statistical Inference with Resampling Methods (Part 1)
What Is a Test of Significance?
Unit 5: Hypothesis Testing
11/16/2016 Examples for Implementing Revised GAISE Guidelines ( Allan J. Rossman Dept of Statistics.
CHAPTER 9 Testing a Claim
Using Simulation Methods to Introduce Inference
CHAPTER 9 Testing a Claim
Using Simulation Methods to Introduce Inference
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo

2 AMATYC webinar April Outline Who are you? Overview, motivation Three examples Advantages Implementation suggestions Assessment suggestions Resources Q&A

Who are you? How many years have you been teaching?  < 1 year  1-3 years  4-8 years  8-15 years  > 15 years AMATYC webinar April

Who are you? How many years have you been teaching statistics?  Never  1-3 years  4-8 years  8-15 years  > 15 years AMATYC webinar April

Who are you? What is your background in statistics?  No formal background  A course or two  Several courses but no degree  Undergraduate degree in statistics  Graduate degree in statistics  Other AMATYC webinar April

Who are you? Have you used simulation in teaching statistics?  Never  A bit, to demonstrate probability ideas  Somewhat, to demonstrate sampling distributions  A great deal, as an inference tool as well as for pedagogical demonstrations AMATYC webinar April

p-values Do your students ever struggle if you ask them to explain to you what a p-value represents?  Never  Sometimes  Always AMATYC webinar April

p-values Do your students ever ask you whether they want a large p-value or a small p-value?  Never  Sometimes  Always AMATYC webinar April

99 Ptolemaic Curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007) AMATYC webinar April 2016

10 Example 1: Helper/hinderer? Sixteen pre-verbal infants were shown two videos of a toy trying to climb a hill  One where a “helper” toy pushes the original toy up  One where a “hinderer” toy pushes the toy back down Infants were then presented with the two toys from the videos  Researchers noted which toy then infant chose to play with r-Hinderer.html r-Hinderer.html AMATYC webinar April 2016

11 Example 1: Helper/hinderer? Data: 14 of the 16 infants chose the “helper” toy Two possible explanations  Infants choose randomly, no genuine preference, researchers just got lucky  Infants have a genuine preference for the helper toy Core question of inference:  Is such an extreme result unlikely to occur by chance (random choice) alone …  … if there were no genuine preference (null model)? AMATYC webinar April 2016

12 Analysis options Could use the normal approximation to the binomial, but sample size is too small for CLT Could use a binomial probability calculation We prefer a simulation approach  To illustrate “how often would we get a result like this just by random chance?”  Starting with tactile simulation AMATYC webinar April 2016

13 Strategy Students flip a fair coin 16 times  Count number of heads, representing choices of helper and hinderer toys  Under the null model of no genuine preference Repeat several times, combine results  See how surprising it is to get 14 or more heads even with “such a small sample size”  Approximate (empirical) p-value Turn to applet for large number of repetitions: (One Proportion) AMATYC webinar April 2016

14 Results  Pretty unlikely to obtain 14 or more heads in 16 tosses of a fair coin, so …  Pretty strong evidence that pre-verbal infants do have a genuine preference for helper toy and were not just choosing at random AMATYC webinar April 2016

Follow-up activity Facial prototyping  Who is on the left – Bob or Tim? AMATYC webinar April

Follow-up activity Facial prototyping  Does our sample result provide convincing evidence that people have a genuine tendency to assign the name Tim to the face on the left?  How can we use simulation to investigate this question?  What conclusion would you draw?  Explain reasoning process behind conclusion AMATYC webinar April

17 Example 2: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? AMATYC webinar April 2016

18 Some approaches Could calculate test statistic, p-value from approximate sampling distribution (z, chi-square)  But it’s approximate  But conditions might not hold  But how does this relate to what “significance” means? Could conduct Fisher’s Exact Test  But there’s a lot of mathematical start-up required  But that’s still not closely tied to what “significance” means Even though this is a randomization test AMATYC webinar April 2016

19 Alternative approach Simulate random assignment process many times, see how often such an extreme result occurs  30 index cards representing 30 subjects  Assume no treatment effect (null model) 13 improver cards, 17 non-improver cards  Re-randomize 30 subjects to two groups of 15 and 15  Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions  Repeat large number of times (turn to computer)  Ask whether observed result is in tail of distribution AMATYC webinar April 2016 ? ?

20 Analysis (Two Proportions) AMATYC webinar April

21 Conclusion Experimental result is statistically significant  And what is the logic behind that? Observed result very unlikely to occur by chance (random assignment) alone (if dolphin therapy was not effective) Providing evidence that dolphin therapy is more effective AMATYC webinar April 2016

22 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later?  21 subjects; random assignment Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? AMATYC webinar April 2016

23 One approach Calculate test statistic, p-value from approximate sampling distribution AMATYC webinar April 2016

24 Another approach Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs AMATYC webinar April 2016

25 Advantages You can do this from beginning of course! Emphasizes entire process of conducting statistical investigations to answer real research questions  From data collection to inference in one day  As opposed to disconnected blocks of data analysis, then data collection, then probability, then statistical inference Leads to deeper understanding of concepts such as statistical significance, p-value, confidence Very powerful, easily generalized tool  Flexibility in choice of test statistic (e.g. medians, odds ratio)  Generalize to more than two groups AMATYC webinar April 2016

26 Implementation suggestions What about normal-based methods: why? Do not ignore them!  A common shape often arises for empirical randomization/sampling distributions Duh!  Students will see t-tests in other courses, research literature  Process of standardization has inherent value  Gain intuition through formulas AMATYC webinar April

Implementation suggestions What about normal-based methods: how? Introduce after students have gained experience with randomization-based methods As a prediction of the simulation results Focus on role of standard deviation of statistic (standard error)  Don’t do it all for them (you or technology) Start with tactile simulation (“I am that dot”) Applet still requires some thought like sample size or entering the observed statistic to find the p-value AMATYC webinar April

28 Implementation suggestions What about interval estimation? Two possible simulation-based approaches  Invert test Test “all” possible values of parameter, see which do not put observed result in tail Easy enough (but tedious) with one-proportion situation (sliders), but not as obvious how to do this with comparing two proportions  Estimate +/- margin-of-error Could estimate margin-of-error with simulated randomization distribution Rough confidence interval as statistic + 2×(SD of statistic) AMATYC webinar April

29 Implementation suggestions Can we introduce SBI gradually? One class period:  Use helper/hinderer activity to introduce concepts of statistical significance, p-value, could this have happened by random chance alone Two class periods:  Also use dolphin therapy activity to introduce inference for comparing two groups (chance = random assignment) Three class periods:  Also use sleep deprivation activity prior to two-sample t- tests Four class periods:  Also use an activity (perhaps draft lottery) to introduce inference for correlation (chance = drawing of numbers) AMATYC webinar April

Assessment suggestions Quick assessment of understanding of class activity  What did the cards represent?  What did shuffling and dealing the cards represent?  What implicit assumption about the two groups did the shuffling of cards represent?  What observational units were represented by the dots on the dotplot?  Why did we count the number of repetitions with 10 or more “successes” (that is, why 10 and why “or more”)? 30 AMATYC webinar April

31 Assessment suggestions Conceptual understanding of logic of inference  Interpret p-value in context: Probability of observed data, or more extreme, under randomness hypothesis, if null model is true  Summarize conclusion in context, and explain reasoning process  Jargon-free multiple choice questions on interpretation, effect of changing sample size, etc.  Ability to apply to new studies, scenarios Define null model, design simulation, draw conclusion More complicated scenarios (e.g., compare 3 groups), new statistics (e.g., relative risk) AMATYC webinar April

Assessment A graduate student is designing a research study. She is hoping to show that the results of an experiment are statistically significant. What type of p-value would she want to obtain?  The magnitude of a p-value has no impact on statistical significance.  A large p-value  A small p-value AMATYC webinar April

Assessment A study of the effectiveness of a nicotine lozenge for helping smokers to quit found that of 459 nicotine lozenge users, 46.0% successfully abstained for 6 weeks, compared to 29.7% of the 458 smokers in the control group. What will be the purpose of the simulation analysis?  To increase the sample size of the study  To estimate the difference in the treatment probabilities  To determine if the observed difference is unlikely to have happened by chance alone  To create a normal distribution  To create similar groups AMATYC webinar April

34 Assessment suggestions Multiple-choice example You want to investigate a claim that women are more likely than men to dream in color. You take a random sample of men and a random sample of women (in your community) and ask whether they dream in color, and compare the proportions of each gender that dream in color. AMATYC webinar April

35 Assessment suggestions Multiple-choice example If the difference in the proportions (who dream in color) between the two samples has a small p-value, which would be the best interpretation? A. It would not be very surprising to obtain the observed sample results if there is really no difference between the proportions of men and women in your community that dream in color. B. It would be very surprising to obtain the observed sample results if there is really no difference between the proportions of men and women in your community that dream in color. C. It would be very surprising to obtain the observed sample results if there is really a difference between the proportion of men and women in your community that dream in color. D. The probability is very small that there is no difference between the proportions of men and women in your community that dream in color. E. The probability is very small that there is a difference between the proportions of men and women in your community that dream in color. AMATYC webinar April

36 Assessment suggestions Multiple-choice example Suppose two more studies are conducted on this issue. Both studies find 30% of women sampled dream in color, compared to 20% of men. But Study C consists of 100 people of each sex, whereas Study D consists of 40 people of each gender. Which study would provide stronger evidence that there is a genuine difference between men and women on this issue? A. Study C B. Study D C. The strength of evidence would be the same for these two studies AMATYC webinar April

37 Conclusions Put core logic of inference at center of course  Normal-based methods obscure this logic  Develop students’ understanding with experiential simulation-based inference  Emphasize connections among Randomness in design of study Inference procedure Scope of conclusions AMATYC webinar April

Resources AMATYC webinar April

Resources AMATYC webinar April

Resources Simulation-based inference blog: ISI applets: Statkey app: lock5stat.com/statkey AMATYC webinar April

Thanks! Questions? AMATYC webinar April