Using Simulations to Teach Statistical Inference Beth Chance, Allan Rossman (Cal Poly) ICTCM 20111.

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

Chapter 7 Hypothesis Testing
An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Using Simulation/Randomization to Introduce p-value in Week 1 Soma Roy Department of Statistics, Cal Poly, San Luis Obispo ICOTS9, Flagstaff – Arizona.
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
Intro stat should not be like drinking water through a fire hose Kirk Steinhorst Professor of Statistics University of Idaho.
Stat 301 – Day 14 Review. Previously Instead of sampling from a process  Each trick or treater makes a “random” choice of what item to select; Sarah.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Stat 217 – Day 27 Chi-square tests (Topic 25). The Plan Exam 2 returned at end of class today  Mean.80 (36/45)  Solutions with commentary online  Discuss.
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
Stat 512 – Day 8 Tests of Significance (Ch. 6). Last Time Use random sampling to eliminate sampling errors Use caution to reduce nonsampling errors Use.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Stat 217 – Day 20 Comparing Two Proportions The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
More About Significance Tests
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Significance Tests: THE BASICS Could it happen by chance alone?
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
Sampling Variability Sampling Distributions
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
9.1 – The Basics Ch 9 – Testing a Claim. Jack’s a candidate for mayor against 1 other person, so he must gain at least 50% of the votes. Based on a poll.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Implementing a Randomization-Based Curriculum for Introductory Statistics Robin H. Lock, Burry Professor of Statistics St. Lawrence University Breakout.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
CHAPTER 9 Testing a Claim
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
The Practice of Statistics, 5 th Edition1 Check your pulse! Count your pulse for 15 seconds. Multiply by 4 to get your pulse rate for a minute. Write that.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
CHAPTER 10 Comparing Two Populations or Groups
Understanding Sampling Distributions: Statistics as Random Variables
What Is a Test of Significance?
Unit 5: Hypothesis Testing
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
CHAPTER 10 Comparing Two Populations or Groups
Significance Tests: The Basics
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Using Simulations to Teach Statistical Inference Beth Chance, Allan Rossman (Cal Poly) ICTCM 20111

Joint Work with Soma Roy, Karen McGaughey (Cal Poly),  Alex Herrington (Cal Poly undergrad) John Holcomb (Cleveland State), George Cobb (Mt. Holyoke), Nathan Tintle, Jill VanderStoep, Todd Swanson (Hope College) This project has been supported by the National Science Foundation, DUE/CCLI # ICTCM

Outline Motivation/Goals Examples  Binomial process, randomized experiment- binary, randomized experiment - quantitative response  Series of lab assignments  Discussion points Student feedback, Evaluation results Design principles & implementation Observations, Open questions ICTCM

Motivation Cobb (2007) – 12 reasons to teach permutation tests…  Model is “simple and easily grasped”  Matches production process, links data production and inference  Role for tactile and computer simulations  Easily extendible to other designs (e.g., blocking)  Fisherian logic --”The Introductory Statistics Course: A Ptolemaic Curriculum” (TISE) ICTCM

Goals Develop an introductory curriculum that focuses on randomization-based approach to inference  vs. using simulation to teach traditional inference  From beginning of course, permeate all topics Improve understanding of inference and statistical process in general  More modern (computer intensive) and flexible approach to inferential analysis ICTCM

Brief overview of labs Case-study focus Pre-lab  Background, Review questions submitted in advance 50-minute (computer) lab period Online instructions  Directed questions following statistical process  Embedded applets or statistical software Application/Extension Lab report with partner ICTCM

Example 1Example 1: Friend or Foe (Helper/Hinderer) Videos Research question Pre-lab Descriptive analysis Introduction of null hypothesis, p-value terminology Plausible values Conclusions ICTCM

Discussion Points Can this be done on day one?  Yes if can motivate the simulation Loaded dice Before reveal the data? ICTCM

> How many infants would need to choose the helper toy for you to be convinced the choice was not made “at random,” but they actually prefer the helper toy? Many students can reason inferentially  “If a choice is made at complete random, then having 13 infants would be highly unlikely”  “Based on the coin flipping experiment, the results stated that at/over 12 was extremely rare. Therefore, at least 12 infants …  “Would be around because it seems highly unlikely that given a option would choose the helper toy” ICTCM

> How many infants would need to choose the helper toy for you to be convinced the choice was not made “at random,” but they actually prefer the helper toy? But maybe not as well “distributionally”  Is it unusual? = “barely over half” vs. unusual compared to distribution Examine language carefully  “Unlikely that choice is random”  “Prove”  “Simulate”, “Repeated this study”  “At random” = 50/50, “model” “Random” = anything is possible ICTCM

Discussion Points Can this be done on day one?  Yes if can motivate the simulation Loaded dice Before reveal the data? Enough understanding of “chance model”? Use of class data instead? (“observed” vs. research study)  Yes, if return to and build on the ideas throughout the course So what comes next? ICTCM

Discussion Points Tactile simulation  One coin 16 times vs. 16 coins Population vs process  Defining the parameter 3Ss: statistic, simulate, strength of evidence  “could have been” distribution of data  “what if the null was true” distribution of statistic Fill in the blank wording Timing of final report  Follow-up in-class discussion ICTCM

Example 2Example 2: Two Proportions Is Yawning Contagious?  Modelling entire process: data collection, descriptive statistics, inferential analysis, conclusions  Parallelisms to first example  Could random assignment alone produce a difference in the group proportions at least this extreme?  Card shuffling, recreate two-way table  Extend to own data Extend to own data ICTCM

Lab Instructions ICTCM

Exam Questions Horizontal axis Shade p-value Make up a research question ICTCM

Discussion Points Starting with a significant result but when ready to discuss insignificant? How critical is authentic data? Choice of statistic (count vs. difference in proportion) Role of traditional symbols and notation? Visualization of bar graphs from trial to trial Implementation of predict and test ICTCM

Example 3Example 3: Two means Are there lingering effects to sleep deprivation?  Randomized experiment  Quantitative data  Parallel inferential reasoning process Index cards Possible follow-up/extensions: what if -4.33?, medians, plausible values ICTCM

Discussion Points Role of tactile simulation Scaffolding of lab report  Introductory sentences, labeling of graphs  Write conclusion to journal When should “normal-based” methods be introduced  Alternative approximation to simulation  Position, method for confidence intervals Choice of technology  Advantages/Disadvantages Applets, Minitab, R, Fathom ICTCM

Post-Lab Assessment (Fall 2010) Following the lab comparing two groups on a quantitative variable (65 responses)  Discuss the purpose of the simulation process  What information does the simulation process reveal to help you answer the research question? Essentially correct: 35.4% demonstrated understanding of the big picture (looking at repeated shuffles to assess whether the observed results happened by chance) Partially: 38.5% (one of null or comparison) Incorrect: 26.1% (“better understand the data”) ICTCM

Post-Lab Assessment (Fall 2010) Did students address the null hypothesis?  33.9% E/ 38.5% P/ 27.7% I Did students reference the random assignment?  36.9% E/ 36.9% P/ 26.2% I Did students focus on comparing the observed result?  64.6% E/ 13.8% P/ 21.5% I Did students explain how they would link the pieces together and draw their conclusion?  24.6% E/ 60% P/ 15% I ICTCM

Student Surveys ICTCM

Student Surveys ICTCM

Student Surveys Example 3 simulation ICTCM

Student Surveys ICTCM

Student Surveys ICTCM

Student Surveys Helper/Hinderer (Winter 2011) – Did the lab help you understand the overall process of a statistical investigation? ICTCM

Student Surveys Did subsequent labs increase understanding? ICTCM

Remainder of labs Lab 4: Random babies Lab 5: Reese’s Pieces (demo)demo  Normal approximation, CLT for binary  Transition to formal test of significance (6 steps) Lab 6: Sleepless nights (finite population)  t approximation, CLT for quantitative, conf interval Lab 7: Simulation of matched-pairs Lab 8: Simulation of regression sampling Chi-square, ANOVA ICTCM

Lab Report ICTCM

Student Feedback (Winter 2011) Google docs survey during last week of course Two instructors ICTCM

Student end-of-course surveys (W 11) ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Student end-of-course surveys ICTCM

Top 2 most interesting labs Instructor A  Is Yawning Contagious?  Heart Rates (matched pairs) Instructor B  Friend or Foe  Is Yawning Contagious?  Reese’s Pieces ICTCM

Top 2 most/least helpful labs Most helpful:  Friend or Foe Least Helpful (Instructor B):  Random babies  Melting away (intro two-sample t, paired) ICTCM

Exam 1 In a recent Gallup survey of 500 randomly selected US adult Republicans, 390 said they believe their congressional representative should vote to repeal the Healthcare Law. Suppose we wish to determine if significantly more than three-quarters (75%) of US adult Republicans favor repeal. The coin tossing simulation applet was used to generate the following two dotplots (A) and (B). Which, if either, of the two plots (A) and (B) was created using the correct procedure? Explain how you know. ICTCM

Exam 1 35% picked B (usually citing null.75  500)  But some look at shape, or later p-value 29% picked A (observed result) 23% neither (wanted.5  500 = 250) 13% other responses: 0,.75, 50, can’t tell, anything possible, label is wrong ICTCM

Exam 2 Heights of females are known to follow a normal distribution with a mean of 64 inches and a standard deviation of 3 inches. Consider the behavior of sample means. Each of the graphs below depicts the behavior of the sample mean heights of females. a. One graph shows the distribution of sample means for many, many samples of size 10. The other graph shows the distribution of sample means for many, many samples of size 50. Which graph goes with which sample size? ICTCM

Exam 2 85% matched n=10 and n = 50 ICTCM

Exam 2 Suppose we wish to test the following hypotheses about the population of Cal Poly undergraduate women: For which graph (A or B) would you expect the p-value to be smaller? Explain using the appropriate statistical reasoning. ICTCM

Exam 2 77% picked B  Mixture of appealing to smaller SD/outliers, larger sample size means smaller p-value, and thinking in terms of test statistic  A few choices not internally consistent ICTCM

Student understanding of p-value CAOS questions (final exam)  Statistically significant results correspond to small p-values Traditional (National/Hope/CP): 69/86/41% Randomization (Hope/CP): 95%/95%  Recognize valid p-value interpretation Traditional (National/Hope/CP): 57/41/74% Randomization (Hope/CP): 60/72%  p-value as probability of Ho - Invalid Traditional (National/Hope/CP): 59/69/68% Randomization (Hope/CP): 80%/89% ICTCM

Student understanding of p-value CAOS questions (final exam)  p-value as probability of Ha – Invalid Traditional (National/Hope/CP): 54/48/72% Randomization (Hope/CP): 45/67%  Recognize a simulation approach to evaluate significance (simulate with no preference vs. repeating the experiment) Traditional (National/Hope/CP): 20/20/30% Randomization (Hope/CP): 32%/40% ICTCM

Student understanding of p-value p-value interpretation in regression (final exam) ICTCM

Student understanding of process Video game question (Final exam: NCSU, Hope, Cal Poly, UCLA, Rhodes College)  What is the explanation for the process the student followed?  Which of the following was used as a basis for simulating the data 1000 times?  What does the histogram tell you about whether $5 incentives are effective in improving performance on the video game?  Which of the following could be the approximate p-value in this situation? ICTCM

Student understanding of process Simulation process  Fall: over 40% chose “This process allows her to determine how many times she needs to replicate the experiment for valid results.”  About 70% pick “The $5 incentive and verbal encouragement are equally effective at improving performance.” as underlying assumption  Still evidence some look at center at zero or shape as evidence of no treatment effect  1/3 to ½ could estimate p-value from graph ICTCM

Example – 2009 AP Statistics Exam A consumer organization would like a method for measuring the skewness of the data. One possible statistic for measuring skewness is the ratio mean/median….  Calculate statistic for sample data…  Draw conclusion from simulated data … 59 ICTCM 2011

Design Principles Tactile simulation Visual, contextual animation of tactile simulation Intermediate animation capability Level of student construction  Ease of changing inputs  Connect elements between graphs Carefully designed, spiraling activities  “Stop!”  Thought questions Allow for student exploration ICTCM

Implementation Early in course Repetition through course, connections Normal approximations Lab assignments  Focus on entire statistical process  Motivating research question  Follow-up application  Thought questions  Screen captures  Pre-lab questions  Minitab demos (Adobe Captivate)demos Exam questions ICTCM

Observations Students quickly get sense of trying to determine whether a result could be “just due to chance” Still struggle with more technical understanding  Under the null hypothesis  Observed vs. hypothesized value Students may fail to see connections between scenarios ICTCM

Suggestions/Open Questions Begin with class discussion/brain-storming on how to evaluate data before show class results  Loaded dice, biased coin tossing  Thought questions Student data vs. genuine research article  “the result” vs. “your result” Choice of first exposure  Significant?  Random sampling or random assignment ICTCM

Suggestions/Open Questions Scaffolding  Observational units, variable How would you add one more dot to graph?  At some point, require students to enter the correct “observed result” (e.g., Captivate)  At some point, ask students to design the simulation?  Start with fill in the blank interpretation? ICTCM

Suggestions/Open Questions One crank or more? When connect to normal approximations?  How make sure traditional methods don’t overtake once they are introduced?  How much discuss exact methods? Confidence intervals ICTCM

Summary Very promising but also need to be very careful, and need a strong cycle of repetition closely tied to rest of course… ICTCM