# Requisite Knowledge for Teachers, Assessment Questions, Technology

## Presentation on theme: "Requisite Knowledge for Teachers, Assessment Questions, Technology"— Presentation transcript:

Requisite Knowledge for Teachers, Assessment Questions, Technology
Allan Rossman, Soma Roy, Beth Chance Dept of Statistics, Cal Poly – San Luis Obispo

Cal Poly STAT 217, 221 STAT 301 Algebra-based, wide variety of majors
Several instructors (Chance, McGaughey, Rossman, Roy) use simulation/randomization-based curriculum Curricular materials (ISI) developed with Tintle, Cobb, Swanson, VanderStoep (Wiley, to appear) STAT 301 Calculus-based, mostly Math and Stat majors Use simulation/randomization-based ISCAM materials (Chance and Rossman, 2nd ed.) JMM 2013

Question on faculty interviews
Most challenging topic to teach in Stat 101? “Right” answer: Sampling distributions Central to understanding statistical inference: What would happen with repeated random sampling? Very difficult cognitive step for students: Seeing proportion or average not as a number but as a (random) variable with a distribution Equally “right” answer: Randomization distributions Analogous to sampling distributions: What would happen with repeated random assignment? Many teachers have not studied JMM 2013

1. What is a randomization distribution?
Sleep deprivation study JMM 2013

Challenges in understanding this graph
What’s the variable? Difference in group means What are the observational units? Random assignments JMM 2013

What to look for in this graph?
Shape We really don’t care Only relevant to justify using a theoretical approximation (t) Center Only relevant to confirm that we correctly simulated under the null (0) JMM 2013

What to look for? (cont) Variability This we care about
This tells us what values of the statistic are typical, what values are unusual, what values are rare when the null model is true Seeing where observed value of statistic falls in the randomization (null) distribution determines p-value, strength of evidence JMM 2013

Exact randomization distribution
Distribution of differences in group means for all 352,716 possible random assignments Appropriate for calculus-based class JMM 2013

2. Why simulate? Study: 14 of 16 infants chose “nice” over “mean” toy
Two possible explanations Infants have genuine preference for nice toy Infants choose randomly Why simulate? To investigate what could have happened by chance alone (random choices), and so … To assess plausibility of “randomly choose” hypothesis JMM 2013

Why simulate? (cont) Non-trivial for students to understand, articulate motivation for simulation Especially understanding that simulation is conducted assuming null to be true, in order to assess plausibility Some students think that simulation means replicating the study, generating a larger sample Some students think any use of software (e.g., to calculate a t-statistic) constitutes a simulation JMM 2013

3. Four more examples A: B: C: D: JMM 2013

Four more examples (cont)
Identical structure Two binary variables, 2×2 table of counts Could apply two-proportion z-test, chi-square test But there’s a very important difference … JMM 2013

Four more examples (cont)
Very different uses of randomness A: Random assignment B: Independent random samples C: One random sample D: No randomness JMM 2013

Four more examples (cont)
Very different scope of conclusions to be drawn A: Cause/effect B: Generalize to two popns C: Generalize to one popn D: Rule out “random chance” explanation JMM 2013

Four more examples: Key question
Should inference method mimic the randomness in data collection? A: Randomization test B: Random sample/bootstrap with fixed margin C: Random sample/bootstrap with only total fixed D: ??? Permutation test I’ve answered yes for calc-based course, no for algebra-based course But I think instructors should be aware of issue JMM 2013

4. New example, three approaches
Halloween treat study: 148 chose candy, 135 chose toy Two-sided test of equal likeliness Three approaches Simulation-based approximation Normal-based approximation Exact binomial calculation JMM 2013

Simulation-based approximation
Produces various approximate p-values JMM 2013

Normal-based z-test Produces one approximate p-value: .4396 JMM 2013

Exact binomial p-value
Produces one exact answer: .4758 JMM 2013

Simulation vs. theory vs. exact
Expect algebra-based students to understand simulation and theory Maybe it’s enough to expect them to select, apply any relevant test procedure for one-proportion setting But expect calculus-based students to understand relationships among 3 approaches Notice that simulation “beats” theory here JMM 2013

5. How to estimate parameter?
Do not reject the value .5 for parameter in Halloween treat example What other potential values of parameter would not be rejected? Perform more tests, using either simulation or normal approximation or exact binomial Using some pre-specified significance level Define as plausible the values not rejected Confidence interval for parameter: values not rejected by test JMM 2013

Summary: What teachers need to know
Randomization distribution Motivation for simulation Common student misconceptions Connections among data collection, inference procedure, scope of conclusions Subtle distinctions among simulation methods Simulation vs. theory vs. exact Confidence interval as inversion of test; interval of plausible values JMM 2013

Assessment: Our Favorite Questions – on homework, quizzes, project reports, exams
Describe how to use simulation/randomization to calculate a p-value Interpret a p-value in the context of the study State an appropriate conclusion in the context of the study Note: You are welcome to use any or all of the example questions. If you do, we would be thrilled if you could share your results with us. Thanks! JMM 2013, San Diego

Describe how to use simulation to calculate a p-value
Example 1: “A research question of interest is whether financial incentives can improve performance. Alicia designed a study to test whether video game players are more likely to win on a certain video game when offered a \$5 incentive compared to when simply told to “do your best.” Forty subjects were randomly assigned to one of two groups, with one group being offered \$5 for a win and the other group simply being told to “do your best.” She collected the following data from her study: Explain in detail how you would conduct a simulation (say with pennies or dice or index cards) to obtain a p-value for this research question.” \$5 incentive “Do your best” Total Win 16 8 24 Lose 4 12 20 40 JMM 2013, San Diego

Describe how to use simulation to calculate a p-value(contd.)
What we look for in the answer: Simulation assumes null is true Simulation mimics the randomness of the study design (depending on course) Records the value of an appropriate statistic for many repetitions Compares value of observed statistic to simulated randomization distribution, to calculate the proportion of simulated values of statistic at least as extreme than the observed statistic JMM 2013, San Diego

Interpret a p-value Example 2: In an article published in Psychology of Music (2010), researchers reported the results of a study conducted to investigate the effects of “romantic lyrics on compliance with a courtship request.” … When a participant came in for the study, she was randomly assigned to listen to either a romantic song or a neutral song. After three minutes, she was greeted by a male “confederate” … who … asked for her phone number so that he could call her up to ask her out. Of the 44 women who listened to the romantic song, 23 gave their phone numbers, whereas of the 43 who listened to the neutral song, only 12 did. The p-value is computed to be Interpret this p-value in the context of the study. JMM 2013, San Diego

Interpret a p-value (contd.)
What we look for in the answer: Similar to what look for when describing simulation What the p-value is a probability of - four components Assuming the null hypothesis is true Randomness due to study design Compares observed statistic to simulated randomization distribution Direction (as or more extreme) All in context JMM 2013, San Diego

State an appropriate conclusion
Example 3: To investigate whether there is an association between happiness and income level, we will use data from the 2002 General Social Survey (GSS), cross-classifying a person’s perceived happiness with their family income level. The GSS is a survey of randomly selected U.S. adults who are not institutionalized. The p-value is found to be < State your conclusion in the context of the study. Comment on how they often confuse interpret and evaluate? Are these questions the same for both audiences (301 and 217)? JMM 2013, San Diego

State an appropriate conclusion (contd.)
What we check that the answer addresses, (where applicable) and includes appropriate justification for (S) Statistical significance (E) Estimation, i.e, statistical confidence (C) Causation (was random assignment used?) (G) Generalizability (whom does the sample represent?) All in context Student response: These were all great student responses, are you going to be able to give an idea of how many students can do this (and by when in the course)?? (Cheap answer would be these were A answers and we give 10-20% As???) JMM 2013, San Diego

More Examples JMM 2013, San Diego

Example 4 (from AP Statistics): New statistic = mean / median.
Give simulation results for values of mean / median, based on a normal population Give the observed value of mean / median for given sample data. Do the simulation results suggest that the underlying population is skewed to the right? Explain. JMM 2013, San Diego

Example 5: Present a graph of the null distribution and ask
At what number should the graph center? Why? Shade the region of the graph that represents the p-value. Calculate the approximate p-value. Which aspect of the graph tells you whether the study results are statistically significant: (a) Shape, (b) Center, or (c) Variability? Will tie this back to what Allan says? JMM 2013, San Diego

Context: Difference in proportions
For example, Do incentives work? …A national sample of 735 households was randomly selected, … 368 households were randomly assigned to receive a monetary incentive along with the advance letter, and the other 367 households were assigned to receive only the advance letter; 286 in the incentive and 245 in the no-incentive group responded to the telephone survey. Context: Difference in proportions Center makes sense? Why? Shade region denoting p-value. Are the results from the study statistically significant? How are you deciding? (a) Shape, (b) Center, or (c) Variability JMM 2013, San Diego

One graph centers at the hypothesized value
Example 6: Present 2+ graphs and ask which represents the null distribution, where One graph centers at the hypothesized value One graph centers at the observed statistic Perhaps include a third graph that centers at some commonly used null value (for e.g. 0.5 for proportions, and 0 for difference in means) which is not the correct null value for current study. JMM 2013, San Diego

For example, In playing Rock-Paper-Scissors against the instructor, 8 out of 42 students picked scissors. To test whether these data provide evidence that such players tend to pick scissors less often than would be expected by random chance, which graph (A, B, or C) is the appropriate null distribution? 14 (1/3 of 42) 21 (1/2 of 42) 8 (observed count) JMM 2013, San Diego

Example 7: Consider the following output.
Make up a research question for which this is plausible output. Clearly specify what the observational units and variable(s) in the study are. Do not use any of the contexts discussed in your lecture notes, labs, assignments, or any practice material. Goal of question: Can students correctly identify the setting? How many variables and what type of variables? Direction of alternative (one-sided vs. two-sided)? As Allan asked – can they correctly identify the setting (e.g., two proportions) Delete b) here? JMM 2013, San Diego

Multiple Choice Example 8: Randomized experiment
Sample size Mean score \$5 incentive 20 98 “do your best” 80 Example 8: Randomized experiment Could the difference in sample means (18) have occurred by chance alone? Describes a tactile (card shuffling) simulation process and gives the following graph based on 1000 repetitions JMM 2013, San Diego

The incentive is effective because the p-value is less than .05.
Example 8 (contd.) What does the histogram tell you about whether \$5 incentives are effective in improving performance on the video game? The incentive is not effective because the distribution of differences generated is centered at 0. The incentive is effective because distribution of differences generated is centered at 0. The incentive is not effective because the p-value is greater than .05. The incentive is effective because the p-value is less than .05. JMM 2013, San Diego

1. Explanation for the simulation process?
Example 8 contd. 1. Explanation for the simulation process? Allows to determine whether the normal distribution fits the data. Allows to compare actual result to what could have happened by chance if gamers’ performances were not affected by treatment Allows to determine the % of time the \$5 incentive strategy would outperform the “do your best” strategy for all possible scenarios. Allows to determine how many times she needs to replicate the experiment for valid results. JMM 2013, San Diego

Both (a) and (b) but not (c).
Example 8 (contd.) 2. Which of the following was used as a basis for simulating the data 1000 times? The \$5 incentive is more effective than verbal encouragement for improving performance. The \$5 incentive and verbal encouragement are equally effective at improving performance. Verbal encouragement is more effective than a \$5 incentive for improving performance. Both (a) and (b) but not (c). JMM 2013, San Diego

Example 8 contd. 3. Approximate p-value in this situation? Recall that the research question believes that the incentive improves performance.  divided by 100 instead of 1000  2-sided p-value instead of 1-sided  correct answer  plain wrong JMM 2013, San Diego

The p-value is the probability that a student wins on the video game.
Example 8 (contd.) 5. Which of the following is the appropriate interpretation of the p-value? The p-value is the probability that the \$5 incentive is not really helpful. The p-value is the probability that the \$5 incentive is really helpful. The p-value is the probability that she would get a result as extreme as she actually found, if the \$5 incentive is really not helpful. The p-value is the probability that a student wins on the video game. JMM 2013, San Diego

Example 9: You want to investigate a claim that women are more likely than men to dream in color. You take a random sample of men and a random sample of women (in your community) and ask whether they dream in color, and compare the proportions of each gender that dream in color. 1) If the difference in the proportions (who dream in color) between the two samples turns out not to be statistically significant, which of the following is the best conclusion to draw? (Circle one.) You have found strong evidence that there is no difference between the proportions of men and women in your community that dream in color. You have not found enough evidence to conclude that there is a difference between the proportions of men and women in your community that dream in color Because the result is not significant, the study does not support any conclusion JMM 2013, San Diego

Example 9 (contd.): 2) If the difference in the proportions (who dream in color) between the two samples does turn out to have a small p-value, which one of the following is the best interpretation? (Circle one.) It would not be very surprising to obtain the observed sample results if there is really no difference between men and women in your community It would be very surprising to obtain the observed sample results if there is really no difference between men and women. It would be very surprising to obtain the observed sample results if there is really a difference between men and women. The probability is very small that there is no difference between men and women in your community on this issue. The probability is very small that there is a difference between men and women in your community on this issue. JMM 2013, San Diego

Something went wrong with the analysis.
3) Suppose that the difference between the sample groups turns out not to be significant, even though your review of the research suggested that there really is a difference between men and women. Which conclusion is most reasonable? Something went wrong with the analysis. There must not be a difference after all. The sample size might have been too small to detect a difference even if there is one. 4) Suppose that two different studies are conducted on this issue. Study A finds that 40 of 100 women sampled dream in color, compared to 20 of 100 men. Study B finds that 35 of 100 women dream in color, compared to 25 of 100 men. Which study (A or B) provides stronger evidence that there is a difference between men and women on this issue? Study A Study B The strength of evidence would be similar for these two studies. JMM 2013, San Diego

The strength of evidence would be similar for these two studies.
5) Suppose that two more studies are conducted on this issue. Both studies find that 30% of women sampled dream in color, compared to 20% of men. But Study C consists of 100 people of each sex, whereas Study D consists of 40 people of each gender. Which study provides stronger evidence that there is a difference between men and women on this issue? Study C Study D The strength of evidence would be similar for these two studies. 6) If the difference in the proportions (who dream in color) between the two samples does turn out to be statistically significant, which of the following is a possible explanation for this result? Men and women in your community do not differ on this issue but there is a small probability that random chance alone led to the difference we observed between the two samples. Men and women in your community differ on this issue. Either (a) or (b) are possible explanations for this result. JMM 2013, San Diego

Men and women in your community differ on this issue.
7) Reconsider the previous question. Now think not about possible explanations but plausible (believable) explanations. If the difference in the proportions (who dream in color) between the two samples does turn out to be statistically significant, which of the following is the more plausible explanation for this result? Men and women in your community do not differ on this issue but there is a small chance that random sampling alone led to the difference we observed between the two groups. Men and women in your community differ on this issue. They are both equally plausible explanations for this result. Might get pretty strapped for time – can summarize what the goals were of the questions (and purpose of the distractors?). Maybe also emphasize for the least few we are hoping to gather assessment data across many, many institutions if they might be willing to try one of these questions embedded in an exam?? But try to steer clear of IRB questions  JMM 2013, San Diego

Technology Desired features
Transparency: lets student “visualize” what the simulation/randomization is doing Easy to use: Aids student learning, and does not become an obstacle in the path to learning statistics. Supervision useful, but not imperative. Easily available: Can be run on different platforms, and is affordable Provides choices to the user: For e.g. lets user pick the direction to look at to calculate p-value. Interactive: With pop-up message asking, “Are you sure that’s the direction?” or “Entry does not match table” JMM 2013, San Diego

Technology as a teaching tool
How much of the thinking do we want the students to do when they are using technology? JMM 2013, San Diego

Example: vs. JMM 2013, San Diego

Example: Will present this as initial exposure vs. follow-ups? Highlight choice of statistic – but not giving them too many? Talk about exact? Talk about the visual shuffling? Intermediate animation option? Ability to enter own data. JMM 2013, San Diego

Example: Direction of alternative? Count or proportion? More generic “standaridzed statistic” though does give t/z symbol in box. Maybe highlight that do get graph whereas most packages won’t have the accompanying graph? Could show the ANOVA or Chi-square with theoretical overlaid? JMM 2013, San Diego

That’s all folks! JMM 2013, San Diego