# What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon.

## Presentation on theme: "What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon."— Presentation transcript:

What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon Lane-Getaz St. Olaf College, Northfield, MN 55057 lanegeta@stolaf.edu

Objective What does statistics education research report about correct conceptions, difficulties and misconceptions people have with inferential reasoning? How might this be of help to statistical consultant dealing with clients? Background: To assess impact of methods on teaching inference, developed instrument to assess 14 known misconceptions and difficulties, added items to assess correct conceptions. Measurement: Reasoning about P-values and Statistical Significance (RPASS) scale reliability in this study is Cronbach’s alpha =.76 (37 items). Study: Compare Pretest and Posttest proportions of students answering each item correctly on a scatterplot (canoe plot). Discussion: Emphasize what students generally learn and what problems tend to persist. Sharon Lane-Getaz, lanegeta@stolaf.edu

Subjects and Setting Subjects (N = 138) from two introductory-level statistics courses aimed at the social sciences (n 1 = 78) and natural sciences (n 2 = 60). 138 out of 167 enrolled students completed the Pre- and Posttest, and consented to participate (83% response) (94) females, (43) males, (1) no response (34) first years, (56) sophomores, (30) juniors, (18) seniors. Setting: Small liberal arts college (3000 students) in the upper Midwest US, a small town of “cows, colleges and contentment” Time: Spring semester 2011. Sharon Lane-Getaz, lanegeta@stolaf.edu

Broad range of results with two courses combined: RPASS-9 Pretests and Posttest Totals Sharon Lane-Getaz, lanegeta@stolaf.edu

Pre- and Posttest Totals Gains by Course

Aggregate Results for Both Courses (N = 138) 70% of 37 RPASS-9 Posttest items correct, on average. Five more Posttest items correct, on average:  RPASS-9 Posttest (Mean = 26.1, SD = 5.1)  RPASS-9 Pretest (Mean = 21.0, SD = 4.2) What did students learn, by item, … and what did they not learn? Sharon Lane-Getaz, lanegeta@stolaf.edu

Item-Level Analysis (Canoe Plot) Canoe Plot of item-level changes in proportion correct  Scatterplot of Pretest to Posttest proportions by item  95% confidence band along p posttest = p pretest differentiates items with a significant difference in proportions answering correctly from items with insignificant differences (Posttest – Pretest).  Wilson adjusted margins of error: maintains a 95% nominal rate (Agresti & Caffo, 2000).  No family-wise correction, intended for descriptive purposes. Sharon Lane-Getaz, lanegeta@stolaf.edu

Proportion Correct Responses by RPASS-9 item Pretest on x, Posttest on y (37 items, N = 138) 23 items above the 95% confidence band, 13 within, and 1 below

Improved 14 Correct Conceptions of the 23 Items “Above the Band” Improved Statistical Literacy: Recognize textbook definitions of p-value (1-1, 6-1) Link p-value to sampling variation (2-1) Understand p-value as a rareness measure (3a-2) Improved Inferential Reasoning: Assess significance graphically (3b-1) Reason about variation (3c-2) Assess impact of alternative hypothesis on p-value (1-3, 4b-1) Differentiate small p-values, Type I and II errors (6-2, 6-7) Reason about sample size impact on p-value (6-4) Reason about strength of evidence vs. p-value (2-2, 4a-1, 6-3) Sharon Lane-Getaz, lanegeta@stolaf.edu (5) Green items indicate p c <.50 on Pretest

Improved (Suppressed) 9 Misconceptions of the 23 items “Above the Band” State conclusions within confines of scope of inference: Need random sample to generalize sample to population (5-4) Need random assignment to draw causal conclusion (4a-3). Interpret what a P-value is NOT: Always small or always desired to be low value (3a-3, 3b-3) Probability the Null Hypothesis is false or true (5-1, 5-2) Alpha or significance level (4a-1) Interpret that a small P-value does NOT mean: Chance caused results observed (2-4) Provides definitive, contrapositive proof (3a-1) Sharon Lane-Getaz, lanegeta@stolaf.edu (3) Red items indicate p c <.50 on Pretest

No Improvement: “Within the Band” Correct Conceptions (C) Reason about variation in boxplot depiction (3c-1) C Making correct rejection decision (4b-3) C Recognize an informal definition of p-value (1-2) C Recognize p-value as a conditional probability (2-3) C Use Confidence Intervals for statistical significance (2-5) C Differentiate p-values from effects (4a-2) C Interpret large p-value (4b-2) C Consider impact of sample size on p-values (4b-4, 6-4) C Sharon Lane-Getaz, lanegeta@stolaf.edu Green indicates p c <.50 on Pretest

No Improvement: “Within the Band” Misconceptions (M) or Multiple Choice Items Belief increased replications = increased sample size (4b-6) M Belief p-values always low or desired to be low (3b-2) M Differentiate statistical vs. practical significance (4b-5, 6-5) C/M Check conditions before making an inference (6-6) C/M Sharon Lane-Getaz, lanegeta@stolaf.edu Red indicates p c <.50 on Pretest

The One item “Below the Band” Unlearning, Guessing, Confusion? Responses for one item suggest better reasoning on the Pretest than on the Posttest (just below the 95% confidence band): When asked to choose correct direction to shade the p-value in the sampling distribution of means (3b-4) Students tend to select shade “to the right;” even though the alternative hypothesis suggests that one should shade the larger left tail. Sharon Lane-Getaz, lanegeta@stolaf.edu

Remind clients of caveats and limitations of the statistical inference process. P-value is an integrated part of the larger statistical process Logic of inference (how we interpret results) depends on sample size, relates to effect size and importance, and whether conditions were met. Scope of inference (what we can conclude) depends on randomness in study design; how the data were gathered Confidence interval (CI) estimates population parameters or true effects, given the sample we observed…and Provides complementary information than p-values do alone (bounds for the effect). Can assess statistical significance. For example, point out whether a null hypothesis is in the interval or not. Is zero in the interval? Is the interval all positive or all negative? Sharon Lane-Getaz, lanegeta@stolaf.edu

Students in a randomization-based curriculum learn more on average, but ironically show no improvement on 5 items associated with the randomization distribution: How one- or two-tailed test relates to p-value (4b-2) M Correct rejection decision (4b-3) C Impact of sample size on significance (4b-4) M Significance vs. practical importance (4b-5) Impact of increasing sample size vs. replications (4b-6) M Sharon Lane-Getaz, lanegeta@stolaf.edu A Surprise Aside

References Agresti, A, & Caffo, B. (2000), Simple and Effective Confidence Intervals for Proportions and Differences of Proportions result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288. Chance, B. L., & Rossman, A. J. (2006), Investigating Statistical Concepts, Applications, and Methods, Belmont, CA: Brooks/Cole – Thomson Learning. Cobb, G. (2007), The Introductory Statistics Course: A Ptolemaic Curriculum?. Technology Innovations in Statistics Education, 1,(1). http://repositories.cdlib.org/uclastat/cts/tise/http://repositories.cdlib.org/uclastat/cts/tise/ Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170-180. delMas, R. C., Garfield, J. B., Ooms, A., & Chance, B. (2007), Assessing Students’ Conceptual Understanding after a First Course in Statistics. Statistics Education Research Journal [online], (6)2, 28-58. http://www.stat.auckland.ac.nz/serjhttp://www.stat.auckland.ac.nz/serj Lane-Getaz, S. J. (2013). Development of a Reliable Measure of Students’ Inferential Reasoning Ability. Statistics Education Research Journal (SERJ), 12(1), 20-47. http://iase-web.org/documents/SERJ/SERJ12(1)_LaneGetaz.pdf Lane-Getaz, S. J. (2007). Toward the Development and Validation of the Reasoning about P- values and Statistical Significance Scale. In B. Phillips & L. Weldon (Eds.), Proceedings of the ISI / IASE Satellite Conference on Assessing Student Learning in Statistics, Voorburg, The Netherlands: ISI. http://www.stat.auckland.ac.nz/~iase/publications/sat07/Lane- Getaz.pdfhttp://www.stat.auckland.ac.nz/~iase/publications/sat07/Lane- Getaz.pdf Utts, J. (2003). What Educated Citizens Should Know about Statistics and Probability. The American Statistician, 57(2), 74-79. Sharon Lane-Getaz, lanegeta@stolaf.edu

Contact Information & Slides Sharon Lane-Getaz, lanegeta@stolaf.edulanegeta@stolaf.edu On sabbatical this coming year and would love to collaborate with YOU to administer the RPASS at your institution! Let’s talk! These JSM-2014 presentation slides will be available from: http://sharonlanegetaz.efoliomn.com/http://sharonlanegetaz.efoliomn.com/JSM2014 The differences in proportions by item appear in the Appendix of this presentation. Please see the proceedings for more!

RPASS-9 item Concept or difficulty assessed p 2 -p 1 6-1 Selects a textbook definition of a p-value given multiple choices..41 3b-1 Uses a density curve and an observed value to estimate if the observed value (or more extreme) is statistically significant..36 5-3 Reasons smaller p-value, stronger the evidence of a difference or effect..36 4a-1 Confuses p-value with significance level ..35 2-1 Recognizes p-value in terms of variation in a sampling distribution..33 1-3 Understands magnitude of p-value depends if test is one- or two-sided..30 4a-2 Reasons greater evidence of a difference or effect, smaller the p-value..27 2-2 Understands stronger evidence of difference or effect, smaller p-value..23 3c-2 Employs graphical reasoning about variation.23 6-2 Understands a small p-value suggests results are statistically significant..23 2-4 Believes the p-value is the probability observed results are due to chance or caused by chance, if the null is true..22 3a-1 Believes statistics provide definitive proof; misuses the deterministic Boolean logic of contrapositive proof..19 Table 1. Proportion Correct on RPASS-9 Posttest item exceeds Pretest Proportion Correct (12 of 23 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz, lanegeta@stolaf.edu

RPASS-9 item Concept or difficulty assessed p 2 -p 1 4b-1 Interprets a p-value for a one-tailed hypothesis..18 5-1 Misinterprets a p-value as the probability the null hypothesis is false..17 5-2 Believes p-value is the probability that the alternative hypothesis is true..17 6-3 Understands stronger evidence of difference or effect, smaller p-value..17 6-4 Reasons about impact of a small sample size on statistical significance..16 3a-2 Understands the p-value as a rareness measure..14 4a-3 Believes causal conclusion can be drawn from small p-values regardless of study design..14 1-1 Recognizes a formal textbook definition of the p-value without context..13 3b-3 Believes p-value is always a low number (or always desired to be a low)..13 3a-3 Belief p-values are always a low value or are always desired to be a low value.12 6-7 Differentiates between concepts of Type I and Type II error..12 Table 1 contd. Proportion Correct on RPASS-9 Posttest exceeds Pretest Proportion Correct (11 of 23 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz, lanegeta@stolaf.edu

RPASS-9 item Concept or difficulty assessed p 2 -p 1 6-5 Understands small p-value does not mean practical importance..08 3b-2 Believes p-value is always a low number (or desired to be low)..07 4b-4 Relationship between sample size and p-value.07 2-3 Understands p-value is conditioned on the null hypothesis being true..06 2-5 Confidence intervals can assess statistical significance, much like p-values are used when hypothesis testing.06 4b-5 Differentiates statistical sand practical significance.03 4b-2 Difficulty with one versus two-tailed p-value.01 3c-1 Employs graphical reasoning about variation 0 4b-3 Understands the rejection decision 0 5-4 Confuses if statistical significance refers to a sample or a population. -.05 4b-6 Understands impact of increasing number of replications in a simulation versus the impact of increasing the sample size. -.06 6-6 Understands to conduct a significance test, conditions must be met. -.06 1-2 Recognizes an informal description of the p-value embedded in context. -.07 Table 2: Equal Proportion of Students Answer RPASS-9 Item Correctly On Posttest and Pretest (13 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz, lanegeta@stolaf.edu

Download ppt "What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon."

Similar presentations