Presentation on theme: "Trends and Updates in the Teaching of Inferential Statistics Yusuf K. Bilgic - Visiting Assistant Professor in Statistics 2012-2014,"— Presentation transcript:
Trends and Updates in the Teaching of Inferential Statistics Yusuf K. Bilgic - Visiting Assistant Professor in Statistics , SUNY-Geneseo Talk at RIT-2013, April, Mini-Conference-2
I wish Fisher knew it! ITALY’S highest court overturned the acquittal of Amanda Knox, accused of the 2007 murder of Meredith Kercher... Miscalculation (misinterpretations) by judges and lawyers of probabilities, from the odds of DNA matches to the chance of accidental death, have sent innocent people to jail, and, perhaps, let murderers walk free... (NYT, 3/27) See next slide for the stat fallacy..
Reasoning.. By the time Ms. Knox’s appeal was decided in 2011, however, techniques had advanced sufficiently to make a retest of the knife possible, and the prosecution asked the judge to have one done. But he refused. His reasoning? If the scientific community recognizes that a test on so small a sample cannot establish identity beyond a reasonable doubt, he explained, then neither could a second test on an even smaller sample…
Talk Outlines What is Inferential Statistics? History Logic in Hypothesis Testing and Interpretations of the P-value Trends in Teaching Hypothesis Testing Reform Movements Position of Hypothesis Testing in Current and Developing Curricula Conclusion
What is Inferential Statistics? Statistical inference is the process of drawing conclusions/decisions/estimates/.. from data that is subject to random variation. Big picture of statistical inference
History 1700s, first inferential statistics uses in astronomy and geodesy Pierre-Simon Laplace P-value: Formally introduced by Karl Pearson, 1914 Modern use of P-values and Null Hypothesis Testing, by Fisher, 1920s. Neyman-Pearson approach to hypothesis testing and debates with Fisher’s approach
The lady tasting tea A lady's claim (Muriel Bristol) able to distinguish by taste how tea is prepared (first adding the milk to the cup, then the tea, or first tea, then milk) She was sequentially presented with 8 cups: 4 prepared one way, 4 prepared the other, and asked to determine the preparation of each cup (knowing that there were 4 of each). In this case the null hypothesis was that she had no special ability, the test was Fisher's exact test. In the actual experiment, Bristol correctly classified all 8 cups. The p-value was C(8,4)=.014 so Fisher rejected the null hypothesis (consider the outcome highly unlikely to be due to chance.
Inferential Stat Timeline WhenWhat?So What?..Statistics/Probability First use of inferential statError probability calculations 1770Laplace’s CLT and inferenceExcess of boys compared to girls 1839ASA foundedDid you know …? 1914P-value introducedPearson used in Chi-squared dist 1920Modern P-value with FisherNull Hypothesis introduced Neyman-Pearson approach Null Hypothesis vs. Alternative Hypothesis introduced 1937 Neyman introduced the confidence interval In statistical testing 1980sMCMC/Gibbs/Applications with technology 2005GAISE ReportReform in Stat Ed 2010+Reform ProjectsConcrete Materials being developed …Con’t..
Logic in Hypothesis Testing and Interpretations of P-value Theory is proven, but statistical hypotheses are checked with data, knowing the limitations of data and chance factor in any set of results. Falsification: a hypothesis is testable by empirical experiment and thus conforms to the standards of scientific method. Null hypothesis–based signiﬁcance testing: Most common way in which scientiﬁc inferences are made
Different paradigms of statistical inference Fisher’s approach on inductive inference about a single hypothesis using pseudo-falsiﬁcation The Neyman–Pearson approach on future behavior based on a test using two complementary hypotheses, associated decision error rates, and a speciﬁed effect size The Bayesian approach on probabilities to measure the belief in a particular hypothesis warranted by evidence.
Three paradigms Fisher's approach does not involve any alternative hypothesis A shortcoming of the NP approach is that the in- the-long-run condition of such testing is a ﬁction relative to actual scientiﬁc inquiry and decision making. The Bayesian approach dominated statistical thinking before Fisher, Neyman, and Pearson but was pushed aside in the 1920s as being too subjective.
Interpretations of probabilities ‘a measure of evidence’: Fisher suggested the p-value as an informal measure of statistical evidence. ‘observed error rate’: Neyman dismissed the p-value as a measure of evidence and proposed the formal hypothesis test framework based on error rates These two methods on testing and interpretation of p- value are incompatible but mistakenly regarded as part of a single, coherent approach to statistical inference ‘degree of belief ’: In the Bayesian approach, p-value suggests plausibility: it informs an investigator so that his or her degree of belief in a hypothesis can be adjusted based on evidence
Trends in Teaching Hypothesis Testing Academic statistics vs. Practical statistics Dichotomous decision vs. Subjective decision Factors that shape teaching statistics: – Vibrant Statistics – Philosophical evolutions in stat – Educational updates: Cognitivism/Constructivism – High Speed Calculations – Subjectivity – Needs in ‘human/social-related inferences’ with complexity
Who is responsible? Me: Hey mom! You always force me to dichotomous decisions. Mom: Are you sure? Me: 100% I am sure you do this. Mom: You made it again. YKB
Shifts in teaching inferential statistics FromTo Single p-valueMany p-values, Simulations, Meta-analysis p-valueEstimate, CI, Power, Effect Size Theoretical probability factsNoisy facts, Data-driven facts Conventional wisdom‘It depends’ likelihood decisions Traditional parametric testsAlternatives (Nonparam, Bayesian, Bootstrap) Theoretical emphasis in data analysis Empirical, Applied, Concept-based, Randomization-based, Behaviorism-based teachingCognitivism-based teaching ObjectivitySubjectivity SingleInterdisciplinary inclusions NHST with p-valueSeeking alternatives/broader ways
Reform Movements Disagreements of Fisher, Pearson, and Neyman unresolved and imperfectly integrated into present-day applications GAISE Recommendations, the CATALYST Project, the Cause Organization, and Project MOSAIC GAISECATALYSTCauseMOSAIC – Radical changes in content and pedagogy – Simulation/empirical/randomization-based activities, re-samplings, no procedural framework
Position of Hypothesis Testing in Current and Developing Curricula Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist inference (the inference framework in which the well-established methodologies of statistical hypothesis testing and confidence intervals are based). New/Developing Common Cores Journals, APA Guidelines Alternatives to Hypothesis/Significance testing and/or p-value
Conﬁdence intervals -instead of hypothesis tests- whenever possible (Newman; Agresti&Franklin; Cumming; APA…) Arguments to replace hypothesis testing with presentations of conﬁdence limits are increasing as a consequence of the confusion surrounding ES, p-value, and error rates (Newman) Alternatives to p-values in testing
Conclusion / Comments / Q Since P values are not likely to soon disappear from the pages of medical journals or from the toolbox of statisticians, the challenge remains how to use them and still properly convey the strength of evidence provided by research data (L. Herd). Need work how to reflect current trends on undergrad teachings. I need partnerships to write an article on today’s topic. Please let me know at
Do you agree? Let’s argue it in the next conference... Bayarri and Berger (2004), ‘In a related vein, we avoid the question of what is “pedagogically correct. If pressed, we would probably argue that Bayesian statistics (with emphasis on objective Bayesian methodology) should be the type of statistics that is taught to the masses, with frequentist statistics being taught primarily to advanced statisticians.’
References Geoff Cumming, 2011, Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, July 14th 2011 by Routledge Academic. Michael C. Newman, 2008, ‘What exactly are you inferring?’ A closer look at hypothesis testing. Environmental Toxicology and Chemistry, Vol. 27, No. 5, pp. 1013–1019, 2008 Robert E. Kass, 2011 Statistical Inference: The Big Picture 1. Statistical Science, 2011, Vol. 26, No. 1, 1–9, DOI: /10-STS337, Institute of Mathematical Statistics, Svetlana Tishkovskaya, Gillian A. Lancaster. Statistical Education in the 21st Century: a Review of Challenges, Teaching Innovations and Strategies for Reform, Journal of Statistics Education Volume 20, Number 2 (2012), Lancaster University Goodman SN P values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. Am J Epidemiol 137:485–496. L. Leonhard Held. Biostatistician.