Presentation on theme: "Hypothesis Testing Part I – As a Diagnostic Test."— Presentation transcript:
Hypothesis Testing Part I – As a Diagnostic Test
This video is designed to accompany pages 95-116 in Making Sense of Uncertainty Activities for Teaching Statistical Reasoning Van-Griner Publishing Company
Diagnostic Tests Diagnostic tests, such as a field sobriety tests and home pregnancy test, have to choose between two outcomes. A “positive” outcome means the test has uncovered adequate evidence of what it is designed to find. A “negative” outcome means the test does not have adequate evidence of what it is designed to find.
Experimentation Many experiments in medicine, social science, education, etc. are designed to make similar bimodal choices. A “positive” outcome means the experiment has uncovered adequate evidence that the treatment is effective. A “negative” outcome means the experiment has not found adequate evidence that the treatment is effective.
Flibanserin Study From TIME: The Flibanserin findings are based on the study of 1,378 premenopausal women who had been in a monogamous relationship for 10 years on average. The women were randomly assigned to take 100 mg of Flibanserin or a placebo daily and to record daily whether they had sex and whether it was satisfying.
The Choice Flibanserin is no better than a Placebo or Flibanserin is better than a Placebo Treatment is Not Effective or Treatment is Effective Generally: In this case:
Hypothesis Testing The paradigm for deciding between “Treatment is Not Effective” and “Treatment is Effective” is an example of what statisticians call “hypothesis testing.” This diagnostic test is not a kit or a physical examination. Rather, it consists of a collection of mathematical steps.
No Surprise Can’t make this decision risk free. There are two potential mistakes: Experimental Results Truth Treatment Really is Ineffective Treatment Really is Effective Treatment IneffectiveTrue NegativeFalse Negative Treatment EffectiveFalse PositiveTrue Positive
Diagnostic Due Process Generally, sensitivity and specificity are used to evaluate how well a screening test performs. That evaluation informs our confidence in results produced by the test. Statistical science tends to focus on specificity for a similar role in hypothesis testing. This is partly because the sensitivity of most common hypothesis testing procedures is pretty good.
Typical Screening Scenario Based on Some Rule Actual Status Test Prediction NegativePositive NegativeAB PositiveCD Data from Test Subjects Truth from Gold Standard Compute Sensitivity and Specificity Apply the Test to a Real Person Get Yes/ No Result More likely to believe a “Yes” if the Specificity is high; a “No” if the Sensitivity is high. Evaluation of Test Performance Application of Test in Practice
Hypothesis Testing Analogy Data from Experimental Subjects A Truth is Hypothesized Compute False Positive Rate for Awkward Rule. Adopt Awkward Rule: “Based on the data from the experiment, say the treatment is effective.” This is like an automatic “YES” If FPR is small enough, accept the “YES” and conclude treatment is effective. Else: don’t trust the recommended “YES” and conclude that the treatment is not effective.
Statistical Significance If the estimated false positive rate (FPR) for deciding between “Treatment is Not Effective” and “Treatment is Effective” is low enough – typically less than 0.05 - the results of the experiment are said to be statistically significant.
Important Vocabulary Testing a hypothesis in the present context means choosing between H0: Treatment is Not Effective - the “null” hypothesis and HA: Treatment is Effective - the “alternative” hypothesis To make the choice, we have to compute an estimated false positive rate and compare it to 0.05. If the estimated FPR is smaller than 0.05, choose HA. Else, choose H0.
Also Known As … In hypothesis testing the estimated false positive rate is more commonly called a p-value. That stands for “probability value”.
Liberties Statistical science, particularly statistical inference, is a very complex endeavor. In this presentation we have purposely avoided discussing a few things, including: The distinction between two very different approaches to hypothesis testing due to Fisher and Neyman-Pearson. The difference between a p-value and a Type I error rate. The real and important distinction between “Accepting H0” and “Failing to Reject H0”. Your instructor may want to offer more details.
One-Sentence Reflection Statistical hypothesis testing amounts to a screening test that chooses between a null hypothesis and an alternative hypothesis based on the size of the estimated false positive rate.