# 1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Roger J. Lewis,

## Presentation on theme: "1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Roger J. Lewis,"— Presentation transcript:

1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Roger J. Lewis, MD, PhD Department of Emergency Medicine Harbor-UCLA Medical Center and Jeff Jones, Grand Rapids MERC / MSU Program in Emergency Medicine EPI-546 Block I

2 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Today’s Topics Classical Hypothesis Testing Classical Hypothesis Testing Type I Error Type I Error Type II Error, Power, Sample Size Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Point Estimates and Confidence Intervals Multiple Comparisons Multiple Comparisons

3 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: Steps 1. Define the null hypothesis 2. Define the alternative hypothesis 3. Calculate a p value 4. Accept or reject the null hypothesis based on the p value 5. If the null hypothesis is rejected, then accept the alternative hypothesis

4 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: The Null Hypotheses: no difference between the two groups to be compared The Null Hypotheses: no difference between the two groups to be compared

5 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: The Alternative Hypothesis: there is a difference between the two groups to be compared The Alternative Hypothesis: there is a difference between the two groups to be compared

6 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: Defining the Alternative Hypothesis The size of the expected difference should be defined prior to data collection (a priori) The size of the expected difference should be defined prior to data collection (a priori) The difference defined by the alternative hypothesis should be clinically significant The difference defined by the alternative hypothesis should be clinically significant Example: Difference in Pain Score on 100mm VAS of 13mm or greater Example: Difference in Pain Score on 100mm VAS of 13mm or greater

7 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: The p value: probability of obtaining the results observed, if the null hypothesis were true The p value: probability of obtaining the results observed, if the null hypothesis were true

8 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: p value If p = 0.01, then the chance of obtaining the same results as the experiment is 1% If p = 0.01, then the chance of obtaining the same results as the experiment is 1% Very unlikely due to chance! Very unlikely due to chance! So we reject the null hypothesis So we reject the null hypothesis

9 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: p value If p = 0.01, then the chance of obtaining the same results as the experiment is 1% If p = 0.01, then the chance of obtaining the same results as the experiment is 1% Very unlikely due to chance! Very unlikely due to chance! So we reject the null hypothesis So we reject the null hypothesis If p = 0.7, then the chance of obtaining the same results as the experiment is 70% If p = 0.7, then the chance of obtaining the same results as the experiment is 70% accept the null hypothesis accept the null hypothesis

10 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: Rejecting the Null Hypothesis The cut-point for rejecting the null hypothesis is arbitrary (  ) The cut-point for rejecting the null hypothesis is arbitrary (  ) Typically,  = 0.05 Typically,  = 0.05 If the null hypothesis is rejected, then the alternative hypothesis is accepted as true If the null hypothesis is rejected, then the alternative hypothesis is accepted as true

11 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Clinical Trial (statistical testing)Jury Trial (criminal law) Assume the null hypothesisPresume innocent Goal: detect a true difference Goal: convict the guilty (reject the null hypothesis) “Level of significance” “Beyond reasonable p <.05doubt” Requires: adequate sample sizeconvincing testimony

12 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Similar to a Trial by Jury….. There are only 1 of 4 possible outcomes of a Clinical Trial: There are only 1 of 4 possible outcomes of a Clinical Trial: 2 are correct: TP, TN 2 are correct: TP, TN 2 are errors: FP, FN 2 are errors: FP, FN

13 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. TRUTH GuiltyInnocent SIGNF. REJECT Ho (P < 0.05) ACCEPT Ho (P > 0.05) TP FP FNTN TEST

14 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Clinical Trial (statistical testing) Jury Trial (criminal law) Appropriately Correct verdict: reject the null hypothesis (TP)convict a guilty person Appropriately Correct verdict: accept the null hypothesis(TN)acquit the innocent

15 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Clinical Trial (statistical testing) Jury Trial (criminal law) Correct inference:Correct verdict: reject the null hypothesisconvict a guilty person Correct inference:Correct verdict: accept the null hypothesisacquit the innocent Incorrect inference (FP)Incorrect verdict: Type I error hang innocent person Incorrect inference (FN)Incorrect verdict: Type II error guilty skates free

16 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Errors TRUTH GuiltyInnocent SIGNF. REJECT Ho (P < 0.05) ACCEPT Ho (P > 0.05) TP FP FNTN Type I (alpha) Type II (Beta) TEST

17 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: Type II Error A false-negative result A false-negative result p value >.05 is obtained, yet the two groups are different p value >.05 is obtained, yet the two groups are different The risk of a type II error =  The risk of a type II error = 

18 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Type II Error Although trend toward benefit, p value >.05 Although trend toward benefit, p value >.05 Null hypothesis accepted Null hypothesis accepted Truth: larger study demonstrated that the two groups were actually different Truth: larger study demonstrated that the two groups were actually different Committed a Type II Error Committed a Type II Error Typical pilot study has low power to detect a difference Typical pilot study has low power to detect a difference

19 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Classical Hypothesis Testing: Power Power = 1 -  Power = 1 -  If Power 80%: If Power 80%: 80% probability of detecting a true difference if it exists 80% probability of detecting a true difference if it exists Power is determined by sample size, the magnitude of the difference sought, and by  Power is determined by sample size, the magnitude of the difference sought, and by  Pilot study had small sample size, therefore “low” power Pilot study had small sample size, therefore “low” power

20 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Steps in Sample Size Determination 1. Define the type of data (continuous, ordinal, categorical, etc.)

21 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. A Few Examples of Statistical Tests TestComparisonPrincipal Assumptions Student's t test Means of two groups Continuous variable, normally distributed, equal variance Wilcoxon rank sum Medians of two groups Continuous variable Chi-squareProportions Categorical variable, more than 5 patients in any particular "cell" Fisher's exact ProportionsCategorical variable

22 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Steps in Sample Size Determination 1. Define the type of data (continuous, ordinal, categorical, etc.) 2. Define the size of the difference sought 3. Define  (usually 0.05) 4. Determine power desired (often 0.80) 5. Look up the sample size: tables, formulas or software

23 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Today’s Topics Classical Hypothesis Testing Classical Hypothesis Testing Type I Error Type I Error Type II Error, Power, Sample Size Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Point Estimates and Confidence Intervals Multiple Comparisons Multiple Comparisons

24 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Limitations of the p Value p < 0.05 tells us that the observed treatment difference is “statistically significantly” different p < 0.05 tells us that the observed treatment difference is “statistically significantly” different p < 0.05 does not tell us: p < 0.05 does not tell us: The uncertainty around the point estimate The uncertainty around the point estimate The likelihood that the true treatment effect is clinically important The likelihood that the true treatment effect is clinically important

25 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Confidence Intervals: Example Purpose: to compare the effects of vasopressor A (V A ) and vasopressor B (V B ) based on post- treatment SBP in hypotensive patients Purpose: to compare the effects of vasopressor A (V A ) and vasopressor B (V B ) based on post- treatment SBP in hypotensive patients Endpoint: post-treatment SBP Endpoint: post-treatment SBP Null hypothesis: mean SBP A = mean SBP B Null hypothesis: mean SBP A = mean SBP B Results:mean SBP A = 70 mm Hg (after V A ) Results:mean SBP A = 70 mm Hg (after V A ) mean SBP B = 95 mm Hg (after V B ) Observed difference = 25 mm Hg (p < 0.05) 25 mm Hg difference is the “point estimate”

26 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. The Point Estimate and the CI When using CIs, we report the point estimate and the limits of the CI surrounding the point estimate: When using CIs, we report the point estimate and the limits of the CI surrounding the point estimate: 25 mm Hg (95% CI: 5 to 44 mm Hg)

27 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Interpretation of the CI Consider the comparison of vasopressor A and vasopressor B Consider the comparison of vasopressor A and vasopressor B Since the 95% CI, 5 to 44 mm Hg doesn’t include 0, this is equivalent to p < 0.05 Since the 95% CI, 5 to 44 mm Hg doesn’t include 0, this is equivalent to p < 0.05 5 25 44

28 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Interpretation of the CI Although the point estimate for the difference is 25 mm Hg, the results are consistent with the true difference being anywhere between 5 and 44 mm Hg Although the point estimate for the difference is 25 mm Hg, the results are consistent with the true difference being anywhere between 5 and 44 mm Hg 5 25 44

29 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Why a 95% CI? The selection of 95% CIs (as opposed to 99% CIs, for example) is arbitrary The selection of 95% CIs (as opposed to 99% CIs, for example) is arbitrary like the selection of 0.05 as the cutoff for a statistically significant p value like the selection of 0.05 as the cutoff for a statistically significant p value

30 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Middle Ear Squeeze Study For a power of 80%, we needed a sample size of approximately 120 subjects For a power of 80%, we needed a sample size of approximately 120 subjects N = 116 N = 116 60 treatment 60 treatment 56 control 56 control Ann Emerg Med July 1992; 21:849-852.

31 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Middle Ear Squeeze Study Using p value For a power of 80%, we needed a sample size of approximately 120 subjects For a power of 80%, we needed a sample size of approximately 120 subjects N = 116 N = 116 60 treatment 60 treatment 56 control 56 control Outcome - ear discomfort: Outcome - ear discomfort: Treatment group 8% Treatment group 8% Control group 32% Control group 32% p =.001 p =.001 Sudafed works! Sudafed works! Ann Emerg Med July 1992; 21:849-852.

32 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Middle Ear Squeeze Study Using Point Estimate and 95% CI Ear discomfort: Ear discomfort: Treatment group 8% Treatment group 8% Control group 32% Control group 32% Absolute Risk Reduction 24% (95% CI: 9.9 to 38.3%) Absolute Risk Reduction 24% (95% CI: 9.9 to 38.3%) NNT 4.2 (95% CI: 2.6 to 10.1) NNT 4.2 (95% CI: 2.6 to 10.1)

33 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Cochrane Library. Wood-Baker, RR; Gibson, PG; Hannay, M; Walters, EH; Walters, JAE Date of Most Recent Update: 26-July-2005.

34 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Clinical vs. Statistical Significance Oral ondansetron vs. placebo Oral ondansetron vs. placebo 215 children with gastroenteritis 215 children with gastroenteritis Primary outcome: vomiting during oral hydration Primary outcome: vomiting during oral hydration RR = 0.4 (95% CI: 0.26 to 0.61) RR = 0.4 (95% CI: 0.26 to 0.61) NNT = 4.9 (95% CI: 3.1 to 10.3) NNT = 4.9 (95% CI: 3.1 to 10.3) Both clinically significant and statistically significant Both clinically significant and statistically significant N Engl J Med 2006; 354:1698-705

35 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Clinical vs. Statistical Significance Secondary outcome: oral intake in ED Secondary outcome: oral intake in ED 239 ml vs. 196 ml 239 ml vs. 196 ml p = 0.001 (statistically significant) p = 0.001 (statistically significant) But is a difference of 9 tsp clinically significant? But is a difference of 9 tsp clinically significant?

36 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Today’s Topics Classical Hypothesis Testing Classical Hypothesis Testing Type I Error Type I Error Type II Error, Power, Sample Size Type II Error, Power, Sample Size Point Estimates and Confidence Intervals Point Estimates and Confidence Intervals Multiple Comparisons Multiple Comparisons

37 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Multiple Comparisons When two identical groups of patients are compared, there is a chance (  ) that a statistically significant p value will be obtained (type I error) When two identical groups of patients are compared, there is a chance (  ) that a statistically significant p value will be obtained (type I error) When multiple comparisons are performed, the risk of one or more false-positive p values is increases When multiple comparisons are performed, the risk of one or more false-positive p values is increases Multiple comparisons include: Multiple comparisons include: – Pair-wise comparisons of more than two groups – The comparison of multiple characteristics between two groups (e.g., sub-group analyses) – The comparison of two groups at multiple time points

38 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Multiple Comparisons: Risk of  1 False Positive Number of Comparisons Probability of at Least One Type I Error 1 2 3 4 5 10 20 30 0.05 0.10 0.14 0.19 0.23 0.40 0.64 0.79 Assumes  = 0.05, uncorrelated comparisons

39 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Multiple Comparisons: Bonferroni Correction A method for reducing the overall risk of a type I error when making multiple comparisons A method for reducing the overall risk of a type I error when making multiple comparisons The overall (study-wise) type I error risk desired (e.g., 0.05) is divided by the number of tests, and this new value is used as the  for each individual test The overall (study-wise) type I error risk desired (e.g., 0.05) is divided by the number of tests, and this new value is used as the  for each individual test Controls the type I error risk, but reduces the power (increased type II error risk) Controls the type I error risk, but reduces the power (increased type II error risk)

40 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Results: We tested these 24 associations in the independent validation cohort. Residents born under Leo had a higher probability of gastrointestinal hemorrhage (P =.04), while Sagittarians had a higher probability of humerus fracture (P =.01) compared to all other signs combined. After adjusting the significance level to account for multiple comparisons, none of the identified associations remained significant in either the derivation or validation cohort. Bonferroni correction:.05/24 = 0.002 for statistical significance

41 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Statistical Issues to Consider if Planning a Study Define the most important question to be answered – the “primary objective” Define the most important question to be answered – the “primary objective” Define the size of the difference you wish to detect Define the size of the difference you wish to detect Get as much information as possible about what you expect to see in the control group Get as much information as possible about what you expect to see in the control group

42 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. Statistical Issues to Consider if Planning a Study Define values for  and power, and the maximum sample size that is realistic Define values for  and power, and the maximum sample size that is realistic Define clinically important subgroups of the population (a priori sub-group analyses) Define clinically important subgroups of the population (a priori sub-group analyses) Determine whether there are important multiple comparisons Determine whether there are important multiple comparisons

43 Dr. Michael Brown © Epidemiology Dept., Michigan State Univ. When You Visit the Statistician: Bring examples of published studies that illustrate the type of analysis you would like to perform at the end of the study Bring examples of published studies that illustrate the type of analysis you would like to perform at the end of the study

Download ppt "1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Roger J. Lewis,"

Similar presentations