Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.

Similar presentations


Presentation on theme: "Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses."— Presentation transcript:

1 Biostatistics in Practice Peter D. Christenson Biostatistician http://gcrc.LABioMed.org/Biostat Session 3: Testing Hypotheses

2 Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3. 1.Look at the bottom panel of Figure 3. Based on what we have discussed about confidence intervals, do you see evidence for change in hyperactivity under Mix A? 2.Repeat question 1 for placebo.

3 Session 3 Preparation: #1 and #2

4 Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3. 3. Now look at the fourth vertical bar in this same panel in Fig 3. Does it agree with your combined conclusions in questions 1 and 2?

5 Session 3 Preparation: #3

6 Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3. 4. Do you think that the negative conclusion for question #1 been "proven"? 5. Do you think that the positive conclusion for question #2 been "proven"?

7 Session 3 Preparation: #4 and #5 Possible values for real effect. Zero is “ruled out”.

8 Session 3 Preparation 5. From Tables 1 and 2, we see that (209- 137)/209=34% of parents of the younger children and (160-130)/160=19% of parents of the older children initially were interested but did not complete the study. What are the main reported reasons for not completing? Does it seem logical that the rate is higher for the 3-year-olds? Do you have any intuition on whether the magnitude of the 34% vs. 19% difference is enough to support an age difference, regardless of the logical reason?

9 Session 3 Preparation #5 73% ↔ Consented ↔ 90% 66% ↔ Completed ↔ 81% Not intuitive whether 73% vs. 90% is real, or reproducible.

10 Session 3 Goals Statistical testing concepts Three most common tests Software Equivalence of testing and confidence intervals False positive and false negative conclusions

11 Goal: Do Groups Differ By More than is Expected By Chance? Cohan (2005) Crit Care Med;33:2358-66.

12 Goal: Do Groups Differ By More than is Expected By Chance? First, need to: Specify experimental units (Persons? Blood draws?). Specify single outcome for each unit (e.g., Yes/No, mean or min of several measurements?). Examine raw data, e.g., histogram, for meeting test requirements. Specify group summary measure to be used (e.g., % or mean, median over units). Choose particular statistical test for the outcome.

13 Outcome Type → Statistical Test Cohan (2005) Crit Care Med;33:2358-66.... Medians %s Means Wilcoxon Test ChiSquare Test t Test

14 Minimal MAP: Group Distributions of Individual Units AI Group (N=42) Stem.Leaf # 7 6 1 7 11334 5 6 555 3 6 01112344 8 5 5566778 7 5 01222234 8 4 57788 5 4 23 2 3 6 1 3 13 2 ----+----+----+----+ Multiply Stem.Leaf by 10**+1 Non-AI Group (N=38) Stem.Leaf # 7 79 2 7 00111234 8 6 5556777888 10 6 00112234 8 5 67999 5 5 3 1 4 79 2 4 04 2 ----+----+----+----+ Multiply Stem.Leaf by 10**+1 → Approximately normally distributed → Use means to summarize groups. → Use t-test to compare means.

15 Goal: Do Groups Differ By More than is Expected By Chance? Next, need to: 1. Calculate a standardized quantity for the particular test, a “test statistic”. Often: t=(Diff in Group Means)/SE(Diff) 2. Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve. 3. Declare groups to differ if test statistic is too deviant from expectations in (2) above. Often: absolute value of t >~2.

16 t-Test for Minimal MAP: Step 1 1. Calculate a standardized quantity for the particular test, a “test statistic”. Diff in Group Means = 63.4 - 56.2 = 7.2 SE(Diff) ≈ sqrt[SEM 1 2 + SEM 2 2 ] = sqrt(1.66 2 +1.41 2 ) ≈ 2.2 AI N 42 Mean 56.1666667 Std Dev 10.7824634 SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean 63.4122807 Std Dev 8.7141575 SE(Mean) 1.41=8.71/√38 → Test Statistic = t = (7.2 - 0)/2.2 = 3.28

17 t-Test for Minimal MAP: Step 2 2.Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve. Expect 0.95 Chance Observed = 3.28 Expected values for test statistic if groups do not differ. Area under sections of curve = probability of values in the interval. (0.5 for 0 to ∞) Prob (-2 to -1) is Area = 0.14

18 t-Test for Minimal MAP: Step 3 Expect 95% Chance Observed = 3.28 3.Declare groups to differ if test statistic is too deviant. [How much?] Convention: “Too deviant” is ~2. “Two-tailed” = the 5% is allocated equally for either group to be superior. 2.5% Conclude: Groups differ since ≥3.28 has <5% if no diff in entire populations.

19 t-Test for Minimal MAP: p value Expect 95% Chance Observed = 3.28 p-value: Probability of a test statistic at least as deviant as observed, if populations really do not differ. Smaller values ↔ more evidence of group differences. Area = 0.0007 p value = 2(0.0007) = 0.0014 <<0.05 3.Declare groups to differ if test statistic is too deviant. [How much?]

20 t-Test: Technical Note There are actually several types of t-tests: Equal vs. unequal variance (variance =SD 2 ), depending on whether the SDs are too different between the groups. [Yes, there is another statistical test for comparing the SDs.] SE(Diff) ≈ sqrt[SEM 1 2 + SEM 2 2 ] = sqrt(1.66 2 +1.41 2 ) ≈ 2.2 is approximate. There are more complicated exact formulas that software implements. AI N 42 Mean 56.1666667 Std Dev 10.7824634 SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean 63.4122807 Std Dev 8.7141575 SE(Mean) 1.41=8.71/√38

21 t-Test: Another Note There are other types of t-tests: A two-sided t-test assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease. A one-sided t-test assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

22 Back to Paper: Normal Range Δ= 63.4-56.2= 7.2 is the best guess for the MAP diff between a randomly chosen AI and non-AI patient, w/o other patient info. What is the “normal” range for AI patients? SD = 8.7 SD = 10.8 N = 38 N = 42

23 Back to Paper: Confidence Intervals Δ= 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients. We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to 11.6. SD = 8.7 SD = 10.8 N = 38 N = 42 SE = 1.41 SE = 1.66 SE(Diff of Means) = 2.2 SE(Diff) ≈ sqrt of [SEM 1 2 + SEM 2 2 ]

24 Back to Paper: t-test Δ= 7.2 is statistically significant (p=0.0014); i.e., only 14 of 1000 sets of 80 patients would differ so much, if AI and non-AI really don’t differ in MAP. Is Δ= 7.2 clinically significant?

25 Confidence Intervals ↔ Tests p>0.05 p≈0.05 p<0.05 Hyperactivity Paper

26 Confidence Intervals ↔ Tests |Δ/SE(Δ)| = |t| < 2 is equivalent to: |Δ| < 2 SE(Δ) is equivalent to: -2 SE(Δ) < Δ < 2 SE(Δ) is equivalent to: Δ - 2 SE(Δ) < 0 < Δ + 2 SE(Δ) (95% Confidence Interval)

27 Confidence Intervals ↔ Tests 95% Confidence Intervals Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences. However, non-overlapping is not necessary. They can overlap and still groups can differ significantly.

28 Back to Paper: Experimental Units Cannot use t-test for comparing lab data for multiple blood draws per subject. b at least 100 g/kg/min of propofol administered at the time of blood draw, or any pentobarbital in the 48 hrs before the blood draw

29 Tests on Percentages Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ? Solution: same theme as for means. Find a test statistic and compare to its expected values if groups do not differ. See next slide.

30 Tests on Percentages Cannot use t-test for comparing lab data for multiple blood draws per subject. Expect 1 Observed = 10.2 Area = 0.002 Chi-Square Distribution 95% Chance 5.99 Here, the test statistic is a ratio, expected to be 1, rather than a difference, expected to be 0. Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002.

31 Tests on Percentages: Chi-Square The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this. Then, chi-square is found as the sum of standardized (Observed – Expected) 2. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big to have happened by chance (probability=0.002).

32 Back to t-Test Expect 95% Chance Observed = 3.28 Declare groups to differ if test statistic is too deviant. Convention: “Too deviant” is ~2. Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%? 2.5% How much “deviance” is enough proof?

33 Graphical Representation of t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study=1.13 \\\ = Probability: Conclude Effect, But no Real Effect (5%). /// = Probability: Conclude No Effect, But Real Effect (41%). 41% 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude real effect.

34 Graphical Representation of t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study=1.13 41% 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude real effect. Suppose we need stronger proof; i.e., shift cutoff to right. Then, chance of false positive is reduced to ~1%, but false negative is increased to ~60%.

35 Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It is 100-41=59% two slides back. This is the topic for the next session #4.


Download ppt "Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses."

Similar presentations


Ads by Google