Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA.

Similar presentations


Presentation on theme: "Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA."— Presentation transcript:

1 Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA Medical Center School of Medicine http://research.LABioMed.org/Biostat E-mail: ypak@labiomed.org 10/23/20151

2 Table of Contents  Analogy of Hypothesis Testing  How to compute a P-Values and interpret it  Understanding the sampling distribution and a confidence intervals (CI)  How to interpret a CI  The relationship between a P-Value and a CI 10/23/20152

3 Population Sample Sample estimate of population parameter/ Descriptive statistics Population parameter Sampling mechanism: random sample or convenience sample C.I.s or P- values for population parameter The procedure of statistical inferences 10/23/20153

4 Analogy for hypothesis testing Example: a bet between two friends Suppose you and a friend were playing a “fun” gambling game. Your friend has a coin which you flip: if “tails”, your friend pays you a $1, if “tails”, your friend pays you a $1, if “heads”, you pay your friend a $1 if “heads”, you pay your friend a $1 After 10 plays, you got 9 heads up. Do you trust your friend? Is this a fair coin? What is your argument? After 10 plays, you got 9 heads up. Do you trust your friend? Is this a fair coin? What is your argument? 10/23/20154

5 Statistical Hypothesis Testing  H 0 : Fair coin (null) vs. H a : Unfair coin (alternative)  Assume the coin is “fair” (Assume H 0 is true)  You and your friend have to put a threshold value on the definition of “being RARE”. That means that if Prob (# of H=9 or more |10 trials) is less than a certain value, say, α, then we will consider that 9 heads out of 10 trials are RARELY happen when the coin is fair, thus very unlikely to happen if the coin was fair. Then the rule is Prob.(# of H= 9 or more |10 trials) < 0.05 (α) ← Type I error rate ( = a level of significance) < 0.05 (α) ← Type I error rate ( = a level of significance) then your friend would agree to conclude it was not a fair coin thus reject H 0 in favor of H a. then your friend would agree to conclude it was not a fair coin thus reject H 0 in favor of H a. 10/23/20155

6 Statistical Hypothesis Testing continue.  Collect data and provide the “evidence” if H 0 : Fair coin is true. P(# of H=9 or more |10 trials)≈ 0.0011 (1.1%) P(# of H=9 or more |10 trials)≈ 0.0011 (1.1%)  Make decision P(# of H=9 or more |10 trials) ≈ 1.1 (%) < 5% P(# of H=9 or more |10 trials) ≈ 1.1 (%) < 5%  Thus, it is VERY unlikely to happen if it was a fair coin. coin.  We found a significant evidence to disapprove H 0 in favor of Ha. H 0 in favor of Ha.  Therefore, conclude that it was an UNFAIR coin (thus, the bet is invalid). 10/23/20156

7 How to interpret P Value=1.1(%), in general ?  A P Value is predicted on the assumption that H 0 is true  A P Value is NOT a probability of the alternative being correct.  A P Value should be used as an evidence to DISPROVE H 0, not to prove the Ha. 10/23/20157

8 How to interpret P-Values: Example Acute secondary Adrenal Insufficiency (AI) after Traumatic Brain Injury (TBI): a prospective study Objective: To determine the prevalence, clinical characteristics, and effect of AI on TBI patients Procedure: 80 TBI and 41 non-TBI patients were followed during the hospitalization up to 9 days, blood samples taken every 8 hours and vital signs recorded every hour. Subject is AI if 2 successive serum cortisols are low.

9 Goal: Do Groups Differ By More than is Expected By Chance? First, need to: Specify experimental units (Persons? Blood draws?). Specify single outcome for each unit (e.g., Yes/No(binary) or continuous?). Examine raw data, e.g., histogram, for meeting test assumptions. Specify group summary measure to be used (e.g., % or mean, median over units)  Descriptive statistics. Choose particular statistical test for the outcome and make inference with inferential statistics (CI, P-Value).

10 Outcome Type → Statistical Test Cohan (2005) Crit Care Med;33:2358-66.... Medians %s Means Wilcoxon Test ChiSquare Test t Test

11 t-Test for Minimal Mean Arterial Pressure(MAP): Step 1 1. Calculate a standardized quantity for the particular test, a “test statistic”. Diff in Group Means = 63.4 - 56.2 = 7.2 (“Signal”) SE(Diff) ≈ sqrt[SEM 1 2 + SEM 2 2 ] = sqrt(1.66 2 +1.41 2 ) ≈ 2.2 (“Noise” due to random sampling) AI N 42 Mean 56.1666667 Std Dev 10.7824634 SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean 63.4122807 Std Dev 8.7141575 SE(Mean) 1.41=8.71/√38 → Test Statistic = t = (7.2 - 0)/2.2 = 3.28 Signal to Noise Ratio

12 t-Test for Minimal MAP: Step 2 2.Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ(H 0 ). Often: t is approx’ly normal bell curve. Expect 0.95 Chance Observed = 3.28 Is the t-test statistics of 3.28 seems to be “RARE” to you? Why? Prob (-2 to -1) is Area = 0.14 22 2.5 % 22

13 t-Test for Minimal MAP: P-Value Expected 95% When H 0 is true 95% Chance Observed = 3.28 P-Value=Prob. ( T-statistics > 3.28)=0.0007(One-sided) In practice, a two sided p- value is usually used. Two sided P-Value = 2 x One-sided P-value =2 x 0.0007= 0.0014 < 0.05  Conclude: Groups differ since ≥3.28 has <5% if no difference in the entire  Smaller values ↔ more evidence of group differences. Area = 0.0007 3.Declare groups to differ if test statistic is RARE under H0 is true[How much RARE?]

14 One sided or Two sided P-Values? There are other types of t-tests: A two-sided P-value assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease. A one-sided P-value assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

15 Tests on Percentages Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ? Solution: Same theme as for means. Find a test statistic and compare to its expected values if groups do not differ. See next slide.

16 Tests on Percentages Cannot use t-test for comparing lab data for multiple blood draws per subject. Expect 1 Observed = 10.2 Area = 0.002 Chi-Square Distribution 95% Chance 5.99 Here, the signal in the test statistic is a squared quantity, expected to be 1. Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002.

17 Tests on Percentages: Chi-Square The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this. Then, chi-square is found as the sum of standardized ∑ (Observed – Expected) 2 / Expected. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big(extreme) to have happened by chance (probability=0.002),i.e., if there is no difference among “all” TBI subjects(H 0 ).

18 How RARE is being “RARE”? Expect >99% ChanceObserved = 3.28 Convention: “Too deviant” is ~2. Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%? <0.5% Answer: Then the chances of missing a real difference are increased, the converse wrong conclusion. This is analogous to setting the threshold for a diagnostic test of disease.

19 A statistically significant result --- A statistically significant result ---  is not necessarily an important or even interesting result  may not be scientifically interesting or clinically significant.  With large sample sizes, very small differences may turn out to be statistically significant. In such a case, practical implications of any findings must be judged on other than statistical grounds.  Statistical significance does not imply practical significance 10/23/201519

20 How to interpret insignificant p-values  Possible answers 1.There is no difference (H 0 is true). 2.There is a real difference (Ha is true) but fail to detect due to small sample size– Type II error  There is no way to determine whether a non- significant difference is the result of a small sample size or because the null hypothesis is correct.  Thus, insignificant P-Values should almost always be regarded as INCONCLUSIVE rather than an indication of no effect. (Fail to reject the null.).  Insignificant p-value does NOT prove H 0. 10/23/201520

21 Back to Paper: Normal Range What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range? SD = 8.7 SD = 10.8 N = 38 N = 42

22 Back to Paper: Normal Range What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range? Answer: 56.2 ± 2(10.8) ≈ 35 to 78 SD = 8.7 SD = 10.8 N = 38 N = 42

23 Back to Paper: Confidence Intervals Δ= 63.4-56.2= 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients. We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to 11.6. SD = 8.7 SD = 10.8 N = 38 N = 42 SE = 1.41 SE = 1.66 SE(Diff of Means) = 2.2 SE(Diff) ≈ sqrt of [SEM 1 2 + SEM 2 2 ]

24 Sampling distribution and CI  Sampling distribution: A distribution of a statistics (such as a sample mean or a t-test statistics) with repeated sampling from a target population.  We can calculate statistics from one random sample and use that statistics as point estimate for population.  But how precise that statistics is based on the sampling distribution of that statistics  Since a sample mean is used most commonly, the sampling distribution of the mean are used most commonly.  Simulation of a sampling distribution or a confidence interval of the sample mean the sample mean  go to http:// http://onlinestatbook.com/stat_sim/index.html http://onlinestatbook.com/stat_sim/index.html http://

25 Confidence Interval  When your study is under powered(e.g., pilot data) or over powered(e.g., national surveys), the confidence interval provide the range for where true effect ( a population parameter) lies.  How well your sample mean (m) reflect the true mean?  Generic form of 95% CI for the mean(proportion) Lower limit: Sample mean(proportion) – 1.96* SE Lower limit: Sample mean(proportion) – 1.96* SE Upper limit: Sample mean (proportion) + 1.96* SE Upper limit: Sample mean (proportion) + 1.96* SE, 1.96* SE also usually called “the margin of the error”., 1.96* SE also usually called “the margin of the error”.  SE is measures the variability in the sampling distribution of the sample mean (or proportion) from a repeated sampling. 10/23/201525

26 Revisiting the food additives study 2. Look at the left side of the bottom panel of Figure 3 and recall what we have said about confidence intervals. Would you conclude that there is a change in hyperactivity under Mix A? 3. Repeat question 2 for placebo.

27 Revisiting the food additive study cont.

28 Possible values for real effect. Zero is “ruled out”.

29 Revisiting the food additive study cont. 4. Do you think that the positive conclusion for question #3 has been "proven"? Yes, with 95% confidence. 5. Do you think that the negative conclusion for question #2 has been "proven"? No, since more subjects would give a narrower confidence interval. Hypothesis testing make a Yes or No conclusion whether there is an effect and quantifies the chances of a correct conclusion either way. Confidence intervals give possible magnitudes of effects.

30 Confidence Intervals ↔ Hypothesis tests p>0.05 p≈0.05 p<0.05 The food additives study

31 Confidence Intervals ↔ Hypothesis tests 95% Confidence Intervals Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences. However, overlapping is not necessary. They can overlap and still groups can differ significantly. The AI study

32 Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It needs to be balanced with the likelihood of wrongly declaring effects when they are non- existent. Today, we have been keeping that error at <5%. Power is the topic for the next session #4.


Download ppt "Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA."

Similar presentations


Ads by Google