Presentation is loading. Please wait.

Presentation is loading. Please wait.

Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal.

Similar presentations


Presentation on theme: "Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal."— Presentation transcript:

1 Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Center for Aging and Population Health Dept. of Epidemiology, GSPH Dept. of Epidemiology, GSPH

2 Flow chart for group comparisons Measurements to be compared continuous Distribution approx normal or N ≥ 20? NoYes Non-parametrics T-tests discrete ( binary, nominal, ordinal with few values)

3 Outline For Today Continuous Distributions Normal distribution Normal distribution Mean Mean Standard deviation ( computation, interpretation ) Standard deviation ( computation, interpretation ) Confidence Intervals, t-distribution Confidence Intervals, t-distribution Comparing 2-groups Comparing 2-groups T-tests T-tests Next lecture Wilcoxon Rank-Sum (non-parametric) Wilcoxon Rank-Sum (non-parametric)

4 Confidence Interval For a Continuous Variable Aflatoxin levels of raw peanut kernels (n=15). 30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37 Aflatoxin, a natural toxin produced by certain strains of the mold Aspergillus flavus and A. parasiticus that grow on peanuts stored in warm, humid silos. Peanuts aren't the only affected crops. Aflatoxins have been found in pecans, pistachios and walnuts, as well as milk, grains, soybeans and spices. Aflatoxin is a potent carcinogen, known to cause liver cancer in laboratory animals and may contribute to liver cancer in Africa where peanuts are a dietary staple.

5 Aflatoxin levels of raw peanut kernels Stem-and-leaf plot Stem (tens)Leaf (Units) 16 16 22 3 6 6 7 8 22 3 6 6 7 8 30 1 5 6 7 30 1 5 6 7 48 48 50 2 50 2 Range= max-min= 52-16=36 Mode = 26 (highest frequency)

6 Aflatoxin levels of raw peanut kernels 30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37 Q1 median Q3 Q1 median Q3 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 1st Quartile: 25%) (3rd Quartile: 75%) (1st Quartile: 25%) (3rd Quartile: 75%) IQR= Q3-Q1= 37-26= 11

7 <= No outliers Slightly skewed

8 Box-and-Whisker Plot (full Bell-labs version with outliers)

9 Standard Deviation (SD) N-1 = degrees of freedom (df) N-1 = degrees of freedom (df) N datapoints (total pieces of information) Parameters estimated: Mean: 1 df, SD: N-1 df Large SD => data points widely spread out from the mean Large SD => data points widely spread out from the mean Small SD => data points clustered closely around the mean Small SD => data points clustered closely around the mean

10 Empirical rule for interpreting SD in normal distributions

11 Empirical rule for interpreting SD Hseih, et. al. Effects of high-intensity exercise training in a pulmonary rehabilitation programme for patients with chronic obstructive pulmonary disease. Respirology (2007) 12:381–388 Age of cohort: 73.9 ± 6.7 (Mean ± SD) “Patients who completed high-intensity training had significant improvements in FVC (2.47 ± 0.70 L, P = 0.024) at rest”.

12 Rules for interpreting SDs that apply to any distribution Chebyshev’s Inequality At least 50% of the values are within √ 2 SDs At least 50% of the values are within √ 2 SDs of the mean At least 75% of the values are within 2 SDs At least 75% of the values are within 2 SDs At least 89% of the values are within 3 SDs At least 89% of the values are within 3 SDs

13 Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study (WHI-OS) ~ 90,000 women (WHI-OS) ~ 90,000 women longitudinal cohort study (8yrs and continuing) longitudinal cohort study (8yrs and continuing) Osteoporotic Fractures Ancillary Substudy  case-control study 1200 cases (fractures), 1200 controls 1200 cases (fractures), 1200 controls Inflammatory markers (e.g. IL-6) Inflammatory markers (e.g. IL-6) Hormones (estradiol), bone mineral density, … Hormones (estradiol), bone mineral density, … 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml)

14 Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study Osteoporotic Fractures Ancillary Substudy 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml) mean (SD):32.8 ± 10.7 (controls) mean (SD):32.8 ± 10.7 (controls) 21.6 ± 13.6 (cases)

15 Rules for interpreting SDs that apply to any distribution Women’s Health Initiative Observational Study Osteoporotic Fractures Ancillary Substudy 25(OH)2 Vitamin D3 (ng/ml) 25(OH)2 Vitamin D3 (ng/ml) mean (SD):32.8 ± 10.7 (controls) mean (SD):32.8 ± 10.7 (controls) 21.6 ± 13.6 (cases) Cases: (SD=13.6) At least 50% within √2 SD’s (21.6 ± 19.2, 2.4 - 40.8 ) At least 50% within √2 SD’s (21.6 ± 19.2, 2.4 - 40.8 ) At least 75% within 2 SD’s (21.6 ± 27.2, 0 - 48.8 ) At least 75% within 2 SD’s (21.6 ± 27.2, 0 - 48.8 )

16 Confidence Interval for a Population Mean Standard error of the mean: Mean: * Standard error is general term for standard deviation of some estimator

17 ∞ 1.96  Normal dist (limit) Example: n=19, df=18

18 Aflatoxin levels of raw peanut kernels n= 15 df=14 (=n-1) 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 t 0.025,14 = 2.145 95% C.I: 32.47 ± 2.145*(10.63/√15) = 32.47 ± 2.145*2.744 = 32.47 ± 5.89 = 2.744 95% C.I: (26.58, 38.36)

19 Aflatoxin levels of raw peanut kernels n= 15 peanuts sampled from silo 16, 22, 23 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52 95% C.I: (26.58, 38.36) 95% C.I.  p < 0.05 (using t-test) Hypothesis: Mean of entire silo = 30 (p>0.05 => not rejected) H 0 : Mean of all silos = 25 (p rejected)

20 Aflatoxin levels of raw peanut kernels n= 15 peanuts sampled from silo 95% C.I: (26.58, 38.36) 95% C.I.  p < 0.05 (using t-test) Hypothesis: Mean of entire silo = 30 t = ( mean – 30 ) / Stderr( mean ) = (32.47 - 30)/2.744 = 0.90 t=0.90, df=14, p = 0.3833 ( 0.3833/2= 0.19167 => see table)

21 ∞ t=0.90, df=14, p = 0.3833 ( 0.3833/2= 0.19167)

22 2-sample independent t-test for comparing means of two groups  General Formula: stdev = sqrt(variance) If two independent estimators (e.g. group means):  Variance(of difference) = sum of variances

23 2-sample t-test to compare two groups Case 1: Equal variances “pooled” variance estimate df = n 1 + n 2 - 2

24 2-sample t-test to compare two groups denom = stderr of numerator Case 2: Unequal variances D.F = Welch-Satterthwaite equation (best approx df)

25 Does Cell Phone Use While Driving Impair Reaction Times? Sample of 64 students from Univ of Utah Randomly assigned: cell phone group or control Randomly assigned: cell phone group or control => 32 in each group => 32 in each group On machine that simulated driving situations: On machine that simulated driving situations: => at irregular periods a target flashed red or green => at irregular periods a target flashed red or green Participants instructed to hit “brake button” as soon as possible when they detected red light Participants instructed to hit “brake button” as soon as possible when they detected red light Control group listened to radio or to books-on-tape Control group listened to radio or to books-on-tape Cell phone group carried on conversation about a political issue with someone in another room Cell phone group carried on conversation about a political issue with someone in another room

26

27 Does Cell Phone Use While Driving Impair Reaction Times ? (milliseconds) N Mean SD Cell Phone32 585.289.6 Control32 533.765.3 -------- Difference 51.5 = sqrt(89.6 2 /32+65.3 2 /32)=19.6 = sqrt(89.6 2 /32+65.3 2 /32)=19.6 = 56.685 = 56.685 t = 51.5/19.6 = 2.63, p=0.011 t = 51.5/19.6 = 2.63, p=0.011

28 Removing one high outlier from cell phone group N Mean SD Cell Phone31 573.1 58.9 (“equal Control32 533.765.3 variances”) -------- Difference 39.4 = 62.69 (pooled var) = 62.69 (pooled var) df= n1+n2-2 = 61 df= n1+n2-2 = 61 t = 39.4/(62.69*√(1/31+1/32)) = 2.52 ( p=0.015)


Download ppt "Group Comparisons Part 1 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal."

Similar presentations


Ads by Google