Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Center for Aging and Population Health Dept. of Epidemiology, GSPH Dept. of Epidemiology, GSPH
Previous Biostat Didactics Fall 2009 – Spring Descriptive Statistics: Examining Your Data Data types: Qualitative (Categorical), Ordinal, Quantitative Data types: Qualitative (Categorical), Ordinal, Quantitative Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots 2. Group Comparisons: Part 1 Normal dist (mean, SD: 68%, 95%, 99% interpretation) Normal dist (mean, SD: 68%, 95%, 99% interpretation) t-dist, degrees of freedom (n-1) t-dist, degrees of freedom (n-1) Confidence interval for the mean Confidence interval for the mean 3. Group Comparisons: Part 2 Comparing means: Two-sample independent t-test Comparing means: Two-sample independent t-test pooled and unequal variance (Satterthwaite) versions pooled and unequal variance (Satterthwaite) versions interpretation of p-values, type I (false positive) and type II error interpretation of p-values, type I (false positive) and type II error
Previous Biostat Didactics Fall 2009 – Spring Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Comparing groups having small sample sizes (< 20) or with non-normal distributions Comparing groups having small sample sizes (< 20) or with non-normal distributions => Use Wilcoxon Rank-Sum Test (nonparametric) (based on rank-order when sorted rather than (based on rank-order when sorted rather than on actual numeric values) on actual numeric values) Comparing groups in the % falling into diff categories Comparing groups in the % falling into diff categories => Use Chi-square, Fisher’s Exact (if any cell n Use Chi-square, Fisher’s Exact (if any cell n < 5)
Previous Biostat Didactics Fall 2009 – Spring Correlation, Regression and Covariate-Adjusted Group Comparisons Pearson vs Spearman correlation Pearson vs Spearman correlation => linear vs monotone association Regression: interpretation of beta coefficients Regression: interpretation of beta coefficients Standard errors, p-values Standard errors, p-values Continuous predictor => beta coeff is a slope Continuous predictor => beta coeff is a slope Dichotomous (e.g. group “dummy” 0,1 valued variable) Dichotomous (e.g. group “dummy” 0,1 valued variable) => beta coeff is difference in response vs “referent” treatment_group = 1knockout mouse = 0wild mouse (referent) = 0wild mouse (referent) Adjusting for important covars when comparing groups Adjusting for important covars when comparing groups
Flow chart for group comparisons Measurements to be compared continuous Distribution approx normal or N ≥ 20? NoYes Non-parametrics T-tests discrete ( binary, nominal, ordinal with few values) Chi-square Fisher’s Exact
Flow chart for regression models (includes adjusted group comparisons) Outcome variable continuous or dichotomous? dichotomouscontinuous Time-to-event available (or relevant)? NoYes Multiple logistic regression Cox proportional hazards regression Predictor variable categorical? NoYes (e.g. groups) Multiple linear regression ANCOVA (Multiple linear regression - using dummy variable(s) for categorical var(s)
Analysis From Last Didactic … In Health, Aging and Body Composition Knee-OA Substudy: In Health, Aging and Body Composition Knee-OA Substudy: Examine Association between SxRxKOA (knee OA) and CRP adjusted for BMI. Motivation: Sowers M, Hochberg M et. al. C-reactive protein as a biomarker of emergent osteoarthritis. Osteoarthritis and Cartilage Volume 10, Issue 8, August 2002, Pages Conclusion: “CRP is highly associated with Knee OA; however, its high correlation with obesity limits its utility as an exclusive marker for knee OA”
All White Females in HABC (N=844) [includes SxRxKOA (n=93); also rest of parent study cohort] N=5 N=5 had CRP > 30 (max=63.2)
log CRP
White Females Difference in average logCRP: 0.76 – 0.43 = 0.33 Knee OA P-value No (n=752)Yes (n=92) Mean (SD) Equal varsUnequal logCRP0.43 (0.83)0.76 (0.58)0.0002< BMI25.4 (4.3)28.8 (5.2)< logCRP SD’s were signif diff (p<0.0001) => Use Satterthwaite unequal variance test
Two-Group Unadjusted Comparison Of Means Using Regression with Dummy-coded Groups * No OA is “referent” group (i.e. kneeOA=0) HABCID logCRP kneeOA BMI proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;
White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent Same p-value as equal variance t-test
Model: logCRP= *kneeOA (intercept) KneeOA=0 logCRP= *0 = KneeOA=1 logCRP= *1 = proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;
ANCOVA (Analysis of Covariance) Compare logCRP adjusted for BMI
proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled Unadjusted diff Was 0.33 BMI partially “explains” this difference
{ Unadjusted Mean Difference Notice: At any BMI level, the mean logCRP difference between KneeOA vs Not is smaller than the unadjusted difference
logCRP between KneeOA vs Not Adjusted for BMI, Age and Anti-inflammatory Meds Note: age is not significant (caveat: narrow HABC study age range: 69-80)
White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent
Pearson Correlation Pearson Correlation = a measure of linear association
Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear
Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear Equals Pearson Correlation using the ranks of the observations instead of actual values Heuristically: Spearman measures degree that low goes with low, middle with middle, high with high
Effect of Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=bmi_minus25; where female=1 and white=1 and kneeOA=1; run; logCRP= *(BMI-25) = at BMI=25 (see graphic)
Effect of Centering BMI at 25 Model 2: logCRP= *(BMI-25) = * *BMI = *BMI
{ Unadjusted Mean Difference
ANCOVA (Analysis of Covariance) Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi_minus25; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled
Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1; run; (“interaction term”) HABCID logCRP kneeOA BMI BMI_x_KneeOA
Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1;run; The “BMI” slopes are not signif different (p=0.8019) => they are parallel
Thank you Questions, comments, suggestions or insights? Questions, comments, suggestions or insights? Remaining time: Open consultation … Remaining time: Open consultation …