Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical.

Similar presentations


Presentation on theme: "Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical."— Presentation transcript:

1 Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Core Director for Biostatistics Center for Aging and Population Health Center for Aging and Population Health Dept. of Epidemiology, GSPH Dept. of Epidemiology, GSPH

2 Previous Biostat Didactics Fall 2009 – Spring 2010 1. Descriptive Statistics: Examining Your Data Data types: Qualitative (Categorical), Ordinal, Quantitative Data types: Qualitative (Categorical), Ordinal, Quantitative Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots Mean, SD, medians, quartiles, IQR, skewness, histograms, boxplots 2. Group Comparisons: Part 1 Normal dist (mean, SD: 68%, 95%, 99% interpretation) Normal dist (mean, SD: 68%, 95%, 99% interpretation) t-dist, degrees of freedom (n-1) t-dist, degrees of freedom (n-1) Confidence interval for the mean Confidence interval for the mean 3. Group Comparisons: Part 2 Comparing means: Two-sample independent t-test Comparing means: Two-sample independent t-test pooled and unequal variance (Satterthwaite) versions pooled and unequal variance (Satterthwaite) versions interpretation of p-values, type I (false positive) and type II error interpretation of p-values, type I (false positive) and type II error

3 Previous Biostat Didactics Fall 2009 – Spring 2010 4. Group Comparisons Part 3: Nonparametric Tests, Chi-squares and Fisher Exact Comparing groups having small sample sizes (< 20) or with non-normal distributions Comparing groups having small sample sizes (< 20) or with non-normal distributions => Use Wilcoxon Rank-Sum Test (nonparametric) (based on rank-order when sorted rather than (based on rank-order when sorted rather than on actual numeric values) on actual numeric values) Comparing groups in the % falling into diff categories Comparing groups in the % falling into diff categories => Use Chi-square, Fisher’s Exact (if any cell n Use Chi-square, Fisher’s Exact (if any cell n < 5)

4 Previous Biostat Didactics Fall 2009 – Spring 2010 5. Correlation, Regression and Covariate-Adjusted Group Comparisons Pearson vs Spearman correlation Pearson vs Spearman correlation => linear vs monotone association Regression: interpretation of beta coefficients Regression: interpretation of beta coefficients Standard errors, p-values Standard errors, p-values Continuous predictor => beta coeff is a slope Continuous predictor => beta coeff is a slope Dichotomous (e.g. group “dummy” 0,1 valued variable) Dichotomous (e.g. group “dummy” 0,1 valued variable) => beta coeff is difference in response vs “referent” treatment_group = 1knockout mouse = 0wild mouse (referent) = 0wild mouse (referent) Adjusting for important covars when comparing groups Adjusting for important covars when comparing groups

5 Flow chart for group comparisons Measurements to be compared continuous Distribution approx normal or N ≥ 20? NoYes Non-parametrics T-tests discrete ( binary, nominal, ordinal with few values) Chi-square Fisher’s Exact

6 Flow chart for regression models (includes adjusted group comparisons) Outcome variable continuous or dichotomous? dichotomouscontinuous Time-to-event available (or relevant)? NoYes Multiple logistic regression Cox proportional hazards regression Predictor variable categorical? NoYes (e.g. groups) Multiple linear regression ANCOVA (Multiple linear regression - using dummy variable(s) for categorical var(s)

7 Analysis From Last Didactic … In Health, Aging and Body Composition Knee-OA Substudy: In Health, Aging and Body Composition Knee-OA Substudy:  Examine Association between SxRxKOA (knee OA) and CRP adjusted for BMI. Motivation: Sowers M, Hochberg M et. al. C-reactive protein as a biomarker of emergent osteoarthritis. Osteoarthritis and Cartilage Volume 10, Issue 8, August 2002, Pages 595-601 Conclusion: “CRP is highly associated with Knee OA; however, its high correlation with obesity limits its utility as an exclusive marker for knee OA”

8 All White Females in HABC (N=844) [includes SxRxKOA (n=93); also rest of parent study cohort] N=5 N=5 had CRP > 30 (max=63.2)

9 log CRP

10 White Females Difference in average logCRP: 0.76 – 0.43 = 0.33 Knee OA P-value No (n=752)Yes (n=92) Mean (SD) Equal varsUnequal logCRP0.43 (0.83)0.76 (0.58)0.0002< 0.0001 BMI25.4 (4.3)28.8 (5.2)< 0.0001 logCRP SD’s were signif diff (p<0.0001) => Use Satterthwaite unequal variance test

11 Two-Group Unadjusted Comparison Of Means Using Regression with Dummy-coded Groups * No OA is “referent” group (i.e. kneeOA=0) HABCID logCRP kneeOA BMI 1000 1.10972 0 22.5922 1001 0.16526 0 22.2751 1002 1.50988 0 26.1207 1003 -0.62048 0 26.9536 1014 0.65657 1 26.5266 1017 0.82039 1 30.2526 1033 0.84323 1 29.8458 1048 1.67787 1 39.8597 proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;

12 White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent Same p-value as equal variance t-test

13 Model: logCRP=0.42682 + 0.33091*kneeOA (intercept) KneeOA=0  logCRP=0.42682+0.33091*0 = 0.42682 KneeOA=1  logCRP=0.42682+0.33091*1 = 0.75773 proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run;

14 ANCOVA (Analysis of Covariance) Compare logCRP adjusted for BMI 

15 proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled  Unadjusted diff Was 0.33 BMI partially “explains” this difference

16 { Unadjusted Mean Difference Notice: At any BMI level, the mean logCRP difference between KneeOA vs Not is smaller than the unadjusted difference

17 logCRP between KneeOA vs Not Adjusted for BMI, Age and Anti-inflammatory Meds Note: age is not significant (caveat: narrow HABC study age range: 69-80)

18 White Females: 2-Group Comparison Using Dummy-coded Groups * No OA is “referent” group (KneeOA=0); proc reg data=kneeOA_vs_noOA; model logCRP= KneeOA; where female=1 and white=1; run; Note: Regression using Dummy (0, 1) for group variable (e.g. KneeOA=0,1) In regression, equal (pooled) variance is assumed “No OA” mean “kneeOA” mean difference from referent

19

20 Pearson Correlation Pearson Correlation = a measure of linear association

21 Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear

22 Pearson vs Spearman Correlation Spearman: A measure of rank order correlation Works for any general trend that is increasing or decreasing and not necessarily linear Equals Pearson Correlation using the ranks of the observations instead of actual values Heuristically: Spearman measures degree that low goes with low, middle with middle, high with high

23 Effect of Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=bmi_minus25; where female=1 and white=1 and kneeOA=1; run;  logCRP= 0.58144 + 0.04699*(BMI-25) = 0.58144 at BMI=25 (see graphic)

24 Effect of Centering BMI at 25  Model 2: logCRP= 0.58144 + 0.04699*(BMI-25) = 0.58144-25*0.04699 + 0.04699*BMI =-0.59337 + 0.04699*BMI

25

26 { Unadjusted Mean Difference

27 ANCOVA (Analysis of Covariance) Centering BMI at 25 proc reg data=kneeOA_vs_noOA; model logCRP=KneeOA bmi_minus25; where female=1 and white=1; run; Note: Equal BMI slopes in each group is being modeled 

28 Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1; run; (“interaction term”) HABCID logCRP kneeOA BMI BMI_x_KneeOA 1000 1.10972 0 22.5922 0.0000 1001 0.16526 0 22.2751 0.0000 1002 1.50988 0 26.1207 0.0000 1003 -0.62048 0 26.9536 0.0000 1014 0.65657 1 26.5266 26.5266 1017 0.82039 1 30.2526 30.2526 1033 0.84323 1 29.8458 29.8458 1048 1.67787 1 39.8597 39.8597

29 Check of ANCOVA Assumption: Equality of BMI slopes: KneeOA vs Not proc reg data=knee_vs_noOA; model logCRP=KneeOA bmi BMI_x_KneeOA; model logCRP=KneeOA bmi BMI_x_KneeOA; where female=1 and white=1; where female=1 and white=1;run; The “BMI” slopes are not signif different (p=0.8019) => they are parallel

30 Thank you Questions, comments, suggestions or insights? Questions, comments, suggestions or insights? Remaining time: Open consultation … Remaining time: Open consultation …


Download ppt "Biostat Didactic Seminar Series Correlation and Regression Part 2 Robert Boudreau, PhD Co-Director of Methodology Core PITT-Multidisciplinary Clinical."

Similar presentations


Ads by Google