Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.

Similar presentations


Presentation on theme: "Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II."— Presentation transcript:

1 Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II

2 More on our example > pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial) > summary(pros5.reg) Call: glm(formula = cap.inv ~ log(psa) + gleason, family = binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 *** log(psa) 0.4812 0.1448 3.323 0.000892 *** gleason 1.0229 0.1595 6.412 1.43e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 512.29 on 379 degrees of freedom Residual deviance: 403.90 on 377 degrees of freedom AIC: 409.9

3 Other covariates: Simple logistic models CovariateBetaexp(Beta)Z Age-0.00820.99-0.51 Race-0.0540.95-0.15 Vol-0.0140.99-2.26 Dig Exam (vs. no nodule) Unilobar left0.882.412.81 Unilobar right1.564.764.78 Bilobar2.108.175.44 Detection in RE1.715.534.48 LogPSA0.872.396.62 Gleason1.243.468.12

4 What is a good multiple regression model?  Principles of model building are analogous to linear regression  We use the same approach Look for significant covariates in simple models consider multicollinearity look for confounding (i.e. change in betas when a covariate is removed)

5 Multiple regression model proposal  Gleason, logPSA, Volume, Digital Exam result, detection in RE  But, what about collinearity? 5 choose 2 pairs. gleason log.psa. vol gleason 1.00 0.46 -0.06 log.psa. 0.46 1.00 0.05 vol -0.06 0.05 1.00

6 Categorical pairs > dpros.dcaps <- epitab(dpros, dcaps) > dpros.dcaps$tab Outcome Predictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 Outcome Predictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05 > fisher.test(table(dpros, dcaps)) Fisher's Exact Test for Count Data data: table(dpros, dcaps) p-value = 2.520e-05 alternative hypothesis: two.sided

7 Categorical vs. continuous  t-tests and anova: means by category > summary(lm(log(psa)~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 *** dcaps 0.8647 0.1632 5.300 1.97e-07 *** --- Residual standard error: 0.9868 on 378 degrees of freedom Multiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07 > summary(lm(log(psa)~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 *** factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 *** --- Residual standard error: 0.9871 on 376 degrees of freedom Multiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06

8 Categorical vs. continuous > summary(lm(vol~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 *** dcaps -6.362 3.022 -2.106 0.0359 * --- Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589 > summary(lm(vol~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 *** factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 --- Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164

9 Categorical vs. continuous > summary(lm(gleason~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 *** dcaps 1.0183 0.1730 5.885 8.78e-09 *** --- Residual standard error: 1.047 on 378 degrees of freedom Multiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09 > summary(lm(gleason~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 *** factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 *** --- Residual standard error: 1.055 on 376 degrees of freedom Multiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06

10 Lots of “correlation” between covariates  We should expect that there will be insignificance and confounding.  Still, try the ‘full model’ and see what happens

11 Full model results > mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial) > > summary(mreg) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 *** gleason 0.908424 0.166317 5.462 4.71e-08 *** log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612. dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 *** factor(dpros)4 1.384887 0.453127 3.056 0.00224 ** --- Null deviance: 511.26 on 378 degrees of freedom Residual deviance: 376.00 on 371 degrees of freedom (1 observation deleted due to missingness) AIC: 392

12 What next?  Drop or retain?  How to interpret?

13 Likelihood Ratio Test  Recall testing multiple coefficients in linear regression  Approach: ANOVA  We don’t have ANOVA for logistic  More general approach: Likelihood Ratio Test  Based on the likelihood (or log-likelihood) for “competing” nested models

14 Likelihood Ratio Test  Ho: small model  Ha: large model  Example:

15 Recall the likelihood function

16 Estimating the log-likelihood  Recall that we use the log-likelihood because it is simpler (back to linear regression)  MLEs: Betas are selected to maximize the likelihood Betas also maximize the log-likelihood If we plus the estimated betas, we get our ‘maximized’ log-likelihood for that model  We compare the log-likelihoods from competing (nested) models

17 Likelihood Ratio Test  LR statistic = G 2 = -2*(LogL(H0)-LogL(H1))  Under the null: G 2 ~ χ 2 (p-q)  If G 2 < χ 2 (p-q),1- α, conclude H0  If G 2 > χ 2 (p-q),1- α conclude H1

18 LRT in R  -2LogL = Residual Deviance  So, G 2 = Dev(0) - Dev(1)  Fit two models:

19 > mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros), + family=binomial) > mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial) > mreg1 Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743 Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness) Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1 > mreg0 Coefficients: (Intercept) gleason log(psa) vol -7.76759 0.99931 0.50406 -0.01583 Degrees of Freedom: 378 Total (i.e. Null); 375 Residual (1 observation deleted due to missingness) Null Deviance: 511.3 Residual Deviance: 399 AIC: 407

20 Testing DPROS  Dev(0) – Dev(1) =  p – q =  χ 2 (p-q),1- α, =  Conclusion?  p-value?

21 More in R qchisq(0.95,3) -2*(logLik(mreg0) - logLik(mreg1)) 1-pchisq(21.96, 3) > anova(mreg0, mreg1) Analysis of Deviance Table Model 1: cap.inv ~ gleason + log(psa) + vol Model 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros) Resid. Df Resid. Dev Df Deviance 1 375 399.02 2 372 377.06 3 21.96 >

22 Notes on LRT  Again, models have to be NESTED  For comparing models that are not nested, you need to use other approaches  Examples: AIC BIC DIC  Next time….

23 For next time, read the following article Low Diagnostic Yield of Elective Coronary Angiography Patel, Peterson, Dai et al. NEJM, 362(10). pp. 2886-95 March 11, 2010 http://content.nejm.org/cgi/content/short/362/10/886?ssource=mfv


Download ppt "Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II."

Similar presentations


Ads by Google