Download presentation
Presentation is loading. Please wait.
Published byJocelin Boone Modified over 9 years ago
1
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II
2
More on our example > pros5.reg <- glm(cap.inv ~ log(psa) + gleason, family=binomial) > summary(pros5.reg) Call: glm(formula = cap.inv ~ log(psa) + gleason, family = binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.1061 0.9916 -8.174 2.97e-16 *** log(psa) 0.4812 0.1448 3.323 0.000892 *** gleason 1.0229 0.1595 6.412 1.43e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 512.29 on 379 degrees of freedom Residual deviance: 403.90 on 377 degrees of freedom AIC: 409.9
3
Other covariates: Simple logistic models CovariateBetaexp(Beta)Z Age-0.00820.99-0.51 Race-0.0540.95-0.15 Vol-0.0140.99-2.26 Dig Exam (vs. no nodule) Unilobar left0.882.412.81 Unilobar right1.564.764.78 Bilobar2.108.175.44 Detection in RE1.715.534.48 LogPSA0.872.396.62 Gleason1.243.468.12
4
What is a good multiple regression model? Principles of model building are analogous to linear regression We use the same approach Look for significant covariates in simple models consider multicollinearity look for confounding (i.e. change in betas when a covariate is removed)
5
Multiple regression model proposal Gleason, logPSA, Volume, Digital Exam result, detection in RE But, what about collinearity? 5 choose 2 pairs. gleason log.psa. vol gleason 1.00 0.46 -0.06 log.psa. 0.46 1.00 0.05 vol -0.06 0.05 1.00
6
Categorical pairs > dpros.dcaps <- epitab(dpros, dcaps) > dpros.dcaps$tab Outcome Predictor 1 p0 2 p1 oddsratio lower upper 1 95 0.2802360 4 0.09756098 1.000000 NA NA 2 123 0.3628319 9 0.21951220 1.737805 0.5193327 5.815089 3 84 0.2477876 12 0.29268293 3.392857 1.0540422 10.921270 4 37 0.1091445 16 0.39024390 10.270270 3.2208157 32.748987 Outcome Predictor p.value 1 NA 2 4.050642e-01 3 3.777900e-02 4 1.271225e-05 > fisher.test(table(dpros, dcaps)) Fisher's Exact Test for Count Data data: table(dpros, dcaps) p-value = 2.520e-05 alternative hypothesis: two.sided
7
Categorical vs. continuous t-tests and anova: means by category > summary(lm(log(psa)~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.2506 0.1877 6.662 9.55e-11 *** dcaps 0.8647 0.1632 5.300 1.97e-07 *** --- Residual standard error: 0.9868 on 378 degrees of freedom Multiple R-squared: 0.06917, Adjusted R-squared: 0.06671 F-statistic: 28.09 on 1 and 378 DF, p-value: 1.974e-07 > summary(lm(log(psa)~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1418087 0.0992064 21.589 < 2e-16 *** factor(dpros)2 -0.1060634 0.1312377 -0.808 0.419 factor(dpros)3 0.0001465 0.1413909 0.001 0.999 factor(dpros)4 0.7431101 0.1680055 4.423 1.28e-05 *** --- Residual standard error: 0.9871 on 376 degrees of freedom Multiple R-squared: 0.07348, Adjusted R-squared: 0.06609 F-statistic: 9.94 on 3 and 376 DF, p-value: 2.547e-06
8
Categorical vs. continuous > summary(lm(vol~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 22.905 3.477 6.587 1.51e-10 *** dcaps -6.362 3.022 -2.106 0.0359 * --- Residual standard error: 18.27 on 377 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.01162, Adjusted R-squared: 0.009003 F-statistic: 4.434 on 1 and 377 DF, p-value: 0.03589 > summary(lm(vol~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.417 1.858 9.374 <2e-16 *** factor(dpros)2 -1.638 2.453 -0.668 0.505 factor(dpros)3 -1.976 2.641 -0.748 0.455 factor(dpros)4 -3.513 3.136 -1.120 0.263 --- Residual standard error: 18.39 on 375 degrees of freedom (1 observation deleted due to missingness) Multiple R-squared: 0.003598, Adjusted R-squared: -0.004373 F-statistic: 0.4514 on 3 and 375 DF, p-value: 0.7164
9
Categorical vs. continuous > summary(lm(gleason~dcaps)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.2560 0.1991 26.401 < 2e-16 *** dcaps 1.0183 0.1730 5.885 8.78e-09 *** --- Residual standard error: 1.047 on 378 degrees of freedom Multiple R-squared: 0.08394, Adjusted R-squared: 0.08151 F-statistic: 34.63 on 1 and 378 DF, p-value: 8.776e-09 > summary(lm(gleason~factor(dpros))) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.9798 0.1060 56.402 < 2e-16 *** factor(dpros)2 0.4217 0.1403 3.007 0.00282 ** factor(dpros)3 0.4890 0.1511 3.236 0.00132 ** factor(dpros)4 0.9636 0.1795 5.367 1.40e-07 *** --- Residual standard error: 1.055 on 376 degrees of freedom Multiple R-squared: 0.07411, Adjusted R-squared: 0.06672 F-statistic: 10.03 on 3 and 376 DF, p-value: 2.251e-06
10
Lots of “correlation” between covariates We should expect that there will be insignificance and confounding. Still, try the ‘full model’ and see what happens
11
Full model results > mreg <- glm(cap.inv ~ gleason + log(psa) + vol + dcaps + factor(dpros), family=binomial) > > summary(mreg) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.617036 1.102909 -7.813 5.58e-15 *** gleason 0.908424 0.166317 5.462 4.71e-08 *** log(psa) 0.514200 0.156739 3.281 0.00104 ** vol -0.014171 0.007712 -1.838 0.06612. dcaps 0.464952 0.456868 1.018 0.30882 factor(dpros)2 0.753759 0.355762 2.119 0.03411 * factor(dpros)3 1.517838 0.372366 4.076 4.58e-05 *** factor(dpros)4 1.384887 0.453127 3.056 0.00224 ** --- Null deviance: 511.26 on 378 degrees of freedom Residual deviance: 376.00 on 371 degrees of freedom (1 observation deleted due to missingness) AIC: 392
12
What next? Drop or retain? How to interpret?
13
Likelihood Ratio Test Recall testing multiple coefficients in linear regression Approach: ANOVA We don’t have ANOVA for logistic More general approach: Likelihood Ratio Test Based on the likelihood (or log-likelihood) for “competing” nested models
14
Likelihood Ratio Test Ho: small model Ha: large model Example:
15
Recall the likelihood function
16
Estimating the log-likelihood Recall that we use the log-likelihood because it is simpler (back to linear regression) MLEs: Betas are selected to maximize the likelihood Betas also maximize the log-likelihood If we plus the estimated betas, we get our ‘maximized’ log-likelihood for that model We compare the log-likelihoods from competing (nested) models
17
Likelihood Ratio Test LR statistic = G 2 = -2*(LogL(H0)-LogL(H1)) Under the null: G 2 ~ χ 2 (p-q) If G 2 < χ 2 (p-q),1- α, conclude H0 If G 2 > χ 2 (p-q),1- α conclude H1
18
LRT in R -2LogL = Residual Deviance So, G 2 = Dev(0) - Dev(1) Fit two models:
19
> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros), + family=binomial) > mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial) > mreg1 Coefficients: (Intercept) gleason log(psa) vol -8.31383 0.93147 0.53422 -0.01507 factor(dpros)2 factor(dpros)3 factor(dpros)4 0.76840 1.55109 1.44743 Degrees of Freedom: 378 Total (i.e. Null); 372 Residual (1 observation deleted due to missingness) Null Deviance: 511.3 Residual Deviance: 377.1 AIC: 391.1 > mreg0 Coefficients: (Intercept) gleason log(psa) vol -7.76759 0.99931 0.50406 -0.01583 Degrees of Freedom: 378 Total (i.e. Null); 375 Residual (1 observation deleted due to missingness) Null Deviance: 511.3 Residual Deviance: 399 AIC: 407
20
Testing DPROS Dev(0) – Dev(1) = p – q = χ 2 (p-q),1- α, = Conclusion? p-value?
21
More in R qchisq(0.95,3) -2*(logLik(mreg0) - logLik(mreg1)) 1-pchisq(21.96, 3) > anova(mreg0, mreg1) Analysis of Deviance Table Model 1: cap.inv ~ gleason + log(psa) + vol Model 2: cap.inv ~ gleason + log(psa) + vol + factor(dpros) Resid. Df Resid. Dev Df Deviance 1 375 399.02 2 372 377.06 3 21.96 >
22
Notes on LRT Again, models have to be NESTED For comparing models that are not nested, you need to use other approaches Examples: AIC BIC DIC Next time….
23
For next time, read the following article Low Diagnostic Yield of Elective Coronary Angiography Patel, Peterson, Dai et al. NEJM, 362(10). pp. 2886-95 March 11, 2010 http://content.nejm.org/cgi/content/short/362/10/886?ssource=mfv
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.