1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 2 Exercise 11.1 Predict risk of malaria from age and log transformed antibody level using logistic regression First examine some plots Then fit the logistic regression Interpret the results Check model assumptions
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 3 > library(ISwR) Loading required package: survival Loading required package: splines > data(malaria) > summary(malaria) subject age ab mal Min. : 1.00 Min. : 3.00 Min. : 2.0 Min. :0.00 1st Qu.: st Qu.: st Qu.: st Qu.:0.00 Median : Median : 9.00 Median : Median :0.00 Mean : Mean : 8.86 Mean : Mean :0.27 3rd Qu.: rd Qu.: rd Qu.: rd Qu.:1.00 Max. : Max. :15.00 Max. : Max. :1.00 > attach(malaria) > plot(age,mal) > plot(log(ab),mal) > plot(age, log(ab))
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 4
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 5
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 6
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 7 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Call: glm(formula = mal ~ age + log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** age log(ab) *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 97 degrees of freedom AIC: Residual deviance shows no lack of fit
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 8 > summary(glm(mal ~ log(ab),binomial)) Call: glm(formula = mal ~ log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * log(ab) *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 98 degrees of freedom AIC: Number of Fisher Scoring iterations: 4
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 9 Exercise 11.2 The 'gvhd' data frame has 37 rows and 7 columns. It contains data from patients receiving a nondepleted allogenic bone marrow transplant, with the purpose of finding variables associated with the development of acute graft-versus-host disease.
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 10 pnr a numeric vector. Patient number. rcpage a numeric vector. Age of recipient (years). donage a numeric vector. Age of donor (years). type a numeric vector, type of leukaemia coded 1: AML, 2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia. preg a numeric vector code, indicating whether donor has been pregnant. 0: no, 1: yes. index a numeric vector giving an index of mixed epidermal cell-lymphocyte reactions. gvhd a numeric vector code, graft versus host disease. 0: no, 1: yes. time a numeric vector. Follow-up time dead a numeric vector code 0: no (censored), 1: yes
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 11 > summary(graft.vs.host) pnr rcpage donage type preg Min. : 1 Min. :13.00 Min. :14.00 Min. :1.000 Min. : st Qu.:10 1st Qu.: st Qu.: st Qu.: st Qu.: Median :19 Median :23.00 Median :23.00 Median :2.000 Median : Mean :19 Mean :25.43 Mean :25.81 Mean :1.973 Mean : rd Qu.:28 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :37 Max. :43.00 Max. :43.00 Max. :3.000 Max. : index gvhd time dead Min. : Min. : Min. : 41.0 Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median : Median : Median : Mean : Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. : Max. : Max. : > attach(graft.vs.host) > hist(index) > hist(sqrt(index)) > hist(log(index))
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 12
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 13
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 14
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 15 > summary(glm(gvhd ~ rcpage+donage+type+preg+log(index),binomial)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * rcpage donage type type preg log(index) * --- Null deviance: on 36 degrees of freedom Residual deviance: on 30 degrees of freedom AIC: Number of Fisher Scoring iterations: 6
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 16 > drop1(glm(gvhd ~ rcpage+donage+type+preg+log(index), binomial),test="Chisq") Single term deletions Model: gvhd ~ rcpage + donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) rcpage donage type preg log(index) *
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 17 > drop1(glm(gvhd ~ donage+type+preg+log(index),binomial), test="Chisq") Single term deletions Model: gvhd ~ donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) donage type preg log(index) * --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 18 > drop1(glm(gvhd ~ donage+preg+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + preg + log(index) Df Deviance AIC LRT Pr(Chi) donage * preg log(index) *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 19 > drop1(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + log(index) Df Deviance AIC LRT Pr(Chi) donage ** log(index) *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 20 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC: > summary(glm(gvhd ~ donage+sqrt(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * sqrt(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 21 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC: > summary(glm(gvhd ~ donage+index,binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * index ** Null deviance: on 36 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 22 Other Issues If we want to analyze time to GVHD, we could use the time and the censoring variable This is complex because someone who does not have GVHD may develop it later (one type of censoring) and cannot develop it if dead (another type) We have competing risks Some of these issues will be addressed next quarter
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 23 Exercise 11.3 The function confint() gives parameter confidence intervals for nonlinear models that are more accurate than the ones given by default This uses a profile likelihood technique, which varies the parameter until the change in likelihood/deviance is too big
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 24 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** age log(ab) *** (1.960)(.19552) = (-1.066, ) > library(MASS) > confint(mal.glm) Waiting for profiling to be done % 97.5 % (Intercept) age log(ab)
December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 25 > gvhd.glm <- glm(gvhd ~ donage+log(index),binomial) > summary(gvhd.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) ** donage * log(index) ** (1.960)( ) = (0.6296, ) (1.960)( ) = (0.0192, ) > confint(gvhd.glm) Waiting for profiling to be done % 97.5 % (Intercept) donage log(index)