Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.

Similar presentations


Presentation on theme: "1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data."— Presentation transcript:

1 1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data

2 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 2 Exercise 11.1 Predict risk of malaria from age and log transformed antibody level using logistic regression First examine some plots Then fit the logistic regression Interpret the results Check model assumptions

3 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 3 > library(ISwR) Loading required package: survival Loading required package: splines > data(malaria) > summary(malaria) subject age ab mal Min. : 1.00 Min. : 3.00 Min. : 2.0 Min. :0.00 1st Qu.: 25.75 1st Qu.: 5.75 1st Qu.: 29.0 1st Qu.:0.00 Median : 50.50 Median : 9.00 Median : 111.0 Median :0.00 Mean : 50.50 Mean : 8.86 Mean : 311.5 Mean :0.27 3rd Qu.: 75.25 3rd Qu.:12.00 3rd Qu.: 373.8 3rd Qu.:1.00 Max. :100.00 Max. :15.00 Max. :2066.0 Max. :1.00 > attach(malaria) > plot(age,mal) > plot(log(ab),mal) > plot(age, log(ab))

4 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 4

5 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 5

6 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 6

7 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 7 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Call: glm(formula = mal ~ age + log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.8492 -0.7536 -0.4838 0.8809 2.5796 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.652 on 99 degrees of freedom Residual deviance: 98.017 on 97 degrees of freedom AIC: 104.02 Residual deviance shows no lack of fit

8 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 8 > summary(glm(mal ~ log(ab),binomial)) Call: glm(formula = mal ~ log(ab), family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.9159 -0.7339 -0.4854 0.8813 2.4722 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.1552 0.8401 2.565 0.010305 * log(ab) -0.7122 0.1932 -3.686 0.000228 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 116.652 on 99 degrees of freedom Residual deviance: 98.968 on 98 degrees of freedom AIC: 102.97 Number of Fisher Scoring iterations: 4

9 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 9 Exercise 11.2 The 'gvhd' data frame has 37 rows and 7 columns. It contains data from patients receiving a nondepleted allogenic bone marrow transplant, with the purpose of finding variables associated with the development of acute graft-versus-host disease.

10 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 10 pnr a numeric vector. Patient number. rcpage a numeric vector. Age of recipient (years). donage a numeric vector. Age of donor (years). type a numeric vector, type of leukaemia coded 1: AML, 2: ALL, 3: CML for acute myeloid, acute lymphatic, and chronic myeloid leukaemia. preg a numeric vector code, indicating whether donor has been pregnant. 0: no, 1: yes. index a numeric vector giving an index of mixed epidermal cell-lymphocyte reactions. gvhd a numeric vector code, graft versus host disease. 0: no, 1: yes. time a numeric vector. Follow-up time dead a numeric vector code 0: no (censored), 1: yes

11 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 11 > summary(graft.vs.host) pnr rcpage donage type preg Min. : 1 Min. :13.00 Min. :14.00 Min. :1.000 Min. :0.0000 1st Qu.:10 1st Qu.:20.00 1st Qu.:20.00 1st Qu.:1.000 1st Qu.:0.0000 Median :19 Median :23.00 Median :23.00 Median :2.000 Median :0.0000 Mean :19 Mean :25.43 Mean :25.81 Mean :1.973 Mean :0.2703 3rd Qu.:28 3rd Qu.:29.00 3rd Qu.:34.00 3rd Qu.:3.000 3rd Qu.:1.0000 Max. :37 Max. :43.00 Max. :43.00 Max. :3.000 Max. :1.0000 index gvhd time dead Min. : 0.270 Min. :0.0000 Min. : 41.0 Min. :0.0000 1st Qu.: 0.920 1st Qu.:0.0000 1st Qu.: 177.0 1st Qu.:0.0000 Median : 2.010 Median :0.0000 Median : 667.0 Median :0.0000 Mean : 2.556 Mean :0.4595 Mean : 669.8 Mean :0.4865 3rd Qu.: 3.730 3rd Qu.:1.0000 3rd Qu.:1105.0 3rd Qu.:1.0000 Max. :10.110 Max. :1.0000 Max. :1504.0 Max. :1.0000 > attach(graft.vs.host) > hist(index) > hist(sqrt(index)) > hist(log(index))

12 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 12

13 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 13

14 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 14

15 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 15 > summary(glm(gvhd ~ rcpage+donage+type+preg+log(index),binomial)) Deviance Residuals: Min 1Q Median 3Q Max -2.0007 -0.3646 -0.0956 0.5796 1.6828 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.47420 2.59647 -2.108 0.0350 * rcpage 0.01597 0.08205 0.195 0.8457 donage 0.10724 0.08262 1.298 0.1943 type2 -0.42824 1.25389 -0.342 0.7327 type3 1.61517 1.30743 1.235 0.2167 preg 1.66236 1.18103 1.408 0.1593 log(index) 1.81874 0.88919 2.045 0.0408 * --- Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 26.252 on 30 degrees of freedom AIC: 40.252 Number of Fisher Scoring iterations: 6

16 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 16 > drop1(glm(gvhd ~ rcpage+donage+type+preg+log(index), binomial),test="Chisq") Single term deletions Model: gvhd ~ rcpage + donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) 26.252 40.252 rcpage 1 26.291 38.291 0.038 0.84495 donage 1 28.082 40.082 1.829 0.17622 type 2 28.959 38.959 2.706 0.25841 preg 1 28.447 40.447 2.195 0.13849 log(index) 1 32.343 44.343 6.090 0.01359 *

17 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 17 > drop1(glm(gvhd ~ donage+type+preg+log(index),binomial), test="Chisq") Single term deletions Model: gvhd ~ donage + type + preg + log(index) Df Deviance AIC LRT Pr(Chi) 26.291 38.291 donage 1 28.832 38.832 2.541 0.11092 type 2 29.403 37.403 3.113 0.21092 preg 1 28.812 38.812 2.522 0.11228 log(index) 1 32.630 42.630 6.340 0.01181 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

18 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 18 > drop1(glm(gvhd ~ donage+preg+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + preg + log(index) Df Deviance AIC LRT Pr(Chi) 29.403 37.403 donage 1 33.654 39.654 4.251 0.0392375 * preg 1 31.068 37.068 1.665 0.1968922 log(index) 1 41.723 47.723 12.320 0.0004482 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

19 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 19 > drop1(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Single term deletions Model: gvhd ~ donage + log(index) Df Deviance AIC LRT Pr(Chi) 31.068 37.068 donage 1 37.740 41.740 6.671 0.0097993 ** log(index) 1 45.476 49.476 14.407 0.0001473 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

20 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 20 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.068 on 34 degrees of freedom AIC: 37.068 > summary(glm(gvhd ~ donage+sqrt(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.28968 2.77653 -2.986 0.00283 ** donage 0.14353 0.06218 2.308 0.02100 * sqrt(index) 2.96830 1.08381 2.739 0.00617 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.591 on 34 degrees of freedom AIC: 37.591

21 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 21 > summary(glm(gvhd ~ donage+log(index),binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 31.068 on 34 degrees of freedom AIC: 37.068 > summary(glm(gvhd ~ donage+index,binomial),test="Chisq") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.08600 2.10453 -2.892 0.00383 ** donage 0.14075 0.06002 2.345 0.01902 * index 0.94439 0.35914 2.630 0.00855 ** Null deviance: 51.049 on 36 degrees of freedom Residual deviance: 32.362 on 34 degrees of freedom AIC: 38.362

22 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 22 Other Issues If we want to analyze time to GVHD, we could use the time and the censoring variable This is complex because someone who does not have GVHD may develop it later (one type of censoring) and cannot develop it if dead (another type) We have competing risks Some of these issues will be addressed next quarter

23 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 23 Exercise 11.3 The function confint() gives parameter confidence intervals for nonlinear models that are more accurate than the ones given by default This uses a profile likelihood technique, which varies the parameter until the change in likelihood/deviance is too big

24 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 24 > mal.glm <- glm(mal ~ age+log(ab),binomial) > summary(mal.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.57234 0.95184 2.702 0.006883 ** age -0.06546 0.06772 -0.967 0.333703 log(ab) -0.68235 0.19552 -3.490 0.000483 *** -.68235 +- (1.960)(.19552) = (-1.066, -.2991) > library(MASS) > confint(mal.glm) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 0.7811646 4.54544887 age -0.2024561 0.06556186 log(ab) -1.0985746 -0.32397509

25 December 1, 2004EPP 245 Statistical Analysis of Laboratory Data 25 > gvhd.glm <- glm(gvhd ~ donage+log(index),binomial) > summary(gvhd.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.45399 2.08147 -2.620 0.00879 ** donage 0.14594 0.06465 2.257 0.02399 * log(index) 2.17773 0.78986 2.757 0.00583 ** 2.17773 +- (1.960)(0.78986) = (0.6296, 3.7259) 0.14594 +- (1.960)(0.06465) = (0.0192, 0.2727) > confint(gvhd.glm) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) -10.38957922 -1.9553765 donage 0.03262422 0.2946919 log(index) 0.89230108 4.0712645


Download ppt "1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data."

Similar presentations


Ads by Google