Download presentation
Presentation is loading. Please wait.
1
Lecture 17: Regression Diagnostics II
Residuals
2
Regression Diagnostics
We’ve examined how to check the proportional hazards assumption Graphical approaches Regression approaches Time-varying covariates Schoenfeld residual test Once we have verified that this assumption is met, we also want to examine Model goodness of fit Functional form of covariates Outliers or influential points
3
Using Residuals for Diagnostics
Several types Schoenfeld residuals Cox-Snell residuals Martingale residuals Deviance residuals (similar to Martingale) What are residuals? Linear regression: What is the interpretation of a residual in Cox regression? Not necessarily the same
4
Cox-Snell “residuals”
Where H0(Tj) = baseline cumulative hazard at time Tj Zjk is kth covariate value for the jth person bk is the coefficient estimate for the kth covariate Interpretation The expected number of events for each observation Think of as expected counts (not residuals) Theory If model fits, rj’s should look like a censored sample from a unit exponential distribution (i.e. l = 1) That is, deviations from expected should be small
5
Why? Some Theory Behind This…
Assume X has survival distribution 𝑆 𝑋 𝑋 Recall the definitions
6
Why? Some Theory Behind This…
If we define, Y = H(X), as cumulative hazard of X Then Thus Y ~ exp(1) regardless of the distribution of X And so HX(X) should look ~ exp(1)
7
Cox-Snell Residuals How do we use these residuals in linear regression? Assess: Model fit Model assumptions Shape of covariates What should we compare these Cox-Snell residuals to?
8
Empirical vs. Fitted rj = Cox-Snell residuals
CS residuals are always >0 Hcs(t) = Nelson-Aalen H(t) = empirical cumulative hazard Estimate by fitting cox model with residuals as time variable and dj as event indicator If model fits/obeys, i.e. assumptions/covariates are appropriately modeled then: Plot of rj’s vs. Hcs(t) should be…
9
Implementation Get Cox-Snell residuals
Get linear predictor(s) Zb Get baseline cumulative hazard Multiply OR… get them from R Get cumulative hazard estimates: Estimate SNA(t) using KM approach with event time = rj and event indicator = dj Transform to Hcs(t) scale
10
Getting Residuals #fit regression reg<-coxph(st~dx+fab+ttrans+mtx+dnage+ptage+dnage*ptage, method="breslow") #get cox-snell residuals par(mfrow=c(1, 2)) cs.res<-event-reg$residuals #Plot of residuals vs. cum hazard fitres<-survfit(coxph(Surv(cs.res, bmt$Either)~1, method="breslow"), type="aalen") plot(fitres$time,-log(fitres$surv), type="s", xlab="Cox-Snell Residuals", ylab="Estimated Cumulative Hazard Function", lwd=2) abline(0, 1, col=2, lwd=2) ## Alternatively, use reg2<-survfit(Surv(cs.res, bmt$Either)~1) Htilde<-cumsum(reg2$n.event/reg2$n.risk) plot(reg2$time, Htilde, type="s", xlab="Cox-Snell Residuals", ylab="Estimated Cumulative Hazard Function", lwd=2, col=4)
11
Diagnostic Plots
12
What about MTX? Recall MTX did not meet the proportional hazards assumption Let’s look at each MTX group to see how this effects fit Also look at MTX stratified model
13
R Code: MTX Groups cs.res0<-cs.res[which(bmt$MTX==0)]; event0<-event[which(bmt$MTX==0)] cs.res1<-cs.res[which(bmt$MTX==1)]; event1<-event[which(bmt$MTX==1)] fitres0<-survfit(coxph(Surv(cs.res0,event0)~1,method="breslow"),type="aalen") fitres1<-survfit(coxph(Surv(cs.res1,event1)~1,method="breslow"),type="aalen") plot(fitres0$time,-log(fitres0$surv),type="s",xlab="Cox-Snell Residuals", ylab="Estimated Cumulative Hazard Function", col=3, lwd=2, lty=2, xlim=c(0, 3), ylim=c(0, 3)) lines(fitres1$time,-log(fitres1$surv),type="s", col=4, lwd=2, lty=2) abline(0,1,col=2, lwd=2) legend(2, .5, c("MTX=0","MTX=1"), col=3:4, lty=2, lwd=2, bty="n")
14
R: MTX Groups
15
R Code: MTX Stratified st<-Surv(dfs, event) reg.strat<-coxph(st~dx+fab+ttrans+dnage+ptage+dnage*ptage+strata(mtx), method="breslow") cs.strat<-event-reg.strat$resid cs.strat0<-cs.strat[which(bmt$MTX==0)] cs.strat1<-cs.strat[which(bmt$MTX==1)] fitres.strat0<-survfit(coxph(Surv(cs.strat0,event0)~1,method="breslow"),type="aalen") fitres.strat1<-survfit(coxph(Surv(cs.strat1,event1)~1,method="breslow"),type="aalen") plot(fitres.strat0$time,-log(fitres.strat0$surv), type="s", xlab="Cox-Snell Residuals", ylab="Estimated Cumulative Hazard Function", lwd=2, xlim=c(0, 3), ylim=c(0, 3)) lines(fitres.strat1$time, -log(fitres.strat1$surv), type="s", col=5, lwd=2) abline(0,1,col=2, lwd=2) legend(2, .5, c("MTX=0","MTX=1"), col=c(1,5), lwd=2, bty="n")
16
MTX Stratified Model
17
Alternative Plots for CS Residuals
There are alternative plots you can consider but they tend to require larger N We can plot: CS residuals vs. Exp(1) CS residuals vs. CS-NA cumulative hazard estimate Let’s consider a larger dataset…
18
Large Data Set Study examining factors that impact the time until first-time mother’s weaned their infants Data includes information on 927 mothers Variables in the data include Race (white, black, other) Mother in poverty Smoking at childbirth Alcohol use at child birth Age of the mother Years of school Prenatal care after the 3rd month
19
Model > data(bfeed) > mod<-coxph(Surv(duration, delta)~factor(race)+poverty+smoke+alcohol+agemth+yschool+pc3mth, data=bfeed) > mod Call: coxph(formula = Surv(duration, delta) ~ factor(race) + poverty + smoke + yschool + pc3mth, data = bfeed) coef exp(coef) se(coef) z p factor(race) factor(race) poverty smoke alcohol yschool Likelihood ratio test=29.7 on 6 df, p=4.51e-05 n= 927, number of events= 892
20
CS As Step Function vs. 45o Line
###Obtaining CS residuals cs.resid<-bfeed$delta-mod$resid ### Fitting Cox model for residuals ### Compare to 450 line fitres<-survfit(coxph(Surv(cs.resid,bfeed$delta)~ 1,method="breslow"),type="aalen") plot(fitres$time, -log(fitres$surv), type="s",xlab="Cox-Snell Residuals", ylab="Estimated Cumulative Hazard Function", lwd=2, ylim=c(0, 8)) abline(0,1,col=2, lwd=2)
21
Comparison to Exp(1) ###Obtaining CS residuals cs.resid<-bfeed$delta-mod$resid ### Also compare to exponential efit<-survfit(Surv(duration, delta)~1, data=bfeed) exp1<-rexp(10000, 1) plot(density(exp1), lwd=2, col=1) lines(density(cs.resid), col=2, lwd=2, lty=2)
22
Comparing CS and NA ###Comparing NA to CS NAe<--log(efit$surv) cs<-cbind(cs.resid, bfeed$duration) na<-cbind(NAe, efit$time) all<-merge(cs, na, by=2, all=T) cs<-all$cs.resid[-927] na<-all$NAe[-927] plot(cs, na, pch=16, xlab="Cox-Snell", ylab="Cox-Snell - Nelson-Aalen") abline(0,1, col=2, lwd=2) fit1<-lm(na~cs) abline(fit1, lwd=2, col=3)
23
Problem with Cox-Snell Approach
CS residuals can diagnose that the model does not fit But they don’t help figure out why or where Note, overall pattern can be helpful (e.g. CS > NA or vice versa) Martingale residuals are better
24
Martingale Residuals When model is correct, E(Mj) = 0
Range between -∞ and 1 Difference over time between observed and expected number of events Mj tends to be negative if estimated cumulative hazard is too large Mj tends to be positive if estimated cumulative hazard is too small
25
Martingale Residuals Average martingale can be computed for different values of a covariate Or range of covariate values Determines if Mjs tend to be positive or negative in the range Helps to find improper specification of effect of covariate on hazard
26
Use of Martingale Residuals
To examine best functional form of the given covariate Approach: Assume optimal model is: Fit model with only Z* Save Martingale residuals from Z* model Plot Martingale residuals versus Z1 Use smoother to help find best transformation Only works on continuous or ordinal variables!
27
Example ###Martingale Residuals- NHL vs HOD in BMT data(hodg) fit1<-coxph(Surv(time, delta)~factor(dtype)+factor(gtype)+score+factor(dtype)*factor(gtype), data=hodg) res1<-fit1$resid fit2<-coxph(Surv(time, delta)~factor(dtype)+factor(gtype)+wtime+factor(dtype)*factor(gtype), res2<-fit2$resid par(mfrow=c(1,2)) plot(bmt2$wait, res1, xlab="Waiting Time (months)", ylab="Martingale Residuals", pch=16) lines(lowess(bmt2$wait, res),col=2, lwd=2) lines(bmt2$wait, lm(res1~bmt2$wait), col=4, lwd=2) plot(bmt2$kar, res2, xlab="Karnofsky Score", ylab="Martingale Residuals", pch=16) lines(lowess(bmt2$karn, res),col=2, lwd=2) lines(bmt2$karn, lm(res1~bmt2$wait), col=4, lwd=2)
28
Martingale Plots
29
Model with “Inappropriate” Waiting Time
> fit<-coxph(Surv(time, cens)~wait+factor(dis)+factor(graft)+karn+ factor(dis)*factor(graft), data=bmt2) > summary(fit) Call: coxph(formula = Surv(time, cens) ~ wait + factor(dis) + factor(graft) + karn + factor(dis) * factor(graft), data = bmt2) n= 43, number of events= 26 coef exp(coef) se(coef) z Pr(>|z|) wait factor(dis) ** factor(graft) karn e-05 *** dis*graft *
30
How Do We Find Best Cutpoint?
Want cutoff that gives largest difference between individuals in the two data-defined groups Clinically chosen value (i.e. what do clinicians find meaningful? Choose based on data (often good choice) Contal and O’Quigley Keep in mind this may bias the model towards inclusion of covariate
31
Outcome-Oriented Choice
Contal and O’Quigley steps Identify possible unique cut points Construct dichotomized predictor for all cut points Conduct log-rank test for each dichotomized version of the variable Choose cutoff based on largest log-rank statistic Based on this procedure, waiting time of 84 months is “best” cut point
32
Model with Dichotomized Waiting Time
> #Model with dichotomized waiting time > iwait<-ifelse(84<=bmt2$wait, 1, 0) > fit.d<-coxph(Surv(time, cens)~iwait+factor(dis)+factor(graft)+karn+ + factor(dis)*factor(graft), data=bmt2) > summary(fit.d) Call: coxph(formula = Surv(time, cens) ~ iwait + factor(dis) + factor(graft) + karn + factor(dis) * factor(graft), data = bmt2) n= 43, number of events= 26 coef exp(coef) se(coef) z Pr(>|z|) iwait * factor(dis) ** factor(graft) karn e-06 *** dis:graft * *NOTE: we can not use the p-value for our waiting time indicator. We should adjust for multiple comparisons because we consider MANY cut points for waiting time (pg. 273 in text). -Here the adjusted p-value is 0.679
33
What About Other Transformations
Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver (1974 to 1984) 312 PCB randomized to placebo or D-penicillamine Clinical, biochemical, and histologic measures also collected Goal: develop natural history model (ignoring treatment) to determine how baseline status impacts survival
34
PCB Survival
35
PBC Example Covariates of interest
Age Albumin Prothrombin time (i.e. clotting time) Presence of edema Serum bilirubin (mg/dL) Edema is a factor variable and is used “as is” What about the other variables?
36
Where to Start Fit a model with age and edema
2. Get the martingale residuals from this fit 3. Plot the martingale residuals -vs. age -vs. albumin -vs. bilirubin -vs. prothrombin time 4. Check possible transformations where necessary
37
Age and Albumin
38
Albumin
39
Bilirubin
40
The Problem Child… Clotting Time
41
The Problem Child… Clotting Time
Log transformation is a good first guess but it doesn’t always work Deviations in the plot don’t necessarily lead us easily to the best functional form There are many we can try Z×log(Z) exp{Z} Power transformations (think Box-Cox) So let’s “explore” a little
42
Try Z×lnZ and eZ
43
Power Transformations?
44
Power Transformations?
45
The Point? Sometimes it is difficult to find a good transformation
Choose among the set of possibilities Is one transformation more interpretable? Does a particular transformation make clinical sense? Add log(bilirubin) and log(albumin) to the model with age and edema to see if this helps
46
Model and Residuals > ### Model including bilirubin and albumin > fit<-coxph(Surv(time, status)~age + factor(edema)+log(bili)+log(albumin)) > summary(fit) Call: coxph(formula = Surv(time, status) ~ age + factor(edema) + log(bili) + log(albumin)) n= 393, number of events= 161 coef exp(coef) se(coef) z Pr(>|z|) age e-06 *** factor(edema) factor(edema) *** log(bili) < 2e-16 *** log(albumin) e-05 *** > res1<-fit$resid
47
Looking Again at Clotting Time
### Model with JUST age and edema fit<-coxph(Surv(time, status)~age + factor(edema)) plot(protime, res1, xlab="Clot Time", ylab="Martingale Residuals", pch=16, main="Model w/ Age & Edema") lines(lowess(protime, res1), col=2, lwd=2) lines(protime, fitted(lm(res1~protime)), col=4, lwd=4) ### Model including bilirubin and albumin fit<-coxph(Surv(time, status)~age + factor(edema)+log(bili)+log(albumin)) res2<-resid(fit, type="martingale") plot(protime, res2, xlab="Clotting Time", ylab="Martingale Residuals", pch=16, main="Model with 4 covariates") lines(lowess(protime, res2), col=2, lwd=2) lines(protime, fitted(lm(res2~protime)), col=4, lwd=4)
48
Compare Residual Plots
49
Transformations in the Models?
50
Transformations in the Models?
51
Transformations in the Models?
52
What to Conclude? Transformations better but still not great
> pt1.5<- (protime)^(-1.5) > fit<-coxph(Surv(time, status)~age + factor(edema)+log(bili)+log(albumin)+pt1.5) > fit Call: coxph(formula = Surv(time, status) ~ age + factor(edema) + log(bili) + log(albumin) + pt1.5) coef exp(coef) se(coef) z p age e factor(edema) e factor(edema) e log(bili) e < 2e-16 log(albumin) e e-05 pt e Likelihood ratio test=229 on 6 df, p=0 n= 418, number of events= 186
53
Concerns with Martingale Residuals?
One problem with Martingale residuals… they tend to be asymmetric Range from -∞ to 1 These are therefore best used to assess covariate form, NOT general goodness of fit. Also note, there is susceptibility to overfitting when playing around with functional form
54
Outliers Defined in survival as
an unusual observed failure time given the covariate value Zj Martingale residuals do measure the degree to which the jth subject is an outlier BUT as we mentioned the distribution is heavily skewed Makes it hard to identify outliers
55
Deviance Residuals Deviance residuals are transformation of Martingale residuals Better behaved than Martingale residuals More like ~ N(0,1) Helpful for determining outliers Negative for survival times that are smaller than expected
56
Deviance vs. Martingale Residuals
Deviance residuals have shorter left and longer right tails Distribution more closely resembles ~N(0,1) Because deviance residuals ~N, we can think of outliers as values outside the range (-3, 3) More conservative? (-2.5, 2.5)
57
Compare to ~ N(0, 1) #################################### ### DEVIANCE RESIDUALS ### fit2<-coxph(Surv(time, delta) ~ factor(dtype) + factor(gtype) + score + wtime + factor(dtype)*factor(gtype), data=hodg) #Comparing Density of Martingale and deviance residuals to ~N(0, 1) par(mfrow=c(1,2)) mart.res<-resid(fit2, type="martingale") plot(density(mart.res), main="Martingale Residuals", lwd=2) lines(seq(-3,3,0.1), dnorm(seq(-3,3,0.1)), col=2, lwd=2) dev.res<-resid(fit2, type="deviance") plot(density(dev.res), main="Deviance Redisduals", lwd=2, ylim=c(0, 0.4))
58
Compare to ~ N(0, 1)
59
Martingale vs. Deviance Residuals
#### Compare deviance to martingale residuals par(mfrow=c(2,2)) fit1<-coxph(Surv(time, delta)~factor(dtype)+factor(gtype)+score+factor(dtype)*factor(gtype), data=hodg) mart.res<-resid(fit1,type="martingale") plot(hodg$wtime, mart.res, xlab="Time to Transplant (months)", ylab="Martingale Residuals", pch=16) lines(lowess(hodg$wtime, mart.res),col=2, lwd=2) lines(hodg$wtime, fitted(lm(mart.res~hodg$wtime)), col=4, lwd=2) dev.res<-resid(fit1,type="deviance") plot(hodg$wtime, dev.res, xlab="Time to Transplant (months)", ylab="Deviance Residuals", pch=16) lines(lowess(hodg$wtime, dev.res),col=2, lwd=2) lines(hodg$wtime, fitted(lm(dev.res~hodg$wtime)), col=4, lwd=2) fit2<-coxph(Surv(time, delta)~factor(dtype)+factor(gtype)+wtime+factor(dtype)*factor(gtype), data=hodg) mart.res<-resid(fit2,type="martingale") plot(hodg$score, mart.res, xlab="Karnofsky Score", ylab="Martingale Residuals", pch=16) lines(lowess(hodg$score, mart.res),col=2, lwd=2) lines(hodg$score, fitted(lm(mart.res~hodg$score)), col=4, lwd=2) dev.res<-resid(fit2,type="deviance") plot(hodg$score, dev.res, xlab="Karnofsky Score", ylab="Deviance Residuals", pch=16) lines(lowess(hodg$score, dev.res),col=2, lwd=2) lines(hodg$score, fitted(lm(dev.res~hodg$score)), col=4, lwd=2)
61
Back to Outliers In order to uses our deviance residuals to determine potential outliers Plot Dj versus the risk score, Again, anything outside of (-3, 3) or even more conservative…
62
R Code > fit<-coxph(Surv(time, status) ~ age + factor(edema) + log(bili) + log(albumin)) > fit Call: coxph(formula = Surv(time, status) ~ age + factor(edema) + log(bili) + log(albumin)) coef exp(coef) se(coef) z p age factor(edema) factor(edema) log(bili) < 2e-16 log(albumin) e-05 Likelihood ratio test=190 on 5 df, p=0, n= 312, number of events= 144 > dev.res<-resid(fit, type="deviance") > lp<-predict(fit, type="lp") > plot(lp, dev.res, xlab=“Risk Score", ylab="Deviance Residual", pch=16) > abline(h=c(-2.5, 2.5), col="red", lwd=2) > abline(h=c(-3, 3), col=4, lwd=2)
63
Outlier Plot
64
Investigating Outliers
> summary(dev.res) Min. 1st Qu. Median Mean 3rd Qu. Max > summary(cbind(time, status, age, log(albumin), log(bili), edema)) time status age log(albumin) Min. : 41 Min. :0.000 Min. :26.28 Min. : st Qu.:1093 1st Qu.: st Qu.: st Qu.: Median :1730 Median :0.000 Median :51.00 Median : Mean :1918 Mean :0.445 Mean :50.74 Mean : rd Qu.:2614 3rd Qu.: rd Qu.: rd Qu.: Max. :4795 Max. :1.000 Max. :78.44 Max. : log(bili) edema Min. : Min. : st Qu.: st Qu.:1.000 Median : Median :1.000 Mean : Mean : rd Qu.: rd Qu.:1.000 Max. : Max. :3.000
65
Investigating Outliers
> cbind(dev.res, cbind(time, status, age, log(albumin), log(bili), edema)) [abs(dev.res) >= 2.5,] dev.res time status age log(alb) log(bili) edema
66
Caveat with Deviance Residuals
As we’ve seen, deviance residuals can be helpful for identifying outliers However, given that we are assuming a normal approximation for our residuals, we need to think about sample size In data with a large number of censored observations (>25%), deviance residuals will tend to be too large.
67
Influence Consider only fixed-time covariates High leverage
An unusual observation with respect to the covariate vector Zi High influence An observation for which the combination Degree to which it is an outlier And its leverage = strong influence on estimates of b
68
Delta-Betas Let be the estimate of from all the data
Let be the estimate of from data with the ith subject removed Then the delta-beta is This is a measure the influence for the ith subject on the estimate of
69
Delta-Betas However, this is computationally intensive
Fit model n times There is an approximation that uses score residuals and the estimated variance-covariance matrix to calculate Each subject has one for each covariate in the model
70
Assessing Influence > ### A look at delta-betas for influential points > fit<-coxph(Surv(time, status)~age + factor(edema)+log(bili)+log(albumin)) > dfbeta<-residuals(fit, type="dfbeta") > colnames(dfbeta)<-names(fit$coef) > head(round(dfbeta, 5)) age edema=0.5 edema=1 log(bili) log(albumin)
71
Influence Plots >plot(pbc$id[-ids], dfbeta[,4], xlab="Patient ID", ylab="log(bilirubin) delta-beta", pch=16) > pbc[ dfbeta[,"log(bili)"] < -.029, c(1,2,3,5,10,11,13)] id time status age edema log(bili) log(albumin)
72
Assessment of Influence
Subject 81 is older and has high serum bilirubin (2 sd on log scale) Bilirubin is an important predictor of high risk, but subjects are in the upper 40th percentile of survival times We may want to do a sensitivity analysis with and without observations 81 and 362 BUT unless we have very good reason (i.e. data entry error) to remove 81, we should not delete them
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.