Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods.

Slides:



Advertisements
Similar presentations
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Advertisements

Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Hypothesis Testing Steps in Hypothesis Testing:
Lecture 20 Comparing groups Cox PHM. Comparing two or more samples  Anova type approach where τ is the largest time for which all groups have at least.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
HSRP 734: Advanced Statistical Methods July 24, 2008.
AP Statistics – Chapter 9 Test Review
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Review.
Linear and generalised linear models
Event History Models Sociology 229: Advanced Regression Class 5
Model Checking in the Proportional Hazard model
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1Day 2 Section 7 Introduction to survival analysis.
Lecture 9: Hypothesis Testing One sample tests >2 sample.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
Assessing Survival: Cox Proportional Hazards Model
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Intervention models Something’s happened around t = 200.
Lecture 8: Nelson Aalen Estimator and Smoothing
Lecture 11: Hypothesis Testing III
Lecture 13: Cox PHM Part II Basic Cox Model Parameter Estimation Hypothesis Testing.
Linear correlation and linear regression + summary of tests
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
CREATE Biostatistics Core THRio Statistical Considerations Analysis Plan.
Lecture 19: Competing Risk Regression
Lecture 15: Time Varying Covariates Time-varying covariates.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Lecture 12: Cox Proportional Hazards Model
DOX 6E Montgomery1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Lecture 17: Regression Diagnostics II Residuals. Residuals are used to investigate the lack of fit of a model to a given subject For Cox regression, there’s.
Logistic Regression. Linear Regression Purchases vs. Income.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 10: Hypothesis Testing II Weight Functions Trend Tests.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
Lecture 3: Parametric Survival Modeling
Logistic Regression Analysis Gerrit Rooks
Introduction to Frailty Models
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Example x y We wish to check for a non zero correlation.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Occasionally, we are able to see clear violations of the constant variance assumption by looking at a residual plot - characteristic “funnel” shape… often.
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
Additional Regression techniques Scott Harris October 2009.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
03/20161 EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 5, 2016 Dr. N. Birkett, School of Epidemiology, Public.
Logistic Regression APKC – STATS AFAC (2016).
April 18 Intro to survival analysis Le 11.1 – 11.2
CHAPTER 7 Linear Correlation & Regression Methods
Applied Biostatistics: Lecture 2
Statistics 262: Intermediate Biostatistics
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistics 262: Intermediate Biostatistics
Additional Regression techniques
Lecture 14: CoxPHM III Model building.
Presentation transcript:

Lecture 16: Regression Diagnostics I Proportional Hazards Assumption -graphical methods -regression methods

Regression Diagnostics Most interested in testing proportional hazards assumption Also looking for functional form of covariates Two types of methods – Graphical approaches – Regression approaches

Graphical Approaches Recall our graphical checks – Kernel smoothing – Smoothing splines Both will provide information about whether or not the hazards cross Both can be implemented in R – “muhaz” package: kernel smoothing for survival – “gss” package: smoothing splines for survival

Graphical Approaches Consider CPHM with single binary covariate… This means we can also can consider the following plot… If hazards proportional, this should be ≈ equal to No package in R – Calculate NA hazard estimates for each condition at each unique time point

Examples Lets explore the graphical and regression checks for proportional hazards – Kidney Infection data Surgical vs. percutaneous – BMT data FAB classification Methotrexate use

Survival for Kidney

Graphical Checks ### KIDNEY INFECTION EXAMPLE### #log cum haz plots Library(Kmsurv); library(survival) coxph(Surv(time, delta)~factor(type), data=kidney) dat1<-kidney[kidney$type==1, ] dat2<-kidney[kidney$type==2, ] fit1<-survfit(coxph(Surv(time, delta)~1, data=dat1), type="aalen") fit2<-survfit(coxph(Surv(time, delta)~1, data=dat2), type="aalen") times<-sort(unique(kidney$time)) ch1<--log(fit1$surv) ch2<--log(fit2$surv) ch1<-c(0, ch1[1:17],ch1[17],ch1[17],ch1[18:20],ch1[20],ch1[21:23],ch1[23]) ch2<-c(ch2[1:13],ch2[13],ch2[14:19],ch2[19],ch2[20],ch2[20],ch2[21:23],ch2[23],ch2[24]) plot(times, log(ch2)-log(ch1), type="s", xlab="time", ylab="log(H[t|Z=perc])-log(H[t|Z=surg])", lwd=2) lines(times, rep(0.613, length(times)), lwd=2, col=2)

Graphical Checks #Smoothing splines cath<-kid$cath[order(kid$Time)] event<-kid$d[order(kid$Time)] times<-sort(kid$Time) library(gss) hazfit<-sshzd(Surv(times, event)~cath*times) haz<-hzdrate.sshzd(hazfit, data.frame(times=times, cath=cath)) h1<-haz[cath==1]; id1<-order(h1); t1<-times[cath==1] h2<-haz[cath==0]; id2<-order(h2); t2<-times[cath==0] plot(times[cath==0], haz[cath==0], xlim=c(0, max(times)), ylim=range(haz), xlab="Time", type="l",ylab="hazard", lwd=2, col=1) lines(times[cath==1], haz[cath==1], lwd=2, col=2) legend(0,.1, c("percutaneous","surgical"), col=1:3, lwd=2, cex=0.8)

Graphical Checks: Kidney

BMT Data Let’s conduct graphical checks for – French/American/British Disease classification Recall this was significant in our original model – Methotrexate use Recall this was not

Survival Curves for BMT: FAB and MTX

BMT Graphical Checks: FAB

BMT Graphical Checks: MTX

Graphical Approaches Pretty pictures are nice and can be intuitive but… We generally prefer a statistical means of determining if an assumption is true This leads us to regression approaches

Regression Approaches Impose a time-dependent covariate into the model General idea: – If PHM is valid, time-dependent covariate will not be significant – If time-dependent covariate is significant, then there is “something” going on in terms of the HRs that varies over time

Introduce Time Dependent Covariate Create an new variable Z 2 (t) = Z 1 ×g(t), where g(t) is a function of time We don’t know the functional form of g(t) Try several possibilities, for example

Binary Case Consider a binary covariate Z 1 Generate Model is: Hazard ratio is:

New Time Dependent Covariate Fit proportional hazards model with Z 1, Z 2 (t), and estimate b 1, b 2 Test local hypothesis: – H 0 : b 2 = 0 vs. H A : b 2 ≠ 0 If you reject H 0, can not assume proportional hazards Do this for each covariate in question

Examples Lets explore the regression check for proportional hazards in our two examples… – Kidney Infection data Surgical vs. percutaneous – BMT data FAB classification Methotrexate use

Regression Check: Kidney ### Kidney Example (are hazards for percutaneous and surgical proportional?) times<-sort(unique(kidney$time)) kidney$id<-1:nrow(kidney) kid.long<-expand.breakpoints(kidney, index="id", status="delta", tevent="time", breakpoints=times) kid.long$ttype1<-log(kid.long$Tstop)*(kid.long$type-1) kid.long$ttype2<-kid.long$Tstop*(kid.long$type-1) kid.long$ttype3 7.5, (kid.long$type-1), 0) kid.long$ttype4 7.5, kid.long$Tstop*(kid.long$type-1), 0) m1<-coxph(Surv(Tstart, Tstop, delta)~type+ttype1, data=kid.long) m2<-coxph(Surv(Tstart, Tstop, delta)~type+ttype2, data=kid.long) m3<-coxph(Surv(Tstart, Tstop, delta)~type+ttype3, data=kid.long) m4<-coxph(Surv(Tstart, Tstop, delta)~type+ttype4, data=kid.long)

Results: Kidney > m1 coef exp(coef) se(coef) z p type ttype > m2 coef exp(coef) se(coef) z p type ttype > m3 coef exp(coef) se(coef) z p type ttype > m4 coef exp(coef) se(coef) z p type ttype

BMT Regression Check: FAB ### BMT Example (are hazards FAB classes proportional?) bps<-sort(unique(c(bmt$DFS))) bmt.long<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps) #create time-dependent covariates bmt.long$txfab1<-log(bmt.long$Tstop)*(bmt.long$FAB) bmt.long$txfab2<-bmt.long$Tstop*(bmt.long$FAB) bmt.long$txfab3 100, (bmt.long$FAB), 0) bmt.long$txfab4 100, bmt.long$Tstop*(bmt.long$FAB), 0) m1<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab1, data=bmt.long) m2<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab2, data=bmt.long) m3<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab3, data=bmt.long) m4<-coxph(Surv(Tstart, Tstop, Either)~FAB+txfab4, data=bmt.long)

Results: FAB > m1 coef exp(coef) se(coef) z p FAB txfab > m2 coef exp(coef) se(coef) z p FAB txfab > m3 coef exp(coef) se(coef) z p FAB txfab > m4 coef exp(coef) se(coef) z p FAB txfab

BMT Regression Check: MTX #create time-dependent covariates for MTX bmt.long$txmtx1<-log(bmt.long$Tstop)*(bmt.long$MTX) bmt.long$txmtx2<-bmt.long$Tstop*(bmt.long$MTX) bmt.long$txmtx3 400, (bmt.long$MTX), 0) bmt.long$txmtx4 400, bmt.long$Tstop*(bmt.long$MTX), 0) m1<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx1, data=bmt.long) m2<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx2, data=bmt.long) m3<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx3, data=bmt.long) m4<-coxph(Surv(Tstart, Tstop, Either)~MTX+txmtx4, data=bmt.long)

Results: MTX > m1 coef exp(coef) se(coef) z p MTX txmtx > m2 coef exp(coef) se(coef) z p MTX txmtx > m3 coef exp(coef) se(coef) z p MTX txmtx > m4 coef exp(coef) se(coef) z p MTX txmtx

Checking Model with >1 Covariate? ### Model where MTX is not time-varying > m5a<-coxph(Surv(Tstart, Tstop, Either)~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge+MTX, data=bmt.long) > m5a Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + MTX, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge MTX PtAge:DonAge Likelihood ratio test=34.2 on 7 df, p=1.58e-05 n= 8665, number of events= 83

Checking Model with >1 Covariate? ### Model where MTX is not time-varying > m5<-coxph(Surv(Tstart, Tstop, Either)~factor(Disease) + FAB + PtAge + DonAge + PtAge*DonAge+MTX+txmtx1, data=bmt.long) > m5 Call: coxph(formula = Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + MTX + txmtx1, data = bmt.long) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB PtAge DonAge MTX txmtx PtAge:DonAge Likelihood ratio test=39.5 on 8 df, p=3.93e-06 n= 8665, number of events= 83

Alternative Form of Time-Varying Covariate So far we’ve guessed at g(t) Problem is we don’t necessarily know the correct functional form Consider a binary covariate, Z 1 Assume for covariate Z 1, the relative risk changes over time What if we use the data instead – Get “best estimate” from the data

Change Point Model Let This gives us proportional hazards model with a change point at  (Liang et. Al (1990)) Fit proportional hazards model with Z 1 and Z 2 (t) We now have a PH model with HR: – Model: H(t|Z (t)) = h 0 (t)exp{b 1 Z 1 + b 2 Z 2 } – h(t|Z(t)) = h 0 (t)exp{b 1 Z 1 } if t <  – h(t|Z(t)) = h 0 (t)exp{(b 1 + b 2 )Z 1 } if t >  So we are fitting a PH model that includes a change point, which allows the HR to change after a specified time

How to Determine  A change point for the relative risk was introduced. Where is the best change point? Recall the partial likelihood only changes at event times Calculate log likelihood at each event time where  represents specific event times Choose  that yields the largest log-likelihood

Kidney Example #Change point model >cps<-sort(unique(kidney$time[which(kidney$delta==1)])) >LL<-c() >for (i in 1:length(cps)) >{ > z2 cps[i], kid.long$type, 0) > mod<-coxph(Surv(Tstart, Tstop, delta)~type+z2, data=kid.long, > method="breslow") > LL<-append(LL, mod$loglik[2]) >} > round(LL, digits=3) [1] [9]

Change Point Results Event TimesLog Partial Likelihood

What About Multiple Comparisons? Is it “fishing” to try many cutpoints? No, we are conducting diagnostics so we don’t worry so much We aren’t sure of the form of a time- dependence so we are being flexible to identify if we are missing something

Hazards Not Proportional Proportional hazards assumption doesn’t hold… what can we do? Single binary covariate… consider a piecewise regression – Change-point identified by data (alternate coding) Many covariates, consider stratified model on non-proportional covariate

Kidney: 1-Covariate with Change-Point > kid.long$z2 3.5, kid.long$cath, 0) > kid.long$z3<-ifelse(kid.long$Tstop<=3.5, kid.long$cath, 0) > mod<-coxph(Surv(Tstart, Tstop, d)~z2+z3, data=kid.long, method="breslow") > mod Call: coxph(formula = Surv(Tstart, Tstop, d) ~ z2 + z3, data = kid.long, method = "breslow") coef exp(coef) se(coef) z p z z Likelihood ratio test=13.9 on 2 df, p= n= 1132, number of events= 26

Interpretation? Up to 3.5 months, there is a there is not a significant difference in risk of infection between the two groups. However, after 3.5 months the relative risk of infection in patients with percutaneously placed catheters is 0.12 times the risk relative to patients with surgically placed catheters. Recall our hazard rate plots – Hazards crossed at about 3.5 months

A Few Points A single cutpoint may not be enough – There are “two” models – Within in each piece, we are still assuming proportional hazards Check proportional hazards models within each of the time intervals Can generate additional time varying covariates within each interval

Stratified Cox Regression Recall stratification Estimates ‘pooled’ association across strata Stratification in regression – Estimates pooled regression coefficient – Strong assumption that associations between covariate and outcome are the same across strata

Estimation: Partial Likelihood Approach Partition dataset based on strata Define log-likelihood per strata Log-likelihood based on J strata Maximize LL(  ) w.r.t.  Notice  is common across all the strata specific partial log-likelihoods

BMT: Stratified Model Steps: 1) Check proportional hazards assumption 2) Fit stratified cox model 3) Check model assumptions (i.e. constant  ’s… more on this in a moment) We’ve already seen that the proportional hazards assumption for Methotrexate use is incorrect.

BMT Data Associations between covariates and DFS, stratified by diagnosis > reg2a<-coxph(Surv(Tstart, Tstop, Either)~ factor(Disease)+ FAB+DonAge:PtAge+TRP+strata(MTX), data = bmt.long2) > reg2a Call: coxph(formula=Surv(Tstart, Tstop,Either)~factor(Disease)+FAB+DonAge+ PtAge+DonAge*PtAge+PRt+strata(MTX), data = bmt.long2) coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge PRt DonAge:PtAge Likelihood ratio test=33.4 on 7 df, p=2.23e-05 n= 19070, number of events= 83

Is Assumption of Constant  Reasonable? Testing assumption Divide dataset into J strata Fit model with p covariates in each strata Define Define based on stratified model Test significance via LRT

Checking Constant  Assumption #Testing stratification assumption reg2a<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=bmt.long2) dat1<-bmt.long2[bmt.long2$MTX==0,] reg2b<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=dat1) dat2<-bmt.long2[bmt.long2$MTX==1,] reg2c<-coxph(Surv(Tstart, Tstop, Either) ~ factor(Disease) + FAB + DonAge + PtAge + DonAge*PtAge +PRt + strata(MTX), data=dat2) LL2<-reg2$loglik[2] LL3<-reg3a$loglik[2]+reg3b$loglik[2]+reg3c$loglik[2] lrt<-2*(LL3-LL2) p.lrt<-1-pchisq(lrt, 2)

> reg2b coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge PRt DonAge:PtAge > reg2c coef exp(coef) se(coef) z p factor(Disease) factor(Disease) FAB DonAge PtAge PRt DonAge:PtAge

Results > LL.piece<-reg2b$loglik[2]+reg2c$loglik[2] > LL.strat<-reg2a$loglik[2] > lrt<-2*(LL.piece-LL.strat) > lrt [1] > p.lrt<-1-pchisq(lrt, 7) > p.lrt [1]

Next Time Regression diagnostics using residuals!