Cox Proportional Hazards Model and Its Extension with SPSS

Cox Proportional Hazards Model and Its Extension with SPSS
Jinheum Kim (Univ. Suwon)

Outline The Cox PH model and its characteristics
Evaluating the PH assumption The stratified Cox model Extension of the Cox model for time-dependent variables Outline

Illustrative example: leukemia remission data
Survival time: time in weeks a patient is in remission failure: going out of remission (relapse=1) censored: remains in remission until the end of the study, lost to follow-up, or withdraws before the end of the study (relapse=0) Two groups of leukemia patients: 0=treatment, 1=placebo Prognostic factors: log WBC , sex(0=female, 1=male) Group of log WBC : 1=0-2.3, 2=2.3-3, 3=3+ Source: Freireich et al. (Blood, 1963)

The Cox PH model defined as the hazard at time 𝑡 for an individual with a set of explanatory variables 𝑋 𝑋= 𝑋 1 ,…, 𝑋 𝑝 : explanatory/predictor variables ℎ 𝑡,𝑋 = ℎ 0 𝑡 𝑒 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 product of two quantities baseline hazard function, ℎ 0 𝑡 , part: involve 𝑡 but not 𝑋’s, unspecified (non-parametric!) exponential part: involve 𝑋’s but not 𝑡 (𝑋’s are time-independent) Cox model is semi-parametric If 𝑋 1 =…= 𝑋 𝑝 =0, i.e., no 𝑋’s in model, ℎ 𝑡,𝑋 = ℎ 0 𝑡

Why is the Cox PH model popular?
It is robust, i.e., will closely approximate correct parametric model When in doubt, the Cox model is a safe choice It ensures that the fitted models will always give estimated hazards that are non-negative Even though ℎ 0 (𝑡) is unspecified, we can estimate the 𝛽(or hazard ratio), and also ℎ 𝑡,𝑋 and its corresponding survival function 𝑆(𝑡,𝑋)

ML estimation of the Cox PH model
ML estimates: maximize likelihood function 𝐿 𝐿= joint probability of observed data=𝐿(𝛽) 𝐿 is actually called a partial likelihood function rather than a (complete) likelihood function considers probabilities only for subjects who fail does not consider probabilities for subjects who are censored

𝐿= 𝐿 1 × 𝐿 2 ×⋯× 𝐿 𝑘 𝐿 𝑗 =portion of 𝐿 for the 𝑗th failure time given the risk set 𝑅( 𝑡 𝑗 ) 𝑅( 𝑡 𝑗 ): set of individuals at risk at the 𝑗th failure time (called risk set) 𝑘: # of failure times Information on censored subjects used prior to censorship, that is, a person who is censored after the 𝑗th failure time is part of the risk set used to compute 𝐿 𝑗 , even though this person is censored later

Steps for obtaining ML estimates form 𝐿 from model maximize ln 𝐿 by solving 𝜕 ln 𝐿 𝜕 𝛽 𝑖 =0,𝑖=1,…,𝑝 Solution by iteration guess at solution modify guess in successive steps stop when solution is obtained

Statistical inference for 𝛽 or hazard ratio(𝐻𝑅) model with no interaction term involving a 0-1 exposure variable defines as the exponentiation of the coefficient of the exposure variable estimated hazard ratio: 𝐻𝑅 = 𝑒 𝛽 Test hypothesis: 𝐻 0 : 𝑒 𝛽 =1 ⇔𝛽=0 Use Wald test or LR test 95% CI for 𝑒 𝛽 : 𝑒 𝛽 ± 𝑧 se 𝛽 95% CI for 𝛽: 𝛽 ± 𝑧 se( 𝛽 )

Computing the hazard ratio & adjusted survival curve using the Cox PH model
Estimated hazard ratio: 𝐻𝑅 = ℎ (𝑡, 𝑋 ∗ ) ℎ (𝑡,𝑋) =exp { 𝑖=1 𝑝 𝛽 𝑖 ( 𝑋 𝑖 ∗ − 𝑋 𝑖 ) } 𝑋 ∗ = 𝑋 1 ∗ ,…, 𝑋 𝑝 ∗ ;𝑋= 𝑋 1 ,…, 𝑋 𝑝 : set of 𝑋’s for two individuals Example 𝑅 𝑥 only model with 𝑋 ∗ =1, 𝑋=0: 𝐻𝑅 = 𝑒 𝛽 1 𝑅 𝑥 + ln WBC model with 𝑋 ∗ =(1, log WBC ∗ ), 𝑋=(0, log WBC ): 𝐻𝑅 = 𝑒 𝛽 𝛽 2 ( log WB C ∗ − log WBC )

Cox model: adjusted survival curves (also step function)
Computing the hazard ratio & adjusted survival curve using the Cox PH model No model: use KM curves Cox model: adjusted survival curves (also step function) Cox PH survival model: 𝑆 𝑡,𝑋 = 𝑆 0 𝑡 𝑒 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 𝑆 0 𝑡 : baseline survival function 𝑆 𝑡,𝑋 = 𝑆 0 𝑡 𝑒 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 : estimated survival function Typically, use 𝑋= 𝑋 or 𝑋 median

The meaning of the PH assumption
PH: 𝐻𝑅 is constant over time, i.e., does not involve 𝑡 If the hazards cross, then a Cox PH model is not appropriate

Leukemia remission data: revisited

Checking the PH assumption
Three approaches graphical log-log survival curves: parallel? observed vs. expected curves: close? goodness-of-fit (GOF) test time-dependent variables (later!)

Graphical approach: log-log plots
Recall Cox PH survival model: 𝑆 𝑡,𝑋 = 𝑆 0 𝑡 𝑒 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 log-log transform: ln −[ ln 𝑆 𝑡,𝑋 ] = 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 + ln [− ln 𝑆 0 𝑡 ] PH model is appropriate if empirical plots of log-log survival curves are parallel two individuals: 𝑋 1 = 𝑋 11 ,…, 𝑋 1𝑝 ; 𝑋 2 = 𝑋 21 ,…, 𝑋 2𝑝 −ln − ln 𝑆 𝑡, 𝑋 1 =− ln − ln 𝑆 𝑡, 𝑋 𝑖=1 𝑝 𝛽 𝑖 (𝑋 2𝑖 − 𝑋 1𝑖 ) Empirical plots: use −ln −[ ln 𝑆 ] 𝑆 is a KM curve 𝑆 is an adjusted survival curve for predictors satisfying the PH assumption; the predictor being assessed not included in model Problems How parallel is parallel? How to categorize a continuous variable? How to evaluate several variables simultaneously?

Graphical approach: log-log plots

Graphical approach: observed vs. expected plots
Observed plot stratify data by categories of the predictor to be assessed obtain KM curves separately for each category Expected plot fit a Cox PH model containing the predictor being assessed separately substitute the value of each category of the predictor into the formula for the estimated survival curve Problems: how close is close?

Two options for expected plots for a continuous variable Use a Cox PH model with (𝑘−1) dummy variables for 𝑘 categories and obtain adjusted survival curves, i.e., 𝑆 𝑡, 𝑋 𝑐 = 𝑆 0 𝑡 𝑒 𝑖=1 𝑘−1 𝛽 𝑖 𝑋 𝑐𝑖 𝑋 𝑐 = 𝑋 𝑐1 ,…, 𝑋 𝑐𝑘−1 gives values of dummy variables for category 𝑐(𝑐=1,…,𝑘) Use a Cox PH model containing the continuous predictor and obtain adjusted survival curve, i.e., 𝑆 𝑡, 𝑋 𝑐 = 𝑆 0 𝑡 exp ( 𝛽 𝑋 𝑐 ) 𝑋 𝑐 : mean value for the variable 𝑋 with category 𝑐

GOF testing approach Schoenfeld residuals are defined for
(i) each predictor in model and (ii) every subject who has event Example: ℎ 𝑡 = ℎ 0 𝑡 exp ( 𝛽 1 𝑅 𝑥 + 𝛽 2 log WBC + 𝛽 3 SEX) Schoenfeld residual for the 𝑖th subject for log WBC = (observed log WBC – weighted average of log WBC for the other subjects still at risk at time 𝑡) weights are each subject’s hazard Underlying idea of test : if PH holds then Schoenfeld residuals may be uncorrelated with survival time more clear-cut decision than when using graphical approaches Drawbacks: global test may fail to detect a specific kind of departure from PH Recommend: use both graphs and tests

The stratified Cox (SC) model
Assume that 𝑍 1 ,…, 𝑍 𝑘 , do not satisfy the PH assumption 𝑋 1 ,…, 𝑋 𝑝 , satisfy the PH assumption Define a single new variable 𝑍 ∗ categorize each 𝑍 𝑖 , 𝑖=1,…,𝑘 form combinations of categories (strata) the strata are the categories of 𝑍 ∗ The SC model ℎ 𝑔 𝑡,𝑋 = ℎ 0𝑔 𝑡 exp { 𝛽 1 𝑋 1 +⋯+ 𝛽 𝑝 𝑋 𝑝 } , 𝑔=1,…, 𝑘 ∗ 𝑍 ∗ has 𝑘 ∗ categories 𝑘 ∗ = total # of combinations (strata) 𝑍 ∗ not included in the model different baseline hazard functions but same coefficients

The SC model different baselines ℎ 01 ⇒ 𝑆 1 ⋮ ℎ 0 𝑘 ∗ ⇒ 𝑆 𝑘 ∗ different survival curves But, 𝐻𝑅 same for each stratum Partial likelihood function 𝐿= 𝐿 1 ×⋯× 𝐿 𝑘 ∗ 𝐿 𝑔 ,𝑔=1,…, 𝑘 ∗ : likelihood derived from the hazard function ℎ 𝑔 𝑡,𝑋

How to test the no-interaction assumption
(no-interaction) SC model 𝛽 coefficients do not vary over stratum No interaction of the variable 𝑍 ∗ with 𝑋 1 ,…, or 𝑋 𝑝 The SC model allowing interaction ℎ 𝑔 𝑡,𝑋 = ℎ 0𝑔 𝑡 exp 𝛽 1𝑔 𝑋 1 +⋯+ 𝛽 𝑝𝑔 𝑋 𝑝 , 𝑔=1,…, 𝑘 ∗

Alternative SC model allowing interaction use product terms involving 𝑍 ∗ define (𝑘 ∗ −1) dummy variables 𝑍 1 ∗ ,…, 𝑍 𝑘 ∗ −1 ∗ from 𝑍 ∗ produce of the form 𝑍 𝑖 ∗ × 𝑋 𝑗 , 𝑖=1,…, (𝑘 ∗ −1);𝑗=1,…,𝑝 ℎ 𝑔 𝑡,𝑋 = ℎ 0𝑔 𝑡 exp { 𝛽 1 𝑋 1 +⋯+ 𝛽 𝑝 𝑋 𝑝 + 𝛽 11 (𝑍 1 ∗ × 𝑋 1 )+⋯+ 𝛽 𝑝1 (𝑍 1 ∗ × 𝑋 𝑝 )+⋯ + 𝛽 1, 𝑘 ∗ −1 (𝑍 𝑘 ∗ −1 ∗ × 𝑋 1 )+⋯+ 𝛽 𝑝, 𝑘 ∗ −1 (𝑍 𝑘 ∗ −1 ∗ × 𝑋 𝑝 )}

Equivalence of two SC models with interaction ℎ 𝑔 𝑡,𝑋 = ℎ 0𝑔 𝑡 exp { 𝛽 1𝑔 log WBC + 𝛽 2𝑔 𝑅 𝑥 } 𝑔=1 (female), 𝑔=2 (male) ℎ 𝑔 𝑡,𝑋 = ℎ 0𝑔 𝑡 exp { 𝛽 1 ∗ log WBC + 𝛽 2 ∗ 𝑅 𝑥 + 𝛽 3 ∗ (𝑍 1 ∗ × log WBC )+ 𝛽 4 ∗ (𝑍 1 ∗ × 𝑅 𝑥 )} 𝑍 1 ∗ =1 (female), 𝑍 1 ∗ =0 (male) For 𝑔=1(female), 𝛽 11 = 𝛽 1 ∗ + 𝛽 3 ∗ ; 𝛽 21 = 𝛽 2 ∗ + 𝛽 4 ∗ For 𝑔=2(male), 𝛽 12 = 𝛽 1 ∗ ; 𝛽 22 = 𝛽 2 ∗

Testing the no-interaction assumption 𝐿𝑅=−2 ln 𝐿 𝑅 −(−2 ln 𝐿 𝐹 )∼ 𝜒 (𝑘 ∗ −1)×𝑝 2 under 𝐻 0 : no-interaction 𝑅: no-interaction SC model 𝐹: SC model with interaction 𝑑𝑓 𝑅 =𝑛−𝑝;𝑑𝑓 𝐹 =𝑛−( 𝑘 ∗ ×𝑝) or =𝑛−(𝑝+ 𝑘 ∗ −1 ×𝑝) 𝑑𝑓 𝑅 −𝑑𝑓 𝐹 = (𝑘 ∗ −1)×𝑝

Time-dependent variables
Definition Time-dependent Time-independent Value of variable differs over time Value of variable is constant over time Example: 𝐸×𝑡 𝐸=1/0

Examples Defined variables Internal variables Ancillary variables
𝐸×𝑡, 𝐸×( log 𝑡 −3), 𝐸 ×𝑔 𝑡 , 𝑔 𝑡 = 1 if 𝑡≥ 𝑡 0 0 if 𝑡< 𝑡 0 Internal variables Values change because of internal characteristics or behavior of the individual eg, exposure level, employment status, smoking status, and obesity levels at time 𝑡 Ancillary variables Values change because of external characteristics eg, air pollution index at time 𝑡

Examples Internal vs. ancillary
Heart transplant status at time 𝑡 𝐻𝑇 𝑡 = 1 if received transplant at some time 𝑡 if did not receive transplant by time 𝑡 Status determined from individual traits vs. status determined from external availability of a donor But, the form of extended Cox model and procedure for analysis are the same regardless of variable type

Incorrect vs. correct time-dependent codes

The extended Cox model Extended Cox model
ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp { 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 + 𝑗=1 𝑝 𝛿 𝑖 𝑋 𝑗 𝑡 } 𝑋 𝑡 = 𝑋 1 ,…, 𝑋 𝑝 1 , 𝑋 1 𝑡 ,…, 𝑋 𝑝 2 𝑡 : entire collection of predictors at time 𝑡 provide one coefficient for 𝑋 𝑗 (𝑡) even though the values of 𝑋 𝑗 (𝑡) may change over time eg, ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp {𝛽𝐸+𝛿(𝐸×𝑡)} Estimating regression parameters maximize partial likelihood 𝐿 risk sets are more complicated than thosefor PH model Statistical inference Wald and LR tests

The hazard ratio formula
PH assumption is not satisfied for the extended Cox model Example ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp {𝛽𝐸+𝛿(𝐸×𝑡)} 𝐸= 1 if exposed 0 if unexposed 𝐻𝑅 𝑡 = ℎ (𝑡, 𝑋 ∗ 𝑡 ) ℎ (𝑡, 𝑋 𝑡 ) =exp 𝛽 1−0 + 𝛿 1×𝑡−0×𝑡 = exp { 𝛽 + 𝛿 𝑡} =exp 𝛽 1−0 + 𝛿 1×𝑡−0×𝑡 = exp { 𝛽 + 𝛿 𝑡} 𝑋 ∗ 𝑡 = 𝐸=1,𝐸×𝑡=𝑡 , 𝑋 𝑡 =(𝐸=0, 𝐸×𝑡=0) If 𝛿 >0, 𝐻𝑅 (𝑡)↑ as 𝑡↑ PH assumption not satisfied!

In the hazard ratio formula, the coefficient 𝛿 of the difference of the values of 𝑋(𝑡) is itself not time-dependent represents the overall effect of the corresponding time-dependent variable, considering all times at which this variable has been measured in the study

Example ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp {𝛿𝐸(𝑡)} 𝐸(𝑡)= 1 if exposed at time 𝑡 0 if unexposed at time 𝑡 𝐻𝑅 𝑡 = ℎ (𝑡, 𝐸 𝑡 =1) ℎ 𝑡, 𝐸 𝑡 =0 =exp 𝛿 1−0 = e 𝛿 , a fixed number But, PH is not satisfied 𝐻𝑅 𝑡 is time-dependent because 𝐸(𝑡) is time-dependent

Use the extended Cox model to
Assessing time-independent variables that do not satisfy the PH assumption Use the extended Cox model to check PH assumption assess effect of variable not satisfying PH assumption Three methods for checking PH assumption graphical GOF test extended Cox model

Extended Cox model Check PH assumption
Assessing time-independent variables that do not satisfy the PH assumption Extended Cox model ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp { 𝑖=1 𝑝 𝛽 𝑖 𝑋 𝑖 + 𝑖=1 𝑝 𝛿 𝑖 𝑋 𝑖 𝑔 𝑖 𝑡 } 𝑔 𝑖 𝑡 =𝑡 or log 𝑡 𝑔 𝐿 𝑡 =𝑡, 𝑔 𝑙 𝑡 =0,𝑙≠𝐿: one variable at a time 𝑔 𝑖 𝑡 = 0 if 𝑡≥ 𝑡 0 1 if 𝑡< 𝑡 0 : heaviside function Check PH assumption Test 𝐻 0 : 𝛿 1 =⋯= 𝛿 𝑝 =0 Use LR test: LR∼ 𝜒 𝑝 2 under 𝐻 0 If PH test is significant, extended Cox model is preferred

Assessing time-independent variables that do not satisfy the PH assumption
Examples ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp {𝛽𝐸+𝛿 𝐸×𝑔 𝑡 } 𝑔 𝑡 =𝑡 ⇒ 𝐻𝑅 𝑡 = exp { 𝛽 + 𝛿 𝑡} 𝑔 𝑡 = 1 if 𝑡≥ 𝑡 0 0 if 𝑡< 𝑡 0 ⇒ 𝐻𝑅 𝑡 = exp { 𝛽 + 𝛿 } if 𝑡≥ 𝑡 0 exp { 𝛽 } if 𝑡< 𝑡 0

ℎ 𝑡,𝑋 𝑡 = ℎ 0 𝑡 exp { 𝛽 1 𝑅 𝑥 + 𝛽 2 log 𝑊𝐵𝐶 + 𝛿 1 𝑆𝐸𝑋× 𝑔 1 𝑡 + 𝛿 2 (𝑆𝐸𝑋× 𝑔 2 (𝑡))} exp { 𝛽 1 𝑅 𝑥 + 𝛽 2 log 𝑊𝐵𝐶 + 𝛿 1 𝑆𝐸𝑋× 𝑔 1 𝑡 + 𝛿 2 (𝑆𝐸𝑋× 𝑔 2 (𝑡))} 𝑔 1 𝑡 = 1 if 𝑡<15 0 if 𝑡≥15 , 𝑔 2 𝑡 = 1 if 𝑡≥15 0 if 𝑡<15

References Kalbfleish, JD and Prentice, RL. (2002). The Statistical Analysis of Failure Time Data, Second Edition, Wiley. Moeschberger, ML and Klein, JP. (2003). Survival Analysis: Techniques for Censored and Truncated Data, Springer. Kleinbaum, DG and Klein, M. (2011). Survival Analysis, A Self-Learning Text, Springer. (data file name: anderson.dat)

Thank you!!!

Cox Proportional Hazards Model and Its Extension with SPSS

Similar presentations

Presentation on theme: "Cox Proportional Hazards Model and Its Extension with SPSS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cox Proportional Hazards Model and Its Extension with SPSS

Similar presentations

Presentation on theme: "Cox Proportional Hazards Model and Its Extension with SPSS"— Presentation transcript:

Similar presentations

About project

Feedback