Presentation on theme: "Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences."— Presentation transcript:
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences
Presentation goals Survival analysis compared w/ other regression techniques What is survival analysis When to use survival analysis Univariate method: Kaplan-Meier curves Multivariate methods: Cox-proportional hazards model Parametric models Assessment of adequacy of analysis Examples
What is survival analysis? Model time to failure or time to event Unlike linear regression, survival analysis has a dichotomous (binary) outcome Unlike logistic regression, survival analysis analyzes the time to an event –Why is that important? Able to account for censoring Can compare survival between 2+ groups Assess relationship between covariates and survival time
Importance of censored data Why is censored data important? What is the key assumption of censoring?
Types of censoring Subject does not experience event of interest Incomplete follow-up Lost to follow-up Withdraws from study Dies (if not being studied) Left or right censored
When to use survival analysis Examples Time to death or clinical endpoint Time in remission after treatment of disease Recidivism rate after addiction treatment When one believes that 1+ explanatory variable(s) explains the differences in time to an event Especially when follow-up is incomplete or variable
Relationship between survivor function and hazard function Survivor function, S(t) defines the probability of surviving longer than time t this is what the Kaplan-Meier curves show. Hazard function is the derivative of the survivor function over time h(t)=dS(t)/dt –instantaneous risk of event at time t (conditional failure rate) Survivor and hazard functions can be converted into each other
Approach to survival analysis Like other statistics we have studied we can do any of the following w/ survival analysis: Descriptive statistics Univariate statistics Multivariate statistics
Descriptive statistics Average survival When can this be calculated? What test would you use to compare average survival between 2 cohorts? Average hazard rate Total # of failures divided by observed survival time (units are therefore 1/t or 1/pt-yrs) An incidence rate, with a higher values indicating more events per time
Univariate method: Kaplan-Meier survival curves Also known as product-limit formula Accounts for censoring Generates the characteristic “stair step” survival curves Does not account for confounding or effect modification by other covariates When is that a problem? When is that OK?
Comparing Kaplan-Meier curves Log-rank test can be used to compare survival curves Less-commonly used test: Wilcoxon, which places greater weights on events near time 0. Hypothesis test (test of significance) H 0 : the curves are statistically the same H 1 : the curves are statistically different Compares observed to expected cell counts Test statistic which is compared to 2 distribution
Comparing multiple Kaplan-Meier curves Multiple pair-wise comparisons produce cumulative Type I error – multiple comparison problem Instead, compare all curves at once analogous to using ANOVA to compare > 2 cohorts Then use judicious pair-wise testing
Limit of Kaplan-Meier curves What happens when you have several covariates that you believe contribute to survival? Example Smoking, hyperlipidemia, diabetes, hypertension, contribute to time to myocardial infarct Can use stratified K-M curves – for 2 or maybe 3 covariates Need another approach – multivariate Cox proportional hazards model is most common -- for many covariates (think multivariate regression or logistic regression rather than a Student’s t-test or the odds ratio from a 2 x 2 table)
Multivariate method: Cox proportional hazards Needed to assess effect of multiple covariates on survival Cox-proportional hazards is the most commonly used multivariate survival method Easy to implement in SPSS, Stata, or SAS Parametric approaches are an alternative, but they require stronger assumptions about h(t).
Cox proportional hazard model Works with hazard model Conveniently separates baseline hazard function from covariates Baseline hazard function over time –h(t) = h o (t)exp(B 1 X+Bo) Covariates are time independent B 1 is used to calculate the hazard ratio, which is similar to the relative risk Nonparametric Quasi-likelihood function
Cox proportional hazards model, continued Can handle both continuous and categorical predictor variables (think: logistic, linear regression) Without knowing baseline hazard h o (t), can still calculate coefficients for each covariate, and therefore hazard ratio Assumes multiplicative risk—this is the proportional hazard assumption Can be compensated in part with interaction terms
Limitations of Cox PH model Does not accommodate variables that change over time Luckily most variables (e.g. gender, ethnicity, or congenital condition) are constant –If necessary, one can program time-dependent variables –When might you want this? Baseline hazard function, h o (t), is never specified You can estimate h o (t) accurately if you need to estimate S(t).
Hazard ratio What is the hazard ratio and how to you calculate it from your parameters, β How do we estimate the relative risk from the hazard ratio (HR)? How do you determine significance of the hazard ratios (HRs). Confidence intervals Chi square test
Assessing model adequacy Multiplicative assumption Proportional assumption: covariates are independent with respect to time and their hazards are constant over time Three general ways to examine model adequacy Graphically Mathematically Computationally: Time-dependent variables (extended model)
Model adequacy: graphical approaches Several graphical approaches Do the survival curves intersect? Log-minus-log plots Observed vs. expected plots
Testing model adequacy mathematically with a goodness-of-fit test Uses a test of significance (hypothesis test) One-degree of freedom chi-square distribution p value for each coefficient Does not discriminate how a coefficient might deviate from the PH assumption
Example: Tumor Extent 3000 patients derived from SEER cancer registry and Medicare billing information Exploring the relationship between tumor extent and survival Hypothesis is that more extensive tumor involvement is related to poorer survival
Example: Tumor Extent Tumor extent may not be the only covariate that affects survival Multiple medical comorbidities may be associated with poorer outcome Ethnic and gender differences may contribute Cox proportional hazards model can quantify these relationships
Example: Tumor Extent Test proportional hazards assumption with log- minus-log plot Perform Cox PH regression Examine significant coefficients and corresponding hazard ratios
Example: Tumor Extent 5 The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Variable Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits Label age2 1 0.15690 0.05079 9.5430 0.0020 1.170 1.059 1.292 70
"name": "Example: Tumor Extent 5 The PHREG Procedure Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Variable Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits Label age2 1 0.15690 0.05079 9.5430 0.0020 1.170 1.059 1.292 70 ChiSq Ratio Confidence Limits Label age2 1 0.15690 0.05079 9.5430 0.0020 1.170 1.059 1.292 70
Summary Survival analyses quantifies time to a single, dichotomous event Handles censored data well Survival and hazard can be mathematically converted to each other Kaplan-Meier survival curves can be compared statistically and graphically Cox proportional hazards models help distinguish individual contributions of covariates on survival, provided certain assumptions are met.