Download presentation
Presentation is loading. Please wait.
1
Stata: Flexible Parametric Survival models 1h
Hein Stigum Presentation, data and programs at: courses Feb-20 H.S.
2
Programs and Datasets Syntax File
0 Flexible Parametric Survival Course, Downloads Programs ssc install stpm2, replace Datasets Paul Dickman course files Book files Feb-20 H.S.
3
Credits: Patrick Royston Paul Lambert Paul Dickman Michael Crowther
Feb-20 H.S.
4
Introduction Feb-20 H.S.
5
Effect measures Risk Rate Hazard probability, proportion, %
Km/h, cases/person-time Example: Diabetes type 1 2 math quatities prop: fraction, numerator (top) is part of denominator (bottom) ex: colds in class=2/30=7% , no dimension, 0-1 rate: change in one quantity per change in another (time), ex: speed, drive 100 km in 2 h then average speed is 50 km/h, dimension, no upper bound Odds: disease per healthy person Statistical concept: risk: probability, no dimension, ex: flip coin ℎ 𝑡 = lim 𝛿𝑡→0 Pr(𝑡≤𝑇<𝑡+𝛿𝑡|𝑇≥𝑡) 𝛿𝑡 Probability of event in a small time interval, given that you have survived Feb-20 Feb-20 H.S. H.S. 5 5 5
6
Cohorts Closed cohort Open cohort Count persons, risk
Count person-time, rate Statistics Open cohort, new time scale Survival data Line= follow up Circle=event (disease) Red line=exposed Time to event data Censored data MF 9510 Logistic, Survival and Cox Feb-20 H.S.
7
Timescale Feb-20 H.S.
8
Timescale Possible timescales: Chose timescale: Stata commands, st:
Time since diagnosis Age Calendar time Chose timescale: the most important and most non-linear as analysis time Stata commands, st: In this study we follow subjects from diagnosis to death, lost to follow-up or end of study enter start of follow-up exit end of follow-up origin start of risktime failure event (death) or not Feb-20 H.S.
9
Survival time setup Time=years since diagnosis, based on a variable
stset ysdiag, failure(dead==1) Time=years since diagnosis, based on dates stset exit, failure(dead==1) origin(diag) scale(365.25) Time=age, based on dates stset exit, failure(dead==1) origin(birth) enter(diag) scale(365.25) Feb-20 H.S.
10
Standard survival analysis
Kaplan-Meyer plot of survival by treatment sts graph, by(trt) risktable Log rank test of difference in survival sts test trt Cox proportional hazard model with one covariate stcox trt Predicted hazard curves from Cox model stcurve, hazard at1(trt=0) at2(trt=1) Feb-20 H.S.
11
Syntax “5,1 Flexible Parametric Survival, Basic“ “Standard Survival Analysis” line 24
Feb-20 H.S.
12
The hazard of hazard ratios
The hazard ratio is problematic under heterogeneity RCT Example Large variation in the risk of dying (heterogeneity, frailty) Treatment halves the risk of dying (for all subjects) Baseline: balanced, comparable groups Over time treated: frail subjects will remain Over time untreated: frail subjects will die out HR may go from 0.5 (true) to 1.5 (misleading) over time Use instead: survival, ? Use HR with caution if large and varying risk of event (Hernan 2010) Feb-20 H.S.
13
Smoothers in regressions
Polynomials x, x2, x3 Fractional polynomials (2 of 8) x-2, x-1, x-0.5 log(x), x0.5 x, x2, x3 Splines (restricted cubic splines, rcs) cubic 2 of 6 =28 models 2 of 8= c1 c2 Smoothers in survival: Dose-response Baseline hazards Time Dependent effects Polynomials: global Splines: local Feb-20 H.S.
14
A physical “spline” Feb-20 H.S.
15
Reference risk, baseline hazard
Log-risk model Twice the risk of what? RR=2 Always report the reference (baseline) risk Cox model Twice the rate of what? HR=2 Always report the baseline hazard, h(t) Peak at 0.25 years Still positive hazard after 6 years Hazards near 0 may give large HR Feb-20 H.S.
16
Proportional vs non-proportional
baseline Hazards how to model baseline, dose-response, and TD Non- Hazard Ratios time dependent, TD Feb-20 H.S.
17
Baseline, dose-response and TD
t= time E= exposure (cont.) s(.)=spline TD=Time Dependent Generic models l𝑛 ℎ 𝑡 = 𝑏 0 𝑡+ 𝑏 1 𝐸 Linear baseline, linear E 𝑠 0 (𝑡)+ 𝑏 1 𝐸 Flexible baseline 𝑠 0 (𝑡)+ 𝑠 1 (𝐸) Flexible dose response 𝑠 0 𝑡 + 𝑏 1 𝐸+ b 2 t E linear TD 𝑠 0 𝑡 + 𝑏 1 𝐸+ 𝑠 2 𝑡 E Flexible TD 𝑠 0 𝑡 + 𝑠 1 𝐸 + 𝑠 2 𝑡 s 1 (E) Flexible TD and dose response Royston Parmar models (proportional hazard) l𝑛 𝐻 𝑡 =𝑠( ln 𝑡)+ 𝑏 1 𝐸 Proportional hazard l𝑛 𝐻 𝑡 =𝑠( ln 𝑡)+ 𝑏 1 𝐸+ 𝑠 2 ( ln 𝑡) 𝐸 Flexible TD Feb-20 H.S.
18
Survival models Cox Parametric Flexible Baseline hazard: free Estimate
time time Baseline hazard: free Estimate baseline hazard: may be unstable easy, stable easy, stable Feb-20 H.S.
19
Royston-Parmar models
Feb-20 H.S.
20
Royston-Parmar models
3 flavors Generalized: Proportional hazard Weibull Proportional odds log logistic Probit log normal Stata syntax stpm2 variables, scale(hazard) df(3) Survival time parametric model version 2 Proportion hazard model 3 degrees of freedom for baseline hazard = 3 cubic splines df(1) = Weibull R: flexsurv Feb-20 H.S.
21
Syntax “5,1 Flexible Parametric Survival, Basic“ “Unstable hazards from Cox model” line 24
Feb-20 H.S.
22
Predict Feb-20 H.S.
23
Powerful predict command
Survival Hazard Model Proportional hazards model Survival difference Hazard ratio Hazard difference Feb-20 H.S.
24
Predict Model Predict stpm2 year8594, scale(hazard) df(4) eform
predict s, survival survival predict sd, sdiff1(year8594 1) ci survival difference predict h, hazard per(1000) hazard predict hd, hdiff1(year8594 1) ci hazard difference predict hr, hrnumerator(year8594 1) ci hazard ratio +++ Feb-20 H.S.
25
Out of sample predictions
Predict on a new timescale to: get nice plots in small data speed up predictions in big data predict beyond observed time make a table of results Model stpm2 year8594 female, scale(hazard) df(4) eform New time range t t2 from 0-20, 100 values Predict predict s0, survival zero timevar(t2) OBS, all covariates must be specified when using a new timescale Feb-20 H.S.
26
Syntax “5,1 Flexible Parametric Survival, Basic“ “Predict options” line 24
Feb-20 H.S.
27
Dose response modeling
Replace exposure variable with splines no margins command more difficult predictions Four methods shown in the syntax The last is perhaps the easiest Feb-20 H.S.
28
Non-proportional hazards
Time dependent effects Non-proportional hazards Feb-20 H.S.
29
Proportional vs non-proportional
baseline Hazards how to model baseline, dose-response, and TD Non- Hazard Ratios time dependent, TD Feb-20 H.S.
30
Baseline, dose-response and TD
t= time E= exposure (cont.) s(.)=spline TD=Time Dependent Generic models l𝑛 ℎ 𝑡 = 𝑠 0 (𝑡)+ 𝑏 1 𝐸 Flexible baseline 𝑠 0 (𝑡)+ 𝑠 1 (𝐸) Flexible dose response 𝑠 0 𝑡 + 𝑏 1 𝐸+ b 2 t E linear TD 𝑠 0 𝑡 + 𝑏 1 𝐸+ 𝑠 2 𝑡 E Flexible TD Royston Parmar models (proportional hazard) l𝑛 𝐻 𝑡 =𝑠( ln 𝑡)+ 𝑏 1 𝐸 Proportional hazard l𝑛 𝐻 𝑡 =𝑠( ln 𝑡)+ 𝑏 1 𝐸+ 𝑠 2 ( ln 𝑡) 𝐸 Flexible TD Feb-20 H.S.
31
Non-Proportional Cox vs Flexible par.
Non-Proportional Hazards = TD = Time Varying Coefficients (tvc) Cox model time dependent effect tvc(var) texp(_t) tvc(var) texp(_t>5) tvc(var) texp(ln(_t)) stsplit Syntax: TD effects in Cox versus RP Flexible parametric tvc(var) dftvc(2) two splines Feb-20 H.S.
32
Time varying coefficient
Model commands Proportional hazards stpm2 sex, scale(hazard) df(4) eform Non-proportional hazards = Time Dependent HR stpm2 sex, scale(hazard) df(4) eform tvc(sex) dftvc(2) Time varying coefficient of sex, with 2 splines Testing for non-proportional hazards Standard method Easier method Fit at PH model Plot Schoenfeld residuals Add “tvc” terms as needed Add “tvc” terms (test PH vs TD with LR-test) Predict and plot HR Syntax: Test Proportional Hazards assumption “tvc” for confounders? Feb-20 H.S.
33
Syntax “5,2 Flexible Parametric Survival, Time Dependent“ “PH versus TD models using female as exposure” line 30 Feb-20 H.S.
34
Summary so far Effect of sex on melanoma survival Next
PH model constant HR Non-PH model time dependent HR (TD) Next TD hazard ratios for age Are the sex estimates the same? Handle huge HR caused by hazards close to zero Plot HR Plot HD (hazard differences) Feb-20 H.S.
35
Syntax “5,2 Flexible Parametric Survival, Time Dependent“ “PH versus TD models using age groups” line 85 Feb-20 H.S.
36
Summary so far TD hazard ratios for age
sex estimates the same (PH vs TD) Plot HR Table of HR Huge HR caused by hazards close to zero Plot HD (hazard differences) as better measure Plot survival Plot survival difference Feb-20 H.S.
37
Time dependent HR Huge HR Use HD instead Proportional HR Feb-20 H.S.
38
Summing up Standard Cox Standard Royston-Parmar
Proportional Hazards constant HR Limited options for non-proportional hazards Standard Royston-Parmar Flexible non-proportional hazards (splines) Predict and plot Hazards Hazard Ratios Hazard Differences Survival Survival differences Feb-20 H.S.
39
Other features of RP models
Relative survival. Competing risk Joint modelling of longitudinal and survival data Nested Case-Control and Case-Cohort Multistate models (event history analysis) Simulation of survival data Feb-20 H.S.
40
References Bower H, Crowther MJ, Lambert PC Strcs: A command for fitting flexible parametric survival models on the log-hazard scale. Stata Journal 16: Crowther MJ, Abrams KR, Lambert PC Flexible parametric joint modelling of longitudinal and survival data. Statistics in Medicine 31: Crowther MJ, Lambert PC Framework and optimisation procedure for flexible parametric survival models reply. Statistics in Medicine 34: Hinchliffe SR, Lambert PC. 2013a. Extending the flexible parametric survival model for competing risks. Stata Journal 13: Hinchliffe SR, Lambert PC. 2013b. Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC medical research methodology 13. Lambert P, Royston P Flexible parametric alternatives to the cox model. In: Stata user group. Lambert PC, Royston P Further development of flexible parametric models for survival analysis. Stata Journal 9: Lambert PC, Wilkes SR, Crowther MJ Flexible parametric modelling of the cause-specific cumulative incidence function. Statistics in Medicine 36: Mozumder SI, Rutherford MJ, Lambert PC A flexible parametric competing-risks model using a direct likelihood approach for the cause-specific cumulative incidence function. Stata Journal 17: Nelson CP, Lambert PC, Squire IB, Jones DR Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in Medicine 26: Royston P, Parmar MKB Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: Wacholder S, Weinberg CR Flexible maximum-likelihood methods for assessing joint effects in case-control studies with complex sampling. Biometrics 50: Feb-20 H.S.
41
Course Feb-20 H.S.
42
Software * From Statistical Software Components * Flexible Parametric Survival ssc install stpm2, replace * Restricted Cubic Splines ssc install rcsgen, replace ssc install strcs, replace * Competing risk ssc install stcompet, replace ssc install stcompadj, replace ssc install stpm2cm, replace ssc install stpm2cif, replace ssc install stpepemori, replace * Explained variance ssc install str2d, replace * Imputation ssc install stsurvimpute, replace * Symmetric nearest neighbor smoothing ssc install running, replace * Not from Statistical Software Components * Relative survival, manual install search strs Feb-20 H.S.
43
Syntax and data files * Download Paul Dickman course files mkdir c:\survival cd c:\survival net install all replace * Download Royston-Lambert book material net from net get fpsaus-dta net get fpsaus-do1 net get fpsaus-do2 * Syntax files for this course Opus>Metodelunsj>Delte dokumenter>Flexible Parametric Survival Course: 0 Flexible Parametric Survival Course, Downloads.do 1 Flexible Parametric Survival Course, Basic.do 2 Flexible Parametric Survival Course, TD.do 3 Flexible Parametric Survival Course, Spesial.do Feb-20 H.S.
44
Extra slides Feb-20 H.S.
45
Generalized Weibull model
Often used parametric model: Weibull Log cumulative hazard is linear in log t: ln 𝐻 𝑡 =𝑘 ln 𝑡− k ln 𝜆 Generalize to (any) smooth baseline hazard: ln 𝐻 𝑡 =𝑠( ln 𝑡)− k ln 𝜆 s=spline Stable estimates on the log cumulative hazard scale. Splines generalize to (almost) any baseline hazard shape. Proportional hazard model Feb-20 H.S.
46
Purpose of regression Estimation Prediction
Estimate association between exposure and outcome adjusted for other covariates Estimate the effect of smoking on lung cancer Prediction Use an estimated model to predict the outcome given covariates in a new dataset Predict air pollution by distance from roads DAGs, bias, precision Predictive power, model fit, R2 Counfounding matter in the first Fit of the models matters in the last Feb-20 H.S.
47
Baseline, dose-response and TD
t= time E= exposure (cont.) s(.)=spline tvc=time varying coeff Generic model l𝑛 ℎ 𝑡 = 𝑏 0 𝑡 𝑏 1 𝐸 b 2 t E 𝑠 0 𝑡 + 𝑠 1 𝐸 + 𝑠 2 𝑡 s 1 (E) baseline dose response Time Dependent Flexible parametric model stpm2 agercs?, df(3) dftvc(2) tvc(agercs?) Feb-20 H.S.
48
Cox versus Royston-Parmar-PH
Both are proportional hazards models Almost identical effect estimates But: Feb-20 H.S.
49
Ex: unstable hazards from Cox model
FP= Flexible Parametric Cox baseline and treated FP baseline FP treated Conclusion: unstable hazards from the Cox model (with artifacts) Feb-20 H.S.
50
Relative survival Patient register Population Individual data
Tabular data What is the survival of patients relative to the population? SMR Relative Survival Feb-20 H.S.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.