Event History Analysis 6

Name: Event History Analysis 6
Uploaded: 2017-07-06T07:05:36+00:00
Duration: PTM23S24
Description: Event History Analysis 6

Event History Analysis 6
Sociology 8811 Lecture 20 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements Paper Assignment #2 handed out previously Class topic:
Due April 26 Class topic: More details on Cox model diagnostics Next class: parametric models, AFT models (if time)… Then – new topics!

Review: Cox Model Basics
Choosing how to deal with ties Several algorithms available Efron method is better than Breslow (default) Exact marginal is even better – but time consuming The baseline hazard rate Can be computed in stata… Plus, you can estimate the hazard rates at particular values of variables The proportional hazard assumption We’ll continue discussing several methods…

Proportional Hazard Assumption
Key assumption: Proportional hazards Estimated Hazard ratios are proportional over time i.e., Estimates of a hazard ratio do NOT vary over time Example: Effect of “abstinence” program on sexual behavior Issue: Do abstinence programs lower the rate in a consistent manner across time? Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group). Groups are assumed to have “parallel” hazards Rather than rates that diverge, converge (or cross).

Strategies: 1. Visually examine raw hazard plots for sub-groups in your data Watch for non-parallel trends A simple, crude method… but often identifies big violations

Visual examination of raw hazard rate Parallel trends in hazard rate indicate proportionality

2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables What stata calls “stphplot” Parallel lines indicate proportional hazards Again, convergence and divergence (or crossing) indicates violation A less-common approach: compare observed survivor plot to predicted values (for different values of X) What stata calls “stcoxkm” If observed are similar to predicted, assumption is not likely to be violated.

-ln(-ln(survivor)) vs. ln(time) – “stphplot” Convergence suggests violation of proportional hazard assumption (But, I’ve seen worse!)

Cox estimate vs. observed KM – “stcoxkm” Predicted differs from observed for countries in West

3. Piecewise Models Piecewise = break model up into pieces (by time) Ex: Split analysis in to “early” vs “late” time If coefficients vary in different time periods, hazards are not proportional Example: stcox var1 var2 var3 if _t < 10 stcox var1 var2 var3 if _t >= 10 Look for large changes in coefficients!

In a piecewise model, coefficients would differ in non-proportional models Non-Proportional Proportional Early Late Here, the effect is the same in both time periods Here, the effect is negative in the early period and positive in the late period

4. Tests based on re-estimating model Try including time interactions in your model Recall: Interactions – effect of A on C varies with B If effect of variable X on hazard rate (or ratio) varies with time, then hazards aren’t proportional Recall example: Abstinence programs Perhaps abstinence programs have a big effect initially, but the effect diminishes (or reverses) later on

Red = Abstinence group; green = control Positive time interaction No time interaction In non-proportional case, the effect of abstinence programs varies across time

Strategy: Create variables that reflect the interaction of X variables with time Significant effects of time interactions indicate non-proportional hazard Fortunately, inclusion of the interaction term in the model corrects the problem. Issue: X variables can interact with time in multiple ways… Linearly With “log time” or time squared With time dummies You may have to try a range of things…

Red = Abstinence group; green = control Linear time interaction Effect grows consistently over time Try “Abstinence*time” Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

5. Grambsch & Therneau test Ex: Stata “estat phtest” Test for non-zero slope of Schoenfeld residuals vs time Implies log hazard ratio function = proportional Can be applied to general model, or for each variable stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*) . estat phtest Test of proportional hazards assumption Time: Time | chi df Prob>chi2 global test | Significant chi-square indicates violation of proportional hazard assumption

Notes on estat phtest : 1. Requires that you calculate “schoenfeld residuals” when you run the original cox model And, if you want a test for each variable, you must also request scaled schoenfeld residuals 2. Test is based on identifying non-zero time trend… but how should we characterize time? Options: normal/linear time, log time, time dummies, etc Results may differ depending on your choice Ex: estat phtest, log – specifies “log time” Plot of smoothed Schoenfeld residuals can indicate best way to characterize time Linear trend (not a curve) indicates that time is characterized OK Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

What if the assumption is violated? 1. Improve model specification Add time interactions to address nonproportionality Ex: If high democracies are not proportional to low democracies, try adding “highdemoc*time” Variables can be interacted with linear time, log time, time dummies, etc., to address the issue 2. Model groups separately Split sample along variables that are non-proportional.

What if the assumption is violated? 3. Use a stratified Cox model Allows a different baseline hazard for each group But, you can’t estimate effect of stratifying variable! Ex: stcox var1 var2 var3, strata(Dhighdemoc) 4. Use a piecewise model Split time into chunks… in which PH assumption is met Requires sufficient sample size in all time periods!

What if the assumption is violated? 5. Live with it (but temper your conclusions) Violation of proportional hazard assumption tends to: Overestimate the effect of variables whose hazard ratios are increasing over time And, underestimate those whose hazard ratios are decreasing However, Allison points out: Cox model is reasonably robust Other issues (e.g., model misspecification) are bigger issues

Cox Model: Residuals OLS regression: Residuals = difference between predicted value of Y and observed Y-hat – Yi EHA: Residuals are more complicated You could compute predicted failure minus observed… But, what about censored cases? What is observed? There are a number of different ways to calculate residuals… each with different properties.

Cox Model: Residuals – Summary
From Cleves et al. (2004) An Introduction to Survival Analysis Using Stata, p. 184: 1. Cox-Snell residuals … are useful for assessing overall model fit 2. Martingale residuals Are useful in determining the functional form of the covariates to be included in the model 3. Schoenfeld residuals (scaled & unscaled), score residuals, and efficient score residuals Are useful for checking & testing the proportional hazard assumption, examining leverage points, and identifying outliers 4. Deviance residuals Are useful fin examining model accuracy and identifying outliers.

Martingale/deviance Residuals Outliers
Martingale residuals: difference over time of observed failures minus expected failures Feature: range from +1 to –infinity Deviance residuals = martingale residuals that are rescaled to be symmetric around zero Easier to interpret Extreme martingale or deviance residuals may indicate outliers Plot residuals vs. time, case number, IVs, etc. Or simply sort data by residuals & list the cases.

Martingale & Deviance Residuals: Outliers
Stata code to identify outliers: *run Cox Model, calculate martingale residuals stcox var1 var2 var3, robust nohr mgale(mg) * Creates variable “mg” which contains martingale residuals * Next, compute deviance residuals using “predict” predict dev, deviance gen caseid = _n * create plots of various types scatter mg caseid * Deviance residual plots are generally easier to interpret scatter dev caseid scatter dev caseid, mlabel(newname2) scatter dev _t

Deviance Residuals Plot
Extreme values may be outliers Here, no obvious outliers are visible

Efficient Score Residuals: Influential Cases
Procedure for identifying outliers using ESRs It is possible to compute DFBETAs based on ESRs DFBETA: Change in coefficient a variable’s coefficient due to a particular case in the analysis Cases with big DFBETAS may be overly influential Issue: Stata cannot automatically compute DFBETAS… You have to compute them manually Also, computation = limited to 800 cases (for “intercooled stata”) Hopefully stata will improve this in the future.

ESRs: Influential Cases
Stata code to estimate DFBETAs: * Run Cox model, request efficient score residuals * Creates vars: esr1 to esr5 corresponding to vars listed in model stcox gdp var1 var2 var3 var4, robust nohr esr(esr*) * Create room for a matrix of up to 800 rows (for your cases) set matsize 800 * Create esr matrix mkmat esr1 esr2 esr3 esr4, matrix(esr) * Multiply ESRs and Var/Cov matrix to estimate DFBETAs, save results mat V=e(V) mat Inf = esr*V svmat Inf, names(s) * Label estimates for subsequent plots label var s1 "dfbeta – var 1" label var s2 "dfbeta – var 2" label var s3 "dfbeta – var 3" label var s4 "dfbeta – var 4" * Plot DFBETAs for each variable vs. time or case number scatter s1 _t, yline(0) mlab(caseID) s(i) scatter s1 casenumber, yline(0) mlab(caseID) s(i) * Look for extreme values (for each IV – s1 to s4)

DFBETA Example DFBETA for NGOs (plotted by casenumber)
DFBETA value indicates that presence of Latvia changes NGO coefficient by standard deviations

Cox-Snell residuals: Model Fit
Cox-Snell residuals can be plotted to assess model fit If model fits well, graph of integrated (cumulative) hazard conditional on Cox-Snell residuals vs. Cox-Snell residuals will fall on a line Strategy in stata: Run Cox model, request martingale residuals Use “predict” to compute Cox-Snell residuals Stset your data again, with Cox-Snell as time variable Compute integrated hazard Graph integrated hazard versus residuals.

Cox-Snell Model Fit Example
Cox-Snell Plot for Environmental Law data This looks quite bad. Cumulative hazard should fall on the line… Instead, there is a sizable gap. Note: Don’t worry about small deviations from the line at the right edge of the plot. There are typically few cases there…

Stratified Cox Models Stratified models allow different baseline hazards for sub-groups in your data But constrain all model coefficients to be the same across all groups Useful if we think that some sub-groups have very different hazard curves over time AND we aren’t really interested in differences across those groups – it is just a nuisance Another option is to simply analyze groups separately But, we lose sample size. Stratifying avoids that.

Cox Models for Grouped Data
Sometimes cases are not independent Ex: Students in same class; people in same family Two useful options: 1. Stata’s “cluster” command: Adjusts standard errors based on group membership stcox var1 var2 var3, cluster(FamilyID) 2. Cox model with shared frailty Another name for a random effects model We’ll discuss this later in the course Don’t confuse with non-shared frailty models! stcox var1 var2 var3, shared(FamilyID)

Stata Note: TVC Note: Stata has a special option for “time varying covariates” You DO NOT need to use this!!! It is designed for cases where you wish to specify the character of time variation (e.g., a rate of decay) You can simply create time-varying variables in your dataset… STATA will analyze it properly using the stcox command.

Event History Analysis 6

Similar presentations

Presentation on theme: "Event History Analysis 6"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Event History Analysis 6

Similar presentations

Presentation on theme: "Event History Analysis 6"— Presentation transcript:

Similar presentations

About project

Feedback