Presentation is loading. Please wait.

Presentation is loading. Please wait.

Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.

Similar presentations


Presentation on theme: "Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

1 Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

2 Announcements Class topic: Time-varying data: example More details on Cox models & other fully parametric Proportional Hazard models Paper Assignment #2 handed out today Due April 26

3 EHA Models: In Greater Depth Issues: –Properties of Cox models (semi-parametric) and fully parametric models Plus relevant assumptions, diagnostics Strategies for Outliers Model Fit –Choosing a model. What should you do? –Other issues Accelerated failure time models Frailty Etc..

4 Event History Example What factors affect how soon a country passes an environmental protection law? Event: Passing an environmental law in a given year Risk set: All countries that have not yet passed an environmental protection law –We decided that risk begins at 1970 (when such laws were invented) Countries independent after 1970 are treated as entering the analysis “late” Option #2: Duration since independence (age) –But, that was less appropriate for the research question.

5 Example: Environmental Laws Cross-national time series dataset of nearly 100 countries Event: when a country writes its first comprehensive environmental law (e.g., EPA) Data taken from various sources Independent variables: GDP, population, democracy, degradation, education, domestic and international NGOs Time duration: analyses are from 1970-1998 In other words, countries enter the “risk set” in 1970, or when they become independent Total sample of 97 countries 73 countries have an event between 1970 and 1998.

6 Time-Varying Data Structure In the previous example, each row of data was a separate survey respondent Because survey respondents were not tracked over multiple years, this data was not “time-varying” In the current example, we have the advantage of time-varying data Each row of data is a country-year Our independent variables may change over time.

7 States, Spells, and Events Example (India): 1010 1970 … 1983 1984 1985 1986 1987 1988 … 1998 Year State Spell #2 Spell #1 Law written

8 States, Spells, and Events Example (Iran): 1010 1970 … 1983 1984 1985 1986 1987 1988 … 1998 Year State Spell #1 No law written as of 1998

9 Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198811798680 INDIA11191988011988198911815590 INDIA11191989011989199011832535 INDIA11191990011990199111849515 INDIA11191991011991199211866530 Example: Law written SpellState Population

10 Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198811798680 INDIA11191988011988198911815590 INDIA11191989011989199011832535 INDIA11191990011990199111849515 INDIA11191991011991199211866530 Stset command: stset end, failure(es==1) origin(1970) Note: It is common to drop cases that are not at risk (ex: if start state = 1) BUT, it is not necessary… Stata drops cases after the event by default…unless you specify exit(time.)

11 Time-Varying Data Structure What if countries pass multiple laws? Called “repeated events 1. start state could be reset to zero 2. We can override the stata default of removing cases after the first event occurs: exit(time.) newname2newid3yearlaweventnumstartendssespop INDIA11191978011978197900656941 INDIA11191979011979198000672021 INDIA11191980011980198100687332 INDIA11191981011981198200702821 INDIA11191982011982198300718426 INDIA11191983011983198400734072 INDIA11191984011984198500749677 INDIA11191985011985198600765147 INDIA11191986111986198701781893 INDIA11191987011987198800798680 INDIA11191988011988198900815590 INDIA11191989011989199000832535 INDIA11191990011990199101849515 INDIA11191991011991199200866530

12 Cumulative Survivor Function

13 Cumulative Survivor Function by Region

14 Cumulative Survivor Function West vs. non-West

15 Smoothed Hazard Function West vs. non-West

16 Constant Rate Model: Example Simple one-variable model comparing west vs. non-west streg west, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 97 Number of obs = 2047 No. of failures = 81 Time at risk = 2047 Wald chi2(1) = 12.10 Log pseudolikelihood = 275.49924 Prob > chi2 = 0.0005 (Std. Err. adjusted for 97 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- west |.6931146.1992638 3.48 0.001.3025648 1.083664 _cons | -3.34054.0807514 -41.37 0.000 -3.49881 -3.18227

17 Constant Rate Model: Example Model with time-varying covariates No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = 94.29 Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp | -.044568.1842564 -0.24 0.809 -.4057039.3165679 degradation | -.4766958.1044108 -4.57 0.000 -.6813372 -.2720543 education |.0377531.0130314 2.90 0.004.0122121.0632942 democracy |.2295392.0959669 2.39 0.017.0414475.417631 ngo |.4258148.1576803 2.70 0.007.1167671.7348624 ingo |.3114173.365112 0.85 0.394 -.4041891 1.027024 _cons | -4.565513 1.864396 -2.45 0.014 -8.219663 -.9113642 Democratic countries enact laws at a higher rate than less-democratic countries

18 Constant Rate Model: Example Same model – with Hazard Ratios No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = 94.29 Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.9564106.1762248 -0.24 0.809.6665075 1.372409 degradation |.6208314.0648215 -4.57 0.000.50594.7618129 education | 1.038475.0135328 2.90 0.004 1.012287 1.06534 democracy | 1.25802.1207283 2.39 0.017 1.042318 1.51836 ngo | 1.530837.2413828 2.70 0.007 1.123858 2.085195 ingo | 1.365359.498509 0.85 0.394.6675179 2.792742 ------------------------------------------------------------------------------ A 1-point increase in democracy increases the hazard rate by 25.8%!

19 Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #1: Create an interaction term No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(8) = 91.25 Log pseudolikelihood = 282.5435 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp | -.0789765.2546507 -0.31 0.756 -.5780827.4201298 degradation | -.4656443.1177774 -3.95 0.000 -.6964838 -.2348047 education |.0425672.0137641 3.09 0.002.01559.0695444 democracy |.2277121.0951693 2.39 0.017.0411836.4142406 ngo |.4069064.1595268 2.55 0.011.0942397.7195732 ingo | -.1326514.6842896 -0.19 0.846 -1.473834 1.208532 nonwest | -3.345421 4.94285 -0.68 0.499 -13.03323 6.342387 ingoXnonwest |.49408.6819827 0.72 0.469 -.8425815 1.830741 _cons | -1.28664 5.692187 -0.23 0.821 -12.44312 9.869841

20 Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #2: Include only non-Western countries in the analysis No. of subjects = 76 Number of obs = 1720 No. of failures = 61 Time at risk = 1720 Wald chi2(6) = 55.26 Log pseudolikelihood = 215.57325 Prob > chi2 = 0.0000 (Std. Err. adjusted for 76 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.3521921.3470927 1.01 0.310 -.3280971 1.032481 degradation | -.7326479.2566293 -2.85 0.004 -1.235632 -.2296637 education |.0314009.0193698 1.62 0.105 -.0065633.069365 democracy |.2387203.0935281 2.55 0.011.0554087.422032 ngo |.3604018.1984957 1.82 0.069 -.0286426.7494462 ingo |.5447586.4949746 1.10 0.271 -.4253738 1.514891 _cons | -8.446306 3.872579 -2.18 0.029 -16.03642 -.8561915

21 Cox Models The basic Cox model: Where h(t) is the hazard rate h 0 (t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model Also written as:

22 Cox Model: Example Mostly similar to exponential model… Cox regression -- Breslow method for ties No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = 65.49 Log pseudolikelihood = -287.27209 Prob > chi2 = 0.0000 (Std. Err. adjusted for 92 clusters in newid3) ------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.4572288.2025104 2.26 0.024.0603157.8541419 degradation | -.4311475.1131853 -3.81 0.000 -.6529867 -.2093083 education |.0027517.0136965 0.20 0.841 -.024093.0295964 democracy |.2836321.0911985 3.11 0.002.1048862.4623779 ngo |.2874221.1614045 1.78 0.075 -.0289248.603769 ingo | -.026845.2391101 -0.11 0.911 -.4954922.4418021 Most effects = similar… though education effect loses significance…

23 Cox Model Issues: Ties 1. How to handle ties in data It is mathematically complex to estimate models when there are tied failures –That is: two cases that have events at the exact same time Several mathematical approaches: –Breslow approximation – simplest approach Stata default, but not the best choice! –Efron approximation – generally better More computationally intensive, but given the power of modern computers it is not an issue Efron is generally preferred

24 Cox Model Issues: Ties –Exact marginal – “continuous time approximation” –Box-Steffensmeier & Jones: “Averaged Likelihood” Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings –Exact partial – “discrete” –Box-Steffensmeier & Jones: “exact discrete method” Assumes ties happened EXACTLY at the same time –Advice: Use Efron at a minimum Exact methods are often more accurate –Exact marginal often makes most sense… events rarely occur at the EXACT same time –But, exact methods can take a LONG time. –For big datasets with many ties, Efron is OK.

25 Cox Model: Baseline Hazard Cox models involve a “baseline hazard” Note: baseline = when all covariates are zero Question: What does the baseline hazard look like? –Or baseline survivor & integrated hazard? –Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps: 1. You must ask stata to save the info when you run the Cox model –Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0) 2. Use “stcurve” command to plot the baseline curves –Ex: stcurve, hazard OR stcurve, survival

26 Cox Model: Baseline Hazard Baseline rate: Adoption of environmental law

27 Cox Model: Baseline Hazard Note: It may not always make sense to plot the baseline hazard Baseline shows hazard when X variables are zero Sometimes zero values aren’t very useful/interesting –Example: Does it make sense to plot hazard of countries adopting laws, if GDP is zero? Hazard rate is quite low In some cases, you’ll just get a flat zero curve –Or extremely high values –Solutions: 1. Rescale indep vars before running cox model 2. Use stcurve to choose relevant values of vars.

28 Cox Model: Estimated Hazards You can also use stcurve to plot estimated hazard rates based on values of indep vars Ex: What is hazard curve if democracy = 1, 5, 10? Strategy: use “at” subcommand: stcurve, hazard at(democ=1) at2(democ=10) NOTE: All other variables are pegged at the mean…

29 Cox: Estimated Hazard Rate Hazard rate for adoption of environmental law

30 Proportional Hazard Assumption Key assumption: Proportional hazards Estimated Hazard ratios are proportional over time i.e., Estimates of a hazard ratio do NOT vary over time –Example: Effect of “abstinence” program on sexual behavior Issue: Do abstinence programs lower the rate in a consistent manner across time? –Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group). –Groups are assumed to have “parallel” hazards Rather than rates that diverge, converge (or cross).

31 Proportional Hazard Assumption Strategies: 1. Visually examine raw hazard plots for sub- groups in your data Watch for non-parallel trends A simple, crude method… but often identifies big violations

32 Proportional Hazard Assumption Visual examination of raw hazard rate Parallel trends in hazard rate look good!

33 Proportional Hazard Assumption 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables What stata calls “stphplot” Parallel lines indicate proportional hazards Again, convergence and divergence (or crossing) indicates violation –A less-common approach: compare observed survivor plot to predicted values (for different values of X) What stata calls “stcoxkm” If observed are similar to predicted, assumption is not likely to be violated.

34 Proportional Hazard Assumption -ln(-ln(survivor)) vs. ln(time) – “stphplot” Convergence suggests violation of proportional hazard assumption (But, I’ve seen worse!)

35 Proportional Hazard Assumption Cox estimate vs. observed KM – “stcoxkm” Predicted differs from observed for countries in West

36 Proportional Hazard Assumption 3. Piecewise Models Piecewise = break model up into pieces (by time) –Ex: Split analysis in to “early” vs “late” time If coefficients vary in different time periods, hazards are not proportional –Example: stcox var1 var2 var3 if _t < 10 stcox var1 var2 var3 if _t >= 10 Look for large changes in coefficients!

37 Proportional Hazard Assumption In a piecewise model, coefficients would differ in non-proportional models Proportional Non-Proportional Here, the effect is the same in both time periods Early Late Early Late Here, the effect is negative in the early period and positive in the late period

38 Piecewise Models Look at coefficients at 2 (or more) spans of time EARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.4465818.4255587 1.05 0.294 -.3874979 1.280661 degradation | -.282548.1572746 -1.80 0.072 -.5908005.0257045 education | -.0195118.0328195 -0.59 0.552 -.0838368.0448131 democracy |.2295673.2625205 0.87 0.382 -.2849634.744098 ngo |.6792462.3110294 2.18 0.029.0696399 1.288853 ingo |.6664661.4804229 1.39 0.165 -.2751456 1.608078 ------------------------------------------------------------------------------ LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gdp |.4963942.357739 1.39 0.165 -.2047613 1.19755 degradation | -.5702894.2395257 -2.38 0.017 -1.039751 -.1008277 education |.0142118.0143762 0.99 0.323 -.0139649.0423886 democracy |.2541799.0981386 2.59 0.010.0618317.4465281 ngo |.1742862.1448187 1.20 0.229 -.1095532.4581256 ingo | -.1134661.2104308 -0.54 0.590 -.5259028.2989707 ------------------------------------------------------------------------------ Note: Effect of ngo is larger in early period

39 Proportional Hazard Assumption 4. Tests based on re-estimating model Try including time interactions in your model Recall: Interactions – effect of A on C varies with B If effect of variable X on hazard rate (or ratio) varies with time, then hazards aren’t proportional –Recall example: Abstinence programs Perhaps abstinence programs have a big effect initially, but the effect diminishes (or reverses) later on

40 Proportional Hazard Assumption Red = Abstinence group; green = control Proportional Non-Proportional In non-proportional case, the effect of abstinence programs varies across time

41 Proportional Hazard Assumption Strategy: Create variables that reflect the interaction of X variables with time Significant effects of time interactions indicate non- proportional hazard Fortunately, inclusion of the interaction term in the model corrects the problem. Issue: X variables can interact with time in multiple ways… –Linearly –With “log time” or time squared –With time dummies –You may have to try a range of things…

42 Proportional Hazard Assumption Red = Abstinence group; green = control Linear time interaction Effect grows consistently over time Try “Abstinence*time” Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

43 Proportional Hazard Assumption 5. Grambsch & Therneau test –Ex: Stata “ estat phtest” Test for non-zero slope of Schoenfeld residuals vs time –Implies log hazard ratio function = proportional Can be applied to general model, or for each variable stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*). estat phtest Test of proportional hazards assumption Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ---------------------------------------------------------------- Significant chi-square indicates violation of proportional hazard assumption

44 Proportional Hazard Assumption Variable-by-variable test “estat phtest”:. estat phtest, detail Test of proportional hazards assumption Time: Time ---------------------------------------------------------------- | rho chi2 df Prob>chi2 ------------+--------------------------------------------------- gdp | 0.09035 0.63 1 0.4277 degradation | -0.22735 3.41 1 0.0646 education | 0.06915 0.47 1 0.4950 democracy | -0.04929 0.20 1 0.6560 ngo | -0.18691 4.56 1 0.0327 ingo | -0.03759 0.34 1 0.5609 ------------+--------------------------------------------------- global test | 18.14 6 0.0059 ---------------------------------------------------------------- Note: Certain variables are especially problematic…

45 Proportional Hazard Assumption Notes on estat phtest : –1. Requires that you calculate “schoenfeld residuals” when you run the original cox model –And, if you want a test for each variable, you must also request scaled schoenfeld residuals –2. Test is based on identifying non-zero time trend… but how should we characterize time? Options: normal/linear time, log time, time dummies, etc –Results may differ depending on your choice –Ex: estat phtest, log – specifies “log time” Plot of smoothed Schoenfeld residuals can indicate best way to characterize time –Linear trend (not a curve) indicates that time is characterized OK –Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

46 Proportional Hazard Assumption What if the assumption is violated? 1. Improve model specification Add time interactions to address nonproportionality Ex: If high democracies are not proportional to low democracies, try adding “highdemoc*time” Variables can be interacted with linear time, log time, time dummies, etc., to address the issue 2. Model groups separately Split sample along variables that are non-proportional.

47 Proportional Hazard Assumption What if the assumption is violated? 3. Use a stratified Cox model Allows a different baseline hazard for each group –But, you can’t estimate effect of stratifying variable! Ex: stcox var1 var2 var3, strata(Dhighdemoc) 4. Use a piecewise model Split time into chunks… in which PH assumption is met –Requires sufficient sample size in all time periods! 5. Live with it (but temper your conclusions) Allison points out: Cox model is reasonably robust –Other issues (e.g., model misspecification) are bigger issues.


Download ppt "Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission."

Similar presentations


Ads by Google