Announcements Assignment #1 due! Assignment #2 handed out Due in 1 week Agenda The idea of an EHA model Interpreting coefficients Types of EHA models –Break EHA data structures.
Survivor: Marriage Compare survivor for women, men: Survivor plot for Men (declines later) Survivor plot for Women (declines earlier)
Integrated Hazard: Marriage Compare Integrated Hazard for women, men: Integrated Hazard for men increases slower (and remains lower) than women
Hazard Plot: Marriage Hazard Rate: Full Sample
Hazard Plot: Marriage Smoothed Hazard Rate: Full Sample
From Plots to Tests to Models It appears from the plots that women get married faster than men Issue: How do we test hypotheses about the difference in rates? Can we be confident that the observed difference between men and women is not merely due to sampling variability? One crude solution: Compute confidence intervals around plots Ex: sts graph, surv ci
Tests of Equality for Survivor Fns Idea: Conduct a hypothesis test to see if survivor functions differ across groups Like a t-test for difference in means… Example: Log-Rank Test Based on calculating the expected # failures at each point in time if there were no difference between groups Then, compute difference between observed failures and expected value for each group Analogous to a chi-square test of independence for a crosstab.
Log-rank Test Example: Do women marry earlier than men?. sts test sex, logrank failure _d: married == 1 analysis time _t: endtime Log-rank test for equality of survivor functions | Events Events sex | observed expected | | Total | chi2(1) = Pr>chi2 = Significant Chi- square (p<.05) indicates that survivor plots differ
Tests of Equality for Survivor Fns Stata offers a variety of tests They mainly differ by how they weight cases –Some place greater weight on early failures, others on later –Tests available in Stata Log rank, Wilcoxon, Tarone-Ware, Peto-Peto-Prentice See Cleves et al for advice about which to use Usually, the results are similar across tests –But they differ in sensitivity to early vs. late events –Also: Cox test Based on a different principle Can be used with weighted data (“pweights”).
EHA Models Strategy: Model the hazard rate as a function of covariates Much like regression analysis Determine coefficients The extent to which change in independent variables results in a change in the hazard rate Use information from sample to compute t- values (and p-values) Test hypotheses about coefficients
EHA Models Issue: In standard regression, we must choose a proper “functional form” relating X’s to Y’s OLS is a “linear” model – assumes a liner relationship –e.g.: Y = a + b 1 X 1 + b 2 X 2 … + b n X n + e Logistic regression for discrete dependent variables – assumes an ‘S-curve’ relationship between variables When modeling the hazard rate h(t) over time, what relationship should we assume? There are many options: assume a flat hazard, or various S-shaped, U-shaped, or J-shaped curves We’ll discuss details later…
Constant Rate Models The simplest parametric EHA model assumes that the base hazard rate is generally “flat” over time Any observed changes are due to changed covariates Called a “Constant Rate” or “Exponential” model Note: assumption of constant rate isn’t always tenable Formula: Usually rewritten as:
Constant Rate Models Is the constant rate assumption tenable?
Constant Rate Models Question: Is the constant rate assumption tenable? Answer: Harder question than it seems… The hazard rate goes up and down over time –Not constant at all – even if smoothed However, if the change was merely the result of independent variables, then the underlying (base) rate might, in fact, be constant If your model doesn’t include variables that account for time variation in h(t), then a constant-rate model isn’t suitable.
Constant Rate Models Let’s run an analysis anyway… Ignore possible violation of assumptions regarding the functional form of h(t) Recall -- Constant rate model is: In this case, we’ll only specify one X var: DFEMALE – dummy variable indicating women Coefficient reflects difference in hazard rate for women versus men.
Constant Rate Model: Marriage A simple one-variable model comparing gender. streg sex, dist(exponential) nohr No. of subjects = Number of obs = No. of failures = Time at risk = LR chi2(1) = Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] Dfemale | _cons | The positive coefficient for DFemale indicates a higher hazard rate for women
Constant Rate Coefficients Interpreting the EHA coefficient: b =.19 Coefficients reflect change in log of the hazard –Recall one of the ways to write the formula: But – we aren’t interested in log rates We’re interested in change in the actual rate Solution: Exponentiate the coefficient i.e., use “inverse-log” function on calculator Result reflects the impact on the actual rate.
Constant Rate Coefficients Exponentiate the coefficient to generate the “hazard ratio” Multiplying by the hazard ratio indicates the increase in hazard rate for each unit increase in the independent variable Multiplying by 1.21 results in a 21% increase A hazard ratio of 2.00 = a 100% increase A hazard ratio of.25 = a decreased rate by 75%.
Constant Rate Coefficients The variable FEMALE is a dummy variable Women = 1, Men = 0 Increase from 0 to 1 (men to women) reflects a 21% increase in the hazard rate –Continuous measures, however can change by many points (e.g., Firm size, age, etc.) To determine effects of multiple point increases (e.g., firm size of 10 vs. 7) multiply repeatedly Ex: Hazard Ratio =.95, increase = 3 units:.95 x.95 x.95 =.86 – indicating a 14% decrease.
Hypothesis Tests: Marriage Final issue: Is the 21% higher hazard rate for women significantly different than men? Or is the observed difference likely due to chance? Solution: Hazard rate models calculate standard errors for coefficient estimates Allowing calculation of T-values, P-values _t | Coef. Std. Err. t P>|t| Female | _cons |
Types of EHA Models Two main types of proportional EHA Models 1. Parametric Models specify a functional form of h(t) Constant rate; Also: Gompertz, Weibull,etc. 2. Cox Models Also called “semi-parametric” Doesn’t specify a particular form for h(t) Each makes assumptions Like OLS assumptions regarding functional form, error variance, normality, etc If assumptions are violated, results can’t be trusted.
Parametric Models Parametric models make assumptions about the shape of the hazard rate over time –Conditional on X Much like OLS regression assumes a linear relationship between X and Y, logit assumes s-curve Options: constant, Gompertz, Weibull There is a piecewise exponential option, too Note: They also make standard statistical assumptions: Independent random sample Properly specified model, etc, etc…
Cox Models The basic Cox model: Where h(t) is the hazard rate h 0 (t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model bX’s are coefficients and covariates
Cox Model: Example Marriage example: No. of subjects = Number of obs = No. of failures = Time at risk = LR chi2(1) = Log likelihood = Prob > chi2 = _t | Coef. Std. Err. z P>|z| Female |
Cox vs. Parametric: Differences Cox Models do not make assumptions about the time-dependence of the hazard rate –Cox models focus on time-ordering of observed events ONLY They do not draw information from periods in which no events occur –After all, to do this you’d need to make some assumption about what rate you’d expect in that interval… –Benefit: One less assumption to be violated –Cost: Cox model is less efficient than a properly specified parametric model Standard errors = bigger; more data needed to get statistically significant results.
Cox vs. Parametric: Similarities Models discussed so far are all “proportional hazard” models Assumption: covariates (X’s) raise or lower the hazard rate in a proportional manner across time Ex: If women have higher risk of marriage than men, that elevated risk will be consistent over all time… Another way of putting it: –Cox Models assume that independent variables don’t interact with time At least, not in ways you haven’t controlled for i.e., that the hazard rate at different values of X are proportional (parallel) to each other over time
Proportional Hazard Models Proportionality: X variables shift h(t) up or down in a proportional manner h(t) time Proportional Women Men h(t) Not Proportional Women Men
Proportional Hazard Models Issue: Does the hazard rate for women diverge or converge with men over time? If so, the proportion (or ratio) of the rate changes. The proportional hazard assumption is violated Upcoming classes: We’ll discuss how to check the proportional hazard assumption and address violations…