6Hazard Plot: MarriageSmoothed Hazard Rate: Full Sample
7From Plots to Tests to Models It appears from the plots that women get married faster than menIssue:How do we test hypotheses about the difference in rates?Can we be confident that the observed difference between men and women is not merely due to sampling variability?One crude solution: Compute confidence intervals around plotsEx: sts graph, surv ci
8Tests of Equality for Survivor Fns Idea: Conduct a hypothesis test to see if survivor functions differ across groupsLike a t-test for difference in means…Example: Log-Rank TestBased on calculating the expected # failures at each point in time if there were no difference between groupsThen, compute difference between observed failures and expected value for each groupAnalogous to a chi-square test of independence for a crosstab.
9Log-rank Test Example: Do women marry earlier than men? . sts test sex, logrankfailure _d: married == 1analysis time _t: endtimeLog-rank test for equality of survivor functions| Events Eventssex | observed expected1 |2 |Total |chi2(1) =Pr>chi2 =Significant Chi-square (p<.05) indicates that survivor plots differ
10Tests of Equality for Survivor Fns Stata offers a variety of testsThey mainly differ by how they weight casesSome place greater weight on early failures, others on laterTests available in StataLog rank, Wilcoxon, Tarone-Ware, Peto-Peto-PrenticeSee Cleves et al for advice about which to useUsually, the results are similar across testsBut they differ in sensitivity to early vs. late eventsAlso: Cox testBased on a different principleCan be used with weighted data (“pweights”).
11EHA Models Strategy: Model the hazard rate as a function of covariates Much like regression analysisDetermine coefficientsThe extent to which change in independent variables results in a change in the hazard rateUse information from sample to compute t-values (and p-values)Test hypotheses about coefficients
12EHA ModelsIssue: In standard regression, we must choose a proper “functional form” relating X’s to Y’sOLS is a “linear” model – assumes a liner relationshipe.g.: Y = a + b1X1 + b2X2 … + bnXn + eLogistic regression for discrete dependent variables – assumes an ‘S-curve’ relationship between variablesWhen modeling the hazard rate h(t) over time, what relationship should we assume?There are many options: assume a flat hazard, or various S-shaped, U-shaped, or J-shaped curvesWe’ll discuss details later…
13Constant Rate ModelsThe simplest parametric EHA model assumes that the base hazard rate is generally “flat” over timeAny observed changes are due to changed covariatesCalled a “Constant Rate” or “Exponential” modelNote: assumption of constant rate isn’t always tenableFormula:Usually rewritten as:
14Constant Rate ModelsIs the constant rate assumption tenable?
15Constant Rate ModelsQuestion: Is the constant rate assumption tenable?Answer: Harder question than it seems…The hazard rate goes up and down over timeNot constant at all – even if smoothedHowever, if the change was merely the result of independent variables, then the underlying (base) rate might, in fact, be constantIf your model doesn’t include variables that account for time variation in h(t), then a constant-rate model isn’t suitable.
16Constant Rate Models Let’s run an analysis anyway… Ignore possible violation of assumptions regarding the functional form of h(t)Recall -- Constant rate model is:In this case, we’ll only specify one X var:DFEMALE – dummy variable indicating womenCoefficient reflects difference in hazard rate for women versus men.
17Constant Rate Model: Marriage A simple one-variable model comparing gender. streg sex, dist(exponential) nohrNo. of subjects = Number of obs =No. of failures =Time at risk =LR chi2(1) =Log likelihood = Prob > chi2 =_t | Coef. Std. Err z P>|z| [95% Conf. Interval]Dfemale |_cons |The positive coefficient for DFemale indicates a higher hazard rate for women
18Constant Rate Coefficients Interpreting the EHA coefficient: b = .19Coefficients reflect change in log of the hazardRecall one of the ways to write the formula:But – we aren’t interested in log ratesWe’re interested in change in the actual rateSolution: Exponentiate the coefficienti.e., use “inverse-log” function on calculatorResult reflects the impact on the actual rate.
19Constant Rate Coefficients Exponentiate the coefficient to generate the “hazard ratio”Multiplying by the hazard ratio indicates the increase in hazard rate for each unit increase in the independent variableMultiplying by 1.21 results in a 21% increaseA hazard ratio of 2.00 = a 100% increaseA hazard ratio of .25 = a decreased rate by 75%.
20Constant Rate Coefficients The variable FEMALE is a dummy variableWomen = 1, Men = 0Increase from 0 to 1 (men to women) reflects a 21% increase in the hazard rateContinuous measures, however can change by many points (e.g., Firm size, age, etc.)To determine effects of multiple point increases (e.g., firm size of 10 vs. 7) multiply repeatedlyEx: Hazard Ratio = .95, increase = 3 units:.95 x .95 x .95 = .86 – indicating a 14% decrease.
21Hypothesis Tests: Marriage Final issue: Is the 21% higher hazard rate for women significantly different than men?Or is the observed difference likely due to chance?Solution: Hazard rate models calculate standard errors for coefficient estimatesAllowing calculation of T-values, P-values_t | Coef. Std. Err t P>|t|Female |_cons |
22Types of EHA Models Two main types of proportional EHA Models 1. Parametric Modelsspecify a functional form of h(t)Constant rate; Also: Gompertz, Weibull,etc.2. Cox ModelsAlso called “semi-parametric”Doesn’t specify a particular form for h(t)Each makes assumptionsLike OLS assumptions regarding functional form, error variance, normality, etcIf assumptions are violated, results can’t be trusted.
23Parametric ModelsParametric models make assumptions about the shape of the hazard rate over timeConditional on XMuch like OLS regression assumes a linear relationship between X and Y, logit assumes s-curveOptions: constant, Gompertz, WeibullThere is a piecewise exponential option, tooNote: They also make standard statistical assumptions:Independent random sampleProperly specified model, etc, etc…
24Cox Models The basic Cox model: Where h(t) is the hazard rate h0(t) is some baseline hazard function (to be inferred from the data)This obviates the need for building a specific functional form into the modelbX’s are coefficients and covariates
25Cox Model: Example Marriage example: No. of subjects = Number of obs = 29269No. of failures = Time at risk =LR chi2(1) =Log likelihood = Prob > chi2 =_t | Coef. Std. Err z P>|z|Female |
26Cox vs. Parametric: Differences Cox Models do not make assumptions about the time-dependence of the hazard rateCox models focus on time-ordering of observed events ONLYThey do not draw information from periods in which no events occurAfter all, to do this you’d need to make some assumption about what rate you’d expect in that interval…Benefit: One less assumption to be violatedCost: Cox model is less efficient than a properly specified parametric modelStandard errors = bigger; more data needed to get statistically significant results.
27Cox vs. Parametric: Similarities Models discussed so far are all “proportional hazard” modelsAssumption: covariates (X’s) raise or lower the hazard rate in a proportional manner across timeEx: If women have higher risk of marriage than men, that elevated risk will be consistent over all time…Another way of putting it:Cox Models assume that independent variables don’t interact with timeAt least, not in ways you haven’t controlled fori.e., that the hazard rate at different values of X are proportional (parallel) to each other over time
28Proportional Hazard Models Proportionality: X variables shift h(t) up or down in a proportional mannerh(t)timeProportionalWomenMenh(t)Not ProportionalWomenMen
29Proportional Hazard Models Issue: Does the hazard rate for women diverge or converge with men over time?If so, the proportion (or ratio) of the rate changes.The proportional hazard assumption is violatedUpcoming classes:We’ll discuss how to check the proportional hazard assumption and address violations…