Download presentation

Presentation is loading. Please wait.

Published byUlises Munford Modified over 3 years ago

1
Judith D. Singer & John B. Willett Harvard Graduate School of Education Discrete-time survival analysis ALDA, Chapters 10, 11, and 12 Times change, and we change with them Anonymous, quoted in Holinsheds Chronicles, 1578

2
What we will cover…. Making sure were all on the same page: A quick review of basic descriptive statistics for discrete-time event data (Ch 10) using the Age at first intercourse study Specifying a discrete-time hazard model (§11.1 & 11.2)both heuristic and formal representations Model fitting, interpretation and comparison (§11.3- 11.6)very similar to logistic regression Alternative specifications of the baseline in the discrete- time hazard model (§12.1)more parsimonious representations of TIME Including time-varying predictors (§12.3)use of a person-period data set makes them easy to include (although interpretations require care) Evaluating and relaxing the proportionality assumption (§12.5)not all predictors have time constant effects Making sure were all on the same page: A quick review of basic descriptive statistics for discrete-time event data (Ch 10) using the Age at first intercourse study Specifying a discrete-time hazard model (§11.1 & 11.2)both heuristic and formal representations Model fitting, interpretation and comparison (§11.3- 11.6)very similar to logistic regression Alternative specifications of the baseline in the discrete- time hazard model (§12.1)more parsimonious representations of TIME Including time-varying predictors (§12.3)use of a person-period data set makes them easy to include (although interpretations require care) Evaluating and relaxing the proportionality assumption (§12.5)not all predictors have time constant effects © Singer & Willett, page 2

3
The life table: Describing the distribution of event occurrence over time (ALDA, Section 10.1, pp 326-329) n at risk j n censored j n events j Recall the grade of 1 st intercourse study: 180 middle school boys were tracked from 7 th through 12th grades. By the end of data collection (at the end of 12 th grade), n=126 (70.0%) had had sex; n=54 (30%) were censored (still virgins) © Singer & Willett, page 3

4
The discrete-time hazard function: Assessing the conditional risk of event occurrence (ALDA, Section 10.2.1, pp 330-339) Discrete-time hazard Conditional probability that individual i experiences the target event in time period j (T i = j) given that s/he didnt experience it in any earlier time period (T i j) h(t ij )=Pr{T i = j|T i j} Easy to estimate because each value of hazard is based on that intervals risk set. As a probability, discrete time hazard is bounded by 0 and 1. This is an issue for modeling well need to address Discrete-time hazard Conditional probability that individual i experiences the target event in time period j (T i = j) given that s/he didnt experience it in any earlier time period (T i j) h(t ij )=Pr{T i = j|T i j} Easy to estimate because each value of hazard is based on that intervals risk set. As a probability, discrete time hazard is bounded by 0 and 1. This is an issue for modeling well need to address © Singer & Willett, page 4

5
The survivor function (and median lifetime): Cumulating risk over time (ALDA, Section 10.2, pp 330-339) 6789101112 Grade 0.00 0.25 0.50 0.75 1.00 S(t) Discrete-time survival probability Probability that individual i will survive beyond time period j (T i > j) (i.e., will not experience the event until after time period j). S(t ij )=Pr{T i > j} By definition, at the beginning of time, S(t 0 )=1.0 Strategy for estimation: Since h(t ij ) tells us about the probability of event occurrence, 1-h(t ij ) tells us about the probability of non-occurrence (i.e., about survival) Discrete-time survival probability Probability that individual i will survive beyond time period j (T i > j) (i.e., will not experience the event until after time period j). S(t ij )=Pr{T i > j} By definition, at the beginning of time, S(t 0 )=1.0 Strategy for estimation: Since h(t ij ) tells us about the probability of event occurrence, 1-h(t ij ) tells us about the probability of non-occurrence (i.e., about survival) ML = 10.6 Estimated median lifetime © Singer & Willett, page 5

6
Towards a discrete time hazard model: Inspecting sample plots of within-group hazard functions: In raw and transformed scales (ALDA, Section 11.1.1, pp 358-361) PT=1 PT=0 Questions to ask when examining sample hazard functions: What is the shape of each hazard function? Does the relative level of hazard differ across groups? Suggests the appropriateness of the dual partition introduced earlier, but how do we deal with the bounded nature of hazard? Questions to ask when examining sample hazard functions: What is the shape of each hazard function? Does the relative level of hazard differ across groups? Suggests the appropriateness of the dual partition introduced earlier, but how do we deal with the bounded nature of hazard? PT=1 PT=0 Transform into odds Solves the upper bound problem but not the lower bound Transform into odds Solves the upper bound problem but not the lower bound PT=1 PT=0 Transform into logits Usually regularizes distances between functionsstretches distances between small values and compresses distances between large values Not bounded at all (although you need to get used to negative #s) Transform into logits Usually regularizes distances between functionsstretches distances between small values and compresses distances between large values Not bounded at all (although you need to get used to negative #s) + ++ + + + © Singer & Willett, page 6

7
What population model might have generated these sample data? Sample hazard estimates, alternative hypothesized models, parameterizing the DT hazard model (ALDA, Section 11.1.1, pp 366-369) Flat population logit hazard, shifted when PT switches from 0 to 1 Linear population logit hazard, shifted when PT switches from 0 to 1 General population logit hazard, shifted when PT switches from 0 to 1 When PT=1, you shift this entire baseline vertically by 1 How do we fit this model to data? Baseline logit hazard function (when PT=0) 7 9 10 11 12 (D 7 =1)(D 8 =1) (D 11 =1)(D 12 =1)... © Singer & Willett, page 7

8
The person-period data set: The key to fitting the discrete-time hazard model (ALDA, Section 10.5.1, pp 351-354) Person level data set idTcensorpt 1261201 193901 4071210 100010019193 100001008 100000107 0100000012407 0010000011407 0001000010407 000010009 000001008 000000107 All parameter estimates, standard errors, t- and z-statistics, goodness-of-fit statistics, and tests will be correct for the discrete- time hazard model Person-Period data set ptD12D11D10D9D8D7eventTIMEid 1100000112126 1010000011126 1001000010126 100010009 100001008 100000107 © Singer & Willett, page 8

9
Model A: A baseline discrete-time hazard model with no substantive predictors (ALDA, Section 11.4.1, pp 386-388) Because there are no predictors in Model A, this baseline is for the entire sample If estimates are approx equal, baseline is flat If estimates decline, hazard declines If estimates increase (as they do here), hazard increases Because there are no predictors in Model A, this baseline is for the entire sample If estimates are approx equal, baseline is flat If estimates decline, hazard declines If estimates increase (as they do here), hazard increases © Singer & Willett, page 9

10
Models B & C: Uncontrolled effects of substantive predictors (ALDA, Section 11.4.2 & 11.4.3, pp 388-390) Continuous predictors Antilogging still yields a estimated odds-ratio associated with a 1-unit difference in the predictor: The estimated odds of first intercourse are 1.56 times (just over 50% higher) for boys whose parents score one unit higher on this antisocial behavior index. The estimated odds of first intercourse for boys who have experienced a parenting transition are 2.4 times higher than the odds for boys who did not experience such a transition. Dichotomous predictors As in regular logistic regression, antilogging a yields the estimated odds-ratio associated with a 1-unit difference in the predictor: ^ © Singer & Willett, page 10

11
Comparing nested models using deviance statistics (and non-nested models information criteria) (ALDA, Section 11.6, pp 397-402) TIME dummies Deviance smaller value, better fit, 2 dist., compare nested models AIC, BIC smaller value, better fit, compare non- nested models Model B vs. Model A provides an uncontrolled test of H 0 : PT =0 Deviance=17.30(1), p<.001 Model C vs. Model A provides an uncontrolled test of H 0 : PAS =0 Deviance=14.79(1), p<.001 Model D vs. Models B&C provide controlled tests [Both rejected as well] © Singer & Willett, page 11

12
Displaying fitted hazard and survivor functions Substitute in prototypical predictor values and compute fitted values (ALDA, Section 11.5.1, pp 392-394) Model B In logit hazard scale, a constant vertical separation of 0.8736 In hazard scale, a non- constant vertical separation (no simple interpretation because this a proportional odds model, not a proportional hazards model!) Effect of PT cumulates into a large difference in estimated median lifetimes (9.9 vs. 11.8 2 years) © Singer & Willett, page 12

13
Pros and cons of the dummy specification for the main effect of TIME (ALDA, Section 12.1, pp 408-409) The dummy specification for TIME is: Completely general, placing no constraints on the shape of the baseline (logit) hazard function; Easily interpretableeach associated parameter represents logit hazard in time period j for the baseline group Consistent with life-table estimates PRO The dummy specification for TIME is also: Nothing more than an analytic decision, not a requirement of the discrete-time hazard model Completely lacking in parsimony. If J is large, it requires the inclusion of many unknown parameters; A problem when it yields fitted functions that fluctuate erratically across time periods because of nothing more than sampling variation CON Three reasons for considering an alternative specification Your study involves many discrete time periods (because data collection is long or time is less coarsely discretized) Hazard is expected to be near 0 in some time periods (causing convergence problems) Some time periods have small risk sets (because either the initial sample is small or hazard and censoring dramatically diminish the risk set over time) Three reasons for considering an alternative specification Your study involves many discrete time periods (because data collection is long or time is less coarsely discretized) Hazard is expected to be near 0 in some time periods (causing convergence problems) Some time periods have small risk sets (because either the initial sample is small or hazard and censoring dramatically diminish the risk set over time) The variable PERIOD in the person-period data set can be treated as continuous TIME © Singer & Willett, page 13

14
0 ONE 1 (TIME-c) 2Linear1 Centering constant helps interpretation 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME-c) 3 4cubic3 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3quadratic2 Common choices 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME- c) 3 4 (TIME-c) 4 53 stationary points 4 0 ONE 1 (TIME-c) 2 (TIME-c) 2 3 (TIME- c) 3 4 (TIME-c) 4 5 (TIME-c) 5 64 stationary points 5 Rarely adopted but gives a sense of whether you should stick with completely general specification 0 ONE 1Constant0 Always the worst fit (highest deviance) Comparing the general specification to an ordered set of polynomials Not necessarily the best, but a systematic set of informative choices (ALDA, Section 12.1.1, pp 409-412) 1 D 1 + … + J D J JGeneraln/a Model: logit h(t ij )= n parameters Behavior of logit hazard Order of polynomial Always the best fit (lowest deviance) Strategy for model comparison Because each lower order model is nested within each higher order model, Deviance statistics can be directly compared to help make analytic decisions © Singer & Willett, page 14

15
Examining alternative polynomial specification for TIME : Deviance statistics and fitted logit hazard functions (ALDA, Section 12.1.1, pp 412-419) The quadratic looks reasonably good, but can we test whether its good enough? General Constant Linear Quadratic Cubic Sample: 260 faculty members (who had received a National Academy of Education Post-Doc) Each was tracked for up to 9 years after taking his/her first academic job By the end of data collection, n=166 (63.8%) had received tenure; the other 36.2% were censored (because they might eventually receive tenure somewhere). Sample: 260 faculty members (who had received a National Academy of Education Post-Doc) Each was tracked for up to 9 years after taking his/her first academic job By the end of data collection, n=166 (63.8%) had received tenure; the other 36.2% were censored (because they might eventually receive tenure somewhere). Gamse and Conger (1997) Abt Associates Comparisons always worth making Is the added polynomial term necessary? Is this polynomial as good as the general spec? * Constant is terrible* Linear is better, but not as good as general * Quadratic is better still, and nearly as good as general * Cubic on up seem thoroughly unnecessary © Singer & Willett, page 15

16
Including time-varying predictors: Age of onset of 1 st depressive episode Sample: 1,393 adults ages 17 to 57 387 (27.8%) reported a first depression onset between ages 4 and 39 Specification of baseline hazard function Many person-periods (36,997) and very few actual events (387) Annual data between ages 4 and 39 requires 36 TIME dummieshardly parsimonious A cubic function of TIME fits nearly as well ( 2 =34.51, 32 df, p>.25) as a completely general specification and measurably better ( 2 =5.83, 1 df, p<.05) than a quadratic Time-varying predictor: First parental divorce n=145 (10.4%) experienced a first parental divorce while still at risk of first depression onset PD is time-varying, indicating whether the parents of individual i divorced during, or before, time period j. PD ij =0 in periods before the divorce PD ij =1 in periods coincident with or subsequent to the divorce Sample: 1,393 adults ages 17 to 57 387 (27.8%) reported a first depression onset between ages 4 and 39 Specification of baseline hazard function Many person-periods (36,997) and very few actual events (387) Annual data between ages 4 and 39 requires 36 TIME dummieshardly parsimonious A cubic function of TIME fits nearly as well ( 2 =34.51, 32 df, p>.25) as a completely general specification and measurably better ( 2 =5.83, 1 df, p<.05) than a quadratic Time-varying predictor: First parental divorce n=145 (10.4%) experienced a first parental divorce while still at risk of first depression onset PD is time-varying, indicating whether the parents of individual i divorced during, or before, time period j. PD ij =0 in periods before the divorce PD ij =1 in periods coincident with or subsequent to the divorce idagefemalepdevent 404100 5100 6100 7100 8100 9110 10110 4011110 40………… 22110 4023111 Data source: Blair Wheaton and colleagues (1997) Stress & adversity across the life course (ALDA, Section 12.3, p 428) ID 40: Reported first depression onset at 23; first parental divorce at age 9 © Singer & Willett, page 16

17
Including a time-varying predictor in the discrete-time hazard model (ALDA, Section 12.3.1, p 428-434) What does 1 tell us ? Contrasts the population logit hazard for people who have experienced a parental divorce with those who have not, But because PD ij is time-varying, membership in the parental divorce group changes over time so were not always comparing the same people The predictor effectively compares different groups of people at different times! But, were still assuming that the effect of the time-varying predictor is constant over time. What does 1 tell us ? Contrasts the population logit hazard for people who have experienced a parental divorce with those who have not, But because PD ij is time-varying, membership in the parental divorce group changes over time so were not always comparing the same people The predictor effectively compares different groups of people at different times! But, were still assuming that the effect of the time-varying predictor is constant over time. Sample logit(proportions) of people experiencing first depression onset at each age, by PD status at that age Hypothesized population model (note constant effect of PD) Implicit particular realization of population model (for those whose parents divorce when theyre age 20) © Singer & Willett, page 17

18
Interpreting a fitted DT hazard model that includes a TV predictor (ALDA, Section 12.3.2, pp 434-440) e 0.4151 =1.51 Controlling for gender, at every age from 4 to 39, the estimated odds of first depression onset are about 50% higher for individuals who experienced a concurrent, or previous, parental divorce e 0.5455 =1.73 Controlling for parental divorce, the estimated odds of first depression onset are 73% higher for women What about a woman whose parents divorced when she was 20? © Singer & Willett, page 18

19
The proportionality assumption: Is a predictors effect constant over time or might it vary? (ALDA, Section 12.5.1, pp 451-456) Predictors effect is constant over time Predictors effect increases over time Predictors effect decreases over time Predictors effect is particularly pronounced in certain time periods © Singer & Willett, page 19

20
Discrete-time hazard models that do not invoke the proportionality assumption (ALDA, Section 12.5.1, pp 454-456) A completely general representation: The predictor has a unique effect in each period A more parsimonious representation: The predictors effect changes linearly with time 1 assesses the effect of X 1 in time period c 2 describes how this effect linearly increases (if positive) or decreases (if negative) Another parsimonious representation: The predictors effect differs across epochs 2 assesses the additional effect of X 1 during those time periods declared to be later in time © Singer & Willett, page 20

21
The proportionality assumption: Uncovering violations and simple solutions (ALDA, Section 12.4, pp 443) Data source: Graham (1997) dissertation Sample: 3,790 high school students who participated in the Longitudinal Survey of American Youth (LSAY) Research design: Tracked from 10 th grade through 3 rd semester of collegea total of 5 periods Only n=132 (3.5%) took a math class for all of the 5 periods! RQs: When are students most at risk of dropping out of math? Whats the effect of gender? Does the gender differential vary over time? Data source: Graham (1997) dissertation Sample: 3,790 high school students who participated in the Longitudinal Survey of American Youth (LSAY) Research design: Tracked from 10 th grade through 3 rd semester of collegea total of 5 periods Only n=132 (3.5%) took a math class for all of the 5 periods! RQs: When are students most at risk of dropping out of math? Whats the effect of gender? Does the gender differential vary over time? Risk of dropping out zig-zags over time peaks at 12 th and 2 nd semester of college Magnitude of the gender differential varies over timesmallest in 11 th grade and increases over time Suggests that the proportionality assumption is being violated © Singer & Willett, page 21

22
Checking the proportionality assumption: Is the effect of FEMALE constant over time? (ALDA, Section 12.5.2, pp 456-460) All models include a completely general specification for TIME using 5 time dummies: HS11, HS12, COLL1, COLL2, and COLL3 8.04 (4) ns 6.50 (1) p=0.0108 © Singer & Willett, page 22

Similar presentations

OK

Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1

Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google