Presentation is loading. Please wait.

Presentation is loading. Please wait.

SC968: Panel Data Methods for Sociologists

Similar presentations


Presentation on theme: "SC968: Panel Data Methods for Sociologists"— Presentation transcript:

1 SC968: Panel Data Methods for Sociologists
Introduction to survival/event history models

2 Types of outcome Continuous OLS Linear regression
Binary Binary regression Logistic or probit regression Time to event data Survival or event history analysis

3 Examples of time to event data
Time to death Time to incidence of disease Unemployed - time till find job Time to birth of first child Smokers – time till quit smoking

4 Time to event data Analyse durations or length of time to reach endpoint Data are usually censored Don’t follow sample long enough for everyone to get to the endpoint (e.g. death)

5 4 key concepts for survival analysis
States Events Risk period Duration

6 States States are categories of the outcome variable of interest
Each person occupies exactly one state at any moment in time Examples alive, dead single, married, divorced, widowed never smoker, smoker, ex-smoker Set of possible states called the state space

7 Events A transition from one state to another
From an origin state to a destination state Possible events depend on the state space Examples From smoker to ex-smoker From married to widowed Not all transitions can be events E.g. from smoker to never smoker

8 Risk period Not all people can experience each state throughout the study period To be able to have a particular event, one must be in the origin state at some stage Example can only experience divorce if married The period of time that someone is at risk of a particular event is called the risk period All subjects at risk of an event at a point in time called the risk set

9 Duration Event history analysis is to do with the analysis of the duration of a nonoccurrence of an event or the length of time during the risk period Examples Duration of marriage Length of life In practice we model the probability of a transition conditional on being in the risk set

10 Example data ID Entry date Died End date 1 01/01/1991 01/01/2008
1 01/01/ /01/2008 01/01/ /01/ /01/2000 3 01/01/ /01/2005 4 01/01/ /07/ /07/2004

11 Calendar time Study follow-up ended

12 Study time in years censored event censored event

13 Censoring An observation is censored if it has incomplete information
We will only consider right censoring That is, the person did not have an event during the time that they were studied Common reasons for right censoring the study ends the person drops-out of the study the person has to be taken off a drug

14 Data Survival or event history data characterised by 2 variables
Time or duration of risk period Failure (event) 1 if not survived or event observed 0 if censored or event not yet occurred

15 What is the data structure?
ID Entry date Died End date Duration Event 01/01/ /01/ 01/01/ /01/ /01/ 01/01/ /01/ 01/01/ /07/ /07/ The row is a person The tricky part is often calculating the duration Remember we need an indicator for observed events/ censored cases

16 Worked example Random 20% sample from BHPS Waves 1 – 15
One record per person/wave Outcome: Duration of cohabitation Conditions on cohabiting in first wave Survival time: years from entry to the study in 1991 till year living without a partner

17 The data Duration = 6 years Event = 1 Ignore data after event = 1

18 The data (continued) Note missing waves before event

19 Preparing the data Select records for respondents who
were cohabiting in 1991 Declare that you want to set the data to survival time Important to check that you have set data as intended

20 Checking the data setup
1 if observation is to be used and 0 otherwise time of entry time of exit 1 if event, 0 if censoring or event not yet occurred

21 Checking the data setup
How do we know when this person separated?

22 Trying again!

23 Checking the new data setup
Now censored instead of an event

24 Summarising time to event data
Individuals followed up for different lengths of time So can’t use prevalence rates (% people who have an event) Use rates instead that take account of person years at risk Incidence rate per year Death rate per 1000 person years

25 Summarising time to event data
Number of observations Person-years <25% of sample had event by 15 elapsed years Rate per year

26 List the cumulative hazard function
Default is the survivor function

27 Graphs of survival time
Kaplan-Meier estimate of survival curve The Kaplan-Meier method estimates the cumulative probability of an individual surviving after baseline to any time, t

28

29 Kaplan-Meier graphs Can read off the estimated probability of surviving a relationship at any time point on the graph E.g. at 5 years 88% are still cohabiting The survival probability only changes when an event occurs So the graph is stepped and not a smooth curve

30

31

32 Testing equality of survival curves among groups
The log-rank test A non –parametric test that assesses the null hypothesis that there are no differences in survival times between groups

33 Log-rank test example Significant difference between men and women

34 The Cox regression model

35 Event History with Cox regression model Event History with Cox Model
No longer modelling the duration Modelling the hazard Hazard: measure of the probability that an event occurs at time t conditional on it not having occurred up until t Also known as the Cox proportional hazard model

36 Some hazard shapes Increasing Decreasing U-shaped Constant
Onset of Alzheimer's Decreasing Survival after surgery U-shaped Age specific mortality Constant Time till next arrives

37 Cox regression model Regression model for survival analysis
Can model time invariant and time varying explanatory variables Produces estimated hazard ratios (sometimes called rate ratios or risk ratios) Regression coefficients are on a log scale Exponentiate to get hazard ratio Similar to odds ratios from logistic models

38 Cox regression equation
is the hazard function for individual i is the baseline hazard function and can take any form It is estimated from the data (non parametric) are the covariates are the regression coefficients estimated from the data Effect of covariates is constant over time (parameterised) This is the proportional hazards assumption Therefore, Cox regression referred to as a semi-parametric model

39 Cox regression in Stata
Will first model a time invariant covariate (sex) on risk of partnership ending Then will add a time dependent covariate (age) to the model

40 Cox regression in Stata

41 Interpreting output from Cox regression
Cox model has no intercept It is included in the baseline hazard In our example, the baseline hazard is when sex=1 (male) The hazard ratio is the ratio of the hazard for a unit change in the covariate HR = 1.3 for women vs. men The risk of partnership breakdown is increased by 30% for women compared with men Hazard ratio assumed constant over time At any time point, the hazard of partnership breakdown for a woman is 1.3 times the hazard for a man

42 Interpreting output from Cox regression (ii)
The hazard ratio is equivalent to the odds that a female has a partnership breakdown before a man The probability of having a partnership breakdown first is = (hazard ratio) / (1 + hazard ratio) So in our example, a HR of 1.30 corresponds to a probability of 0.57 that a woman will experience a partnership breakdown first The probability or risk of partnership breakdown can be different each year but the relative risk is constant So if we know that the probability of a man having a partnership breakdown in the following year is 1.5% then the probability of a woman having a partnership breakdown in the following year is 0.015*1.30 = 1.95%

43

44

45 Time dependent covariates
Examples Current age group rather than age at baseline GHQ score may change over time and predict break-ups Will use age to predict duration of cohabitation Nonlinear relationship hypothesised Recode age into 8 equally spaced age groups

46 Cox regression with time dependent covariates

47 Cox regression assumptions
Assumption of proportional hazards No censoring patterns True starting time Plus assumptions for all modelling Sufficient sample size, proper model specification, independent observations, exogenous covariates, no high multicollinearity, random sampling, and so on

48 Proportional hazards assumption
Cox regression with time-invariant covariates assumes that the ratio of hazards for any two observations is the same across time periods This can be a false assumption, for example using age at baseline as a covariate If a covariate fails this assumption for hazard ratios that increase over time for that covariate, relative risk is overestimated for ratios that decrease over time, relative risk is underestimated standard errors are incorrect and significance tests are decreased in power

49 Testing the proportional hazards assumption
Graphical methods Comparison of Kaplan-Meier observed & predicted curves by group. Observed lines should be close to predicted Survival probability plots (cumulative survival against time for each group). Lines should not cross Log minus log plots (minus log cumulative hazard against log survival time). Lines should be parallel

50 Testing the proportional hazards assumption
Formal tests of proportional hazard assumption Include an interaction between the covariate and a function of time. Log time often used but could be any function. If significant then assumption violated Test the proportional hazards assumption on the basis of partial residuals. Type of residual known as Schoenfeld residuals.

51 When assumptions are not met
If categorical covariate, include the variable as a strata variable Allows underlying hazard function to differ between categories and be non proportional Estimates separate underlying baseline hazard for each stratum

52 When assumptions are not met
If a continuous covariate Consider splitting the follow-up time. For example, hazard may be proportional within first 5 years, next 5-10 years and so on Could covariate be included as time dependent covariate? There are different survival regression methods (e.g. parametric model)

53 Censoring assumptions
Censored cases must be independent of the survival distribution. There should be no pattern to these cases, which instead should be missing at random. If censoring is not independent, then censoring is said to be informative You have to judge this for yourself Usually don’t have any data that can be used to test the assumption Think carefully about start and end dates Always check a sample of records

54 True starting time The ideal model for survival analysis would be where there is a true zero time If the zero point is arbitrary or ambiguous, the data series will be different depending on starting point. The computed hazard rate coefficients could differ, sometimes markedly Conduct a sensitivity analysis to see how coefficients may change according to different starting points

55 Other extensions to survival analysis
Discrete (interval-censored) survival times Repeated events Multi-state models (more than 1 event type) Transition from employment to unemployment or leaving labour market Modelling type of exit from cohabiting relationship- separation/divorce/widowhood

56 Could you use logistic regression instead?
May produce similar results for short or fixed follow-up periods Examples everyone followed-up for 7 years maximum follow-up 5 years Results may differ if there are varying follow-up times If dates of entry and dates of events are available then better to use Cox regression

57 Finally…. This is just an introduction to survival/ event history analysis Only reviewed the Cox regression model Also parametric survival methods But Cox regression likely to suit type of analyses of interest to sociologists Consider an intensive course if you want to use survival analysis in your own work


Download ppt "SC968: Panel Data Methods for Sociologists"

Similar presentations


Ads by Google