Download presentation

Presentation is loading. Please wait.

Published byAlden Stobbe Modified over 2 years ago

1

2
CIQLE Workshop: Longitudinal data analysis, Silke Aisenbrey CIQLE Workshop: Introduction to longitudinal data analysis with stata panel models and event history analysis Silke Aisenbrey, Yale University

3
CIQLE Workshop: Longitudinal data analysis Goals for the workshop: -Intro to stata -Modeling Change over time: Panel Regression Models (fixed, between and random) -Modeling whether and/or when events occur: Event History Analysis (Data management for event history data, kaplan-meier, cox, piecewise constant)

4
CIQLE Workshop: Longitudinal data analysis open stata: COMMAND RESULTS results and syntax REVIEW of syntax: commands or menu VARIABLES of open file

5
CIQLE Workshop: Longitudinal data analysis open data, with menu (stata data--> eventex.dta)

6
CIQLE Workshop: Longitudinal data analysis to see real data to make changes directly in data erase variables, cases, make single changes in cases -->

7
CIQLE Workshop: Longitudinal data analysis

8
relational and logical operators in stata: == is equal to ~= is not equal (also !=) > greater than < less than >= greater than or equal <= less than or equal & and | or ~ not (also!) basic descriptive commands

9
CIQLE Workshop: Longitudinal data analysis sum var tab var1 var2 tab var1 var2, col combine with: …… if var1==2 & var3>0 by var1: …………… sort ………… exercise: e.g.: tab abitur sex, col tab abitur sex if cohort==1930, col sort cohort by cohort: tab abitur sex basic descriptive commands

10
CIQLE Workshop: Longitudinal data analysis help command gen var1 = var2 recode var1 (0=.) (1/8=2) (9=3) rename var1 var100 ** use the following variables: cohort (indicator of cohort membership ) sex (1=male, 2=female) agemaryc (age @ first marriage) exercise: e.g.: sum agemaryc recode age @ married in groups -generate a new variable -recode new variable into groups -recode if marcens==0 basic commands for data management

11
CIQLE Workshop: Longitudinal data analysis possible break

12
CIQLE Workshop: Longitudinal data analysis Intro to panel regression with stata: -panel data -fixed effects -between effects -random effects -fixed or random?

13
CIQLE Workshop: Longitudinal data analysis panel data (panelex1.dta)

14
CIQLE Workshop: Longitudinal data analysis Panel data, also called cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods. Cross-sectional data: only information about variance between subjects Panel data: two kinds of information between and within subjects --> two sources of variance Panel data:

15
CIQLE Workshop: Longitudinal data analysis Janet: Basics of panel regression models

16
CIQLE Workshop: Longitudinal data analysis cross sectional vs. panel analyses open panelex1.dta ignore the fact that we have repeated measures: conclusion: more children --> higher income regress childrn income

17
CIQLE Workshop: Longitudinal data analysis Fixed effects model Answers the question: What is the effect of x when x changes within persons over time e.g. Person A has two children at first point of time and three children at second, what effect has this change on income? Information used: fixed effects estimates using the time-series information in the data Variance analyzed: within Problems: only time variant variables

18
CIQLE Workshop: Longitudinal data analysis Fixed effects exercise: separate regression for each unit and then average it: regress income childrn if id==1 regress income childrn if id==2

19
CIQLE Workshop: Longitudinal data analysis + ( ) _____________________________ 2 = - 2.5 exercise: generate dummy variable for person and regress with dummy variable tab id, g(iddum) reg income childrn iddum1 iddum2 conclusion: more children --> lower income

20
CIQLE Workshop: Longitudinal data analysis Fixed effects -define data set as panel data tsset id t -regression with fixed effects command xtreg income chldrn, fe

21
CIQLE Workshop: Longitudinal data analysis Between effects model Answers the question: What is the effect of x when x is different (changes) between persons: Person A has on the average three children and Person B has on the average five children, what effect has this difference on their income? In the between effects model we model the mean response, where the means are calculated for each of the units. Information used: cross-sectional information (between subjects) Variance analyzed: between variance Time variant and time invariant variables

22
CIQLE Workshop: Longitudinal data analysis Between effects regress income childrn conclusion: more children --> more income define data as panel data xtreg dependent independent, be average --->

23
CIQLE Workshop: Longitudinal data analysis Random effects model: Assumption: no difference between the two answers to the questions: 1) what is the effect of x when x changes within the person: Person A has two children at first point of time and three children at second, what effect does this change have on their income? 2) what is the effect of x when x is different (changes) between persons: Person A has two children and Person B has three children children, what effect does this difference have on their income? Information used: panel and cross-sectional (between and within subjects) Variance analyzed: between variance and within variance Time variant and time invariant variables

24
CIQLE Workshop: Longitudinal data analysis Random effects model: -matrix-weighted average of the fixed and the between estimates. -assumes b1 has the same effect in the cross section as in the time-series -requires that individual error terms treated as random variables and follow the normal distribution. use: xtreg dependent independent if var==x, re

25
CIQLE Workshop: Longitudinal data analysis

26
possible break

27
CIQLE Workshop: Longitudinal data analysis open data: panelex2.dta varlist:

28
CIQLE Workshop: Longitudinal data analysis tell stata the structure of the data: tsset X Y X= caseid Y=time/wave summary statistics: xtdes xtsum

29
CIQLE Workshop: Longitudinal data analysis use the effects xtreg dependent independent if sex==1, fe xtreg dependent independent if sex==1, be xtreg dependent independent if sex==1, re exercise: compare/discuss models e.g.: xtreg indvar1 indvar2 … if sex==1, fe try to include time invariant variables try to make theoretical/empirical argument why you use which model

30
CIQLE Workshop: Longitudinal data analysis

31
Problems/Tests/Solutions: Whats the right model: fixed or random effects? Test: Hausman Test Null hypothesis: Coefficients estimated by the efficient random effects estimator are same as those estimated by the consistent fixed effects estimator. If same (insignificant P-value, Prob>chi2 larger than.05) --> safe to use random effects. If significant P-value --> use fixed effects. xtreg y x1 x2 x3..., fe estimates store fixed xtreg y x1 x2 x3..., re estimates store random hausman fixed random

32
CIQLE Workshop: Longitudinal data analysis

33
Problems/Tests/Solutions: Autocorrelation? What is autocorrelation: Last time periods values affect current values test: xtserial Install user-written program, type findit xtserial or net search xtserial xtserial depvar indepvars

34
CIQLE Workshop: Longitudinal data analysis Significant test statistic indicates presence of serial correlation. Solution: use model correcting for autocorrelation xtregar instead of xtreg

35
CIQLE Workshop: Longitudinal data analysis

36
possible break

37
CIQLE Workshop: Longitudinal data analysis panel -waves -number of children @ wave1 / 2/ 3/ 4 -employed @ wave1 / 2/ 3/ 4 -income @ wave1 / 2/ 3/ 4 regression models: dependent variable continuous event -dates of events -birth of first child @ 1963 -birth of second child @ 1966… -start of first employment @… -start of unemployment @… -start of second employment @… time information in event data more precise: dependent variable event happens 0/1 different data structure

38
CIQLE Workshop: Longitudinal data analysis Different Faces of Event History Data Time continuousdiscrete

39
CIQLE Workshop: Longitudinal data analysis Types of censoring Subject does not experience event of interest Incomplete follow-up Lost to follow-up Withdraws from study Left or right censored

40
CIQLE Workshop: Longitudinal data analysis

41
open data eventex.dta

42
CIQLE Workshop: Longitudinal data analysis tell stata that our data is survival data stset stset X, failure(Y) id(Z) X= time at which event happens or right censored, this is always needed Y= 0 or missing means censored, all other values are interpreted as representing an event taking place/ failure Z= id three examples: 1)stset ageendsch event: end of school time: age @ end of school 2)stset agemaryc, failure (marcens) id (caseid) event: marriage 3)stset agestjob, failure (stjob) id (caseid) event: first job

43
CIQLE Workshop: Longitudinal data analysis DATA MANGAGEMENT HANNAH

44
CIQLE Workshop: Longitudinal data analysis Different Models of Event History Time continousdiscrete non-parametricsemi-parametricparametric -kaplan-meier -nelson-aalen -log-rank test for comparison b/w groups -cox -piecewise constant -exponential -weibull -log-logistic -lognormal -gompertz -generalized gamma -logistic -log-log only qualitative covariates inclusion of covariates in models -compare survival experiences between groups (sex, cohorts) -univariate -multivariate Extended from Jenkins 2005

45
CIQLE Workshop: Longitudinal data analysis survivor function and hazard function Survivor function, S(t) defines the probability of surviving longer than time t Survivor and hazard functions can be converted into each other Hazard (instantaneous hazard, force of mortality), is the risk that an event will occur during a time interval (Δ(t)) at time t, given that the subject did not experience the event before that time

46
CIQLE Workshop: Longitudinal data analysis List the Kaplan-Meier survivor function. sts list. sts list, by(sex) compare Graph the Kaplan-Meier survivor function. sts graph. sts graph, by(sex) non-parametric: kaplan-meier

47
CIQLE Workshop: Longitudinal data analysis non-parametric: kaplan-meier exercise: stset your data for marriage, endschool or first job e.g.: 1) sts list 2) sts graph 3) sts list, by (…) compare 4) sts graph, by (..)

48
CIQLE Workshop: Longitudinal data analysis List the Nelson-Aalen cumulative hazard function. sts list, na. sts list, na by(sex) compare Graph the Nelson-Aalen cumulative hazard function. sts graph, na. sts graph, na by(sex) non-parametric: Nelson-Aalen

49
CIQLE Workshop: Longitudinal data analysis non-parametric: Nelson-Aalen exercise: stset your data for marriage, endschool or first job 1) sts list, na 2) sts graph, na 3) sts list, na by (…) compare 4) sts graph, na by (..)

50
CIQLE Workshop: Longitudinal data analysis Comparing Kaplan-Meier curves Log-rank test can be used to compare survival curves Hypothesis test (test of significance) H 0 : the curves are statistically the same H 1 : the curves are statistically different Compares observed to expected cell counts non-parametric: kaplan-meier for age@marr:

51
CIQLE Workshop: Longitudinal data analysis Comparing Kaplan-Meier curves non-parametric: kaplan-meier exercise: Test equality of survivor functions e.g.: sts test abitur

52
CIQLE Workshop: Longitudinal data analysis Limit of Kaplan-Meier curves What happens when you have several covariates that you believe contribute to survival? Example Education, marital status, children, gender contribute to job change Can use K-M curves – for 2 or maybe 3 covariates Need another approach – multivariate Cox proportional hazards model is most common -- for many covariates non-parametric: kaplan-meier

53
CIQLE Workshop: Longitudinal data analysis Cox proportional hazards model Can handle both continuous and categorical predictor variables Without knowing baseline hazard h o (t), can still calculate coefficients for each covariate, and therefore hazard ratio Assumes multiplicative risk - -->proportional hazard assumption semi-parametric models: cox

54
CIQLE Workshop: Longitudinal data analysis semi-parametric models: cox example age of first marriage stcox sex Interpretation: because the cox model does not estimate a baseline, there is no intercept in the output. sex (male=1) (female=2) whatever the hazard rate at a particular time is for men, it is 1.5 times higher for women what does this mean in our case? women get married younger than men do.

55
CIQLE Workshop: Longitudinal data analysis Interpretation of the regression coefficients An estimated hazard rate ratio greater than 1 indicates the covariate is associated with an increased hazard of experiencing the event of interest An estimated hazard rate ratio less than 1 indicates the covariate is associated with a decreased hazard of experiencing the event of interest Estimated hazard rate ratio of 1 indicates no association between covariate and hazard. semi-parametric models: cox

56
CIQLE Workshop: Longitudinal data analysis Graphically: estimates for functions: stcox sex, basehc (H0) stcurve, hazard at1(sex=0) at2(sex=1) stcox sex, basesurv (S0) stcurve, surviv at1(sex=0) at2(sex=1)

57
CIQLE Workshop: Longitudinal data analysis exercise: make your own cox model and estimate the hazard and survival

58
CIQLE Workshop: Longitudinal data analysis Assessing model adequacy Proportional assumption: covariates are independent with respect to time and their hazards are constant over time Three general ways to examine model adequacy Graphically: Do survival curves intersect? Mathematically: Schoenfeld test Computationally: Time-dependent variables (extended model)

59
CIQLE Workshop: Longitudinal data analysis compare with kaplan maier: stcoxkm, by (sex) exercise: do this with one of your estimates

60
CIQLE Workshop: Longitudinal data analysis "log-log" plots stphplot, by (sex) exercise: do this with one of your estimates, stphplot can be adjusted --> look in stphplot help

61
CIQLE Workshop: Longitudinal data analysis Mathematically: Schoenfeld Test tests if the log hazard function is constant over time, thus a rejection of the null hypothesis indicates a deviation from the proportional hazard assumption stcox sex, schoenfeld(sch*) scaledsch(sca*) estat phtest ( if more var estat phtest, detail) exercise: do this with your model, try to find a model which fits

62
CIQLE Workshop: Longitudinal data analysis Summary Survival analyses quantifies time to a single, dichotomous event Handles censored data well Survival and hazard can be mathematically converted to each other Kaplan-Meier survival curves can be compared graphically Cox proportional hazards models help distinguish individual contributions of covariates to survival, provided certain assumptions are met.

63
CIQLE Workshop: Longitudinal data analysis It can get a lot more complicated than this The proportional hazards model as shown only works when the time to event data is relatively simple Complications non proportional hazard rates time dependent covariates competing risks multiple failures non-absorbing events etc. Extensive literature for these situations and software is available to handle them.

64
CIQLE Workshop: Longitudinal data analysis Semi-parametric models: Piecewise constant -transition rate assumed to be not constant over observed time -splits data in user defined time pieces, -transition rates constant in each time piece -but: transition rates change between time pieces

65
CIQLE Workshop: Longitudinal data analysis Semi-parametric models: piecewise constant in STATA a user written command, an ado file by J. Sorensen: stpiece net search stpiece install file stpiece abitur, tp(20 30 40) tv(sex) tp: time pieces, intervals tv: covariates whose influence might vary over time pieces

66
CIQLE Workshop: Longitudinal data analysis the end

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google