Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Advertisements

Event History Models 1 Sociology 229A: Event History Analysis Class 3
Analysis of variance (ANOVA)-the General Linear Model (GLM)
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
HSRP 734: Advanced Statistical Methods July 24, 2008.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Duration models Bill Evans 1. timet0t0 t2t2 t 0 initial period t 2 followup period a b c d e f h g i Flow sample.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
Event History Analysis 7
Event History Analysis 6
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Parametric EHA Models Sociology 229A: Event History Analysis Class 6 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
REGRESSION AND CORRELATION
Event History Models Sociology 229: Advanced Regression Class 5
Event History Analysis 1 Sociology 8811 Lecture 16 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Event History Models 2 Sociology 229A: Event History Analysis Class 4 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Model Checking in the Proportional Hazard model
BINARY CHOICE MODELS: LOGIT ANALYSIS
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Event History Models: Cox & Discrete Time Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Correlation & Regression
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
Survival Analysis III Reading VGSM
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Parametric EHA Models Sociology 229: Advanced Regression Class 6
Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Linear Functions 2 Sociology 5811 Lecture 18 Copyright © 2004 by Evan Schofer Do not copy or distribute without permission.
EHA: More On Plots and Interpreting Hazards Sociology 229A: Event History Analysis Class 9 Copyright © 2008 by Evan Schofer Do not copy or distribute without.
Sociology 5811: Lecture 14: ANOVA 2
Introduction to Linear Regression
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
HSRP 734: Advanced Statistical Methods July 17, 2008.
More EHA Models & Diagnostics Sociology 229A: Event History Analysis Class 7 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
EHA Diagnostics Sociology 229A: Event History Analysis Class 5 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Lecture 12: Cox Proportional Hazards Model
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.
Logistic Regression Analysis Gerrit Rooks
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Additional Regression techniques Scott Harris October 2009.
SURVIVAL ANALYSIS WITH STATA. DATA INPUT 1) Using the STATA editor 2) Reading STATA (*.dta) files 3) Reading non-STATA format files (e.g. ASCII) - infile.
assignment 7 solutions ► office networks ► super staffing
Event History Analysis 3
QM222 Class 8 Section A1 Using categorical data in regression
Advanced quantitative methods for social scientists (2017–2018) LC & PVK Session 6 Event History Analysis / survival (and other tools for social and individual.
Statistics 262: Intermediate Biostatistics
Count Models 2 Sociology 8811 Lecture 13
Presentation transcript:

Event History Analysis 5 Sociology 8811 Lecture 19 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements Class topic: Time-varying data: example More details on Cox models & other fully parametric Proportional Hazard models Paper Assignment #2 handed out today Due April 26

EHA Models: In Greater Depth Issues: –Properties of Cox models (semi-parametric) and fully parametric models Plus relevant assumptions, diagnostics Strategies for Outliers Model Fit –Choosing a model. What should you do? –Other issues Accelerated failure time models Frailty Etc..

Event History Example What factors affect how soon a country passes an environmental protection law? Event: Passing an environmental law in a given year Risk set: All countries that have not yet passed an environmental protection law –We decided that risk begins at 1970 (when such laws were invented) Countries independent after 1970 are treated as entering the analysis “late” Option #2: Duration since independence (age) –But, that was less appropriate for the research question.

Example: Environmental Laws Cross-national time series dataset of nearly 100 countries Event: when a country writes its first comprehensive environmental law (e.g., EPA) Data taken from various sources Independent variables: GDP, population, democracy, degradation, education, domestic and international NGOs Time duration: analyses are from In other words, countries enter the “risk set” in 1970, or when they become independent Total sample of 97 countries 73 countries have an event between 1970 and 1998.

Time-Varying Data Structure In the previous example, each row of data was a separate survey respondent Because survey respondents were not tracked over multiple years, this data was not “time-varying” In the current example, we have the advantage of time-varying data Each row of data is a country-year Our independent variables may change over time.

States, Spells, and Events Example (India): … … 1998 Year State Spell #2 Spell #1 Law written

States, Spells, and Events Example (Iran): … … 1998 Year State Spell #1 No law written as of 1998

Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA Example: Law written SpellState Population

Time-Varying Data Structure newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA Stset command: stset end, failure(es==1) origin(1970) Note: It is common to drop cases that are not at risk (ex: if start state = 1) BUT, it is not necessary… Stata drops cases after the event by default…unless you specify exit(time.)

Time-Varying Data Structure What if countries pass multiple laws? Called “repeated events 1. start state could be reset to zero 2. We can override the stata default of removing cases after the first event occurs: exit(time.) newname2newid3yearlaweventnumstartendssespop INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA INDIA

Cumulative Survivor Function

Cumulative Survivor Function by Region

Cumulative Survivor Function West vs. non-West

Smoothed Hazard Function West vs. non-West

Constant Rate Model: Example Simple one-variable model comparing west vs. non-west streg west, dist(exponential) nohr Exponential regression -- log relative-hazard form No. of subjects = 97 Number of obs = 2047 No. of failures = 81 Time at risk = 2047 Wald chi2(1) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 97 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] west | _cons |

Constant Rate Model: Example Model with time-varying covariates No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | _cons | Democratic countries enact laws at a higher rate than less-democratic countries

Constant Rate Model: Example Same model – with Hazard Ratios No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | A 1-point increase in democracy increases the hazard rate by 25.8%!

Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #1: Create an interaction term No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(8) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | nonwest | ingoXnonwest | _cons |

Constant Rate Model : Example What if we expect global civil society to have a particularly strong effect in the non-West? Option #2: Include only non-Western countries in the analysis No. of subjects = 76 Number of obs = 1720 No. of failures = 61 Time at risk = 1720 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 76 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | _cons |

Cox Models The basic Cox model: Where h(t) is the hazard rate h 0 (t) is some baseline hazard function (to be inferred from the data) This obviates the need for building a specific functional form into the model Also written as:

Cox Model: Example Mostly similar to exponential model… Cox regression -- Breslow method for ties No. of subjects = 92 Number of obs = 1938 No. of failures = 77 Time at risk = 1938 Wald chi2(6) = Log pseudolikelihood = Prob > chi2 = (Std. Err. adjusted for 92 clusters in newid3) | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | Most effects = similar… though education effect loses significance…

Cox Model Issues: Ties 1. How to handle ties in data It is mathematically complex to estimate models when there are tied failures –That is: two cases that have events at the exact same time Several mathematical approaches: –Breslow approximation – simplest approach Stata default, but not the best choice! –Efron approximation – generally better More computationally intensive, but given the power of modern computers it is not an issue Efron is generally preferred

Cox Model Issues: Ties –Exact marginal – “continuous time approximation” –Box-Steffensmeier & Jones: “Averaged Likelihood” Assumes ties didn’t happen EXACTLY at the same time… and considers all possible orderings –Exact partial – “discrete” –Box-Steffensmeier & Jones: “exact discrete method” Assumes ties happened EXACTLY at the same time –Advice: Use Efron at a minimum Exact methods are often more accurate –Exact marginal often makes most sense… events rarely occur at the EXACT same time –But, exact methods can take a LONG time. –For big datasets with many ties, Efron is OK.

Cox Model: Baseline Hazard Cox models involve a “baseline hazard” Note: baseline = when all covariates are zero Question: What does the baseline hazard look like? –Or baseline survivor & integrated hazard? –Stata can estimate the baseline survivor, hazard, integrated hazard. Two steps: 1. You must ask stata to save the info when you run the Cox model –Ex: stcox gdp degradation education democracy ngo ingo, robust nohr basehc(h0) 2. Use “stcurve” command to plot the baseline curves –Ex: stcurve, hazard OR stcurve, survival

Cox Model: Baseline Hazard Baseline rate: Adoption of environmental law

Cox Model: Baseline Hazard Note: It may not always make sense to plot the baseline hazard Baseline shows hazard when X variables are zero Sometimes zero values aren’t very useful/interesting –Example: Does it make sense to plot hazard of countries adopting laws, if GDP is zero? Hazard rate is quite low In some cases, you’ll just get a flat zero curve –Or extremely high values –Solutions: 1. Rescale indep vars before running cox model 2. Use stcurve to choose relevant values of vars.

Cox Model: Estimated Hazards You can also use stcurve to plot estimated hazard rates based on values of indep vars Ex: What is hazard curve if democracy = 1, 5, 10? Strategy: use “at” subcommand: stcurve, hazard at(democ=1) at2(democ=10) NOTE: All other variables are pegged at the mean…

Cox: Estimated Hazard Rate Hazard rate for adoption of environmental law

Proportional Hazard Assumption Key assumption: Proportional hazards Estimated Hazard ratios are proportional over time i.e., Estimates of a hazard ratio do NOT vary over time –Example: Effect of “abstinence” program on sexual behavior Issue: Do abstinence programs lower the rate in a consistent manner across time? –Or, perhaps the rate is lower initially… but then the rate jumps back up (maybe even exceeds the control group). –Groups are assumed to have “parallel” hazards Rather than rates that diverge, converge (or cross).

Proportional Hazard Assumption Strategies: 1. Visually examine raw hazard plots for sub- groups in your data Watch for non-parallel trends A simple, crude method… but often identifies big violations

Proportional Hazard Assumption Visual examination of raw hazard rate Parallel trends in hazard rate look good!

Proportional Hazard Assumption 2. Plot –ln(-ln(survival plot)) versus ln(time) across values of X variables What stata calls “stphplot” Parallel lines indicate proportional hazards Again, convergence and divergence (or crossing) indicates violation –A less-common approach: compare observed survivor plot to predicted values (for different values of X) What stata calls “stcoxkm” If observed are similar to predicted, assumption is not likely to be violated.

Proportional Hazard Assumption -ln(-ln(survivor)) vs. ln(time) – “stphplot” Convergence suggests violation of proportional hazard assumption (But, I’ve seen worse!)

Proportional Hazard Assumption Cox estimate vs. observed KM – “stcoxkm” Predicted differs from observed for countries in West

Proportional Hazard Assumption 3. Piecewise Models Piecewise = break model up into pieces (by time) –Ex: Split analysis in to “early” vs “late” time If coefficients vary in different time periods, hazards are not proportional –Example: stcox var1 var2 var3 if _t < 10 stcox var1 var2 var3 if _t >= 10 Look for large changes in coefficients!

Proportional Hazard Assumption In a piecewise model, coefficients would differ in non-proportional models Proportional Non-Proportional Here, the effect is the same in both time periods Early Late Early Late Here, the effect is negative in the early period and positive in the late period

Piecewise Models Look at coefficients at 2 (or more) spans of time EARLY. stcox gdp degradation education democracy ngo ingo if year < 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | LATE. stcox gdp degradation education democracy ngo ingo if year >= 1985, robust nohr _t | Coef. Std. Err. z P>|z| [95% Conf. Interval] gdp | degradation | education | democracy | ngo | ingo | Note: Effect of ngo is larger in early period

Proportional Hazard Assumption 4. Tests based on re-estimating model Try including time interactions in your model Recall: Interactions – effect of A on C varies with B If effect of variable X on hazard rate (or ratio) varies with time, then hazards aren’t proportional –Recall example: Abstinence programs Perhaps abstinence programs have a big effect initially, but the effect diminishes (or reverses) later on

Proportional Hazard Assumption Red = Abstinence group; green = control Proportional Non-Proportional In non-proportional case, the effect of abstinence programs varies across time

Proportional Hazard Assumption Strategy: Create variables that reflect the interaction of X variables with time Significant effects of time interactions indicate non- proportional hazard Fortunately, inclusion of the interaction term in the model corrects the problem. Issue: X variables can interact with time in multiple ways… –Linearly –With “log time” or time squared –With time dummies –You may have to try a range of things…

Proportional Hazard Assumption Red = Abstinence group; green = control Linear time interaction Effect grows consistently over time Try “Abstinence*time” Interaction with time-period… Effect differs early vs. late Try “Abstinence*DLate”

Proportional Hazard Assumption 5. Grambsch & Therneau test –Ex: Stata “ estat phtest” Test for non-zero slope of Schoenfeld residuals vs time –Implies log hazard ratio function = proportional Can be applied to general model, or for each variable stcox gdp degradation education democracy ngo ingo, robust nohr scaledsch(sca*) schoenfeld(sch*). estat phtest Test of proportional hazards assumption Time: Time | chi2 df Prob>chi global test | Significant chi-square indicates violation of proportional hazard assumption

Proportional Hazard Assumption Variable-by-variable test “estat phtest”:. estat phtest, detail Test of proportional hazards assumption Time: Time | rho chi2 df Prob>chi gdp | degradation | education | democracy | ngo | ingo | global test | Note: Certain variables are especially problematic…

Proportional Hazard Assumption Notes on estat phtest : –1. Requires that you calculate “schoenfeld residuals” when you run the original cox model –And, if you want a test for each variable, you must also request scaled schoenfeld residuals –2. Test is based on identifying non-zero time trend… but how should we characterize time? Options: normal/linear time, log time, time dummies, etc –Results may differ depending on your choice –Ex: estat phtest, log – specifies “log time” Plot of smoothed Schoenfeld residuals can indicate best way to characterize time –Linear trend (not a curve) indicates that time is characterized OK –Ex: estat phtest, plot(ngo) OR estat phtest, log plot(ngo)

Proportional Hazard Assumption What if the assumption is violated? 1. Improve model specification Add time interactions to address nonproportionality Ex: If high democracies are not proportional to low democracies, try adding “highdemoc*time” Variables can be interacted with linear time, log time, time dummies, etc., to address the issue 2. Model groups separately Split sample along variables that are non-proportional.

Proportional Hazard Assumption What if the assumption is violated? 3. Use a stratified Cox model Allows a different baseline hazard for each group –But, you can’t estimate effect of stratifying variable! Ex: stcox var1 var2 var3, strata(Dhighdemoc) 4. Use a piecewise model Split time into chunks… in which PH assumption is met –Requires sufficient sample size in all time periods! 5. Live with it (but temper your conclusions) Allison points out: Cox model is reasonably robust –Other issues (e.g., model misspecification) are bigger issues.