Analysis of Time to Event Data

Slides:



Advertisements
Similar presentations
The analysis of survival data in nephrology. Basic concepts and methods of Cox regression Paul C. van Dijk 1-2, Kitty J. Jager 1, Aeilko H. Zwinderman.
Advertisements

Surviving Survival Analysis
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Statistical Issues in Contraceptive Trials
SC968: Panel Data Methods for Sociologists
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Analysis of frequency counts with Chi square
© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Getting to Know Your Data Basic Data Cleaning Principles.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Point and Confidence Interval Estimation of a Population Proportion, p
Biostatistics in Research Practice Time to event data Martin Bland Professor of Health Statistics University of York
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Using time-dependent covariates in the Cox model THIS MATERIAL IS NOT REQUIRED FOR YOUR METHODS II EXAM With some examples taken from Fisher and Lin (1999)
Main Points to be Covered Cumulative incidence using life table method Difference between cumulative incidence based on proportion of persons at risk and.
Today Concepts underlying inferential statistics
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
EVIDENCE BASED MEDICINE
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Survival Analysis: From Square One to Square Two
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Introduction to Survival Analysis August 3 and 5, 2004.
Inference for regression - Simple linear regression
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
HSRP 734: Advanced Statistical Methods July 10, 2008.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Prevalence The presence (proportion) of disease or condition in a population (generally irrespective of the duration of the disease) Prevalence: Quantifies.
Assessing Survival: Cox Proportional Hazards Model
INTRODUCTION TO SURVIVAL ANALYSIS
Chapter 12 Survival Analysis.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
HSRP 734: Advanced Statistical Methods July 17, 2008.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Lecture 12: Cox Proportional Hazards Model
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
12/20091 EPI 5240: Introduction to Epidemiology Incidence and survival December 7, 2009 Dr. N. Birkett, Department of Epidemiology & Community Medicine,
Biostatistics Case Studies 2014 Youngju Pak Biostatistician Session 5: Survival Analysis Fundamentals.
01/20151 EPI 5344: Survival Analysis in Epidemiology Quick Review from Session #1 March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health &
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
02/20161 EPI 5344: Survival Analysis in Epidemiology Hazard March 8, 2016 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Additional Regression techniques Scott Harris October 2009.
Multi-state piecewise exponential model of hospital outcomes after injury DE Clark, LM Ryan, FL Lucas APHA 2007.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
April 18 Intro to survival analysis Le 11.1 – 11.2
Statistical Inference for more than two groups
Statistics 103 Monday, July 10, 2017.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

Analysis of Time to Event Data Kaplan-Meier and Cox Regression Analysis

Kaplan-Meier Analysis We will motivate the construction of Kaplan-Meier survival curves, and the logrank test for comparing them, by beginning with the analysis of life tables. Kaplan-Meier analysis is the limiting case of this methodology.

Kaplan-Meier Analysis Analysis of Life Tables Suppose that we are measuring survival on a cohort of n individuals and that we are only able to assess their status at k+1 points in time: t1, t2, …, tk+1. For the interval (ti, ti+1) we know only the number who started the interval alive, the number who finished alive, and the number who died. This might be the case, for example, with the analysis of vital statistics data.

Kaplan-Meier Analysis Analysis of Life Tables Our data may be arrayed as follows: Note that L2 = L1 - D1 - W1 and more generally that Li+1 = Li - Di - Wi for i = 1, 2, …, k

Kaplan-Meier Analysis Analysis of Life Tables We wish to calculate Si = S(ti) = Probability of surviving to the start of the ith interval as well as Hi = H(ti) = Probability of dying during ith interval given that you survived to the start of the ith interval We refer to S(ti) as the survival function and to H(ti) as the hazard function. Note that H(ti) is a conditional probability and is thus quite distinct from the unconditional probability of death fi = f(ti) = Probability of dying during the ith interval

Kaplan-Meier Analysis Analysis of Life Tables The three terms can be related through the formula f(ti) = H(ti) * S(ti) , which can be interpreted as Pr{die in ith interval} = Pr{die during ith interval given that you survive to the start of the interval}* Pr{survive to start of ith interval} H(ti) can be readily estimated by } assumes LTF occurs uniformly over the interval

Kaplan-Meier Analysis Analysis of Life Tables How do we compute an estimate, s(ti), of S(ti)? s1 = 1 by definition s2 = (1-h1) (prob don’t die in 1st interval) s3 = s2 * (1-h2) = (1-h1) * (1-h2) and more generally si = (1-h1) * (1-h2) * … * (1-hi-1)

Kaplan-Meier Analysis The logrank test Consider the ith time interval, (ti, ti+1), and assume that we have data for two groups, A & B. Our data might look as follows:

Kaplan-Meier Analysis The logrank test Assuming no difference in survival between the groups, our best estimate of not surviving the interval is just the overall hazard function, hi = 33/412 . If we apply this to the number at risk in each group we get the expected numbers of deaths under H0: and

Kaplan-Meier Analysis The logrank test Now let Oa = sum of Dias = obs # deaths in group A Ob = sum of Dibs = obs # deaths in group B Ea = sum of Eias = expected # deaths in group A Eb = sum of Eibs = expected # deaths in group B It is easy to show that [ Oa + Ob = Ea + Eb ] Observed # deaths = Expected # deaths

Kaplan-Meier Analysis The logrank test The logrank test statistics is given by This is the same form as the Pearson c2 test for 2-way tables! Under the null hypothesis of no difference in survival rates, X2 will have a chi-square distribution with one degree of freedom. We reject H0 if X2 gets too big.

Kaplan-Meier Analysis The logrank test -- more than 2 groups Now suppose instead of just 2 groups we have some arbitrary number, g, of groups. Calculate Oa, Ob, …, Og and Ea, Eb, …, Eg as before. } ~ c2g-1

Kaplan-MeierAnalysis Kaplan-Meier: the limiting case Kaplan-Meier survival curves, and the corresponding logrank test for comparing them, are just the limiting case of the life table methodology when our time intervals get small (e.g., time measured in days rather than in years). Multiple deaths and/or loss to follow-up at the same timepoint become less and less common Otherwise, the calculations for H(t), S(t), and the logrank statistic are unchanged!

Kaplan-Meier/Cox Analysis The data In the limiting (K-M) case, we assume we observe n individuals over time, and that they enroll in, and drop out of, the study at varying times. t1 o t2 x t6 x t14 o t87 o t105 x o Alive x Dead t142 o tn o Start of study End of study Calendar time

Kaplan-Meier/Cox Analysis The data For analysis we focus on time since entry into study and so rearrange the data as follows: t1 o t2 x t6 o Alive x Dead x t14 o t87 o t105 x t142 o tn o Start of follow-up End of follow-up Time in study

Kaplan-Meier/Cox Analysis The data Thus our data are in the form of An observation time An indicator of whether this time ended in the event of interest (e.g., death), or whether it was “censored” Censoring can occur either to early dropouts or because the participant was still “alive” at the end of the study Be sure you know how to code for your stat package! Either a single variable indicating the groups to be compared (for K-M) or an arbitrary set of predictor variables (the Cox model)

Cox Regression Analysis Overview A strength of the Kaplan-Meier analysis is that it is totally nonparametric. We have to make no assumptions about the underlying true distribution of failure times. On the other hand, we can only compare a finite number of groups, and we have no way to adjust our comparison of curves for potentially confounding variables.

Cox Regression Analysis Overview While a number of fully parametric models for time to event data exist, perhaps the most common regression model that is in use for survival analysis is the Cox Proportional Hazards Regresssion model. The Cox model combines aspects of Kaplan-Meier analysis with parametric modelling, and thus provides a very flexible tool for modelling time to event data.

Cox Regression Analysis The proportional hazards model Whereas in Poisson regression we construct a linear model for the ln(incidence rate), in the Cox PH model we construct a linear model for the “instantaneous” incidence rate, which is also called the instantaneous hazard function. Recall that for the life table analysis we defined the hazard function as Hi = H(ti) = Probability of dying during ith interval given that you survived to the start of the ith interval The instantaneous hazard is just the limiting case of Hi as the interval (ti, ti+1) gets very, very small.

Cox Regression Analysis The proportional hazards model So let l(t|X1, X2, …, Xk) = probability of “dying” on day t given survival up to day t and baseline covariates X1, X2, …, Xk define the instantaneous hazard at time t. The Cox proportional hazards model assumes that l(t) can be written as Let’s break this equation down some to better understand it.

Cox Regression Analysis The proportional hazards model } Unspecified “baseline” hazard function estimated via Kaplan- Meier methods. Think of it as the intercept, or b0, term in our other regression models. It is considered to be a nuisance term that carries no information about the influence of the Xs on survival. The regression model. The bs are the coefficients estimated by your statistics package. We use the term proportional hazards because the hazard functions are proportional for different values of the Xs. nonparametric portion parametric portion

Cox Regression Analysis Interpretation of coefficients RR (male vs female) = = => b2 = ln(RRmales vs females)

Cox Regression Analysis Generalizations The Cox model may be generalized to handle time-dependent variables. the model conditions on the value of the covariate(s) at each failure time when estimating the bs. your software package may not offer this option, and even if it does your options for modelling the time-dependency may be limited. We can get around the proportional hazard assumption to some extent by allowing the baseline hazard to vary arbitrarily for, say, smokers and nonsmokers.

Cox Regression Analysis Assumptions Changes in any time-dependent covariates are not related to the outcome of interest (e.g., you don’t quit smoking because your health is getting worse in a study of mortality) Censoring is not related to the outcome of interest (e.g., healthy people aren’t more likely to leave the study early)

Cox Regression Example Vollmer et al., NEJM, 1983 Background Very early days of transplantation, prior to federal funding of transplantation Evidence seemed to suggest huge benefits from transplantation Highly selected patient populations may bias results -- only healthiest patients were receiving transplants The Question Does survival differ for patients on dialysis vs transplantation?

Cox Regression Example Data features Population Referral center for patients with ESRD Renal failure might be due to primary renal disease or secondary to diabetes or hypertension Treatment Protocol Start on dialysis May get a transplant later Transplant may fail and patient go back on dialysis Transplants may come from either a living-related donor or a cadaveric donor

Cox Regression Example Data features Obviously we have the potential for serious confounding in favor of the transplant groups.

Cox Regression Example Getting started Checking assumptions: Patients with diabetes and hypertension had different disease process Hazards not proportional, & expected diff covariate effects than for those w/primary renal failure Therefore chose to conduct totally separate analyses for these individuals

Cox Regression Example Telling a story Totally unadjusted analysis: Kaplan-Meier analysis with patients classified according to ever transplant status Observation time is time since enrollment into NKC, which credits transplant with pre-transplant survival This represents a very biased, but not atypical, analysis for the time

Cox Regression Example Telling a story Time-dep, unadj. analysis: K-M analysis again For transplant pts, now use time since transplantation Since K-M, still no way to give dialysis credit for pre-transplant survival No covariate adjustment Starting to see curves come together

Cox Regression Example Modeling covariates: checking assumptions How to model age effects? Fairly smooth decline in mortality with increasing age Decided to model this six-level categorical variable as a linear effect in the model Not worried about “data snooping” since this is a nuisance term

Cox Regression Example Modeling covariates: checking assumptions How to model co-morbidities? PH assumption clearly not met Used separate model strata for  “none” vs. “any” and used linear trend for latter, with 4-5 co-morbidities combined into a single group Again, intent was to provide best fit to this “nuisance” variable

Cox Regression Example Telling a story RRLRD vs. Dial = e-.60 = 0.55

Cox Regression Example Telling a story RRCAD vs. Dial = e.01 = 1.01 RRLRD vs. CAD = 0.55/1.01 = 0.55

Cox Regression Example Telling a story

Cox Regression Example Telling a story

Cox Regression Example Telling a story Used a time-dep variable to let RRs vary over time Suggests even CAD tx is beneficial if you get past peri-operative mortality

Cox Regression Example Telling a story Believe it or not, these are not K-M plots, but rather model driven survival estimates. Illustrates the flexibility of what we can do with the Cox model!

Cox Regression Example Summary The Cox model is a powerful and flexible tool that can handle: Covariate information Time-dependent data Time-dependent RR effects Departures from PH assumption (e.g., strata) Individual and group data Caution: As with any complex model, requires care in use and interpretation