Presentation on theme: "Introduction to Survival Analysis Rich Holubkov, Ph.D. September 23, 2010."— Presentation transcript:
Introduction to Survival Analysis Rich Holubkov, Ph.D. September 23, 2010
Today’s Class Introduction to, and motivation for, basic survival analysis techniques (and why we need advanced techniques) Presented at a “newbie Master’s statistician” level I will not assume you’ve seen survival data before, but necessarily will go through this material extremely quickly I do assume you know biostats basics like heuristics of chi-square statistics
Motivation for survival analysis Let’s say we have an event of interest, such as death or failure of a device. We may be interested in event rates at a particular timepoint (1 year), or assessing event rates over time as patients are being followed up. Could just calculate one-year rates… –Losses to follow-up before one year? –Patients not followed for one year as yet? –What if most events happen within 30 days?
Motivation for survival analysis In studies where people are followed over a period of time, we usually have: –people in the study with different lengths of follow-up (whether had event or not) –people dropping out of the study for various reasons at different times We want to figure out rates of an event (like death, or liver transplant failure) at a particular time point (or all timepoints), taking this differential follow-up into account.
Motivation for survival analysis We will often want to compare survival data between treatment groups (logrank test), quantify effects of factors on outcomes (Cox model and alternatives), sometimes in settings where patients can have competing outcomes (competing risks). Today, I will talk mainly about Kaplan- Meier curves and their comparison via logrank tests. No parametrics!
Motivating Example (briefly) My Registry has enrolled 1000 kids undergoing liver transplants from 1985 till 2010 (now). I want to estimate the chance of a child being alive at 10 years post-transplant. The problem is, Registry has a child who has been followed for 20 years, a child followed for just one year,… How can I use every child’s information?
Makes sense to look year by year! (“Actuarial life tables”) I’ll see how many kids I had at the beginning of each year of follow-up (1000 enrolled in my Registry, 950 were around at beginning of two years, etc.) If a child dropped out of the analysis without dying (let’s say, during first year of follow-up, 25 withdrew or just weren’t around for long enough yet), makes sense to count that child as being around and “at risk” for half a year.
Actuarial Life Tables Of my 1000 enrolled kids, at 1 year, 950 were alive, 25 dead, 25 dropouts/not followed long enough. 1-yr survival is: (950+12.5)/(950+25+12.5)=97.5% Examine the 950 left at beginning of Year 2 in same way. At end of 2 years, 900 were alive, 20 more dead, and 30 dropouts/not long enough. Additional 1-yr survival for these 950 is then (900+15)/(900+20+15)=97.9%. 2-yr survival=97.5% X 97.9%=95.4%
Extend this idea, event by event! I can do this for 10 years, year by year. But, I can do this much more finely, effectively calculating an event rate at every timepoint where a child dies. Just need to know number left in the study (“at risk”) at each of those timepoints. This way of generating survival curves is called Kaplan-Meier analysis or the product-limit method.
A Kaplan-Meier Example http://www.cancerguide.org/scurve_km.html http://www.cancerguide.org/scurve_km.html Let ’ s say we have just seven patients: Patient A: died at 1 year Patient B: dropped out at 2 years Patient C: dropped out at 3 years Patient D: died at 4 years Patient E: still in study, alive at 5 years Patient F: died at 10 years Patient G: still in study, alive at 12 years So, three deaths at 1, 4, and 10 years.
For basic survival analysis, we need for each patient: A well-defined “time zero”, the date that follow-up begins A well-defined “date of event or last contact”, the last time we know if the patient had an event or not Above two elements are equivalent to “time on study” An indicator of whether or not the patient had an event at last contact.
Simple Analysis Dataset For our seven patient example: Event Time A: died at 1 year 1 1 B: dropped out at 2 years 0 2 C: dropped out at 3 years 0 3 D: died at 4 years 1 4 E: in study, alive at 5 years 0 5 F: died at 10 years 1 10 G: in study, alive at 12 years 0 12
Dropouts: a critical assumption Statistically: “any dropouts must be noninformative”. Dropout time should be independent of failure time. Practically: A child who drops out of the analysis cannot differ from a child who stays in analysis, in terms of chances of having an event as follow-up continues. Is this assumption ever reasonable? Is this assumption formally testable?
Simple Example A registry follows kids receiving transplants in rural states Kids whose transplants are not doing as well tend to move to urban areas to be close to major medical centers. Let’s say they are lost to follow-up. My registry now has relatively healthier kids left as follow-up gets longer Thus, my long-term follow-up data will give rosier picture of follow-up for children living in rural states.
A Variant… Recent improvements in surgical techniques/post-surgical treatment have improved prognosis. So, kids transplanted decades ago have worse long-term prognosis. But, these are the only kids in my registry with long-term follow-up. So, my long-term follow-up data will give unnecessarily grim picture for recently treated kids. It’s just like healthy kids dropping out early! Changes in survival probability over time of enrollment is like nonrandom dropout.
A Variant I’ve Seen A surgeon keeps his own registry. His research assistant regularly followed patients until about 5 years ago. He still regularly updates the registry database for important outcomes. Specifically, whenever he is notified that a patient has died, that patient’s survival data are updated. What will be wrong with the database for survival analysis? How can it be fixed?
Informative Censoring If we stay nonparametric, noninformative censoring cannot be tested using observed data of failure/censoring times! Parametric methods do exist, for example jointly modeling dropout/events. Can compare entry characteristics, risk profiles of censored versus uncensored Should look for time trends in long-term follow-up studies or RCTs If censoring rate is appreciable, usually concern regarding potential bias.
How can we compare two (or more) survival curves? The standard approach to comparing survival curves between two or more groups, is termed the “logrank test”, sometimes known as the Mantel-Cox test. The test can be motivated in several ways. I present a “chi-squared table” approach, sort of like Mantel (1966).
How a logrank test works! At every timepoint when one or more kids have an event, we calculate the expected number of events in both groups assuming identical survival. If one child has event when there were 25 kids in Group A and 75 kids in Group B, we “expect” 0.25 events in Group A and 0.75 in Group B. Do this for all events, see if sum of the “Observed-Expected” for a group is large, like in the chi-squared test.
Heuristic Derivation j indexes times of events, 1 to J N 1j, N 2j number at risk at time j, N j = N 1j +N 2j O 1j, O 2j # of events at time j, O j = O 1j +O 2j Consider as 2 x 2 table for each time j: Arm 1Arm 2Total Dead O 1j O 2j OjOj Alive N 1j -O 1j N 2j -O 2j Nj-OjNj-Oj At Risk N 1j N 2j NjNj
How a logrank test works! If two arms truly have same survival distribution, then O 1j is hypergeometric, so has expectation E 1j = O j (N 1j /N j ) and variance V 1j = (N 1j N 2j O j (N j -O j ))/(N j 2 (N j -1)) Now get a statistic summing over the J event times, treating as sum of independent variables (this is very heuristic!!!)
How a logrank test works! Under null, with E 1j = O j (N 1j /N j ) and V 1j = (N 1j N 2j O j (N j -O j ))/(N j 2 (N j -1)), (∑ j (O 1j -E 1j )) 2 /∑ j V 1j has a chi-squared distribution with 1 d.f. This is the standard logrank test for testing equality between two survival curves. Readily generalizes to more than 2 groups, and to subgroups (strata) within each group to be compared.
Why this derivation? Using our E 1j = O j (N 1j /N j ) and V 1j = (N 1j N 2j O j (N j -O j ))/(N j 2 (N j -1)), we can apply weights w j ≡w(t j ) at each event time t j, j=1,…,J. Then, the weighted statistic (∑ j (w j (O 1j -E 1j )) 2 /∑ j w j V 1j still has a chi- squared distribution with 1 d.f. But, using w j ≡1 yields a test that is uniformly most powerful against all alternatives with proportional hazards (where “risk of event in Arm 1 vs. Arm 2” is constant throughout follow-up)
Why is this of interest? This is why we generally use a standard logrank (Mantel-Cox) test! But, programs like SAS will give or offer you other variants! –w j = N j (Gehan-Wilcoxon or Breslow test, gives more weight to earlier observations) –w j = estimated proportion surviving at t j (Peto-Peto-Prentice Wilcoxon test, fully efficient for alternatives where odds ratio of event for Arm 2 vs. Arm 1 constant over time) –There are classes of these test types…
Which test to use? For the practicing statistician, main thing is to be aware of these various test statistics and how they differ. We usually use the standard Mantel-Cox logrank test when planning a trial comparing survival curves, but whichever test is selected has to be prespecified. Can’t go hunting for significant p-values after an RCT. What about before?
Variance of Kaplan-Meier Estimates Using the delta method, if estimated survival at time t is S(t), an estimate of the variance of S(t) is S(t) 2 Σ (t i <=t) O i /((N i -O i )N i ). This estimate (Greenwood’s formula), used by SAS, is only modified at timepoints when events occur. Thus, estimated variance may not be proportional to subjects remaining at risk.
Variance of Kaplan-Meier Estimates You can implement Greenwood’s formula to get confidence intervals of a Kaplan-Meier rate at a particular timepoint, or to test hypotheses about that rate. Be aware that using the Kaplan-Meier nonparametric estimation approach will give substantially larger standard error estimates than parametrically based survival models.
Variance of Kaplan-Meier Estimates Peto (1977) derived alternative formula: if estimated survival at time t is S(t) and number at risk at time t is N(t), Peto estimate of the variance of S(t) is S(t) 2 [1-S(t)]/ N(t). Easy to compute, heuristically makes sense, perhaps(?) better when N smaller Worth knowing about and examining if testing an event rate is your primary goal! Neither great with N small/high censoring.
Interval Censoring A patient dropping out of analysis before having an event is “right censored”. A patient who is known to have an event during some interval is called “interval censored”. For example, a cardiac echo on 12/1/2009 shows a valve leak. Previous echo was on 11/1/2008, and the leak could have started anytime between the two echos.
Left Censoring A patient who is known to have an event, but not more specifically than before a particular date, is called “left censored”. For example (Klein/Moeschberger), a study looking at time till first marijuana use may have information from some kids who report use, but cannot recall the date at all. Special case of interval censoring.
Interval Censoring Turnbull (1976) developed a nonparametric survival curve estimator applied to interval-censored data. Basically, an EM algorithm is used to compute the nonparametric MLE product-limit curve iteratively. Recently, more efficient algorithms have been developed to compute Kaplan-Meier type estimates.
Interval Censoring Just something to be aware of, if you encounter this type of data. SAS macros %EMICM and %ICSTEST allow construction of nonparametric survival curves with interval-censored data, and their comparison using a generalized version of the logrank test. R also has routines (“interval” package)
Not a Pop Quiz, but… A statistician familiar with survival basics may sometimes encounter data that didn’t turn out quite as expected. Here is an example I’ve certainly seen, though usually less extreme. I show hypothetical data on a six-month study with about 40 patients in each of two treatment groups…
Crossing Curves Example If this were a pilot observational study, logrank test is clearly inappropriate for a future trial if data are like this. Can consider a different test, shorter follow-up, or otherwise varying design of future trial, depending on setting (e.g., survival after cancer therapy) If these are results of an actual RCT with a prespecified logrank test, you are at least partly stuck.
Basic Survival Analysis in SAS To do Kaplan-Meier and logrank in SAS: proc lifetest data=curves plots=s; time days*event(0); strata group; run; plots=s option asks for a Kaplan-Meier plot time statement: days is time on study, event is outcome indicator, with 0 as the value for patients who are censored (versus those who had an event). strata statement: compare levels of group
Basic Survival Analysis in R To do Kaplan-Meier curves in R: > library(survival); > mfit=survfit(Surv(days,event==1)~group, data=curves); plot(mfit); In the survival library, Surv() creates a survival object. First argument is follow-up time, second is a status indicator (if true, event occurred, otherwise censoring). survfit() then computes Kaplan-Meier estimator. Logrank test: survdiff(Surv(days,event==1) ~group, data=curves)
Summary I have presented basic, nonparametric approaches to analysis of survival data. Kaplan-Meier curves and the logrank test should be a part of every biostatistician’s toolbox. Same is true for proportional hazards models, to be discussed next week by Nan Hu.
Summary The survival analysis assumption of “noninformative censoring” usually cannot be formally tested and should be assumed not to hold. So, if a study has substantial dropout, there may be bias. Compare the characteristics of dropouts to others to get an idea of how bad the situation could be.
Summary The usual logrank test can be viewed as just one member of a class of tests with different weightings of each event time. This test weights all event times equally, and is usually preferred as it’s uniformly most powerful when one treatment’s benefit (in terms of relative risk) is constant throughout follow-up. Variants may be preferred for nonstandard scenarios.
Bibliography Fleming R, Harrington DP. Counting Processes and Survival Analysis. 1991. Wiley. Kaplan EL, Meier P (1958). Nonparametric estimation from incomplete observations. JASA 53:457–481, 1958. Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. 2 nd ed. 2003. Springer. Mantel, N (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports 50(3): 163–70. Peto R, Pike MC, Armitage P, et al. (1977) Design and analysis of randomized clinical trials requiring prolonged observation of each patient, Part II. British Journal of Cancer 35: 1-39. Turnbull, B (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. JRSS-B, 38: 290-295.
Next Seminar Nan Hu, Ph.D. will be discussing the Cox proportional hazards model, one week from today on September 30 th.