Presentation is loading. Please wait.

Presentation is loading. Please wait.

HSRP 734: Advanced Statistical Methods July 10, 2008.

Similar presentations


Presentation on theme: "HSRP 734: Advanced Statistical Methods July 10, 2008."— Presentation transcript:

1 HSRP 734: Advanced Statistical Methods July 10, 2008

2 Objectives Describe the Kaplan-Meier estimated survival curve Describe the Kaplan-Meier estimated survival curve Describe the log-rank test Describe the log-rank test Use SAS to implement Use SAS to implement

3 Kaplan-Meier Estimate of Survival Function S(t) The Kaplan-Meier estimate of the survival function is a simple, useful and popular estimate for the survival function. The Kaplan-Meier estimate of the survival function is a simple, useful and popular estimate for the survival function. This estimate incorporates both censored and noncensored observations This estimate incorporates both censored and noncensored observations Breaks the estimation problem down into small pieces Breaks the estimation problem down into small pieces

4 Kaplan-Meier Estimate of the Survival Function S(t) For grouped survival data, For grouped survival data, Let interval lengths L j become very small – all of length L=  t and let t 1, t 2, … be times of events (survival times) Let interval lengths L j become very small – all of length L=  t and let t 1, t 2, … be times of events (survival times)

5 Kaplan-Meier Estimate of the Survival Function S(t) 2 cases to consider in the previous equation 2 cases to consider in the previous equation Case 1. No event in a bin (interval) Case 1. No event in a bin (interval) does not change — which means that we can ignore bins with no events does not change — which means that we can ignore bins with no events

6 Kaplan-Meier Estimate of the Survival Function S(t) Case 2. y j events occur in a bin (interval) Case 2. y j events occur in a bin (interval) Also: n j persons enter the bin assume any censored times that occur in the bin occur at the end of the bin

7 Kaplan-Meier Estimate of the Survival Function S(t) So, as  t → 0, we get the Kaplan- Meier estimate of the survival function S(t) So, as  t → 0, we get the Kaplan- Meier estimate of the survival function S(t) Also called the “product-limit estimate” of the survival function S(t) Also called the “product-limit estimate” of the survival function S(t) Note: each conditional probability estimate is obtained from the observed number at risk for an event and the observed number of events (n j -y j ) / n j Note: each conditional probability estimate is obtained from the observed number at risk for an event and the observed number of events (n j -y j ) / n j

8 Kaplan-Meier Estimate of Survival Function S(t) We begin by We begin by Rank ordering the survival times (including the censored survival times) Rank ordering the survival times (including the censored survival times) Define each interval as starting at an observed time and ending just before the next ordered time Define each interval as starting at an observed time and ending just before the next ordered time Identify the number at risk within each interval Identify the number at risk within each interval Identify the number of events within each interval Identify the number of events within each interval Calculate the probability of surviving within that interval Calculate the probability of surviving within that interval Calculate the survival function for that interval as the probability of surviving that interval times the probability of surviving to the start of that interval Calculate the survival function for that interval as the probability of surviving that interval times the probability of surviving to the start of that interval

9 Group Weeks in remission -- ie, time to relapse Maintenance chemo (X=1) 9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+ No maintenance chemo (X=0) 5, 5, 8, 8, 12, 16+, 23, 27, 30+, 33, 43, 45 Example - AML + indicates a censored time to relapse; e.g., 13+ = more than 13 weeks to relapse

10 Example – AML Calculation of Kaplan-Meier estimates: Calculation of Kaplan-Meier estimates: In the “not maintained on chemotherapy” group: Time At risk Events tjtjtjtj njnjnjnj yjyjyjyj 01201.000 5122 1.000 x ((12-2)/12) = 0.833 8102 0.833 x ((10-2)/10) = 0.666 1281 0.666 x ((8-1)/8) = 0.583 2361 0.583 x ((6-1)/6) = 0.486 2751 0.486 x ((5-1)/5) = 0.389 3331 0.389 x ((3-1)/3) = 0.259 4321 0.259 x ((2-1)/2) = 0.130 4511 0.130 x ((1-1)/1) = 0

11 Example – AML (cont’d) In the “maintained on chemotherapy” group: Time At risk Events tjtjtjtj njnjnjnj yjyjyjyj 01101.000 9111 1.000 x ((11-1)/11) = 0.909 13101 0.909 x ((10-1)/10) = 0.818 18810.716 23710.614 31510.491 34410.368 48210.184

12 Example – AML (cont’d) The “Kaplan-Meier curve” plots the estimated survival function vs. time — separate curves for each group The “Kaplan-Meier curve” plots the estimated survival function vs. time — separate curves for each group

13 Example – AML (cont’d) Notes Notes — Can count the total number of events by counting the number of steps (times) — If feasible, picture the censoring times on the graph as shown above.

14 Kaplan-Meier Estimate Using SAS

15 Comments on the Kaplan-Meier Estimate If the event and censoring times are tied, we assume that the censoring time is slightly larger than the death time. If the event and censoring times are tied, we assume that the censoring time is slightly larger than the death time. If the largest observation is an event, the Kaplan-Meier estimate is 0. If the largest observation is an event, the Kaplan-Meier estimate is 0. If the largest observation is censored, the Kaplan-Meier estimate remains constant forever. If the largest observation is censored, the Kaplan-Meier estimate remains constant forever.

16 Comments on the Kaplan-Meier Estimate If we plot the empirical survival estimates, we observe a step function. If there are no ties and no censoring, the step function drops by 1/n. If we plot the empirical survival estimates, we observe a step function. If there are no ties and no censoring, the step function drops by 1/n. With every censored observation the size of the steps increase. With every censored observation the size of the steps increase. When does the number of intervals equal the number of deaths in the sample? When does the number of intervals equal the number of deaths in the sample? When does the number of intervals equal n? When does the number of intervals equal n?

17 Comments on the Kaplan-Meier Estimate The Kaplan-Meier is a consistent estimate of the true S(t). That means that as the sample size gets large, KM estimate converges to the true value. The Kaplan-Meier is a consistent estimate of the true S(t). That means that as the sample size gets large, KM estimate converges to the true value. The Kaplan-Meier estimate can be used to empirically estimate any cumulative distribution function The Kaplan-Meier estimate can be used to empirically estimate any cumulative distribution function

18 Comments on the Kaplan-Meier Estimate The step function in K-M curve really looks like this: The step function in K-M curve really looks like this: If you have a failure at t 1 then you want to say survivorship at t 1 should be less than 1. If you have a failure at t 1 then you want to say survivorship at t 1 should be less than 1. For small data sets it matters, but for large data sets it does not matter. For small data sets it matters, but for large data sets it does not matter.

19 Confidence Interval for S(t) – Greenwood’s Formula Greenwood’s formula for the variance of : Greenwood’s formula for the variance of : Using Greenwood’s formula, an approximate 95% CI for S(t) is Using Greenwood’s formula, an approximate 95% CI for S(t) is There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1) There is a “problem”: the 95% CI is not constrained to lie within the interval (0,1)

20 Confidence Interval for S(t) – Alternative Formula Based on log(-log(S(t)) which ranges from -∞ to ∞ Based on log(-log(S(t)) which ranges from -∞ to ∞ Find the standard error of above, find the CI of above, then transform CI to one for S(t) Find the standard error of above, find the CI of above, then transform CI to one for S(t) This CI will lie within the interval [0,1] This CI will lie within the interval [0,1] This is the default in SAS This is the default in SAS

21 Log-rank test for comparing survivor curves Are two survivor curves the same? Are two survivor curves the same? Use the times of events: t 1, t 2,... Use the times of events: t 1, t 2,... (do not include censoring times) Treat each event and its “set of persons still at risk” (i.e., risk set) at each time t j as an independent table Treat each event and its “set of persons still at risk” (i.e., risk set) at each time t j as an independent table Make a 2×2 table at each t j Make a 2×2 table at each t j Event No Event Total Group A ajajajaj n jA - a j n jA Group B cjcjcjcj n jB -c j n jB Total djdjdjdj n j -d j njnjnjnj

22 Log-rank test for comparing survivor curves At each event time t j, under assumption of equal survival (i.e., S A (t) = S B (t) ), the expected number of events in Group A out of the total events (d j =a j +c j ) is in proportion to the numbers at risk in group A to the total at risk at time t j : At each event time t j, under assumption of equal survival (i.e., S A (t) = S B (t) ), the expected number of events in Group A out of the total events (d j =a j +c j ) is in proportion to the numbers at risk in group A to the total at risk at time t j : Ea j = d j x n jA / n j Differences between a j and Ea j represent evidence against the null hypothesis of equal survival in the two groups Differences between a j and Ea j represent evidence against the null hypothesis of equal survival in the two groups

23 Log-rank test for comparing survivor curves Use the Cochran Mantel-Haenszel idea of pooling over events j to get the log-rank chi-squared statistic with one degree of freedom Use the Cochran Mantel-Haenszel idea of pooling over events j to get the log-rank chi-squared statistic with one degree of freedom

24 Log-rank test for comparing survivor curves Idea summary: Idea summary: Create a 2x2 table at each uncensored failure time Create a 2x2 table at each uncensored failure time The construct of each 2x2 table is based on the corresponding risk set The construct of each 2x2 table is based on the corresponding risk set Combine information from all the tables Combine information from all the tables The null hypothesis is S A (t) = S B (t) for all time t. The null hypothesis is S A (t) = S B (t) for all time t.

25 Comparisons across Groups Extensions of the log-rank test to several groups require knowledge of matrix algebra. In general, these tests are well approximated by a chi- squared distribution with G-1 degrees of freedom. Extensions of the log-rank test to several groups require knowledge of matrix algebra. In general, these tests are well approximated by a chi- squared distribution with G-1 degrees of freedom. Alternative tests: Alternative tests: Wilcoxon family of tests (including Peto test) Wilcoxon family of tests (including Peto test) Likelihood ratio test (SAS) Likelihood ratio test (SAS)

26 Comparison between Log-Rank and Wilcoxon Tests The log-rank test weights each failure time equally. No parametric model is assumed for failure times within a stratum. The log-rank test weights each failure time equally. No parametric model is assumed for failure times within a stratum. The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more weight tends to be given to early failure times. As in the log-rank test, no parametric model is assumed for failure times within a stratum. The Wilcoxon test weights each failure time by a function of the number at risk. Thus, more weight tends to be given to early failure times. As in the log-rank test, no parametric model is assumed for failure times within a stratum. Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at picking up early departures from the null hypothesis and the log-rank test will tend to be more sensitive to departures in the tail. Between these two tests (Wilcoxon and log-rank tests), the Wilcoxon test will tend to be better at picking up early departures from the null hypothesis and the log-rank test will tend to be more sensitive to departures in the tail.

27 Comparison with Likelihood Ratio Test in SAS The likelihood ratio test employed in SAS assumes the data within the various strata are exponentially distributed and censoring in non- informative. Thus, this is a parametric method that smoothes across the entire curve. The likelihood ratio test employed in SAS assumes the data within the various strata are exponentially distributed and censoring in non- informative. Thus, this is a parametric method that smoothes across the entire curve.


Download ppt "HSRP 734: Advanced Statistical Methods July 10, 2008."

Similar presentations


Ads by Google