Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kaplan-Meier and Nelson-Aalen Estimators

Similar presentations


Presentation on theme: "Kaplan-Meier and Nelson-Aalen Estimators"— Presentation transcript:

1 Kaplan-Meier and Nelson-Aalen Estimators
Donald Pierce For continuous time the failure rate or hazard function is just like what one would think of something like “death rate”, where is the density and is the survival function Hence we have the important relation where is the cumulative hazard

2 We are concerned today with nonparametric estimation of the survival function and/or the hazard function based on a random sample with censoring (without censoring one would estimate simply by the proportion of the sample failing after time T) The response time for an individual is censored if that individual is removed from follow-up at a known time, so the failure time is only known to be at least as large as the censoring time Censoring often happens at the end of the follow-up time, but also often happens at times intermixed with the failure times This can be due to dropouts in a medical study, but more interestingly for the following reason

3 Censoring (right) and left-truncation (delayed entry)
Consider a study of people who enter at different ages x x x x Study time End x x Age

4 Much reasoning for estimation can be based on considering a partition of the time scale into fixed intervals, and the fact that we can estimate the failure probability for an interval conditionally on the number at risk at the beginning of that interval We can then multiply such conditional probabilities to obtain an unconditional estimate of survival to a given time, and then pass to the limit as the intervals become arbitrarily narrow Except for the limiting argument this has been done in actuarial work for hundreds of years to obtain estimates of survival probabilities In the 1970’s and 1980’s there was enormous development in terms of martingale theory, with more formal emphasis on hazard functions as opposed to survival functions Hazard functions are so conditional that things are much simpler More practically, with delayed entry as above one can estimate hazard functions while it is hopeless to estimate survival probabilities, due to the variously conditional nature of the data

5 A main thing on my mind is the important distinctions between estimating survival functions and estimating hazard functions As I just said, the latter is far more straightforward, and also more generally applicable (however, integrated hazards present the same difficulties as survival functions for delayed entry since the lower limit on the integral is then the entry time) The main theme of my Padua short course is that estimation of rates can be far more useful than trying to put things in terms of distributions of random variables Not only are there practical advantages in estimating rates, this fits in better with the martingale approach (which is part of the gain in that approach) The following martingale results are given here largely to show that it is the hazard function that enters into things, and nothing about the survival function or conditioning on delayed entry times

6 Elements of counting process, for case where each subject can respond only once. Consider first only the individual : 0-1 indicator of failure in [0,s] right continuous : 0-1 indicator of being at risk at (s-) left continuous failure time

7 For the counting process martingale is then
where the second term is called the compensator of The martingale defining property can be stated as that The two most important derived properties are that (a) (b) Increments of over disjoint intervals are uncorrelated

8 Kaplan-Meier estimator: This is motivated by considering some fixed time intervals and a factoring as This line of thought leads to the estimator As the intervals become arbitrarily narrow, this becomes Where etc. Understand that is the number of failures at time , and is the number at risk just before that time

9 Nelson–Aalen estimator: Thinking again of time intervals, for each interval the estimate of failure rate is the number of failures in the interval divided by the number at risk at the start of the interval As the intervals become arbitrarily narrow, we have that This is nonzero only at failure times, and at those values it is (with no ties) 1 over the number at risk just before the failure The Nelson-Aalen estimator of the cumulative or integrated hazard sums these estimates over failures up to time

10 These both pertain to discrete distributions, requiring the following alterations to the hazard – survival relation For discrete distributions the hazard is defined as Then at the kth ordered value of the random variable and in connection with the previous relation

11 Many think of the K-M and N-A estimators as alternatives, but in fact they provide exactly the same discrete distribution and thus are just different representations of the same thing This provides the correct relationship as the one for discrete distributions It is true, though, then when thinking of these as estimators for continuous distributions, they do provide different estimates since then one expects to have but in fact They are very nearly equal, though, for time ranges where the jumps of the N-A estimator are small

12 Some useful (highly selected) references (for my course):
Kalbfleisch and Prentice, The Statistical Analysis of Failure Time Data, 2nd Ed, Wiley Therneau and Grambsch, Modeling Survival Data: Extending the Cox Model, Springer Andersen, Borgan, Gill and Keiding, Statistical Models Based on Counting Processes, Springer-Verlag Breslow and Day, Statistical Methods in Cancer Research Vol II, The Design and Analysis of Cohort Studies, Int’l Agency for Research on Cancer, Lyon Fleming and Harrington, Counting Processes and Survival Data, Wiley Hosmer and Lemeshow, Applied Survival Analysis: Regression Modeling of Time-to_Event Data, Wiley


Download ppt "Kaplan-Meier and Nelson-Aalen Estimators"

Similar presentations


Ads by Google