Presentation is loading. Please wait.

Presentation is loading. Please wait.

Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.

Similar presentations


Presentation on theme: "Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring."— Presentation transcript:

1 Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring –the study ends before the event occurs –the subject is lost to follow-up during the study (e.g., they could have moved out of town) –the subject withdraws from the study because of death (assuming death is not the event of interest) or because of some other reason. these are all “right-censored” since the event occurs to the right (larger than) the time we last observe “left-censoring” is also possible - examples? they are Type I censored since the actual number of censored observations is not known in advance

2 See Fig.5.1 on p.74 - “R” corresponds to an uncensored observation We represent censored data as ordered pairs –(def. 5.2): Y 1,Y 2,…Y n are right-censored by t 1,t 2,…t n if the sample consists of (Z i,  i ), where Z i =min(Y i,t i ), and –Note that t i is the value of Z i when the observation is censored, and Y is observed when uncensored. –assume Y’s are independent of the t’s

3 Example 5.2 (Stanford Heart Transplant Data) - note the form of the dataset with the Days=Z, Cens=delta, other explanatory variables are Age, T5, etc. Note in Example 5.3 the notation of using a “+” sign to represent a right-censored observation; the survival variable is astrocytomas survival time until death resulting from tumors Go back to Exercise 4.5 on page 68. Note the censoring in the treatment group (with “+”) but not in the placebo group - what is the meaning of these censored observations? Now let’s write the data in Exercise 4.5 in a form that it can be analyzed with various computer programs…

4 1.How many variables are there of interest here? (We need a column for each variable…). How many observations are there? [ I’ll propose id,remission time, censor indicator, group ] 2.Use Excel to organize the data and then we’ll read it into R (or SAS later) for analysis 3.Use read.csv(file=file.choose()) to get the data into R…

5 Section 5.4: Lifetable estimates –Divide the lifetime axis into fixed disjoint intervals –Estimate the conditional probability of survival across each interval –Estimate S (the survival) at the endpoints of the intervals The intervals of times are represented as the choice of the endpoints is up to the data analyst In a lifetable, the number at risk in any interval is the number alive and under consideration (not censored) at the start of the interval. For any interval I j, we write N j =number at risk in I j, ; D j = number of deaths (or observed failures) in I j, ; W j = number of observations censored in I j.

6 Note: N 1 =n, the total sample size is initially at risk N j = N j-1 - D j-1 - W j-1 ; this shows the propogation of those at risk in the j-1 interval to the j interval. Write: p j =P(surviving thru I j | alive at start of I j ) = P(Y > a j | Y > a j-1 ) = S(a j )/S(a j-1 ) Note that p 1 =S(a 1 ) since S(a 0 )=S(0)=1 Then p 2 =S(a 2 )/S(a 1 )=S(a 2 )/p 1 ; so S(a 2 )=p 2 p 1 ; Continuing p 3 =S(a 3 )/S(a 2 )=S(a 3 )/p 2 p 1 ; so S(a 3 )=p 3 p 2 p 1 ; and so forth til we get Theorem 5.1 (p. 82) which states that for every j, S(a j )=p j …p 3 p 2 p 1, where p j = the conditional probability of surviving across I j given alive at the start of I j. Use this theorem to estimate the survival at the endpoints of the intervals in the lifetable.

7 In order to get the S’s we need to estimate the p’s… The usual estimate of a proportion works here (5.3). Note that when estimating 1-p j we’re estimating the conditional probability of dying in the interval, given they were alive at the start of the interval… So: We define the effective number at risk as which essentially assumes the censoring occurs uniformly across the interval. So we apply this to our estimator above and get the actuarial estimate

8 If for a given j, N j ’=0, then take the estimate to be 0. So to estimate S(a j ), use Two basic assumptions for the construction of lifetables are: –censor times are independent of lifetimes…this assures the p j is the same for each individual – failure times and censor times in a given interval are uniformly distributed across the interval Think of a lifetable as a generalization of a frequency histogram that accounts for right censoring. See Example 5.6 on page 83-84 of melanoma survival (defined as time from first treatment for melanoma to death - in years). Let’s go over this data carefully to understand the computations…try in Excel…and later in SAS!

9 Greenwood’s formula gives an error bound around the lifetable estimates. I won’t go through the derivation, but if you’re interested, see pages 85-86. Theorem 5.3 (Greenwood’s Formula). The standard error of the lifetable estimate is given by This formula is usable as long as the effective number at risk is not too small in the intervals. See Example 5.7 on page 87 for a use of this formula. Go over Example 5.8 - use SAS HW: For Midterm: #5.2, 5.5, 5.7


Download ppt "Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring."

Similar presentations


Ads by Google