# Surviving Survival Analysis

## Presentation on theme: "Surviving Survival Analysis"— Presentation transcript:

Surviving Survival Analysis
Today’s Talk: Surviving Survival Analysis By Kelley Mizukami By Dr. Olga Korosteleva

OUTLINE What is Survival Analysis? Censored Data
Kaplan-Meier Estimator Log-Rank Test Cox Regression Model

WHAT IS SURVIVAL ANALYSIS?
Branch of statistics that focuses on time-to-event data and their analysis. Survival data deal with time until occurrence of any well-defined event. The outcome variable examined is the survival time (the time until the occurrence of the event). Special because it can incorporate information about censored data into analysis.

OBJECTIVES OF SURVIVAL ANALYSIS?
Estimate probability that an individual surpasses some time-to-event for a group of individuals. Ex) probability of surviving longer than two months until second heart attach for a group of MI patients. Compare time-to-event between two or more groups. Ex) Treatment vs placebo patients for a randomized controlled trial. Assess the relationship of covariates to time-to-event. Ex) Does weight, BP, sugar, height influence the survival time for a group of patients?

SITUATIONS WHEN WE CAN USE SURVIVAL ANALYSIS
We can use survival analysis when you wish to analyze survival times or “time-to-event” times “Time-to-Event” include: Time to death Time until response to a treatment Time until relapse of a disease Time until cancellation of service Time until resumption of smoking by someone who had quit Time until certain percentage of weight loss

MORE EXAMPLES Suppose you wish to analyze the time it takes for a student to complete a series of classes. Response /Status Variable: Time it takes to complete, status Predictor Variables: Age, Gender, Race, GPA Suppose you wish to analyze the time between admittance to the hospital until death for a lung cancer patient. Response/Status Variables : Length-of-Follow up, status Predictor Variables: Age, Gender, Race, White Blood Counts, Tumor Type, Treatment Type, Cancerous Mass Size

MORE EXAMPLES Suppose you are interested in comparing the time until you lose 10% body weight on one of two exercise programs. Response/Status Variables: Time it Takes, Status Predictor Variables: Age, Gender, Starting Weight, BP, BMI, Exercise Program Suppose you are interested in the time it takes before one sees results for a certain treatment. Predictor Variables: Age, Gender, Type of Treatment, Weight, Height, exercise (Y/N), healthy eating (Y/N)

MORE EXAMPLES Suppose you wish to compare the time it takes before you cancel your cable TV service when you use two different cable providers. Response/Status Variables: Time it Takes, Status Predictor Variables: Age, Gender, Race, Cable Provider, Average Income, Average number of complaints per month

DATA Survival data can be one of two types: Complete Data
Censored Data Complete data – the value of each sample unit is observed or known. Censored data – the time to the event of interest may not be observed or the exact time is not known. We distinguish complete data from censored data by adding a “+” to any values that are censored. (i.e. 4+)

CENSORED DATA Censored data can occur when:
The event of interest is death, but the patient is still alive at the time of analysis. The individual was lost to follow-up without having the event of interest. The event of interest is death by cancer but the patient died of an unrelated cause, such as a car accident. The patient is dropped from the study without having experienced the event of interest due to a protocol violation. Even if an observation is censored we will still include it in our analysis.

FUNCTION DESCRIBING SURVIVAL TIMES
𝑇 is a random variable that represents survival time. The distribution of survival time can be described by the survival function.

SURVIVAL FUNCTION Let T denote the survival time, a random variable with the survival function: 𝑠 𝑡 =𝑃(𝑇≥𝑡) Probability that a subject selected at random survives longer than time t. Properties 𝑠 𝑡=0 =1 𝑠(𝑡) is bounded by 0 and 1, it is a probability 𝑠(𝑡) is a non-increasing function

SURVIVAL FUNCTION If there is no censoring, then a good estimator of 𝑠(𝑡), at time 𝑡, is: 𝑠 𝑡 = number of patients surviving longer than time 𝑡 total number of patients on trial But usually there is censoring. Therefore we can estimate 𝑠(𝑡) using the Kaplan-Meier estimator.

KAPLAN-MEIER ESTIMATOR

KAPLAN-MEIER (KM) ESTIMATOR
Helps us find 𝑠(𝑘) when there are censored data. To find this KM estimator break up survival probability into a sequence of conditions. The probability of surviving 𝑘 (𝑘 ≥2) or more years from the beginning of the study is a product of observed survival rates. 𝑠 𝑘 = 𝑝 1 𝑝 2 𝑝 3 ⋯ 𝑝 𝑘

KAPLAN-MEIER ESTIMATOR
𝑠 𝑡 = 𝑗| 𝑡 (𝑗) ≤𝑡 𝑝 𝑗 = 𝑗| 𝑡 (𝑗) ≤𝑡 𝑛 𝑗 − 𝑑 𝑗 𝑛 𝑗 𝑝 𝑗 : estimated by the proportion of people living through 𝑡 (𝑗) out of those who have survived beyond 𝑡 (𝑗−1) 𝑛 𝑗 : number at risk at 𝑡 (𝑗) 𝑑 𝑗 : number who died at 𝑡 (𝑗) 𝑛 𝑗 - 𝑑 𝑗 = number who survived beyond 𝑡 (𝑗)

HOW TO CALCULATE THE KM ESTIMATOR
EVENT TIMES (n=12): RECALL: 𝑠 𝑡 = 𝑗| 𝑡 (𝑗) ≤𝑡 𝑝 𝑗 = 𝑗| 𝑡 (𝑗) ≤𝑡 𝑛 𝑗 − 𝑑 𝑗 𝑛 𝑗 𝑠 2 = 𝑠 0 𝑝 2 = =0.92 Skip censoring points since they don’t change until we get to the next time point. 𝑠 5 = 𝑠 2 𝑝 5 = =0.825 𝑠 6 = 𝑠 5 𝑝 6 = =0.73 𝑠 10 = 𝑠 6 𝑝 10 = =0.63 𝑠 16 = 𝑠 10 𝑝 16 = −2 5 = =0.37 𝑠 27 = 𝑠 16 𝑝 27 = =0.25 𝑠 30 = 𝑠 27 𝑝 30 = =0.13 𝑠 32 = 𝑠 30 𝑝 32 = =0

SURVIVAL CURVE

EXAMPLE DATA The MYEL Data Set: Myelomatosis Patients
The MYEL data set contains survival times for 25 patients diagnosed with myelomatosis (Peto et al., 1977). The patients were randomly assigned to two drug treatments. The variables are as follows: DUR is the time in days from the point of randomization to either death or censoring STATUS has a value of 1 if dead and a value of 0 if alive. This tells is that the censored value will be 0 if the patient is alive and 1 or uncensored if they are dead TREAT specifies a value of 1 or 2 that corresponds to the two treatments. RENAL has a value of 1 if renal functioning was normal

WHAT DO THE DATA LOOK LIKE?
Snapshot of the data dur status treat renal 8 1 180 2 632 852 52 2240 220 63 195 76 70

KM EXAMPLE USING SPSS Analyze > Survival > Kaplan Meier
Time: Dur Status: status(1) Here define 1 since it the value indicating event has occurred (i.e. death) Options: Check off survival plot

OUTPUT

OUTPUT

OUTPUT

Comparing the survival curves of two treatment groups
LOG-RANK TEST Comparing the survival curves of two treatment groups

LOG-RANK TEST Use the Log-Rank Test to compare the survival functions of two samples. H0: The two survival functions are the equivalent Ha: The two survival functions are different

TEST STATISTIC Test statistic: 𝑄= ( 𝑂 𝐴 − 𝐸 𝐴 ) 2 𝑗=1 𝐽 𝑣𝑎𝑟( 𝑑 𝐴𝑗 )
𝑄= ( 𝑂 𝐴 − 𝐸 𝐴 ) 2 𝑗=1 𝐽 𝑣𝑎𝑟( 𝑑 𝐴𝑗 ) 𝑂 𝐴 = 𝑗=1 𝐽 𝑑 𝐴𝑗 : Total observed deaths from group A 𝐸 𝐴 = 𝑗=1 𝐽 𝐸( 𝑑 𝐴𝑗 ) : Total expected deaths from group A 𝐸 𝑑 𝐴𝑗 = 𝑑 𝑗 𝑛 𝐴𝑗 𝑛 𝑗 𝑣𝑎𝑟 𝑑 𝐴𝑗 = 𝑛 𝐴𝑗 𝑛 𝐵𝑗 𝑑 𝑗 ( 𝑛 𝑗 − 𝑑 𝑗 ) 𝑛 𝑗 2 ( 𝑛 𝑗 −1)

EXAMPLE USING SPSS Factor: Treat
Analyze > Survival > Kaplan Meier Time: dur Status: status(1) Here define 1 since it the value indicating event has occurred (i.e. death). Factor: Treat Options: Check off survival plot Click on “Compare Factor” and choose “Log-Rank”

OUTPUT

OUTPUT

OUTPUT

Incorporating Covariates
COX REGRESSION MODEL Incorporating Covariates

SURVIVAL MODELS Models that relate the time that passes before some event occurs to one or more covariates that may be associated with that amount of time.

COX REGRESSION MODEL This model produces a survival function that predicts the probability that an event has occurred at a given time t, for given predictor variables (covariates).

COX REGRESSION MODEL 𝜆 𝑡, 𝑥 𝑖 = 𝜆 0 𝑡 𝑒 𝛽 ′ 𝑥 𝑖 𝑡 is the time
𝜆 𝑡, 𝑥 𝑖 = 𝜆 0 𝑡 𝑒 𝛽 ′ 𝑥 𝑖 𝑡 is the time 𝑥 𝑖 are the covariates for the 𝑖 th individual 𝜆 0 𝑡 is the baseline hazard function. This is the function when all the covariates equal to zero.

HAZARD FUNCTION The hazard function:
𝜆 𝑡 = lim Δ𝑡 →0 𝑃 𝑡<𝑇<𝑡+Δ𝑡 𝑇≥𝑡) ∆ 𝑡 This is the risk of failure immediately after time 𝑡, given they have survived past time t.

INTERPRETATION OF THE BETAS
First we need to find the ratio when there is a one unit increase in the covariate, provided the other covariates stay fixed. 𝜆(𝑡, 𝑥 1 +1) 𝜆(𝑡, 𝑥 1 ) = 𝜆 0 𝑡 𝑒 𝛽 1 ( 𝑥 1 +1) 𝜆 0 𝑡 𝑒 𝛽 1 ( 𝑥 1 ) = 𝑒 𝛽 1 We interpret 𝛽 1 as the increase in log hazard per unit of 𝑥.

EXAMPLE USING SPSS Analyze > Survival > Cox Regression
SPSS fits the model with minus beta coefficients: 𝜆 𝑡, 𝑥 𝑖 = 𝜆 0 𝑡 𝑒 −𝛽 ′ 𝑥 𝑖 It has to be taken into account when interpreting the coefficients Time: Dur Status: status(1) Censoring value: 1 Covariates: treat, renal Categorical: treat, renal

OUTPUT

OUTPUT Interpretation:
The hazard for patients receiving treatment 2 is 28.8% of that for treatment 1 patients. Patients with normal renal function have 1.6% hazard as compared to those whose renal function is abnormal.