Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Complex Survey Data

Similar presentations


Presentation on theme: "Analysis of Complex Survey Data"— Presentation transcript:

1 Analysis of Complex Survey Data
Day 4: Survival analysis and Cox proportional hazards models

2 Nonparametric Survival Analysis
Kaplan-Meier Method (also called Product-Limit Method) Life Table Method (also called Actuarial Method)

3 Nonparametric Survival Analysis
A statistical method to study time to an event Divide risk period into many small time intervals 2) Treat each interval as a small cohort analysis 3) Combine the results for the intervals

4 Basic Concepts of Survival Analysis
Censoring Time to an event Survival Function

5 Censoring At the end of study, subjects did not experience the event (outcome). Or subjects withdrew from a study (lost to follow up or died from other diseases). Survival analysis assumes LTF and competing cause censoring is random (independent of exposure and outcome) When using longitudinal complex surveys (e.g., PSID, AddHealth), survival analysis is most useful We can also use it in cross-sectional studies when incorporating retrospective age of onset information. Right censoring example: Study the time to death after lung cancer diagnosis Known time to death: for those who died (uncensored observation) Unknown time to death as of the end of the study period: for those who survive (censored observation) Interval censoring: Subjects experience the event (outcome) within an interval. Example: (1) Framingham Heart Study. The ages at which subjects first developed coronary heart disease (CHD) are usually known exactly. However, the ages of first occurrence of the subcategory angina pectoris may be known only to be between two clinical examinations, approximately two years apart. (2) annual HIV testing: a person was tested negative at then end of year 2 and is found to be infected at the end of year 3. The time of infection is interval censored between year 2 and year 3.

6 Censoring Example: Cohort Size at Start : 1,000 for 1 year
Number with disease : 28 Number LTF: 15 If assume all dropped out on 1st day of study, rate of disease/y 28 1, = 985 = .0284 = If assume all dropped out on last day of study, probability of disease 28 1,000 = .0280 = If drop out rate is constant over the period best estimate of when dropped out is midpoint : probability of disease then is 28 1,000 – 7.5 = 992.5 = .0282 =

7 Survival Function The probability of surviving beyond a specific time [i.e., S(t) = 1 – F(t)] F(t) = cumulative probability distribution for endpoint (e.g., death)

8 S4 S3 F S2 F S1 F F Probability for survival at each new time period =
Probability at that time period conditioned “surviving” to that interval S4 q S3 p F S2 o F S1 Probability survival to S4 = n n * o * p * q F Failures (F) = deaths or cases or losses to follow up F

9 Life Table Method A classical method of estimating the survival function in epidemiology and actuarial science Time is partitioned into a fixed sequence of intervals (not necessarily of equal lengths) Interval lengths (arbitrary) Larger the interval, larger the bias Useful for large samples

10 Thus, effective sample size (n*)= n – ½ (censoring #)
The LIFETEST Procedure Stratum 1: platelet = 0 Life Table Survival Estimates Conditional Effective Conditional Probability Interval Number Number Sample Probability Standard [Lower, Upper) Failed Censored Size of Failure Error Survival Failure N* Effective sample size: whenever there is censoring (withdrawal or loss), we assume that, on average, those individuals who became lost or withdrawn during the interval were at risk for half the interval. Censored at midpoint of the interval: it is equivalent to assuming that the distribution of censoring time is uniform within the interval. Thus, effective sample size (n*)= n – ½ (censoring #) E.g., effective sample size (1st interval) = 9 – ½ (0) = 9 E.g., effective sample size (2nd interval) = 5 – ½ (1) = 4.5

11 e.g., P(F) (1st interval) = 4/9 = .44
The LIFETEST Procedure Stratum 1: platelet = 0 Life Table Survival Estimates Conditional Effective Conditional Probability Interval Number Number Sample Probability Standard [Lower, Upper) Failed Censored Size of Failure Error Survival Failure Cumulative Survival P(F) Conditional Probability of Failure: Number failed / Effective Sample Size Conditional probability of failure: is an estimate of the probability that a patient will die during the interval, given that he/she made it to the start of the interval. e.g., P(F) (1st interval) = 4/9 = .44 e.g., P(F) (2nd interval) = 2/4.5 = .44 Survival probability (in each interval) = 1- failure probability (in each interval) Cum Survival Prob (S(t)) = S (t-1) * S(t) e.g., S(1) = 1 * ( ) = 1* =.5556 e.g., S(2) = S(0)* S(1) * S(2) S(2) =1*( )* ( ) =1 * * = .3086

12 Kaplan-Meier (Product-limit) Method
Time is partitioned into variable intervals Whenever a case arises, set up a time interval. Use the actual censored and event times If censored times > last event time, then the average duration will be underestimated using KM method

13 Kaplan-Meier Method Lost to follow-up Lost to follow-up 4 10 14 24
Patient 1 died Patient 2 Lost to follow-up Patient 3 died Patient 4 died Patient 5 Lost to follow-up Patient 6 died 4 10 14 24 Months Since Enrollment

14 Kaplan-Meier Method (1)
Times to death from starting treatment (Months) (2) Number alive at each time (3) Number who died at each time (4) HAZARD Proportion who died at that time: (3)/(2) (5) Proportion who survived at that time: 1.00-(4) (6) Cumulative Survival 4 6 1 .167 .833 10 .250 .750 .625 14 3 .333 .667 .417 24 1.00 .000

15 Kaplan-Meier Plot (N=6)
% Surviving 100 .833 80 .625 60 .417 40 20 .0 4 10 14 24 Months After Enrollment

16 Kaplan-Meier Curve (N = 5,398)
. Tort No Fault 1 No Fault 2 “Effect of eliminating compensation for pain and suffering on the outcome of insurance claims for whiplash injury” Cassidy JD et al., N Engl J Med 2000;342:

17 Median Survival Time Tort No Fault 1 No Fault 2

18 Semi-Parametric Methods
Not required to choose some particular probability distribution to represent survival time Incorporate time-dependent covariates Example: exposure increases over time as with drug dosage or with workers in hazardous occupations

19 Cox Proportional Hazards Model
Basic Model of the hazard for individual i at time t hi(t) = 0(t) exp{β1xi1 + ….. + βkxik} Baseline hazard function Non-negative Linear function of fixed covariates Take the logarithm of both sides, log hi(t) = (t) +β1xi1 + ….. + βkxik No need to specify the functional form of baseline hazard function log 0(t)

20 Cox Proportional Hazards Model
Consider the hazard ratio of two individuals i and j hi(t) = 0(t) exp{β1xi1 + ….. + βkxik} hi(t) = 0(t) exp{β1xj1 + ….. + βkxjk} Hazard ratio = exp{β1(xi1 -xj1) ….. + βk(xik-xjk)} Hazard functions are multiplicatively related, hazard ratio is constant over survival time. Hazards of any two individuals are proportional.

21 Cox Proportional Hazards Model
2. Partial Likelihood Estimation Estimate the β coefficients of the Cox model without having to specify the baseline hazard function 0(t) Partial likelihood depends only on the order in which events occur, not on the exact times of occurrence. Partial likelihood estimates are not fully efficient because of loss of information about exact times of event occurrence

22 Interpretation of Coefficients
No intercept h0(t): an arbitrary function of time. Cancel out of the estimating equations eβ: Hazard ratio Indicator variables (coded as 0 and 1) Hazard ratio of the estimated hazard for those with a value of 1 in X to the estimated hazard for those with a value of 0 in X (controlling for other covariates) Quantitative (Continuous) variables Estimated percent change in the hazard for each one-unit increase in X. For example, variable AGE, eβ=1.5, which yields 100( ) =50. For each one-year increase in the age at diagnosis, the hazard of death goes up by an estimated 50 percent, controlling for other covariates.

23 Lab 4: estimating survival curves and Cox models in SUDAAN


Download ppt "Analysis of Complex Survey Data"

Similar presentations


Ads by Google