HSRP 734: Advanced Statistical Methods July 17, 2008.

Slides:



Advertisements
Similar presentations
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Advertisements

Continued Psy 524 Ainsworth
Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Overview of Logistics Regression and its SAS implementation
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
the Cox proportional hazards model (Cox Regression Model)
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Proportional Hazard Regression Cox Proportional Hazards Modeling (PROC PHREG)
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Chapter 11 Multiple Regression.
Event History Models Sociology 229: Advanced Regression Class 5
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Modeling clustered survival data The different approaches.
Generalized Linear Models
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Chapter 13: Inference in Regression
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
Survival Data John Kornak March 29, 2011
HSRP 734: Advanced Statistical Methods July 10, 2008.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Assessing Survival: Cox Proportional Hazards Model
CHAPTER 14 MULTIPLE REGRESSION
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
Lecture 12: Cox Proportional Hazards Model
Multiple Logistic Regression STAT E-150 Statistical Methods.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
Lecture 3: Parametric Survival Modeling
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Additional Regression techniques Scott Harris October 2009.
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
Generalized Linear Models
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Presentation transcript:

HSRP 734: Advanced Statistical Methods July 17, 2008

Objectives Describe and use the Cox proportional hazards model to describe and compare survival experiences Describe and use the Cox proportional hazards model to describe and compare survival experiences Use SAS to implement Use SAS to implement

From Stratification to Modeling What have we done so far? What have we done so far? Estimated the survival function with the minimum of assumptions Estimated the survival function with the minimum of assumptions Compared the survival function of various groups using nonparametric tests Compared the survival function of various groups using nonparametric tests Similar to a contingency table analysis, the above tests are somewhat limited to simple stratifications Similar to a contingency table analysis, the above tests are somewhat limited to simple stratifications

From Stratification to Modeling Goal: extend survival analysis to an approach that allows for multiple covariates of mixed forms (i.e., continuous, ordinal and nominal categorical) Goal: extend survival analysis to an approach that allows for multiple covariates of mixed forms (i.e., continuous, ordinal and nominal categorical) We have two options for our expansion We have two options for our expansion Model the survival function or time Model the survival function or time Model the hazard function (between 0 to ∞) Model the hazard function (between 0 to ∞)

Cox Proportional Hazards Model We will model the hazard function We will model the hazard function In the Cox proportional hazards model, we have a regression-based approach to survival analysis. In the Cox proportional hazards model, we have a regression-based approach to survival analysis.

What are Proportional Hazards The constant C does not depend on time The constant C does not depend on time

Cox Proportional Hazards Model Cox assumed this proportionality constant and proposed the following model. Cox assumed this proportionality constant and proposed the following model. where h 0 (t) is the baseline hazard; involves t but not X, is the exponential function; is the exponential function; involves X’s but not t ( as long as the X’s are time independent). involves X’s but not t ( as long as the X’s are time independent).

Cox Proportional Hazards Model Hazard rate = baseline hazard rate x positive term that depends on a “score” Hazard rate = baseline hazard rate x positive term that depends on a “score” Score = linear function of explanatory factors Score = linear function of explanatory factors Note: Baseline hazard rate is the same for everyone Note: Baseline hazard rate is the same for everyone “Score” may be negative “Score” may be negative

Cox Proportional Hazards Model The Cox proportional hazards (PH) model assumes one of many possible forms. The Cox proportional hazards (PH) model assumes one of many possible forms. We could use any function g(X) > 0. such that We could use any function g(X) > 0. such that

Cox Proportional Hazards Model In the Cox PH model, we do not include an intercept term. This is because any intercept term could be incorporated into the baseline hazard. In the Cox PH model, we do not include an intercept term. This is because any intercept term could be incorporated into the baseline hazard.

Cox Proportional Hazards Model The regression model for the hazard function (instantaneous incidence rate) as a function of p explanatory (X) variables is specified as follows: The regression model for the hazard function (instantaneous incidence rate) as a function of p explanatory (X) variables is specified as follows: log hazard: log h(t; X) = log h 0 (t) +  1 X 1 +  2 X 2 + … +  p X p log h(t; X) = log h 0 (t) +  1 X 1 +  2 X 2 + … +  p X phazard:

Cox Proportional Hazards Model Interpretation of h 0 (t): Interpretation of h 0 (t): Baseline hazard (incidence) rate as a function of time Baseline can be interpreted as when all X’s are zero – often must center continuous variables to make h 0 (t) interpretable Baseline can be interpreted as when all X’s are zero – often must center continuous variables to make h 0 (t) interpretable

Cox Proportional Hazards Model Interpretation of Interpretation of is the relative hazard associated with a 1 unit change in X1 (i.e., X 1 +1 vs. X 1 ), holding other Xs constant, independent of time is the relative hazard associated with a 1 unit change in X1 (i.e., X 1 +1 vs. X 1 ), holding other Xs constant, independent of time or, in relative risk terms, is the relative risk for X 1 +1 vs. X 1, holding other Xs constant, independent of time is the relative risk for X 1 +1 vs. X 1, holding other Xs constant, independent of time Other  s have similar interpretations Other  s have similar interpretations

Cox Proportional Hazards Model Note: Note: “multiplies” the baseline hazard h 0 (t) by the same amount regardless of the time t. This is therefore a “proportional hazards” model – the effect of any (fixed) X is the same at any time during follow-up

Cox Proportional Hazards Model Applying the formula relating S(t) to the cumulative hazard to the proportional hazards model, Applying the formula relating S(t) to the cumulative hazard to the proportional hazards model,

Cox Proportional Hazards Model  is the focus whereas h 0 (t) is a nuisance variable  is the focus whereas h 0 (t) is a nuisance variable David Cox (1972) showed how to estimate  without having to assume a model for h 0 (t) David Cox (1972) showed how to estimate  without having to assume a model for h 0 (t) “Semi-parametric” “Semi-parametric” h 0 (t) is the baseline hazard - “non-parametric” part of the model h 0 (t) is the baseline hazard - “non-parametric” part of the model  1,  2, …,  p are the regression coefficients - “parametric” part of the model  1,  2, …,  p are the regression coefficients - “parametric” part of the model Think of estimating h 0 (t) with a step function Think of estimating h 0 (t) with a step function Let # steps get large — “partial likelihood” for  depends on , not h 0 (t) Let # steps get large — “partial likelihood” for  depends on , not h 0 (t)

Partial likelihood The likelihood function used in Cox PH models is called a partial likelihood The likelihood function used in Cox PH models is called a partial likelihood We use only the part of the likelihood function that contains the  ’s We use only the part of the likelihood function that contains the  ’s It depends only on the ranks of the data and not the actual time values. It depends only on the ranks of the data and not the actual time values.

Partial likelihood Let the survival times (times to failure) be: Let the survival times (times to failure) be: t 1 < t 2 <... < t k And let the “risk sets” corresponding to these times be: And let the “risk sets” corresponding to these times be: R 1, R 2,..., R k R j = list of persons at risk just before t j Then, the “partial likelihood” for  is Then, the “partial likelihood” for  is (Assumes no ties in event times) (Assumes no ties in event times) To estimate , find the values of  s that maximize L(  ) above. To estimate , find the values of  s that maximize L(  ) above.

Partial likelihood Why does the partial likelihood make sense? Why does the partial likelihood make sense? Choose  so that the one who failed at each time was most likely - relative to others who might have failed! Choose  so that the one who failed at each time was most likely - relative to others who might have failed!

Some General Comments Thoughts Similar to logistic regression, a simple function of Similar to logistic regression, a simple function of the has a particularly nice interpretation the has a particularly nice interpretation can be interpreted as a relative risk (risk ratio) for a one unit change in the predictor can be interpreted as a relative risk (risk ratio) for a one unit change in the predictor

Some General Comments Thoughts Using the common methods of estimation, it can be shown that estimated regression parameters have an asymptotically normal distribution with mean and finite variance Using the common methods of estimation, it can be shown that estimated regression parameters have an asymptotically normal distribution with mean and finite variance

Some General Comments Thoughts Two important implications of asymptotic normality Two important implications of asymptotic normality We can use the likelihood ratio, score, and Wald tests to make inference about our data We can use the likelihood ratio, score, and Wald tests to make inference about our data We can use the usual method to construct a 95% confidence interval We can use the usual method to construct a 95% confidence interval

Confidence Intervals Instead of comparing a 49 year old to a 50 year old (a one unit difference in age), what if we want the hazard ratio and confidence interval comparing a 49 year old to a 59 year old? Instead of comparing a 49 year old to a 50 year old (a one unit difference in age), what if we want the hazard ratio and confidence interval comparing a 49 year old to a 59 year old?

Some General Comments Thoughts The Cox PH model is a regression model and we can use the usual tools for model building (e.g., stepwise methods or linearity of predictor via higher order terms) The Cox PH model is a regression model and we can use the usual tools for model building (e.g., stepwise methods or linearity of predictor via higher order terms)

Two Examples AML — one covariate AML — one covariate UIS — more than one covariate UIS — more than one covariate

Example 1: Cox PH model for AML data Semi-parametric model for the hazard (incidence) rate for the AML data Semi-parametric model for the hazard (incidence) rate for the AML data where h i (t) is the hazard for person i at week t, h 0 (t) is the hazard if X i = 0 (not maintained group), and is the multiplicative effect of X i =1 (maintained group)

Cox PH Model using SAS — AML

Example 1: Cox PH model for AML data (cont’d) = – relative rate of AML relapse maintained vs. not maintained = – relative rate of AML relapse maintained vs. not maintained 95% CI : (0.16, 1.23) 1/0.444 = 2.25 – relative rate of AML relapse not maintained vs. maintained 1/0.444 = 2.25 – relative rate of AML relapse not maintained vs. maintained 95% CI : (1/1.23, 1/0.16) = (0.81, 6.26)

Example 2: Cox PH model for UIS data Description of the variables from the UIS study in Table 1.3 of Hosmer, D.W. and Lemeshow, S. (1998) Applied Survival Analysis: Regression Modeling of Time to Event Data, John Wiley and Sons Inc., New York, NY Description of the variables from the UIS study in Table 1.3 of Hosmer, D.W. and Lemeshow, S. (1998) Applied Survival Analysis: Regression Modeling of Time to Event Data, John Wiley and Sons Inc., New York, NY This data set is available at This data set is available at select “datasets” and then “survival analysis”

Example 2: Cox PH model for UIS data (cont’d) We use Cox PH model to compare two treatment randomization assignments, controlling for several covariates We use Cox PH model to compare two treatment randomization assignments, controlling for several covariates Compare long treatment randomization assignment with short treatment randomization assignment Compare long treatment randomization assignment with short treatment randomization assignment Use time to drug relapse as the response variable Use time to drug relapse as the response variable Time variable is time from admission date to drug relapse or censoring due to the end of the study or lost to follow-up (the definition for variable CENSOR is questionable in the data set; however, we still use it as a demonstration.) Time variable is time from admission date to drug relapse or censoring due to the end of the study or lost to follow-up (the definition for variable CENSOR is questionable in the data set; however, we still use it as a demonstration.) Control for other risk factors in making the comparison Control for other risk factors in making the comparison

Cox PH Model using SAS — UIS

The Description of UIS data Data are in the file uissurv.dat n = 628 VariableDescriptionCodes/Values IDIdentification Code AGEAge at EnrollmentYears BECKTOTABeck Depression Score HERCOCHeroin/Cocaine Use During1 = Heroin & Cocaine 3 Months Prior to Admission2 = Heroin Only 3 = Cocaine Only 4 = Neither Heroin nor Cocaine IVHXHistory of IV Drug Use1 = Never 2 = Previous 3 = Recent

The Description of UIS data (cont’d) VariableDescriptionCodes/Values NDRUGTXNumber of Prior Drug Treatments RACESubject's Race0 = White 1 = Non-White TREATTreatment Randomization0 = Short Assignment1 = Long SITETreatment Site0 = A 1 = B LOSLength of Stay in TreatmentDays (Admission Date to Exit Date) TIMETime to Drug RelapseDays (Measured from Admission Date) CENSOREvent for Treating Lost to1 = Returned to Drugs or Follow-Up as Returned to Drugs Lost to Follow-Up 0 = Otherwise

Example 2: Cox PH model for UIS data (cont’d) Model 1: Model 1: log h(t) = log h 0 (t) +  1 TREAT Model 2: Model 2: log h(t) = log h 0 (t) +  1 TREAT +  2 AGE +  3 RACE +  4 BECKTOTA +  5 HERCOC.1 +  6 HERCOC.2 +  7 HERCOC.3 where HERCOC.1 = 1 if HERCOC = 1; = 0 otherwise, HERCOC.2 = 1 if HERCOC = 2; = 0 otherwise, HERCOC.2 = 1 if HERCOC = 2; = 0 otherwise, HERCOC.3 = 1 if HERCOC = 3; = 0 otherwise, HERCOC.3 = 1 if HERCOC = 3; = 0 otherwise,

Example 2: Cox PH model for UIS data (cont’d) What is the relative risk of drug relapse for the long treatment group compared to the short treatment group, adjusting for age and other risk factors? What is the relative risk of drug relapse for the long treatment group compared to the short treatment group, adjusting for age and other risk factors? e = – about 20% reduction in the risk of drug relapse for the patients in the long treatment randomization assignment compared with patients in the short treatment randomization assignment. e = – about 20% reduction in the risk of drug relapse for the patients in the long treatment randomization assignment compared with patients in the short treatment randomization assignment.

Example 2: Cox PH model for UIS data (cont’d) What is the interpretation of each coefficient? What is the interpretation of each coefficient? AGE — controlling for treatment assignment and other risk factors, the risk of drug relapse, as estimated from a Cox model, is 0.98 times lower per year of age AGE — controlling for treatment assignment and other risk factors, the risk of drug relapse, as estimated from a Cox model, is 0.98 times lower per year of age RACE — controlling for treatment assignment and other risk factors, the risk of drug relapse is 0.78 times lower for non- white compared with white RACE — controlling for treatment assignment and other risk factors, the risk of drug relapse is 0.78 times lower for non- white compared with white BACKTOTA — controlling for treatment assignment and other risk factors, the risk of drug relapse is 1.01 times higher per unit difference in Beck Depression score BACKTOTA — controlling for treatment assignment and other risk factors, the risk of drug relapse is 1.01 times higher per unit difference in Beck Depression score

Example 2: Cox PH model for UIS data (cont’d) HERCOC.1 — controlling for treatment assignment and other risk factors, the risk of drug relapse is times higher for patients who use Heroin and Cocaine compared with those who use neither Heroin nor Cocaine; however, this risk is not statistically different from 1 HERCOC.1 — controlling for treatment assignment and other risk factors, the risk of drug relapse is times higher for patients who use Heroin and Cocaine compared with those who use neither Heroin nor Cocaine; however, this risk is not statistically different from 1 HERCOC.2 — you do! HERCOC.2 — you do! HERCOC.3 — you do! HERCOC.3 — you do!

Example 2: Cox PH model for UIS data (cont’d) You must think about another way to deal with variable HERCOC since none of the dummy variables is significant. You must think about another way to deal with variable HERCOC since none of the dummy variables is significant. How to do it? How to do it? I randomly chose the covariates for the demonstration. To find a best model seriously, you need to go through the model selection. I randomly chose the covariates for the demonstration. To find a best model seriously, you need to go through the model selection.

Example 2: Cox PH model for UIS data (cont’d) What is the relative risk of drug relapse for What is the relative risk of drug relapse for (A) A short treatment randomization assigned 45-year old vs. (B) A long treatment randomization assigned 75 -year old

Example 2: Cox PH model for UIS data (cont’d) Log hazard for (A) Log hazard for (A) = const + 0 x ( ) + 45 x ( ) = const – Log hazard for (B) Log hazard for (B) = const + 1 x ( ) + 75 x ( ) = const – Difference in log hazards, (A) vs. (B): Difference in log hazards, (A) vs. (B): (const – ) – (const – ) = Relative Risk (A) vs. (B) Relative Risk (A) vs. (B) e = 2.19 – higher risk for younger, short treatment randomization assigned patient than for older, long treatment randomization assigned patient.

Example 2: Cox PH model for UIS data (cont’d) How much higher is the risk of a 70 years old patient compared with a 60 years old patient, assuming treatment and other risk factors are the same? How much higher is the risk of a 70 years old patient compared with a 60 years old patient, assuming treatment and other risk factors are the same? The estimated difference in log hazards for two patients whose ages differ by 10 years, holding other covariates fixed is The estimated difference in log hazards for two patients whose ages differ by 10 years, holding other covariates fixed is 10 x =10 x ( ) = RR = e = 0.83 – a ten year difference in the age decreases the risk of drug relapse by 20% How would you determine age modifies the risk of drug relapse for long treatment assignment vs. short treatment assignment? How would you determine age modifies the risk of drug relapse for long treatment assignment vs. short treatment assignment?