Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Statistical Analysis SC504/HS927 Spring Term 2008
Multiple Regression and Model Building
Qualitative predictor variables
Logistic Regression Psy 524 Ainsworth.
HSRP 734: Advanced Statistical Methods July 24, 2008.
SC968: Panel Data Methods for Sociologists
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Cox Regression II. Monday “Gut Check” Problem… Write out the likelihood for the following data, with weight as a time-dependent variable: Time-to-event.
the Cox proportional hazards model (Cox Regression Model)
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
An Introduction to Logistic Regression
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation and Regression Analysis
Model Checking in the Proportional Hazard model
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Regression Analysis, Chapter 13,
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Survival Analysis in SAS
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Building multivariable survival models with time-varying effects: an approach using fractional polynomials Willi Sauerbrei Institut of Medical Biometry.
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Assessing Survival: Cox Proportional Hazards Model
Time-dependent covariates and further remarks on likelihood construction Presenter Li,Yin Nov. 24.
Cox Regression II Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy Kristin Sainani Ph.D.
Simple Linear Regression. Deterministic Relationship If the value of y (dependent) is completely determined by the value of x (Independent variable) (Like.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Linear correlation and linear regression + summary of tests
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Survival Analysis 1 Always be contented, be grateful, be understanding and be compassionate.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Lecture 12: Cox Proportional Hazards Model
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
1 “The Effects of Sociodemographic Factors on the Hazard of Dying Among Aged Chinese Males and Females” Dudley L. Poston, Jr. and Hosik Min Department.
Logistic Regression and Odds Ratios Psych DeShon.
Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.
Nonparametric Statistics
A little VOCAB.  Causation is the "causal relationship between conduct and result". That is to say that causation provides a means of connecting conduct.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Additional Regression techniques Scott Harris October 2009.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
03/20161 EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 5, 2016 Dr. N. Birkett, School of Epidemiology, Public.
Nonparametric Statistics
Chapter 15 Multiple Regression Model Building
Logistic Regression When and why do we use logistic regression?
Survival curves We know how to compute survival curves if everyone reaches the endpoint so there is no “censored” data. Survival at t = S(t) = number still.
A little VOCAB.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Statistics 262: Intermediate Biostatistics
Multiple logistic regression
Diagnostics and Transformation for SLR
Nonparametric Statistics
Statistics 262: Intermediate Biostatistics
Introduction to Logistic Regression
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Diagnostics and Transformation for SLR
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed minus predicted” residual of linear regression Martingale … Deviance ... Schoenfeld

Schoenfeld residuals Schoenfeld (1982) proposed the first set of residuals for use with Cox regression packages Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241. Schoenfeld residuals are not defined for censored individuals. Instead of a single residual for each individual, there is a separate residual for each individual for each covariate

Schoenfeld residuals The Schoenfeld residual is defined as the covariate value for the individual that failed minus its expected value. Yields residuals for each individual who failed, for each covariate. The expected value of the covariate at time t is a weighted average of the covariate, weighted by the likelihood of failure for each individual in the risk set at t.

Schoenfeld residuals

The function describing the failure pattern is the product of J terms, one for each observed failure time

Schoenfeld residuals

Schoenfeld residuals

Example 5 people left in our risk set at event time=7 months: Female 55-year old smoker Male 45-year old non-smoker Female 67-year old smoker Male 58-year old smoker Male 70-year old non-smoker The 55-year old female smoker is the one who has the event…

Example Based on our model, we can calculate a predicted probability of death by time 7 for each person (call it “p-hat”): Female 55-year old smoker: p-hat=.10 Male 45-year old non-smoker : p-hat=.05 Female 67-year old smoker : p-hat=.30 Male 58-year old smoker : p-hat=.20 Male 70-year old non-smoker : p-hat=.30 Thus, the expected value for the AGE of the person who failed is: 55(.10) + 45 (.05) + 67(.30) + 58 (.20) + 70 (.30)= 60 And, the Schoenfeld residual is: 55-60 = -5

Example Based on our model, we can calculate a predicted probability of death by time 7 for each person (call it “p-hat”): Female 55-year old smoker: p-hat=.10 Male 45-year old non-smoker : p-hat=.05 Female 67-year old smoker : p-hat=.30 Male 58-year old smoker : p-hat=.20 Male 70-year old non-smoker : p-hat=.30 The expected value for the GENDER of the person who failed is: 0(.10) + 1(.05) + 0(.30) + 1 (.20) + 1 (.30)= .55 And, the Schoenfeld residual is: 0-.55 = -.55

Schoenfeld residuals Since the Schoenfeld residuals are, in principle, independent of time, a plot that shows a non-random pattern against time is evidence of violation of the PH assumption. Plot Schoenfeld residuals against time to evaluate PH assumption Regress Schoenfeld residuals against time to test for independence between residuals and time.

TEST OF PH ASSUMPTION If the impact of an independent variable meets the proportional hazard assumption, the smoothed values of a quantity called scaled Schoenfeld residuals would be roughly horizontal when plotted against survival time. Grambsch and Therneau (1994) demonstrated that a test of non-zero slope in a weighted regression of the residuals upon time can test for non-proportional hazard. Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994; 81: 515-526.

Example: no pattern with time

Example: violation of PH

Summary of the many ways to evaluate PH assumption… 1. Examine log(-log(S(t)) plots PH assumption is supported by parallel lines and refuted by lines that cross or nearly cross Must use categorical predictors or categories of a continuous predictor

Summary of the many ways to evaluate PH assumption… 2. Include interaction with time in the model PH assumption is supported by non-significant interaction coefficient and refuted by significant interaction coefficient Retaining the interaction term in the model corrects for the violation of PH Don’t complicate your model in this way unless it’s absolutely necessary!

Testing Proportional Hazards λ(t) = λ0(t) exp{ β1 age + β2 drug} exp{ β1age+β2drug+β3age*ln(t)+β4 drug *ln(t)} Look at p-values associated with β3 and β4 (Wald tests) Do a partial likelihood ratio test comparing the two models

Summary of the many ways to evaluate PH assumption… 3. Plot Schoenfeld residuals PH assumption is supported by a random pattern with time and refuted by a non-random pattern

Summary of the many ways to evaluate PH assumption… 4. Regress Schoenfeld residuals against time to test for independence between residuals and time. PH assumption is supported by a non-significant relationship between residuals and time, and refuted by a significant relationship

Checking the proportional hazards assumption of the COX model using Schoenfeld residuals: R code: Cox.resid<-cox.zph(Cox.fit) plot(Cox.resid) R output rho chisq p mut 0.105 0.798 0.372 sex 0.130 1.139 0.286 GLOBAL NA 2.025 0.363