We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
Econ 140 Lecture 81 Classical Regression II Lecture 8.
HSRP 734: Advanced Statistical Methods July 24, 2008.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multiple regression analysis
Chapter 11 Survival Analysis Part 3. 2 Considering Interactions Adapted from "Anderson" leukemia data as presented in Survival Analysis: A Self-Learning.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Two-Way ANOVA in SAS Multiple regression with two or
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 8: Fitting Parametric Regression Models STT
Testing Distributions Section Starter Elite distance runners are thinner than the rest of us. Skinfold thickness, which indirectly measures.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Hypothesis testing Chapter 9. Introduction to Statistical Tests.
STT : Biostatistics Analysis Dr. Cuixian Chen
Lecture 4 SIMPLE LINEAR REGRESSION.
Assessing Survival: Cox Proportional Hazards Model
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CHAPTER 14 MULTIPLE REGRESSION
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
01/20151 EPI 5344: Survival Analysis in Epidemiology Interpretation of Models March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Multinomial Distribution
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
HSRP 734: Advanced Statistical Methods July 31, 2008.
CHAPTER 11 SECTION 2 Inference for Relationships.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
© Copyright McGraw-Hill 2000
Lecture 12: Cox Proportional Hazards Model
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
STT : BIOSTATISTICS ANALYSIS Dr. Cuixian Chen Chapter 7: Parametric Survival Models under Censoring STT
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
© Copyright McGraw-Hill 2004
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
01/20151 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 24, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Section 6.4 Inferences for Variances. Chi-square probability densities.
01/20141 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models April 1, 2014 Dr. N. Birkett, Department of Epidemiology & Community.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
ANOVA and Multiple Comparison Tests
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
03/20161 EPI 5344: Survival Analysis in Epidemiology Estimating S(t) from Cox models March 29, 2016 Dr. N. Birkett, School of Epidemiology, Public Health.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
BINARY LOGISTIC REGRESSION
April 18 Intro to survival analysis Le 11.1 – 11.2
Choice modelling - an introduction
CHAPTER 29: Multiple Regression*
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Parametric Survival Models (ch. 7)
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Simple Linear Regression
Presentation transcript:

We’ll now look at the relationship between a survival variable Y and an explanatory variable X; e.g., Y could be remission time in a leukemia study and X could be white blood cell count. X is sometimes called the covariate or the regressor variable. Often there are more than just one X variables so we write X T =(X 1, … X p ) when there are p explanatory variables. (T=transpose). We write Y x for the response Y when X=x. Def 8.1: Let Y x denote the response depending on an observed vector X=x. A proportional hazards model for Y x is h x (y)=h 0 (y)g 1 (x), where g is a postive function of x and h 0 (y) is called the baseline hazard and represents the hazard function for an individual having g 1 (x)=1. Often g 1 (x)=exp(   x 1 +…+  p x p )

Note how the “proportional” enters the picture (see p. 144 for definitions): The two hazards are for two different individuals, distinguished by the values the explanatory variables take on for them…note that the “baseline” hazard cancels out In the simplest case, we work with the situation where the g 1 function is exp(   x 1 +…+  p x p ) - it satisfies the properties g 1 (x) ≥ 0 and g 1 (0) = 1 and the baseline hazard occurs when x=0. The process of fitting this model follows the usual process of finding the best estimates of the beta values…

Then the standard proportional hazards model in Def. 8.1 becomes: h x (y)=h 0 (y) exp(   x 1 +…+  p x p ) Then the baseline hazard is when x=0 (all covariates=0) We’ll then estimate the betas using the given responses and covariates… NOTE: The hazard on the left equals the product of two functions: the baseline hazard (which doesn’t involve the covariates) and the other factor (which doesn’t involve the survival time y). This is called the Cox proportional hazards model and good estimates of the betas and the hazard and survival curves can be obtained in many different and varied situations ; i.e., this model is very robust. It is called semiparametric since we don’t have to assume a particular model for the survival function.

Let’s look Example 8.1, where there is only one covariate, namely “group” (usually control and experimental are the only two values). The proportional hazard (or the hazard ratio) is So, if we could get an estimate of  call it  -hat), we could then have an estimate of the hazard ratio between two individuals in the two groups ; i.e., exp(  -hat) so we could say that

Note on page 145 in (8.3) that the proportional hazards model has a so-called “power” effect on the baseline survival function: Here Example 8.1 shows the effect of a single covariate X=group: Notice also that the ratio of two hazards cancels out the baseline hazard and leaves a function that is constant over time.

SAS has a procedure that easily estimates the betas in the proportional hazards model - for example, in the remission times data: proc phreg; model remtime*censor(0)=grp; run; /* or if we put a second covariate in */ proc phreg; model remtime*censor(0)=grp logWBC; run; /*note the use of the numeric variable grp defined as grp=1 if group=“pl” and 0 otherwise… */

Now let’s consider the remission data example in more detail…get the SAS output for the 3 models: –grp only (model 1) –grp and logWBC (model 2) –grp, logWBC, and interaction term grp*logWBC (model 3) For each model, we’ll do three things: –do a statistical test of the null hypothesis beta=0 –get an estimate of the hazard ratio for each beta –get a 95% confidence interval for the hazard ratio There are two statistics we can compute to do a significance test of the betas: –the Wald statistic is the quotient of the estimator (beta-hat) divided by the standard error of the estimator. This statistic is approximately standard normal and the p-value is obtained from the normal table.

–the second statistic is the so-called likelihood ratio (LR) statistic and is used to compare the models Use this statistic to compare model 3 with model 2; i.e., is the interaction term significant? –the Wald statistic is -.342/.520 = The null hypothesis being tested is that beta=0 (for the coefficient of the interaction term) Use the normal table to see that 2*P(Z<-.66)=2(.2554)=.5108 –the LR statistic is computed as the difference between LRs of the two models, LR(model 2) - LR(model 3) = =.428. Now consider this as chi-square with 1 d.f. (one parameter difference between the two models) under the null hypothesis that the interaction term has coefficient zero and we have P(chisq(1) >.428) =.513

Notice that in each of the three printouts, there is a section giving values of a three test statistics testing the so-called “Global Null Hypothesis: BETA=0”. In this case, the BETA=0 refers to the vector of all the betas:The likelihood ratio chi-square statistic is obtained from the two - 2LOG(L) statistics subtracted (the one w/out covariates {no x’s} minus the one with covariates). If the null hypothesis is true, then this chi-square will have d.f. equal to the number of covariates in the model. This same difference in log(likelihoods) can be used to compare any two models - the statistic is chi- square with the number of d.f. is the difference in # of covariates, assuming the null hypothesis of the “extra” betas = 0 is true.

Now let’s look at the HRs in each of the three models… In model 1, the HR is estimated to be (from SAS). Let’s see how this is done… we’ve seen that so if X=1 is the placebo group, then the maximum likelihood estimate of beta = (from SAS), so exp( ) = is the estimated hazard ratio. This means that the hazard for an individual in the placebo group is more than 4.5 times greater than an individual in the treatment group (at all times) ignoring logWBC.

Consider Model 2’s hazard ratios… and If we had a significant interaction term the estimated HR could be

To get confidence intervals around the estimated HRs, we use the Wald statistic +/ * SE(Wald) to get confidence intervals for the beta-hats - then exponentiate the interval to get Cis for the HRs. To get the adjusted survival curves for the two groups (adjusted for the covariates - i.e., use the model 2), we use the baseline option in proc phreg proc phreg; model remtime*censor(0)=grp logWBC; title “Model 2”; baseline out=a survival=s upper=ucl lower=lcl ; proc print data=a; run; quit;

To get the adjusted survival curves for specific values of the covariates, first create a dataset with the values you want to consider and then use the covariate option as follows: … data b; grp=1; logWBC=2.93; run; … proc phreg data=remission; model remtime*censor(0)=grp logWBC; baseline out=a survival=s upper=ucl lower=lcl covariates=b/nomean; proc print data=a; run; quit;