Additional Regression techniques Scott Harris October 2009.

Additional Regression techniques Scott Harris October 2009

2 Learning outcomes By the end of this session you should: be aware of 2 additional regression techniques: –Cox Regression –Logistic regression; know when these techniques are applicable; be able to interpret the results from these regression techniques.

3 Contents Cox Regression –Assumptions behind the model –Fitting Cox regression models in SPSS –Interpreting the model –Testing the assumptions Log-log plot Plots of partial residuals against rank time Logistic Regression –When to use it –‘How to’ in SPSS –Interpreting the output

Cox regression

5 Models time-to-event data in the presence of censored cases. Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. Can be extended to allow for time dependent covariates (not covered here). Also known as Cox Proportional Hazards model or Cox model.

6 Hazard functions Hazard

7 Hazard rates & ratios The hazard rate is the probability that if the event in question has not already occurred, it will occur in the next time interval, divided by the length of that interval. This time interval is made very short, so that in effect the hazard rate represents an instantaneous rate. The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group.

8 Cox regression: PH assumption Assumption of Proportional Hazards: The hazards are consistent and do not vary differently over time. Can be graphically assessed by looking at the Log-Log plot: If PH model is true then the curves should be approximately parallel. Can also examine the residuals (Schoenfeld residuals): If PH is true then the plot of the residuals should be horizontal and close to 0.

9 SPSS – Cox regression Analyze  Survival  Cox Regression…

10 SPSS – Cox regression

11 SPSS – Cox regression * Cox regression adjusted for age. COXREG Time /STATUS=Status(1) /CONTRAST (Group)=Indicator(1) /METHOD=ENTER Age Group /SAVE=PRESID /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).

12 Info: Cox regression in SPSS 1)From the menus select ‘Analyze’  ‘Survival’  ‘Cox Regression…’. 2)Put the variable containing the time into the ‘Time:’ box. 3)Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. 4)Add any other variables that you would like included in your model into the ‘Covariates:’ box. 5)If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. 6)Click the ‘Save…’ button and tick the ‘Partial Residuals’ option in the ‘Diagnostics’ box. Click ‘Continue’. 7)Click the ‘Options’ button and tick the ‘CI for exp( β ):’ option in the ‘Model Statistics’ box. Click ‘Continue’. 8)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

13 SPSS – Cox regression: Output This table Here the reference category was set up as the first level, which here sets Group A as the reference. Hazard ratio for being in Group B, relative to Group A (reference) with CI and p value. in conjunction with how the contrast was set up defines how you should interpret the output for the categorical variables. Hazard ratio for each unit increase in Age with CI and p value.

14 Hazard ratio (95% CI)p value Age1.78 (1.21, 2.61)0.003 Group B8.80 (1.34, 57.94)0.024 Here you can see that the hazard is 78% higher for each additional year of age and this effect is highly significant (p=0.003). Having adjusted for age however there appears to be a very clear difference between the groups with a hazard ratio for Group B relative to Group A of 8.80 (95% CI: 1.34 to 57.94) (p=0.024). Notice that this confidence interval is very wide and that the lower limit suggests that the true hazard ratio may be as low as 1.34. SPSS – Cox regression

15 Hazard ratio (95% CI)p value Group B2.56 (0.74, 8.82)0.136 If we take Age out of the model then the effect of the groups is reduced with Group B having an increased hazard ratio relative to Group A of 2.56 (95% CI: 0.74 to 8.82), which is now not statistically significant at the 5% level (p=0.136). Model selection for Survival models is as important as it is for other modelling procedures and needs to be thought about carefully. SPSS – Cox regression

16 The PH assumption: Log-log plot The log-log plot is one way to assess graphically whether the assumption of proportional hazards was reasonable. For the assumption to hold then the log- log plot should show the separate lines as approximately parallel to each other.

17 SPSS – The PH assumption: Log-log plot To produce an accurate log-log plot in SPSS you need to define the categorical variable as a Strata. * Log-log plot. COXREG Time /STATUS=Status(1) /STRATA=Group /METHOD=ENTER Age /PLOT LML /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).

18 Info: Cox regression: Log-log plot in SPSS 1)Follow the information sheet on producing a Cox regression, but stop after point 5. 2)To produce the Log-log plot we need to remove the most important categorical variable from the ‘Covariates:’ box and put it into the ‘Strata:’ box instead. This variable is quite often the groups that we are looking to compare. 3)Once a variable is in the ‘Strata:’ box, click on the ‘Plots…’ button. Tick the option for the ‘Log minus log’ plot in the ‘Plot Type’ box. Click ‘Continue’. 4)Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file.

19 SPSS – The PH assumption: Log-log plot Not enough cases in each strata  Dataset too small

20 SPSS – Cox regression: Aside Aside: Strata Fitting the group variable as a strata instead of as a covariate, with no other covariates in the model, replicates the Kaplan-Meier plot if we ask for the survival plot.

21 SPSS – The PH assumption: Residual plots Plot each of the residuals against rank time. If the PH assumption has not been violated then each of the plots: –Should not show a clear trend over time (i.e. not drastically increasing or decreasing). –It should also be centered close to 0. * Creating the ranks. RANK VARIABLES=Time (A) /RANK /PRINT=YES /TIES=MEAN. * Producing the scatter graphs. GRAPH /SCATTERPLOT(BIVAR)=RTime WITH PR1_1 /MISSING=LISTWISE. GRAPH /SCATTERPLOT(BIVAR)=RTime WITH PR2_1 /MISSING=LISTWISE.

22 Info: Cox regression: Residual plots in SPSS 1)Follow the information sheet on producing a Cox regression all the way through until the end. This will save a new set of variables to the dataset that contain the residuals (you will get 1 residual for each covariate in the model and they will start with PR). 2)We now need to produce a rank time variable. To do this we need to go to ‘Transform’  ‘Rank Cases’. –Now put the time variable into the ‘Variable(s):’ box. –Click ‘OK’ to produce the ranks or ‘Paste’ to add the syntax for this into your syntax file. 3)Now we have the 2 elements to produce the scatter plots. To draw the scatterplots we go to: ‘Graphs’  ‘Scatter/Dot…’ then select ‘Simple Scatter’ and click ‘Define’. Put the new rank time on the x axis and each of the residual variables in turn on the y axis. 4)Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file. 5)You can now edit the plot to improve presentation (see Introduction course notes). It is often useful to add a horizontal reference line at 0 to aid interpretation.

23 SPSS – The PH assumption: Residual plots These plots don’t seem to indicate any obvious trend and are generally centered close to zero, but we are dealing with a very small example dataset here.

Logistic regression

25 Logistic regression Logistic regression is used when the outcome variable is binary (is categorical and has 2 levels). Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. The modeling is conducted on the log odds scale but the results should be presented on the odds scale (see categorical notes). Can be extended to deal with outcomes with more than 2 levels. These models are known as multinomial or ordinal regression (not covered here).

26 SPSS – Logistic regression Analyze  Regression  Binary Logistic… Binary outcome variable All other covariates

27 SPSS – Logistic regression… If you have any categorical variables then you need to use the ‘Categorical…’ option to set up how to deal with these. ln_yesno is a binary yes/no variable so we move it into the ‘Categorical Covariates:’ box.

28 SPSS – Logistic regression… For each categorical variable you now need to set up up which level will be the reference category. Here ‘No’ is the first category (the lowest code) and so we set this as the reference. Right click and select ‘Variable information’

29 SPSS – Logistic regression… Go into the options and tick the box for confidence intervals for the odds ratios.

30 Info: Logistic Regression in SPSS 1)From the menus select ‘Analyze’  ‘Regression’  ‘Binary Logistic…’. 2)Put the variable containing the binary outcome into the ‘Dependent:’ box. 3)Add all other variables that you would like included in your model into the ‘Covariates:’ box. 4)If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. 5)Click the ‘Options’ button and tick the ‘CI for exp( β ):’ option in the ‘Statistics and Plots’ box. Click ‘Continue’. 6)Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file.

31 SPSS Logistic Regression: Output Information on the amount of data used in the analysis. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Convergence information.

32 SPSS Logistic Regression: Output… P values. Odds ratios. 95% confidence intervals for the odds ratios. Interpretation: Having adjusted for lymph node involvement each additional year of age increases the odds of mortality within 5 years by a factor of 0.99 (95% CI 0.97 to 1.01), although this was not statistically significant (p=0.375). Having adjusted for age, subjects with lymph node involvement have their odds of mortality in 5 years increased by a factor of 2.65 (95% CI 1.49 to 4.72) compared to those with no lynph node involvement. This effect was highly statistically significant (p=0.001).

33 Summary You should now: be aware of 2 additional regression techniques: –Cox Regression –Logistic regression; know when these techniques are applicable; be able to interpret the results from these regression techniques.

34 References Practical Statistics for medical research, D Altman: Chapter 13. Medical Statistics, B Kirkwood, J Stern: Chapter 26. An introduction to medical statistics, M Bland: Chapter 15.6. Survival analysis specific texts Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning Text, Springer-Verlag Publishers, 2005. Parmar M. K. B., Machin D., Survival analysis: a practical approach, Wiley, 1995.

Additional Regression techniques Scott Harris October 2009.

Similar presentations

Presentation on theme: "Additional Regression techniques Scott Harris October 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Additional Regression techniques Scott Harris October 2009.

Similar presentations

Presentation on theme: "Additional Regression techniques Scott Harris October 2009."— Presentation transcript:

Similar presentations

About project

Feedback