Additional Regression techniques

Slides:



Advertisements
Similar presentations
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Inference for Regression
Analysis of variance (ANOVA)-the General Linear Model (GLM)
HSRP 734: Advanced Statistical Methods July 24, 2008.
Outliers Split-sample Validation
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Proportional Hazard Regression Cox Proportional Hazards Modeling (PROC PHREG)
Outliers Split-sample Validation
Assumption of Homoscedasticity
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
STAT E-150 Statistical Methods
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Assessing Survival: Cox Proportional Hazards Model
12e.1 ANOVA Within Subjects These notes are developed from “Approaching Multivariate Analysis: A Practical Introduction” by Pat Dugard, John Todman and.
Examining Relationships in Quantitative Research
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
1.Introduction to SPSS By: MHM. Nafas At HARDY ATI For HNDT Agriculture.
Time-to-event Basic Medical Statistics Course: Module C October 2010 Wilma Heemsbergen.
Conduct Simple Correlations Section 7. Correlation –A Pearson correlation analyzes relationships between parametric, linear (interval or ratio which are.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Comparing Proportions & Analysing Categorical Data Scott Harris October 2009.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Additional Regression techniques Scott Harris October 2009.
Survival Analysis: An Introductory Course Scott Harris October 2009.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Analysis of Variance (ANOVA) Scott Harris October 2009.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Practical Solutions Additional Regression techniques.
Stats Methods at IC Lecture 3: Regression.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Introduction to SPSS SOCI 301 Lab session.
BINARY LOGISTIC REGRESSION
Probability and Statistics
Logistic Regression APKC – STATS AFAC (2016).
April 18 Intro to survival analysis Le 11.1 – 11.2
An Interactive Tutorial for SPSS 10.0 for Windows©
Notes on Logistic Regression
Applied Biostatistics: Lecture 2
Statistical Inference for more than two groups
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Analysis of Covariance (ANCOVA)
Statistics 262: Intermediate Biostatistics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistical Analysis using SPSS
Logistic Regression.
Introduction to Logistic Regression
Program This course will be dived into 3 parts: Part 1 Descriptive statistics and introduction to continuous outcome variables Part 2 Continuous outcome.
How to Start This PowerPoint® Tutorial
Introduction to SAS Essentials Mastering SAS for Data Analytics
Individual Assignment 6
Exercise 1 (a): producing individual tables, using the cross-tabs menu
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Chapter 13 Excel Extension: Now You Try!
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Additional Regression techniques © Scott Harris University of Southampton Additional Regression techniques Please use the dd month yyyy format for the date for example 11 January 2008. The main title can be one or two lines long. Scott Harris October 2009 Additional Regression Techniques

© Scott Harris University of Southampton Learning outcomes By the end of this session you should: be aware of 2 additional regression techniques: Cox Regression Logistic regression; know when these techniques are applicable; be able to interpret the results from these regression techniques. Additional Regression Techniques

© Scott Harris University of Southampton Contents Cox Regression Assumptions behind the model Fitting Cox regression models in SPSS Interpreting the model Testing the assumptions Log-log plot Plots of partial residuals against rank time Logistic Regression When to use it ‘How to’ in SPSS Interpreting the output Additional Regression Techniques

© Scott Harris University of Southampton Cox regression Use divider pages to break up your presentation into logical sections and to provide a visual break for the viewer. The title can be one or two lines long. Additional Regression Techniques

© Scott Harris University of Southampton Cox regression Models time-to-event data in the presence of censored cases. Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. Can be extended to allow for time dependent covariates (not covered here). Also known as Cox Proportional Hazards model or Cox model. Additional Regression Techniques

© Scott Harris University of Southampton Hazard functions Hazard Additional Regression Techniques

© Scott Harris University of Southampton Hazard rates & ratios The hazard rate is the probability that if the event in question has not already occurred, it will occur in the next time interval, divided by the length of that interval. This time interval is made very short, so that in effect the hazard rate represents an instantaneous rate. The hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group. Additional Regression Techniques

Cox regression: PH assumption © Scott Harris University of Southampton Cox regression: PH assumption Assumption of Proportional Hazards: The hazards are consistent and do not vary differently over time. Can be graphically assessed by looking at the Log-Log plot: If PH model is true then the curves should be approximately parallel. Can also examine the residuals (Schoenfeld residuals): If PH is true then the plot of the residuals should be horizontal and close to 0. Additional Regression Techniques

© Scott Harris University of Southampton SPSS – Cox regression Analyze  Survival  Cox Regression… Additional Regression Techniques

© Scott Harris University of Southampton SPSS – Cox regression Additional Regression Techniques

© Scott Harris University of Southampton SPSS – Cox regression * Cox regression adjusted for age . COXREG Time /STATUS=Status(1) /CONTRAST (Group)=Indicator(1) /METHOD=ENTER Age Group /SAVE=PRESID /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) . Additional Regression Techniques

Info: Cox regression in SPSS © Scott Harris University of Southampton Info: Cox regression in SPSS From the menus select ‘Analyze’  ‘Survival’  ‘Cox Regression…’. Put the variable containing the time into the ‘Time:’ box. Put the categorical variable, that indicates whether a case had the event of interest or not into the ‘Status:’ box. Then click the ‘Define Event…’ button and enter the single value or range of values that all indicate that the event occurred. Click ‘Continue’. Add any other variables that you would like included in your model into the ‘Covariates:’ box. If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. Click the ‘Save…’ button and tick the ‘Partial Residuals’ option in the ‘Diagnostics’ box. Click ‘Continue’. Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the ‘Model Statistics’ box. Click ‘Continue’. Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file. Additional Regression Techniques

SPSS – Cox regression: Output © Scott Harris University of Southampton SPSS – Cox regression: Output This table in conjunction with how the contrast was set up defines how you should interpret the output for the categorical variables. Here the reference category was set up as the first level, which here sets Group A as the reference. Hazard ratio for each unit increase in Age with CI and p value. Hazard ratio for being in Group B, relative to Group A (reference) with CI and p value. Additional Regression Techniques

© Scott Harris University of Southampton SPSS – Cox regression Hazard ratio (95% CI) p value Age 1.78 (1.21, 2.61) 0.003 Group B 8.80 (1.34, 57.94) 0.024 Here you can see that the hazard is 78% higher for each additional year of age and this effect is highly significant (p=0.003). Having adjusted for age however there appears to be a very clear difference between the groups with a hazard ratio for Group B relative to Group A of 8.80 (95% CI: 1.34 to 57.94) (p=0.024). Notice that this confidence interval is very wide and that the lower limit suggests that the true hazard ratio may be as low as 1.34. Additional Regression Techniques

© Scott Harris University of Southampton SPSS – Cox regression Hazard ratio (95% CI) p value Group B 2.56 (0.74, 8.82) 0.136 If we take Age out of the model then the effect of the groups is reduced with Group B having an increased hazard ratio relative to Group A of 2.56 (95% CI: 0.74 to 8.82), which is now not statistically significant at the 5% level (p=0.136). Model selection for Survival models is as important as it is for other modelling procedures and needs to be thought about carefully. Additional Regression Techniques

The PH assumption: Log-log plot © Scott Harris University of Southampton The PH assumption: Log-log plot The log-log plot is one way to assess graphically whether the assumption of proportional hazards was reasonable. For the assumption to hold then the log-log plot should show the separate lines as approximately parallel to each other. Additional Regression Techniques

SPSS – The PH assumption: Log-log plot © Scott Harris University of Southampton SPSS – The PH assumption: Log-log plot To produce an accurate log-log plot in SPSS you need to define the categorical variable as a Strata. * Log-log plot . COXREG Time /STATUS=Status(1) /STRATA=Group /METHOD=ENTER Age /PLOT LML /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) . Additional Regression Techniques

Info: Cox regression: Log-log plot in SPSS © Scott Harris University of Southampton Info: Cox regression: Log-log plot in SPSS Follow the information sheet on producing a Cox regression, but stop after point 5. To produce the Log-log plot we need to remove the most important categorical variable from the ‘Covariates:’ box and put it into the ‘Strata:’ box instead. This variable is quite often the groups that we are looking to compare. Once a variable is in the ‘Strata:’ box, click on the ‘Plots…’ button. Tick the option for the ‘Log minus log’ plot in the ‘Plot Type’ box. Click ‘Continue’. Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file. Additional Regression Techniques

SPSS – The PH assumption: Log-log plot © Scott Harris University of Southampton SPSS – The PH assumption: Log-log plot Not enough cases in each strata  Dataset too small Additional Regression Techniques

SPSS – Cox regression: Aside © Scott Harris University of Southampton SPSS – Cox regression: Aside Aside: Strata Fitting the group variable as a strata instead of as a covariate, with no other covariates in the model, replicates the Kaplan-Meier plot if we ask for the survival plot. Additional Regression Techniques

SPSS – The PH assumption: Residual plots © Scott Harris University of Southampton SPSS – The PH assumption: Residual plots Plot each of the residuals against rank time. If the PH assumption has not been violated then each of the plots: Should not show a clear trend over time (i.e. not drastically increasing or decreasing). It should also be centered close to 0. * Producing the scatter graphs . GRAPH /SCATTERPLOT(BIVAR)=RTime WITH PR1_1 /MISSING=LISTWISE . /SCATTERPLOT(BIVAR)=RTime WITH PR2_1 * Creating the ranks . RANK VARIABLES=Time (A) /RANK /PRINT=YES /TIES=MEAN . Additional Regression Techniques

Info: Cox regression: Residual plots in SPSS © Scott Harris University of Southampton Info: Cox regression: Residual plots in SPSS Follow the information sheet on producing a Cox regression all the way through until the end. This will save a new set of variables to the dataset that contain the residuals (you will get 1 residual for each covariate in the model and they will start with PR). We now need to produce a rank time variable. To do this we need to go to ‘Transform’  ‘Rank Cases’. Now put the time variable into the ‘Variable(s):’ box. Click ‘OK’ to produce the ranks or ‘Paste’ to add the syntax for this into your syntax file. Now we have the 2 elements to produce the scatter plots. To draw the scatterplots we go to: ‘Graphs’  ‘Scatter/Dot…’ then select ‘Simple Scatter’ and click ‘Define’. Put the new rank time on the x axis and each of the residual variables in turn on the y axis. Finally click ‘OK’ to produce the plot or ‘Paste’ to add the syntax for this into your syntax file. You can now edit the plot to improve presentation (see Introduction course notes). It is often useful to add a horizontal reference line at 0 to aid interpretation. Additional Regression Techniques

SPSS – The PH assumption: Residual plots © Scott Harris University of Southampton SPSS – The PH assumption: Residual plots These plots don’t seem to indicate any obvious trend and are generally centered close to zero, but we are dealing with a very small example dataset here. Additional Regression Techniques

© Scott Harris University of Southampton Logistic regression Use divider pages to break up your presentation into logical sections and to provide a visual break for the viewer. The title can be one or two lines long. Additional Regression Techniques

© Scott Harris University of Southampton Logistic regression Logistic regression is used when the outcome variable is binary (is categorical and has 2 levels). Allows the inclusion of predictor variables (covariates). These can be categorical or continuous. The modeling is conducted on the log odds scale but the results should be presented on the odds scale (see categorical notes). Can be extended to deal with outcomes with more than 2 levels. These models are known as multinomial or ordinal regression (not covered here). Additional Regression Techniques

SPSS – Logistic regression © Scott Harris University of Southampton SPSS – Logistic regression Binary outcome variable All other covariates Analyze  Regression  Binary Logistic… Additional Regression Techniques

SPSS – Logistic regression… © Scott Harris University of Southampton SPSS – Logistic regression… If you have any categorical variables then you need to use the ‘Categorical…’ option to set up how to deal with these. ln_yesno is a binary yes/no variable so we move it into the ‘Categorical Covariates:’ box. Additional Regression Techniques

SPSS – Logistic regression… © Scott Harris University of Southampton SPSS – Logistic regression… Right click and select ‘Variable information’ For each categorical variable you now need to set up up which level will be the reference category. Here ‘No’ is the first category (the lowest code) and so we set this as the reference. Additional Regression Techniques

SPSS – Logistic regression… © Scott Harris University of Southampton SPSS – Logistic regression… Go into the options and tick the box for confidence intervals for the odds ratios. Go into the options and tick the box for confidence intervals for the odds ratios. Go into the options and tick the box for confidence intervals for the odds ratios. Additional Regression Techniques

Info: Logistic Regression in SPSS © Scott Harris University of Southampton Info: Logistic Regression in SPSS From the menus select ‘Analyze’  ‘Regression’  ‘Binary Logistic…’. Put the variable containing the binary outcome into the ‘Dependent:’ box. Add all other variables that you would like included in your model into the ‘Covariates:’ box. If any of the variables that were included in the ‘Covariates:’ box are categorical then click the ‘Categorical…’ button. Each of these variables then need to be moved to the ‘Categorical Covariates:’ box. In the ‘Change Contrast’ box decide, for each variable, whether the reference category should be either the first or last level and make any changes if appropriate. Click ‘Continue’. Click the ‘Options’ button and tick the ‘CI for exp(β):’ option in the ‘Statistics and Plots’ box. Click ‘Continue’. Finally click ‘OK’ to produce the test results or ‘Paste’ to add the syntax for this into your syntax file. Additional Regression Techniques

SPSS Logistic Regression: Output © Scott Harris University of Southampton SPSS Logistic Regression: Output Information on the amount of data used in the analysis. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Very important as this identifies the level of the binary outcome that is being modelled. Here the higher level is 1 which was used to indicate subjects who died within 5 years and so this is what our model will be looking at. Convergence information. Additional Regression Techniques

SPSS Logistic Regression: Output… © Scott Harris University of Southampton SPSS Logistic Regression: Output… P values. 95% confidence intervals for the odds ratios. Odds ratios. Interpretation: Having adjusted for lymph node involvement each additional year of age increases the odds of mortality within 5 years by a factor of 0.99 (95% CI 0.97 to 1.01), although this was not statistically significant (p=0.375). Having adjusted for age, subjects with lymph node involvement have their odds of mortality in 5 years increased by a factor of 2.65 (95% CI 1.49 to 4.72) compared to those with no lynph node involvement. This effect was highly statistically significant (p=0.001). Additional Regression Techniques

© Scott Harris University of Southampton Summary You should now: be aware of 2 additional regression techniques: Cox Regression Logistic regression; know when these techniques are applicable; be able to interpret the results from these regression techniques. Additional Regression Techniques

© Scott Harris University of Southampton References Practical Statistics for medical research, D Altman: Chapter 13. Medical Statistics, B Kirkwood, J Stern: Chapter 26. An introduction to medical statistics, M Bland: Chapter 15.6. Survival analysis specific texts Kleinbaum D. G., Klein M., Survival Analysis: A Self-Learning Text, Springer-Verlag Publishers, 2005. Parmar M. K. B., Machin D., Survival analysis: a practical approach, Wiley, 1995. Additional Regression Techniques