Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Similar presentations


Presentation on theme: "Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."— Presentation transcript:

1 Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

2 Objectives of session Recognise the need to check fit of the model Recognise the need to check fit of the model Carry out checks of assumptions in SPSS for simple linear regression Carry out checks of assumptions in SPSS for simple linear regression Understand predictive model Understand predictive model Understand residuals Understand residuals

3 How is the fitted line obtained? Use method of least squares (LS) Seek to minimise squared vertical differences between each point and fitted line Results in parameter estimates or regression coefficients of slope (b) and intercept (a) – y=a+bx

4 Consider Fitted line of y = a +bx Explanatory (x) Dependent (y) a

5 Consider the regression of age on minimum LDL cholesterol achieved Select Regression Select Regression Linear…. Linear…. Dependent (y) – Min LDL achieved Dependent (y) – Min LDL achieved Independent (x) - Age_Base Independent (x) - Age_Base

6 N.B. -0.008 may look very small but represents: The DECREASE in LDL achieved for each increase in one unit of age i.e. ONE year Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant)2.024.10519.340.000 Age at baseline-.008.002-.121-4.546.000 a. Dependent Variable: Min LDL achieved

7 H 0 : slope b = 0 Test t = slope/se = -0.008/0.002 = 4.546 with p<0.001, so statistically significant Predicted LDL = 2.024 - 0.008xAge Output from SPSS linear regression Coefficients a ModelUnstandardized CoefficientsStandardized Coefficients BStd. ErrorBetatsig 1(Constant)2.024.10519.340.000 Age at baseline-.008.002-.121-4.546.000 a. Dependent Variable: Min LDL achieved

8 Predicted LDL achieved = 2.024 - 0.008xAge So for a man aged 65 the predicted LDL achieved = 2.024 – 0.008x 65 = 1.504 Prediction Equation from linear regression AgePredicted Min LDL 451.664 551.584 651.504 751.424

9 Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

10 Use Graphs and Scatterplot to obtain the Lowess line of fit

11 1.Create Scatterplot and then double-click to enter chart editor 2.Chose Icon ‘Add fit line at total’ 3.Then select type of fit such as Lowess

12 Linear assumption: Fitted lowess smoothed line Lowess smoothed line (red) gives a good eyeball examination of linear assumption (green)

13 Definition of a residual A residual is the difference between the predicted value (fitted line) and the actual value or unexplained variation r i = y i – E ( y i ) Or r i = y i – ( a + bx )

14 Residuals

15 To assess the residuals in SPSS linear regression, select plots….. Normalised or standardised predicted value of LDL Normalised residual Select histogram of residuals and normal probability plot

16 In SPSS linear regression, select Statistics….. Select confidence intervals for regression coefficients Model fit Select Durbin- Watson for serial correlation and identification of outliers

17 Output: Scatterplot of residuals vs. predicted Note 1)Mean of residuals = 0 2)Most of data lie within + or -3 SDs of mean

18 Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed

19 Plot of residuals with normal curve super- imposed Output: Histogram of standardised residuals

20 Output: Cumulative probability plot Look for deviation from diagonal line to indicate non- normality

21 Output: Description of residuals Subjects with standardised residuals > 3 Descriptive statistics for residuals Worth investigation? Casewise Diagnostics(a) Case NumberStd. ResidualMin LDL Predicted Residual 1645.6605.58401.5181534.0658471 2094.3954.52601.3686853.1573148 2503.1433.78751.5293252.2581750 2683.0643.87301.6716642.2013357 2743.2274.09531.7771532.3180975 3624.0954.53501.5934602.9415398 5173.6364.32401.7117882.6122125 8493.9684.32901.4781132.8508873 10474.2074.43601.4136863.0223141 10753.8854.40401.6132192.7907805 11033.5193.99051.4625842.5279157 12293.0163.76601.5992542.1667456 12903.9754.23451.3791072.8553933 a. Dependent Variable: Min LDL achieved

22 R – correlation between min LDL achieved and Age at baseline, here 0.121 R 2 - % variation explained, here 1.5%, not particularly high Durbin-Watson test - serial correlation of residuals should be approximately 2 if no serial correlation Output: Model fit and serial correlation Model Summary ModelRR SquareAdjusted R SquareStd. Error of the Estimate Durbin-Watson 1.121 a.015.014.71840482.034 a. Predictors: (Constant), Age at baseline

23 Summary After fitting any regression model check assumptions - Functional form – linearity is default, often not best fit, consider quadratic… Functional form – linearity is default, often not best fit, consider quadratic… Check Residuals for approx. normality Check Residuals for approx. normality Check Residuals for outliers (> 3 SDs) Check Residuals for outliers (> 3 SDs) All accomplished within SPSS All accomplished within SPSS

24 Practical on Model Checking Read in ‘LDL Data.sav’ 1) Fit age squared term in min LDL model and check fit of model compared to linear fit (Hint: Use transform/compute to create age squared term and fit age and age 2 ) 2) Fit separate linear regressions with min Chol achieved with predictors of 1) baseline Chol 2) APOE_lin 3) adherence Check assumptions and interpret results


Download ppt "Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."

Similar presentations


Ads by Google