# SADC Course in Statistics Choosing the best model (Session 08)

## Presentation on theme: "SADC Course in Statistics Choosing the best model (Session 08)"— Presentation transcript:

SADC Course in Statistics Choosing the best model (Session 08)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to use a simple descriptive approach to select of the most appropriate subset of explanatory variables apply methods of variable selection (based on statistical tests) in a meaningful way to get the best model appreciate the effect on t-probabilities when xs are added or dropped from a model understand dangers of using automatic selection procedures

To put your footer here go to View > Header and Footer 3 Example of choosing best set of xs Consider data (fictitious) from a retrospective study of patients surviving less than 4 months after being diagnosed as having acute leukaemia. Objective: To identify factors affecting survival time. Variables were: y = survival time (days) after diagnosis x1 = no: of chemotherapy sessions x2 = total volume of blood transfused x3 = no: of days of hospital care x4 = age of patient (years).

To put your footer here go to View > Header and Footer 5 Summary statistics for all regressions How many possible regression models exist? Example with x1 and x3 to show summaries: ---------+--------------------------------------- Source | SS df MS F Prob>F ---------+--------------------------------------- Model | 1488.691 2 744.346 6.07 0.0188 Residual | 1227.072 10 122.707 ---------+--------------------------------------- Total | 2715.763 12 226.314 ---------+--------------------------------------- No. of parameters fitted (p) = 3 R 2 p = 1488.69 / 2715.07 = 0.5482 Adjusted R 2 p = 1 – 122.71 / 226.31 = 0.4578

To put your footer here go to View > Header and Footer 6 Descriptive approach (all regressions) No. of xs p = No. of parameters Terms in model R2R2 Adj. R 2 Res. M.S. None 00226.3 11x10.5340.492115.1 11x20.6660.63682.4 11x30.2860.221176.3 11x40.6750.64580.4 22x1, x20.9790.9745.8 22x1, x30.5480.458122.7 22x1, x40.9720.9677.5 22x2, x30.8470.81641.5 22x2, x40.6800.61686.9 22x3, x40.9350.92217.6 33x1, x2, x30.9820.9765.4 33x1, x2, x40.9820.9765.3 33x1, x3, x40.9810.9755.7 33x2, x3, x40.9730.9648.2 44x1, x2, x3, x40.9820.9746.0

To put your footer here go to View > Header and Footer 7 A descriptive approach… continued Plot R 2 versus no. of parameters (p) in model Which model would you select on the basis of these results?

To put your footer here go to View > Header and Footer 8 A descriptive approach… continued Which model would you select on the basis of the residual mean square? Alternatively, plot residual mean square. Small residual mean square is good!

To put your footer here go to View > Header and Footer 9 An inferential approach… Use a sequential procedure to select variables that contribute most, and significantly, to the regression model. Three popular methods exist: Forward selection Backward elimination Stepwise regression

To put your footer here go to View > Header and Footer 10 Forward selection … Select the best single variable - see slide 6 Ask, Is it contributing significantly? Answer: Yes (see below) ----------------------------------------- y | Coef. Std. Err. t P>|t| -------+--------------------------------- x4 | -.73816.1546 -4.77 0.001 const. | 117.57 5.2622 22.34 0.000 ----------------------------------------- Now consider 2-variable models with x4.

To put your footer here go to View > Header and Footer 11 Two-variable models with x4 ----------------------------------------- y | Coef. Std.Err. t P>|t| -------------+--------------------------- x4 | -.61395.04864 -12.62 0.000 x1 | 1.4400.13842 10.40 0.000 const.| 103.10 2.1240 48.54 0.000 ----------------------------------------- x4 | -.45694.69595 -0.66 0.526 x2 |.31090.74861 0.42 0.687 const.| 94.160 56.627 1.66 0.127 ----------------------------------------- x4 | -.72460.07233 -10.02 0.000 x3 | -1.1999.18902 -6.35 0.000 const.| 131.28 3.2748 40.09 0.000 -----------------------------------------

To put your footer here go to View > Header and Footer 12 Three-variable models with x4, x1 ----------------------------------------- y | Coef. Std.Err. t P>|t| -------------+--------------------------- x4 | -.23654.17329 -1.37 0.205 x1 | 1.4519.11700 12.41 0.000 x2 |.41611.18561 2.24 0.052 const. | 71.648 14.142 5.07 0.001 ----------------------------------------- x4 | -.64280.04454 -14.43 0.000 x1 | 1.0519.22368 4.70 0.001 x3 | -.41004.19923 -2.06 0.070 const. | 111.68 4.5625 24.48 0.000 ----------------------------------------- Model with x1, x2 and x4 would be selected! - despite x4 now being non-significant!

To put your footer here go to View > Header and Footer 13 Backward elimination gives x1,x2 --------------------------------------- y | Coef. Std.Err. t P>|t| -----+--------------------------------- x1 | 1.5511.74477 2.08 0.071 x2 |.51017.7238 0.70 0.501 x3 |.10191.7547 0.14 0.896 x4 | -.14406.7091 -0.20 0.844 --------------------------------------- x1 | 1.4519.11700 12.41 0.000 x2 |.41611.18561 2.24 0.052 x4 | -.23654.17329 -1.37 0.205 --------------------------------------- x1 | 1.4683.12130 12.10 0.000 x2 |.66225.04585 14.44 0.000 ---------------------------------------

To put your footer here go to View > Header and Footer 14 Stepwise selection procedure… This is similar to forward selection, but at each stage of the process, all xs in the model are re-assessed to check if those that entered the model at an earlier stage still remain important. Note: Software packages allow automatic use of one of these with pre-specified p- values for selection and deletion of variables. Usually available only with quantitative xs.

To put your footer here go to View > Header and Footer 15 Discussion… in small groups Look back at results. What do you observe with the forward and backward procedures. Do they give the same results? Did the selection using forward seem sensible, given that for x4, the p-value =0.205? Can you work out what model would results with a stepwise selection procedures? Is it a good idea to use such automatic selection procedures available in software packages? If not, why not?

To put your footer here go to View > Header and Footer 16 Discussion continued… Suppose a medical researcher told you that a model without x2 was not meaningful, how would you proceed with your model selection? What other latent (lurking) variables, measurable or non-measurable, might affect y? What further steps would you undertaken before accepting the final model?

To put your footer here go to View > Header and Footer 17 Practical work follows to ensure learning objectives are achieved…

Similar presentations