Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 20 Last Lecture: Effect of adding or deleting a variable

Similar presentations


Presentation on theme: "Lecture 20 Last Lecture: Effect of adding or deleting a variable"— Presentation transcript:

1 Lecture 20 Last Lecture: Effect of adding or deleting a variable
a). Adding a variable : Bias , Variance for estimation, Complicate the model b). Deleting a variable: Bias, Variance for estimation, Simplify the model c). The best model is obtained by the squared bias and the variance. Criteria of Model Selection a) b) c). Today: Procedures for Model Selection I. Evaluating All Possible Models II Evaluating Part of Models a). Forward Selection b). Backward Elimination c). Stepwise 4/12/2019 ST3131, Lecture 20

2 I. Evaluating All Possible Models
Procedure: Based on some criterion, e.g , compare all possible models and select the best one. Advantage: Drawback: 4/12/2019 ST3131, Lecture 20

3 # of all possible models with q predictor variables:-----------
q=3, this number is 0-variable model: YX SSE0 1-variable model: YX01, YX02, YX SSE01, SSE02,SSE03 2-variable model: YX012, YX013, YX SSE012, SSE013, SSE023 3-variable model: YX SSE0123 q=6, this number is: q=7, this number is: Use: When q is relatively small, say q<6. When q is large, say q=10, we have 2^10 = 1024 models to evaluate. This is too difficult to be practical or feasible. 4/12/2019 ST3131, Lecture 20

4 The New York Rivers Data
Vars R-Sq R-Sq(adj) C-p S X1 X2 X3 X4 =================================================================== X X X X X X X X X X X X X X X X X X X X X X X X X X X X ================================================================== X X X X 4/12/2019 ST3131, Lecture 20

5 II. Evaluating Part of Models
To save computation, we evaluate just part of models. There are 3 procedures to be considered. a). Forward Selection: Procedure: Start with the Simplest model with NO predictor variables. Step 1: Introduce the variable which has the largest correlation coefficient with Y Step 2: Introduce the variable which has the largest correlation coefficient with Y-Residuals after Y regressing those introduced variables. Step 3: Repeat Step 2 until the coefficient of the latest introduced variable is not significant or the absolute t-test value is smaller than the pre-determined cutoff- value. 4/12/2019 ST3131, Lecture 20

6 Drawback: the resulting model may not be the globally best one. (why?)
The Final Model is the model with all introduced variables except the last one which is not significant or its absolute t-test is smaller than the pre-determined cutoff-value. # of models evaluated: less than , compared with of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) Remark: The pre-determined cutoff-value is often taken as 1 4/12/2019 ST3131, Lecture 20

7 Correlations: Nitrogen, Agr, Forest, Rsdntial, ComIndl
0.080 Forest Rsdntial ComIndl Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = Forest Predictor Coef SE Coef T P Constant Forest S = R-Sq = 59.8% R-Sq(adj) = 57.6% 4/12/2019 ST3131, Lecture 20

8 Correlations: RESI1, Agr, Rsdntial, ComIndl
0.396 Rsdntial ComIndl Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = Forest ComIndl Predictor Coef SE Coef T P Constant Forest ComIndl S = R-Sq = 69.4% R-Sq(adj) = 65.7% 4/12/2019 ST3131, Lecture 20

9 4/12/2019 ST3131, Lecture 20 Correlations: RESI2, Agr, Rsdntial
0.686 Rsdntial Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = Forest ComIndl Agr Predictor Coef SE Coef T P Constant Forest ComIndl Agr S = R-Sq = 70.9% R-Sq(adj) = 65.4% 4/12/2019 ST3131, Lecture 20

10 b). Backward Elimination
Procedure: Start with the Full Model with All predictor variables. Step 1: Delete the variable which has the smallest absolute t-test value in the Full Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 2: Delete the variable which has the smallest absolute t-test value in the Reduced Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 3: Repeat Step 2 until all the coefficients in the latest Reduced Model are significant or the t-tests are larger than the pre-determined cutoff-value. 4/12/2019 ST3131, Lecture 20

11 Drawback: the resulting model may not be the globally best one. (why?)
The Final Model is the latest Reduced Model where no any coefficients can be deleted. # of models evaluated: less than , compared with of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) Remark: The pre-determined cutoff-value is often taken as 1. Deleting the coefficient with the smallest absolute t-test value is Equivalent to Deleting the coefficient with the smallest contribution to the reduction of SSE. The F-test value for Comparing the (p+1)-variable Reduced Model to the p-variable Reduced Model is exactly of the t-test value for the deleted coefficient. 4/12/2019 ST3131, Lecture 20

12 4/12/2019 ST3131, Lecture 20 The regression equation is
Nitrogen = Agr Forest Rsdntial ComIndl Predictor Coef SE Coef T P Constant Agr Forest Rsdntial ComIndl S = R-Sq = 70.9% R-Sq(adj) = 63.2% The regression equation is Nitrogen = Agr Forest ComIndl Predictor Coef SE Coef T P Constant Agr Forest ComIndl S = R-Sq = 70.9% R-Sq(adj) = 65.4% 4/12/2019 ST3131, Lecture 20

13 Regression Analysis: Nitrogen versus Forest, ComIndl
The regression equation is Nitrogen = Forest ComIndl Predictor Coef SE Coef T P Constant Forest ComIndl S = R-Sq = 69.4% R-Sq(adj) = 65.7% 4/12/2019 ST3131, Lecture 20

14 Procedure: Start with the Simplest model with NO predictor variables.
c) Stepwise Procedure: Start with the Simplest model with NO predictor variables. Step 1: Introduce the variable which has the largest correlation coefficient with Y Step 2: Introduce the variable which has the largest correlation coefficient with Y-Residuals after Y regressing those introduced variables. Step3: Check if there are some variables in the Current Model that can be deleted. Delete the variable which has the smallest absolute t-test value in the Current Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 4: Repeat Step 3 until there are no coefficients can be deleted. Step 5: Check if there are some variables in the remaining variables that can be introduced to the current model. Introduce the variable which has the largest correlation coefficient with the current Y-residuals, which is significant, or its absolute t-test value is larger than the pre-determined value. Step 6: Repeat Steps 3, 4 and 5 until there are no variables in the current model can be deleted and there are no variables in the remaining variables that can be introduced. 4/12/2019 ST3131, Lecture 20

15 Remark: 1. The pre-determined cutoff-value is often taken as 1 for both variable-entering and Leaving. For the above three procedures, one can also use F-test predetermined cutoff-value since the square of the t-test is the F-test of the (p+1)-variable model over the p-variable model. 4/12/2019 ST3131, Lecture 20

16 Drawback: the resulting model may not be the globally best one. (why?)
The Final Model is the model which has no variables that can be deleted or introduced. # of models evaluated: more than but much fewer than of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) 4/12/2019 ST3131, Lecture 20

17 Remarks: The above Model Selection Procedures should be used with Caution.
They should not be used mechanically to determine the “best” variables The order in which the variables enter or leave the model Should Not be interpreted as reflecting the relative importance of the variables. All three procedures often give nearly the same selection of variables with non-collinear data. It may not be the case for collinear data. We recommend the Backward Elimination procedure over the Forward Selection procedure. Reasons: a). The t-test values are currently available in the Coefficient Table in the Backward Elimination procedure while in the Forward Selection procedure we need to compute the correlation coefficients between the Y-residuals and the remaining variables. b). The Backward Elimination procedure is better to handle the multi-collinear problem. 4/12/2019 ST3131, Lecture 20

18 Example: Supervise Performance Data
Y overall rating of job being done by supervisor X1 handles employee complaints X2 does not allow special privileges X3 opportunity to learn new things X4 Raises based on performance X5 Too critical of poor performance X6 Rate of advancing to better job Correlations: Y, X1, X2, X3, X4, X5, X6 Y X X X X X5 X 0.000 X X X X X Cell Contents: Pearson correlation P-Value 4/12/2019 ST3131, Lecture 20

19 4/12/2019 ST3131, Lecture 20 Forward selection. Alpha-to-Enter: 1
Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6 Forward selection. Alpha-to-Enter: 1 Response is Y on 6 predictors, with N = 30 Step Constant X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value S R-Sq R-Sq(adj) C-p 4/12/2019 ST3131, Lecture 20

20 Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6
Backward elimination. Alpha-to-Remove: 0 Response is Y on 6 predictors, with N = 30 Step Constant X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value S R-Sq R-Sq(adj) C-p 4/12/2019 ST3131, Lecture 20

21 Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6
Alpha-to-Enter: 0.8 Alpha-to-Remove: 0.8 Response is Y on 6 predictors, with N = 30 Step Constant X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value X T-Value P-Value S R-Sq R-Sq(adj) C-p 4/12/2019 ST3131, Lecture 20

22 Best Subsets Regression: Y versus X1, X2, X3, X4, X5, X6
Response is Y X X X X X X Vars R-Sq R-Sq(adj) C-p S X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 4/12/2019 ST3131, Lecture 20

23 Remarks about Cp statistic
The first term decreases with p increasing, and the second term increases with p increasing. However, it seems that the second term dominates the Cp statistic so that the Cp statistic has an increasing trend over p since the expectation of the Cp statistic is p+1 for any fixed p. 2). The accuracy of the noise variance estimate , which is based on the Full Model, is a key factor for the accuracy of the Cp statistic. If the Full Model has a large number of variables with little explanatory power, the estimate of is large, then the first term of the Cp statistic is small. In this case, Cp statistic is of limited usefulness since a good estimate of is not available based on the Full Model. Thus, we should use Cp statistic with caution. 4/12/2019 ST3131, Lecture 20

24 After-class questions:
In the Forward Selection procedure, do we always choose the most useful variables to explain more information of the responses? In the Backward Elimination, do we always delete the most insignificant variables? In Stepwise, why need we introduce at least two variables before do the backward elimination? Given a model selection table, how can we select a proper cutoff value based on some statistic so that the procedure will stop at some step? 4/12/2019 ST3131, Lecture 20


Download ppt "Lecture 20 Last Lecture: Effect of adding or deleting a variable"

Similar presentations


Ads by Google