Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.

Model Selection1

1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each of the remaining k-1 variables. 4. Determine the best model that includes the previous best variable and one new best variable. 5. If either the adjusted-R 2 declines, the standard error of the regression increases, the t-statistic of the best variable is insignificant, or the coefficients are theoretically inconsistent, STOP, and use the previous best model. Repeat 2-4 until stopped or an all variable model has been reached. Model Selection2

3 -If the adjusted-R 2 declines when an additional variable is added, then the added value of the variable does not outweigh its modeling cost. -If the standard error increases then the additional variable has not improved estimation. -If the t-statistic of one of the variables is insignificant then there may be too many variables. -If the coefficients are inconsistent with theory may indicate multicollinearity effects.

1. Regress Y on all k potential X variables 2. Use t-tests to determine which X has the least amount of significance 3. If this X does not meet some minimum level of significance, remove it from the model 4. Regress Y on the set of k-1 X variables Repeat 2-4 until all remaining Xs meet minimum Model Selection4

Multiple Regression 15 The tests should be used one at a time. T 1 can tell you to drop X 1 and keep X 2 -X 6 T 2 can tell you to drop X 2 and keep X 1 and X 3 -X 6 Together, they don’t necessarily tell you to drop both and keep X 3 -X 6

Model Selection6 If t stat not significant, we can remove an X and simplify the model while still maintaining the model’s high Rsquare. Typical stopping rule Continue until all Xs meet some target “significance level to stay” (often.10 or.15 to keep more Xs).

 The forward and backward heuristics may or may not result in the same end model. Generally however the resulting models should be quite similar.  The backwards elimination model requires that you start with a model that includes all possible explanatory variables. But, for example, Excel will only conduct regression for up to 16 variables. Model Selection7

 When using many variables in a regression, it may be the case that some of the explanatory variables are highly correlated with other explanatory variables. In the extreme when two of the variables are linearly related, the multiple regression will fail as unstable.  Simple indicators are a failure of the F-test; an increase in Standard Error; insignificant t- statistic for a previously significant variable; theoretically inconsistent coefficients.  Recall also that when using a categorical variable, one of the categories must be “left out”. Model Selection8

 The variance-inflation-factors (VIFs) should be calculated after reaching a supposed stopping point in a multiple regression selection method.  The VIFs are calculated for each independent variable by regressing that INDEPENDENT VARIABLE against the other independent variables = 1 / (1-R 2 )  A simple rule-of-thumb is that the VIFs should be less than 4. Model Selection9

 The forward and backward heuristic rely on adding or deleting one variable at a time.  It is however possible to evaluate the statistical significance of including a set of variables by constructing the partial F- statistic. Model Selection10

Multiple regression 5 -- The partial F test11  Suppose there are r variables in the group  Define the full model to be the one with all Xs (all k predictors)  Define the reduced model to be the one with the group left out (it has k-r variables).

Multiple regression 5 -- The partial F test12  Look at the increase in the sum of squared errors SSE Reduced – SSE Full to see how much of the explained variation is lost.  Divide this by r, the number of variables in the group.  Put this in ratio to the MSE of the full model.  This is called the partial F statistic.

Multiple regression 5 -- The partial F test13 This has an F distribution with r numerator and (n-k-1) denominator degrees of freedom

Multiple regression 5 -- The partial F test14 Full Reduced

Multiple regression 5 -- The partial F test15 H o : Four variable coefficients are insignificant H 1 : at least one variable coefficient in the group is useful (889.042 – 765.939 )/4 30.776 F = -------------------- = ----- = 3.255 9.456 9.456 The correct F dist to test against is 4 numerator and 81 denominator degrees of freedom. The value for a (4,60) distribution is 2.53 at a significance level of.05 and 3.65 at a significance level of.01

Multiple Regression 4: Indicator Variables16 Extensions Two lines, different slopes More than two categories Multicategory, multislope

Multiple regression 5 -- The partial F test17  Recall that using the Executive variable alone created a salary model with two lines having different intercepts.  Adding the variable Alpha Experience resulted in a model also having two lines with different intercepts.  But, what if there is an interaction effect between Executive status and Alpha experience.

Multiple regression 5 -- The partial F test18  The Executive status variable has two categories: 0 and 1.  Create two variables from Alpha experience so that ◦ when Executive =0, Alpha retains its value, otherwise it equals 0. ◦ When Executive = 1, Alpha retains its value, otherwise it equals 0.  Using now three variables, Executive status and the two alpha variables will result in a model with two lines having different intercepts and different slopes capturing a simple interaction effect among the variables.

Model Selection19

Model Selection20

Model Selection21

Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.

Similar presentations

Presentation on theme: "Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.

Similar presentations

Presentation on theme: "Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each."— Presentation transcript:

Similar presentations

About project

Feedback