# R2Y-InterceptTstat YintSlopeTstat Slope Education014%5032915-33887-3 Education211%5001815-29833-2 Education40%4767510880 Education62%455621286431 Education827%4466415530144.

## Presentation on theme: "R2Y-InterceptTstat YintSlopeTstat Slope Education014%5032915-33887-3 Education211%5001815-29833-2 Education40%4767510880 Education62%455621286431 Education827%4466415530144."— Presentation transcript:

R2Y-InterceptTstat YintSlopeTstat Slope Education014%5032915-33887-3 Education211%5001815-29833-2 Education40%4767510880 Education62%455621286431 Education827%4466415530144

Multiple Regression 13 We use least squares approach again ^ Y = b 0 + b 1 X 1 + b 2 X 2 + … + b k X k Find (b 0, b 1, b 2, …, b k ) to minimize the sum of squared residuals, where k is the number of independent variables.

Multiple Regression 14 Things that are pretty much the same as in simple regression: R 2 S t-tests Additional: F-test Adjusted R 2 Multiple Regression Output

SUMMARY OUTPUT Regression Statistics Multiple R0.709 R Square0.503 Adjusted R Square0.461FStat = 11.889 Standard Error17748SigF = 0.00000 Observations52 CoefficientsStandard Errort Stat Intercept16442.258874.131.85 Education23742.7512549.920.30 Education431321.119486.853.30 Education637762.7510147.963.72 Education881236.4213555.465.99

Multiple Regression 16 H 0 :  1 =  2 = …=  k = 0 (None of the Xs help explain Y) H 1 : Not all  s are 0 (At least one X is useful) H 0 : R 2 = 0 is an equivalent hypothesis

Multiple Regression 17  Allows for the comparison of models with different numbers of variables.  “Penalizes" or adjusts the regular R 2 for the number of variables used.

 For the simple education regression, the model using the Education8 variable which had an R2 of 27% and an adjusted R2 of 25.2%.  For the multiple education regression, the model had an R2 of 50% and an adjusted R2 of 46.1%.  This tells us that in comparison the multiple regression model is more explanatory than the simple regression model.

Multiple Regression 19 Explained variation = R 2, k dof Unexplained variation = 1 - R 2,n-k-1 dof

 Have considered the education variables individually: 5 separate regressions.  Have considered all education variables simultaneously: 1 regression.  Need a method for considering various subsets and modeling of a general multiple variable model.  With 11 variables, there are 2 11 -1=2047 possible regressions. It is generally impracticable to consider all possible combinations.

Model Selection11 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each of the remaining k-1 variables. 4. Determine the best model that includes the previous best variable and one new best variable. 5. If either the adjusted-R 2 declines, the standard error of the regression increases, the F-test fails, the t-statistic of the best variable is insignificant, or the coefficients are theoretically inconsistent, STOP, and use the previous best model. Repeat 2-4 until stopped or an all variable model has been reached.

R2Y-IntT YIntSlopeT Slope Age83%-28627-5.56197915.42 Prior Experience45%276396.8533696.36 Alpha Experience67%214586.59304910.05 Education42%116561.7984486.05 Executive76%3751420.515308812.73 Gender3%5212610.85-8480-1.27 Education014%5032915.43-33887-2.88 Education211%5001815.05-29833-2.49 Education40%476759.57880.01 Education62%4556211.8086431.12 Education827%4466414.96530144.26

 The starting point for the best regression model is the R2. The model with the highest R2 is the best model if the following conditions are met.  The t-statistics indicate that all of the coefficients are statistically significant.  The coefficients and the model are consistent with a desired theoretical model.  If the conditions are not met, select the next best model, highest R2.

 The best candidate model is using the variable Age: R2 is 83% and the t-statistics of the coefficients are all statistically significant. The sign and scale of the coefficients are plausible. However, it is unlikely that the company would choose to construct as a matter of theory a salary model based on age.  The next best candidate model is using the variable Executive: R2 is 76% and the t-statistics are all statistically significant. The sign and scale of the coefficients are plausible. There is no conceptual business objection to the model.

 Including the Executive variable, add in turn each of the remaining variables in constructing a two- variable model. This requires ten separate regressions.  This process is continued until the stopping criteria are met: adjusted-R2 declines, standard error increases, the F-test fails for all models, the t-statistics for the best model are not significant, the sign and scale of the coefficients are conceptually contradictory or an irreducible theoretical contradictions are reached.  The maximum number of multiple regressions required for this process is: 11+10+9+…1 = 55. This is a vast improvement over 2047.