Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship.

Similar presentations


Presentation on theme: "© Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship."— Presentation transcript:

1 © Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship among the regressors in the sample. A.3The disturbance term has zero expectation A.4The disturbance term is homoscedastic A.5The values of the disturbance term have independent distributions A.6The disturbance term has a normal distribution PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS Moving from the simple to the multiple regression model, we start by restating the regression model assumptions. Only A.2 is different. Previously it was stated that there must be some variation in the X variable. We will explain the difference in one of the following lectures. Provided that the regression model assumptions are valid, the OLS estimators in the multiple regression model are unbiased and efficient, as in the simple regression model.

2 © Christopher Dougherty 1999–2006 We will not attempt to prove efficiency. We will however outline a proof of unbiasedness. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

3 © Christopher Dougherty 1999–2006 The first step, as always, is to substitute for Y from the true relationship. The Y ingredients of b 2 are actually in the form of Y i minus its mean, so it is convenient to obtain an expression for this. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

4 © Christopher Dougherty 1999–2006 After simplifying, we find that b 2 can be decomposed into the true value  2 plus a weighted linear combination of the values of the disturbance term in the sample. This is what we found in the simple regression model. The difference is that the expression for the weights, which depend on all the values of X 2 and X 3 in the sample, is considerably more complicated. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

5 © Christopher Dougherty 1999–2006 Having reached this point, proving unbiasedness is easy. Taking expectations,  2 is unaffected, being a constant. The expectation of a sum is equal to the sum of expectations. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

6 © Christopher Dougherty 1999–2006 The a* terms are nonstochastic since they depend only on the values of X 2 and X 3, and these are assumed to be nonstochastic. Hence the a* terms may be taken out of the expectations as factors. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

7 © Christopher Dougherty 1999–2006 By Assumption A.3, E(u i ) = 0 for all i. Hence E(b 2 ) is equal to  2 and so b 2 is an unbiased estimator. Similarly b 3 is an unbiased estimator of  3. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

8 © Christopher Dougherty 1999–2006 Finally we will show that b 1 is an unbiased estimator of  1. This is quite simple, so you should attempt to do this yourself, before looking at the rest of this sequence. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

9 © Christopher Dougherty 1999–2006 First substitute for the sample mean of Y. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

10 © Christopher Dougherty 1999–2006 Now take expectations. The first three terms are nonstochastic, so they are unaffected by taking expectations. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

11 © Christopher Dougherty 1999–2006 The expected value of the mean of the disturbance term is zero since E(u) is zero in each observation. We have just shown that E(b 2 ) is equal to  2 and that E(b 3 ) is equal to  3. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

12 © Christopher Dougherty 1999–2006 Hence b 1 is an unbiased estimator of  1. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS

13 © Christopher Dougherty 1999–2006 PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS This sequence investigates the variances and standard errors of the slope coefficients in a model with two explanatory variables. The expression for the variance of b 2 is shown above. The expression for the variance of b 3 is the same, with the subscripts 2 and 3 interchanged.

14 © Christopher Dougherty 1999–2006 The first factor in the expression is identical to that for the variance of the slope coefficient in a simple regression model. The variance of b 2 depends on the variance of the disturbance term, the number of observations, and the mean square deviation of X 2 for exactly the same reasons as in a simple regression model. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

15 © Christopher Dougherty 1999–2006 The difference is that in multiple regression analysis the expression is multiplied by a factor which depends on the correlation between X 2 and X 3. The higher is the correlation between the explanatory variables, positive or negative, the greater will be the variance. This is easy to understand intuitively. The greater the correlation, the harder it is to discriminate between the effects of the explanatory variables on Y, and the less accurate will be the regression estimates. Note that the variance expression above is valid only for a model with two explanatory variables. When there are more than two, the expression becomes much more complex and it is sensible to switch to matrix algebra. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

16 © Christopher Dougherty 1999–2006 The standard deviation of the distribution of b 2 is of course given by the square root of its variance. With the exception of the variance of u, we can calculate the components of the standard deviation from the sample data. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

17 © Christopher Dougherty 1999–2006 The variance of u has to be estimated. The mean square of the residuals provides a consistent estimator, but it is biased downwards by a factor (n – k) / n, where k is the number of parameters, in a finite sample. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

18 © Christopher Dougherty 1999–2006 Obviously we can obtain an unbiased estimator by dividing the sum of the squares of the residuals by n – k instead of n. We denote this unbiased estimator s u. 2 PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

19 © Christopher Dougherty 1999–2006 Thus the estimate of the standard deviation of the probability distribution of b 2, known as the standard error of b 2 for short, is given by the expression above. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

20 © Christopher Dougherty 1999–2006 We will use this expression to analyze why the standard error of S is larger for the union subsample than for the non-union subsample in earnings function regressions using Data Set 21.. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

21 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ To select a subsample in Stata, you add an ‘if’ statement to a command. The COLLBARG variable is equal to 1 for respondents whose rates of pay are determined by collective bargaining, and it is 0 for the others. Note that in tests for equality, Stata requires the = sign to be duplicated. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

22 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ In the case of the union subsample, the standard error of S is 0.5493. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

23 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP if COLLBARG==0 Source | SS df MS Number of obs = 439 -------------+------------------------------ F( 2, 436) = 57.77 Model | 19540.1761 2 9770.08805 Prob > F = 0.0000 Residual | 73741.593 436 169.132094 R-squared = 0.2095 -------------+------------------------------ Adj R-squared = 0.2058 Total | 93281.7691 438 212.972076 Root MSE = 13.005 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.721698.2604411 10.45 0.000 2.209822 3.233574 EXP |.6077342.1400846 4.34 0.000.3324091.8830592 _cons | -28.00805 4.643211 -6.03 0.000 -37.13391 -18.88219 ------------------------------------------------------------------------------ In the case of the non-union subsample, the standard error of S is 0.2604, less than half as large. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

24 © Christopher Dougherty 1999–2006 We will explain the difference by looking at the components of the standard error. Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

25 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ We will start with s u. Here is RSS for the union subsample. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

26 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ There are 101 observations in the union subsample. k is equal to 3. Thus n – k is equal to 98. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

27 © Christopher Dougherty 1999–2006 RSS / (n – k) is equal to 158.183. To obtain s u, we take the square root. This is 12.577.. reg EARNINGS S EXP if COLLBARG==1 Source | SS df MS Number of obs = 101 -------------+------------------------------ F( 2, 98) = 9.72 Model | 3076.31726 2 1538.15863 Prob > F = 0.0001 Residual | 15501.9762 98 158.18343 R-squared = 0.1656 -------------+------------------------------ Adj R-squared = 0.1486 Total | 18578.2934 100 185.782934 Root MSE = 12.577 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.333846.5492604 4.25 0.000 1.243857 3.423836 EXP |.2235095.3389455 0.66 0.511 -.4491169.8961358 _cons | -15.12427 11.38141 -1.33 0.187 -37.71031 7.461779 ------------------------------------------------------------------------------ PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

28 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 We place this in the table, along with the number of observations. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

29 © Christopher Dougherty 1999–2006 Similarly, in the case of the non-union subsample, s u is the square root of 169.132, which is 13.005. We also note that the number of observations in that subsample is 439.. reg EARNINGS S EXP if COLLBARG==0 Source | SS df MS Number of obs = 439 -------------+------------------------------ F( 2, 436) = 57.77 Model | 19540.1761 2 9770.08805 Prob > F = 0.0000 Residual | 73741.593 436 169.132094 R-squared = 0.2095 -------------+------------------------------ Adj R-squared = 0.2058 Total | 93281.7691 438 212.972076 Root MSE = 13.005 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.721698.2604411 10.45 0.000 2.209822 3.233574 EXP |.6077342.1400846 4.34 0.000.3324091.8830592 _cons | -28.00805 4.643211 -6.03 0.000 -37.13391 -18.88219 ------------------------------------------------------------------------------ PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

30 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 We place these in the table. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

31 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 We calculate the mean square deviation of S for the two subsamples from the sample data. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

32 © Christopher Dougherty 1999–2006. cor S EXP if COLLBARG==1 (obs=101) | S EXP --------+------------------ S | 1.0000 EXP | -0.4087 1.0000. cor S EXP if COLLBARG==0 (obs=439) | S EXP --------+------------------ S | 1.0000 EXP | -0.1784 1.0000 The correlation coefficients for S and EXP are –0.4087 and –0.1784 for the union and non-union subsamples, respectively. (Note that "cor" is the Stata command for computing correlations.) PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

33 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666–0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 These entries complete the top half of the table. We will now look at the impact of each item on the standard error, using the mathematical expression at the top. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

34 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 The s u components need no modification. It is a little larger for the non- union subsample, having an adverse effect on the standard error. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

35 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 The number of observations is much larger for the non-union subsample, so the second factor is much smaller than that for the union subsample. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

36 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 Perhaps surprisingly, the variance in schooling is a little larger for the union subsample. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

37 © Christopher Dougherty 1999–2006 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666–0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 The correlation between schooling and work experience is greater for the union subsample, and this has an adverse effect on its standard error. Note that the sign of the correlation makes no difference since it is squared. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

38 Decomposition of the standard error of S Component s u n MSD(S) r S, EXP s.e. Union12.577 1016.2325–0.40870.5493 Non-union13.005 4395.8666 – 0.17840.2604 Factor product Union12.5770.09950.40061.09570.5493 Non-union13.0050.04770.41291.01630.2603 We see that the reason that the standard error is smaller for the non-union subsample is that there are far more observations than in the non-union subsample. Otherwise the standard errors would have been about the same. The greater correlation between S and EXP has an adverse effect on the union standard error, but this is just about offset by the smaller s u and the larger variance of S. PRECISION OF THE MULTIPLE REGRESSION COEFFICIENTS

39 © Christopher Dougherty 1999–2006 X 2 X 3 Y 101951 112156 122361 132566 142771 152976 MULTICOLLINEARITY Suppose that Y = 2 + 3X 2 + X 3 and that X 3 = 2X 2 – 1. There is no disturbance term in the equation for Y, but that is not important. Suppose that we have the six observations shown.

40 © Christopher Dougherty 1999–2006 The three variables are plotted as line graphs above. Looking at the data, it is impossible to tell whether the changes in Y are caused by changes in X 2, by changes in X 3, or jointly by changes in both X 2 and X 3. Y X3X3 X2X2 MULTICOLLINEARITY

41 © Christopher Dougherty 1999–2006 Change Change Change X 2 X 3 Y in X 2 in X 3 in Y 101951125 112156125 122361125 132566125 142771125 152976125 Numerically, Y increases by 5 in each observation when X 2 changes by 1. MULTICOLLINEARITY

42 © Christopher Dougherty 1999–2006 Hence the true relationship could have been Y = 1 + 5X 2. Y X3X3 X2X2 Y = 1 + 5X 2 ? MULTICOLLINEARITY

43 © Christopher Dougherty 1999–2006 What would happen if you tried to run a regression when there is an exact linear relationship among the explanatory variables? We will investigate, using the model with two explanatory variables shown above. [Note: A disturbance term has now been included in the true model, but it makes no difference to the analysis.] MULTICOLLINEARITY

44 © Christopher Dougherty 1999–2006 The expression for the multiple regression coefficient b 2 is shown above. We will substitute for X 3 using its relationship with X 2. MULTICOLLINEARITY

45 © Christopher Dougherty 1999–2006 First, we will replace the terms highlighted with the expression derived below. MULTICOLLINEARITY

46 © Christopher Dougherty 1999–2006 Next, the terms that are highlighted now. MULTICOLLINEARITY

47 © Christopher Dougherty 1999–2006 Finally this term. MULTICOLLINEARITY

48 © Christopher Dougherty 1999–2006 After all the replacements, it turns out that the numerator and the denominator are both equal to zero. The regression coefficient is not defined. It is unusual for there to be an exact relationship among the explanatory variables in a regression. When this occurs, it is typically because there is a logical error in the specification. MULTICOLLINEARITY

49 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ However, it often happens that there is an approximate relationship. For example, when relating earnings to schooling and work experience, it if often reasonable to suppose that the effect of work experience is subject to diminishing returns. A standard way of allowing for this is to include EXPSQ, the square of EXP, in the specification. According to the hypothesis of diminishing returns,  4 should be negative. MULTICOLLINEARITY

50 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ We fit this specification using Data Set 21. The schooling component of the regression results is not much affected by the inclusion of the EXPSQ term. The coefficient of S indicates that an extra year of schooling increases hourly earnings by $2.75. MULTICOLLINEARITY

51 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ (Looking back at slide 21:) In the specification without EXPSQ it was 2.68, not much different. MULTICOLLINEARITY

52 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ The standard error, 0.23 in the specification without EXPSQ, is also little changed and the coefficient remains highly significant. MULTICOLLINEARITY

53 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ By contrast, the inclusion of the new term has had a dramatic effect on the coefficient of EXP. Now it is negative, which makes little sense, and insignificant! MULTICOLLINEARITY

54 © Christopher Dougherty 1999–2006 Previously it had been positive and highly significant.. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY

55 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY

56 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ The reason for these problems is that EXPSQ is highly correlated with EXP. This makes it difficult to discriminate between the individual effects of EXP and EXPSQ, and the regression estimates tend to be erratic.. cor EXP EXPSQ (obs=540) | EXP EXPSQ ------+-------------- ---- EXP | 1.0000 EXPSQ | 0.9812 1.0000 MULTICOLLINEARITY

57 © Christopher Dougherty 1999–2006 When high correlations among the explanatory variables lead to erratic point estimates of the coefficients, large standard errors and unsatisfactorily low t statistics, the regression is said to said to be suffering from multicollinearity. Multicollinearity may also be caused by an approximate linear relationship among the explanatory variables. When there are only 2, an approximate linear relationship means there will be a high correlation, but this is not always the case when there are more than 2. MULTICOLLINEARITY

58 © Christopher Dougherty 1999–2006 ALLEVIATION OF MULTICOLLINEARITY What can you do about multicollinearity if you encounter it? We will discuss some possible measures, looking at the model with two explanatory variables. Before doing this, two important points should be emphasized. First, multicollinearity does not cause the regression coefficients to be biased. Their probability distributions are still centered over the true values, if the regression specification is correct, but they have unsatisfactorily large variances. Second, the standard errors and t tests remain valid. The standard errors are larger than they would have been in the absence of multicollinearity, warning us that the regression estimates are erratic. Since the problem of multicollinearity is caused by the variances of the coefficients being unsatisfactorily large, we will seek ways of reducing them.

59 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (1)Reduce by including further relevant variables in the model. We will focus on the slope coefficient and look at the various components of its variance. We might be able to reduce it by bringing more variables into the model and reducing  u 2, the variance of the disturbance term. ALLEVIATION OF MULTICOLLINEARITY

60 © Christopher Dougherty 1999–2006 The estimator of the variance of the disturbance term is the residual sum of squares divided by n – k, where n is the number of observations (540) and k is the number of parameters (4). Here it is 166.5.. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ ALLEVIATION OF MULTICOLLINEARITY

61 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ MALE ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 37.24 Model | 28957.3532 5 5791.47063 Prob > F = 0.0000 Residual | 83052.8779 534 155.529734 R-squared = 0.2585 -------------+------------------------------ Adj R-squared = 0.2516 Total | 112010.231 539 207.811189 Root MSE = 12.471 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------ We now add two new variables that are often found to be determinants of earnings: MALE, sex of respondent, and ASVABC, the composite score on the cognitive tests in the Armed Services Vocational Aptitude Battery. MALE is a qualitative variable and the treatment of such variables will be explained in Chapter 5. ALLEVIATION OF MULTICOLLINEARITY

62 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ MALE ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 37.24 Model | 28957.3532 5 5791.47063 Prob > F = 0.0000 Residual | 83052.8779 534 155.529734 R-squared = 0.2585 -------------+------------------------------ Adj R-squared = 0.2516 Total | 112010.231 539 207.811189 Root MSE = 12.471 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------ Both MALE and ASVABC have coefficients significant at the 0.1% level. ALLEVIATION OF MULTICOLLINEARITY

63 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904. reg EARNINGS S EXP EXPSQ MALE ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 37.24 Model | 28957.3532 5 5791.47063 Prob > F = 0.0000 Residual | 83052.8779 534 155.529734 R-squared = 0.2585 -------------+------------------------------ Adj R-squared = 0.2516 Total | 112010.231 539 207.811189 Root MSE = 12.471 However they account for only a small proportion of the variance in earnings and the reduction in the estimate of the variance of the disturbance term is likewise small. ALLEVIATION OF MULTICOLLINEARITY

64 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP EXPSQ MALE ASVABC ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------ As a consequence the impact on the standard errors of EXP and EXPSQ is negligible. ALLEVIATION OF MULTICOLLINEARITY

65 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP EXPSQ MALE ASVABC ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------ Note how unstable the coefficients are. This is often a sign of multicollinearity. ALLEVIATION OF MULTICOLLINEARITY

66 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (2)Increase the number of observations. Surveys: increase the budget, or use clustering Time series: use quarterly instead of annual data The next factor to look at is n, the number of observations. If you are working with cross- section data (individuals, households, enterprises, etc) and you are undertaking a survey, you could increase the size of the sample by negotiating a bigger budget. You select a number of these randomly, perhaps using random sampling to make sure that metropolitan, other urban, and rural areas are properly represented. You then confine the survey to the areas selected. This reduces the travel time and cost of the fieldworkers, allowing them to interview a greater number of respondents. ALLEVIATION OF MULTICOLLINEARITY

67 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (2)Increase the number of observations. Surveys: increase the budget, use clustering Time series: use quarterly instead of annual data If you are working with time series data, you may be able to increase the sample by working with shorter time intervals for the data, for example quarterly or even monthly data instead of annual data. ALLEVIATION OF MULTICOLLINEARITY

68 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ MALE ASVABC Source | SS df MS Number of obs = 2714 -------------+------------------------------ F( 5, 2708) = 183.99 Model | 161795.573 5 32359.1147 Prob > F = 0.0000 Residual | 476277.268 2708 175.877869 R-squared = 0.2536 -------------+------------------------------ Adj R-squared = 0.2522 Total | 638072.841 2713 235.190874 Root MSE = 13.262 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.312461.135428 17.08 0.000 2.046909 2.578014 EXP | -.3270651.308231 -1.06 0.289 -.9314569.2773268 EXPSQ |.023743.0101558 2.34 0.019.0038291.0436569 MALE | 5.947206.5221755 11.39 0.000 4.923303 6.971108 ASVABC |.2086846.0336869 6.19 0.000.1426301.2747392 _cons | -27.40462 2.579435 -10.62 0.000 -32.46248 -22.34676 ------------------------------------------------------------------------------ Here is the result of running the regression with all 2,714 observations in the EAEF data set. ALLEVIATION OF MULTICOLLINEARITY

69 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ MALE ASVABC Number of obs = 540 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------. reg EARNINGS S EXP EXPSQ MALE ASVABC Number of obs = 2714 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.312461.135428 17.08 0.000 2.046909 2.578014 EXP | -.3270651.308231 -1.06 0.289 -.9314569.2773268 EXPSQ |.023743.0101558 2.34 0.019.0038291.0436569 MALE | 5.947206.5221755 11.39 0.000 4.923303 6.971108 ASVABC |.2086846.0336869 6.19 0.000.1426301.2747392 _cons | -27.40462 2.579435 -10.62 0.000 -32.46248 -22.34676 ------------------------------------------------------------------------------ Comparing this result with that using Data Set 21, we see that the standard errors are much smaller, as expected. ALLEVIATION OF MULTICOLLINEARITY

70 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP EXPSQ MALE ASVABC Number of obs = 540 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------. reg EARNINGS S EXP EXPSQ MALE ASVABC Number of obs = 2714 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.312461.135428 17.08 0.000 2.046909 2.578014 EXP | -.3270651.308231 -1.06 0.289 -.9314569.2773268 EXPSQ |.023743.0101558 2.34 0.019.0038291.0436569 MALE | 5.947206.5221755 11.39 0.000 4.923303 6.971108 ASVABC |.2086846.0336869 6.19 0.000.1426301.2747392 _cons | -27.40462 2.579435 -10.62 0.000 -32.46248 -22.34676 ------------------------------------------------------------------------------ As a consequence, the t statistics of the variables are higher. However the correlation between EXP and EXPSQ is as high as in the smaller sample and the increase in the sample size has not been large enough to have much impact on the problem of multicollinearity. ALLEVIATION OF MULTICOLLINEARITY

71 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (3)Increase MSD(X 2 ). A third possible way of reducing the problem of multicollinearity might be to increase the variation in the explanatory variables. This is possible only at the design stage of a survey. For example, if you were planning a household survey with the aim of investigating how expenditure patterns vary with income, you should make sure that the sample included relatively rich and relatively poor households as well as middle-income households. ALLEVIATION OF MULTICOLLINEARITY

72 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (4) Reduce. Another possibility might be to reduce the correlation between the explanatory variables. This is possible only at the design stage of a survey and even then it is not easy. ALLEVIATION OF MULTICOLLINEARITY

73 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (5)Combine the correlated variables. If the correlated variables are similar conceptually, it may be reasonable to combine them into some overall index. ALLEVIATION OF MULTICOLLINEARITY

74 © Christopher Dougherty 1999–2006 That is precisely what has been done with the three cognitive ASVAB variables. ASVABC has been calculated as a weighted average of ASVAB02 (arithmetic reasoning), ASVAB03 (word knowledge), and ASVAB04 (paragraph comprehension). The three components are highly correlated and by combining them as a weighted average, rather than using them individually, one avoids a potential problem of multicollinearity.. reg EARNINGS S EXP EXPSQ MALE ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 37.24 Model | 28957.3532 5 5791.47063 Prob > F = 0.0000 Residual | 83052.8779 534 155.529734 R-squared = 0.2585 -------------+------------------------------ Adj R-squared = 0.2516 Total | 112010.231 539 207.811189 Root MSE = 12.471 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.031419.296218 6.86 0.000 1.449524 2.613315 EXP | -.0816828.6441767 -0.13 0.899 -1.347114 1.183748 EXPSQ |.0130223.021334 0.61 0.542 -.0288866.0549311 MALE | 5.762358 1.104734 5.22 0.000 3.592201 7.932515 ASVABC |.2447687.0714294 3.43 0.001.1044516.3850858 _cons | -26.18541 5.452032 -4.80 0.000 -36.89547 -15.47535 ------------------------------------------------------------------------------ ALLEVIATION OF MULTICOLLINEARITY

75 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (6)Drop some of the correlated variables. Dropping some of the correlated variables, if they have insignificant coefficients, may alleviate multicollinearity. However, this approach to multicollinearity is dangerous. It is possible that some of the variables with insignificant coefficients really do belong in the model and that the only reason their coefficients are insignificant is because there is a problem of multicollinearity. If that is the case, their omission may cause omitted variable bias, to be discussed in Chapter 6. ALLEVIATION OF MULTICOLLINEARITY

76 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (7)Empirical restriction A further way of dealing with the problem of multicollinearity is to use extraneous information, if available, concerning the coefficient of one of the variables. For example, suppose that Y in the equation above is the demand for a category of consumer expenditure, X is aggregate disposable personal income, and P is a price index for the category. To fit a model of this type you would use time series data. If X and P are highly correlated, which is often the case with time series variables, the problem of multicollinearity might be eliminated in the following way. ALLEVIATION OF MULTICOLLINEARITY

77 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (7)Empirical restriction Obtain data on income and expenditure on the category from a household survey and regress Y' on X'. (The ' marks are to indicate that the data are household data, not aggregate data.) This is a simple regression because there will be relatively little variation in the price paid by the households. ALLEVIATION OF MULTICOLLINEARITY

78 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (7)Empirical restriction Now substitute b' for  2 in the time series model. Subtract b' X from both sides, and regress Z = Y – b' X on price. This is a simple regression, so multicollinearity has been eliminated. 2 2 2 ALLEVIATION OF MULTICOLLINEARITY

79 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (7)Empirical restriction There are some problems with this technique. First, the  2 coefficients may be conceptually different in time series and cross-section contexts. Second, since we subtract the estimated income component b' X, not the true income component  2 X, from Y when constructing Z, we have introduced an element of measurement error in the dependent variable. ALLEVIATION OF MULTICOLLINEARITY

80 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (8)Theoretical restriction Last, but by no means least, is the use of a theoretical restriction, which is defined as a hypothetical relationship among the parameters of a regression model. It will be explained using an educational attainment model as an example. Suppose that we hypothesize that highest grade completed, S, depends on ASVABC, and highest grade completed by the respondent's mother and father, SM and SF, respectively. ALLEVIATION OF MULTICOLLINEARITY

81 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ A one-point increase in ASVABC increases S by 0.13 years. ALLEVIATION OF MULTICOLLINEARITY

82 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ S increases by 0.05 years for every extra year of schooling of the mother and 0.11 years for every extra year of schooling of the father. Mother's education is generally held to be at least, if not more, important than father's education for educational attainment, so this outcome is unexpected. ALLEVIATION OF MULTICOLLINEARITY

83 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ It is also surprising that the coefficient of SM is not significant, even at the 5% level, using a one-sided test. ALLEVIATION OF MULTICOLLINEARITY

84 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ However, assortative mating leads to correlation between SM and SF and the regression appears to be suffering from multicollinearity.. cor SM SF (obs=540) | SM SF --------+------------------ SM | 1.0000 SF | 0.6241 1.0000 ALLEVIATION OF MULTICOLLINEARITY

85 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (8)Theoretical restriction Suppose that we hypothesize that mother's and father's education are equally important. We can then impose the restriction  3 =  4. ALLEVIATION OF MULTICOLLINEARITY

86 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (8)Theoretical restriction This allows us to rewrite the equation as shown. ALLEVIATION OF MULTICOLLINEARITY

87 © Christopher Dougherty 1999–2006 Possible measures for alleviating multicollinearity (8)Theoretical restriction Defining SP to be the sum of SM and SF, the equation may be rewritten as shown. The problem caused by the correlation between SM and SF has been eliminated. ALLEVIATION OF MULTICOLLINEARITY

88 © Christopher Dougherty 1999–2006. g SP=SM+SF. reg S ASVABC SP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 156.04 Model | 1177.98338 2 588.991689 Prob > F = 0.0000 Residual | 2026.99996 537 3.77467403 R-squared = 0.3675 -------------+------------------------------ Adj R-squared = 0.3652 Total | 3204.98333 539 5.94616574 Root MSE = 1.9429 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1253106.0098434 12.73 0.000.1059743.1446469 SP |.0828368.0164247 5.04 0.000.0505722.1151014 _cons | 5.29617.4817972 10.99 0.000 4.349731 6.242608 ------------------------------------------------------------------------------ The estimate of  3 is now 0.083 and highly significant. ALLEVIATION OF MULTICOLLINEARITY

89 © Christopher Dougherty 1999–2006. g SP=SM+SF. reg S ASVABC SP ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1253106.0098434 12.73 0.000.1059743.1446469 SP |.0828368.0164247 5.04 0.000.0505722.1151014 _cons | 5.29617.4817972 10.99 0.000 4.349731 6.242608 ------------------------------------------------------------------------------. reg S ASVABC SM SF ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ Not surprisingly, this is a compromise between the coefficients of SM and SF in the previous specification. ALLEVIATION OF MULTICOLLINEARITY

90 © Christopher Dougherty 1999–2006. g SP=SM+SF. reg S ASVABC SP ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1253106.0098434 12.73 0.000.1059743.1446469 SP |.0828368.0164247 5.04 0.000.0505722.1151014 _cons | 5.29617.4817972 10.99 0.000 4.349731 6.242608 ------------------------------------------------------------------------------. reg S ASVABC SM SF ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The standard error of SP is much smaller than those of SM and SF. The use of the restriction has led to a large gain in efficiency and the problem of multicollinearity has been eliminated. ALLEVIATION OF MULTICOLLINEARITY

91 © Christopher Dougherty 1999–2006. g SP=SM+SF. reg S ASVABC SP ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1253106.0098434 12.73 0.000.1059743.1446469 SP |.0828368.0164247 5.04 0.000.0505722.1151014 _cons | 5.29617.4817972 10.99 0.000 4.349731 6.242608 ------------------------------------------------------------------------------. reg S ASVABC SM SF ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The t statistic is very high. Thus it would appear that imposing the restriction has improved the regression results. However, the restriction may not be valid. We should test it. Testing theoretical restrictions is one of the topics in Chapter 6. ALLEVIATION OF MULTICOLLINEARITY

92 © Christopher Dougherty 1999–2006 F TESTS OF GOODNESS OF FIT This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole. We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all. The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients  2,...,  k are zero. The alternative hypothesis is that at least one of these  coefficients is different from zero. In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually. In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable.

93 © Christopher Dougherty 1999–2006 ESS / TSS is the definition of R 2. RSS / TSS is equal to (1 – R 2 ). (See the last sequence in Chapter 2.) F TESTS OF GOODNESS OF FIT

94 © Christopher Dougherty 1999–2006 The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively. The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero. F TESTS OF GOODNESS OF FIT

95 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ Here is the regression output using Data Set 21. F TESTS OF GOODNESS OF FIT

96 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row. F TESTS OF GOODNESS OF FIT

97 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The denominator is the residual sum of squares divided by the number of degrees of freedom remaining. F TESTS OF GOODNESS OF FIT

98 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ Hence the F statistic is 104.3. All serious regression packages compute it for you as part of the diagnostics in the regression output. F TESTS OF GOODNESS OF FIT

99 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is 5.51. Hence we easily reject H 0 at the 0.1% level. F TESTS OF GOODNESS OF FIT

100 © Christopher Dougherty 1999–2006 It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable. Then the F statistic should be low enough for H 0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients. The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R 2 is high. You would expect to have a highly significant F statistic. However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant. In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually.

101 © Christopher Dougherty 1999–2006 We now come to the other F test of goodness of fit. This is a test of the joint explanatory power of a group of variables when they are added to a regression model. For example, in the original specification, Y may be written as a simple function of X 2. In the second, we add X 3 and X 4. F TESTS OF GOODNESS OF FIT

102 © Christopher Dougherty 1999–2006 The null hypothesis for the F test is that neither X 3 nor X 4 belongs in the model. The alternative hypothesis is that at least one of them does, perhaps both. F TESTS OF GOODNESS OF FIT

103 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining For this F test, and for several others which we will encounter, it is useful to think of the F statistic as having the structure indicated above. The ‘improvement’ is the reduction in the residual sum of squares when the change is made, in this case, when the group of new variables is added. The ‘cost’ is the reduction in the number of degrees of freedom remaining after making the change. In the present case it is equal to the number of new variables added, because that number of new parameters are estimated. (Remember that the number of degrees of freedom in a regression equation is the number of observations, less the number of parameters estimated. In this example, it would fall from n – 2 to n – 4 when X 3 and X 4 are added.) The ‘remaining unexplained’ is the residual sum of squares after making the change. The ‘degrees of freedom remaining’ is the number of degrees of freedom remaining after making the change. F TESTS OF GOODNESS OF FIT

104 © Christopher Dougherty 1999–2006. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------ We will illustrate the test with an educational attainment example. Here is S regressed on ASVABC using Data Set 21. We make a note of the residual sum of squares. F TESTS OF GOODNESS OF FIT

105 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ Now we have added the highest grade completed by each parent. Does parental education have a significant impact? Well, we can see that a t test would show that SF has a highly significant coefficient, but we will perform the F test anyway. We make a note of RSS. F TESTS OF GOODNESS OF FIT

106 © Christopher Dougherty 1999–2006 The F statistic is 13.16. The critical value of F(2,500) at the 0.1% level is 7.00. The critical value of F(2,536) must be lower, so we reject H 0 and conclude that the parental education variables do have significant joint explanatory power. F TESTS OF GOODNESS OF FIT

107 © Christopher Dougherty 1999–2006 This sequence will conclude by showing that t tests are equivalent to marginal F tests when the additional group of variables consists of just one variable. Suppose that in the original model Y is a function of X 2 and X 3, and that in the revised model X 4 is added. F TESTS OF GOODNESS OF FIT

108 © Christopher Dougherty 1999–2006 The null hypothesis for the F test of the explanatory power of the additional ‘group’ is that all the new slope coefficients are equal to zero. There is of course only one new slope coefficient,  4. F TESTS OF GOODNESS OF FIT

109 © Christopher Dougherty 1999–2006 The F test has the usual structure. We will illustrate it with an educational attainment model where S depends on ASVABC and SM in the original model and on SF as well in the revised model. F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining F TESTS OF GOODNESS OF FIT

110 © Christopher Dougherty 1999–2006. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ Here is the regression of S on ASVABC and SM. We make a note of the residual sum of squares. F TESTS OF GOODNESS OF FIT

111 © Christopher Dougherty 1999–2006 Now we add SF and again make a note of the residual sum of squares.. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ F TESTS OF GOODNESS OF FIT

112 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining The improvement on adding SF is the reduction in the residual sum of squares. F TESTS OF GOODNESS OF FIT

113 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining The cost is just the single degree of freedom lost when estimating  4. F TESTS OF GOODNESS OF FIT

114 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining The remaining unexplained is the residual sum of squares after adding SF. F TESTS OF GOODNESS OF FIT

115 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining The number of degrees of freedom remaining after adding SF is 540 – 4 = 536. F TESTS OF GOODNESS OF FIT

116 © Christopher Dougherty 1999–2006 F(cost, d.f. remaining) = improvementcost remaining unexplained degrees of freedom remaining The critical value of F at the 0.1% significance level with 500 degrees of freedom is 10.96. The critical value with 536 degrees of freedom must be lower, so we reject H 0 at the 0.1% level. The null hypothesis we are testing is exactly the same as for a two-sided t test on the coefficient of SF. F TESTS OF GOODNESS OF FIT

117 © Christopher Dougherty 1999–2006 We will perform the t test. The t statistic is 3.48.. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ F TESTS OF GOODNESS OF FIT

118 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ The critical value of t at the 0.1% level with 500 degrees of freedom is 3.31. The critical value with 536 degrees of freedom must be lower. So we reject H 0 again. F TESTS OF GOODNESS OF FIT

119 . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ It can be shown that the F statistic for the F test of the explanatory power of a ‘group’ of one variable must be equal to the square of the t statistic for that variable. (The difference in the last digit is due to rounding error.) F TESTS OF GOODNESS OF FIT

120 © Christopher Dougherty 1999–2006. reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+-------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ It can also be shown that the critical value of F must be equal to the square of the critical value of t. (The critical values shown are for 500 degrees of freedom, but this must also be true for 536 degrees of freedom.) F TESTS OF GOODNESS OF FIT

121 © Christopher Dougherty 1999–2006 Hence the conclusions of the two tests must coincide. This result means that the t test of the coefficient of a variable is a test of its marginal explanatory power, after all the other variables have been included in the equation. If the variable is correlated with one or more of the other variables, its marginal explanatory power may be quite low, even if it genuinely belongs in the model. If all the variables are correlated, it is possible for all of them to have low marginal explanatory power and for none of the t tests to be significant, even though the F test for their joint explanatory power is highly significant. If this is the case, the model is said to be suffering from the problem of multicollinearity discussed earlier. F TESTS OF GOODNESS OF FIT


Download ppt "© Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship."

Similar presentations


Ads by Google