Presentation is loading. Please wait.

Presentation is loading. Please wait.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.

Similar presentations


Presentation on theme: "VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression."— Presentation transcript:

1 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression model in terms of explanatory variables. 1 True model Consequences of variable misspecification Fitted model

2 To keep the analysis simple, we will assume that there are only two possibilities. Either Y depends only on X 2, or it depends on both X 2 and X 3. 2 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE True model Consequences of variable misspecification Fitted model

3 If Y depends only on X 2, and we fit a simple regression model, we will not encounter any problems, assuming of course that the regression model assumptions are valid. 3 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE True model Consequences of variable misspecification Fitted model Correct specification, no problems

4 Likewise we will not encounter any problems if Y depends on both X 2 and X 3 and we fit the multiple regression. 4 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE True model Correct specification, no problems Consequences of variable misspecification Fitted model Correct specification, no problems

5 In this sequence we will examine the consequences of fitting a simple regression when the true model is multiple. 5 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE True model Correct specification, no problems Consequences of variable misspecification Fitted model Correct specification, no problems

6 In the next one we will do the opposite and examine the consequences of fitting a multiple regression when the true model is simple. 6 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE True model Correct specification, no problems Consequences of variable misspecification Fitted model Correct specification, no problems

7 True model The omission of a relevant explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid. 7 Correct specification, no problems Correct specification, no problems Coefficients are biased (in general). Standard errors are invalid. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE Consequences of variable misspecification Fitted model

8 8 In the present case, the omission of X 3 causes b 2 to be biased by the term highlighted in yellow. We will explain this first intuitively and then demonstrate it mathematically. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

9 Y X3X3 X2X2 direct effect of X 2, holding X 3 constant effect of X 3 apparent effect of X 2, acting as a mimic for X 3 22 33 9 The intuitive reason is that, in addition to its direct effect  2, X 2 has an apparent indirect effect as a consequence of acting as a proxy for the missing X 3. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

10 Y X3X3 X2X2 direct effect of X 2, holding X 3 constant effect of X 3 apparent effect of X 2, acting as a mimic for X 3 22 33 10 The strength of the proxy effect depends on two factors: the strength of the effect of X 3 on Y, which is given by  3, and the ability of X 2 to mimic X 3. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

11 Y X3X3 X2X2 direct effect of X 2, holding X 3 constant effect of X 3 apparent effect of X 2, acting as a mimic for X 3 22 33 11 The ability of X 2 to mimic X 3 is determined by the slope coefficient obtained when X 3 is regressed on X 2, the term highlighted in yellow. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

12 12 We will now derive the expression for the bias mathematically. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

13 13 Although Y really depends on X 3 as well as X 2, we make a mistake and regress Y on X 2 only. The slope coefficient is therefore as shown. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

14 14 We substitute for Y. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

15 15 We simplify and demonstrate that b 2 has three components. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

16 16 To investigate biasedness or unbiasedness, we take the expected value of b 2. The first two terms are unaffected because they contain no random components. Thus we focus on the expectation of the error term. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

17 17 X 2 is nonstochastic, so the denominator of the error term is nonstochastic and may be taken outside the expression for the expectation. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

18 18 In the numerator the expectation of a sum is equal to the sum of the expectations (first expected value rule). VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

19 19 In each product, the factor involving X 2 may be taken out of the expectation because X 2 is nonstochastic. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

20 20 By Assumption A.3, the expected value of u is 0. It follows that the expected value of the sample mean of u is also 0. Hence the expected value of the error term is 0. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

21 21 Thus we have shown that the expected value of b 2 is equal to the true value plus a bias term. Note: the definition of a bias is the difference between the expected value of an estimator and the true value of the parameter being estimated. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

22 22 As a consequence of the misspecification, the standard errors, t tests and F test are invalid. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

23 23 We will illustrate the bias using an educational attainment model. To keep the analysis simple, we will assume that in the true model S depends only on ASVABC and SM. The output above shows the corresponding regression using EAEF Data Set 21. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------

24 24 We will run the regression a second time, omitting SM. Before we do this, we will try to predict the direction of the bias in the coefficient of ASVABC. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------. cor SM ASVABC (obs=540) | SM ASVABC --------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000

25 25 It is reasonable to suppose, as a matter of common sense, that  3 is positive. This assumption is strongly supported by the fact that its estimate in the multiple regression is positive and highly significant. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------. cor SM ASVABC (obs=540) | SM ASVABC --------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000

26 26 The correlation between ASVABC and SM is positive, so the numerator of the bias term must be positive. The denominator is automatically positive since it is a sum of squares and there is some variation in ASVABC. Hence the bias should be positive. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------. cor SM ASVABC (obs=540) | SM ASVABC --------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000

27 27 Here is the regression omitting SM. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------. cor SM ASVABC (obs=540) | SM ASVABC --------+------------------ SM| 1.0000 ASVABC| 0.4202 1.0000

28 . reg S ASVABC SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------ 28 As you can see, the coefficient of ASVABC is indeed higher when SM is omitted. Part of the difference may be due to pure chance, but part is attributable to the bias. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------

29 29 Here is the regression omitting ASVABC instead of SM. We would expect b 3 to be upwards biased. We anticipate that  2 is positive and we know that both the numerator and the denominator of the other factor in the bias expression are positive. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- SM |.3130793.0348012 9.00 0.000.2447165.3814422 _cons | 10.04688.4147121 24.23 0.000 9.232226 10.86153 ------------------------------------------------------------------------------

30 30 In this case the bias is quite dramatic. The coefficient of SM has more than doubled. The reason for the bigger effect is that the variation in SM is much smaller than that in ASVABC, while  2 and  3 are similar in size, judging by their estimates. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- SM |.3130793.0348012 9.00 0.000.2447165.3814422 _cons | 10.04688.4147121 24.23 0.000 9.232226 10.86153 ------------------------------------------------------------------------------. reg S ASVABC SM ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1328069.0097389 13.64 0.000.1136758.151938 SM |.1235071.0330837 3.73 0.000.0585178.1884963 _cons | 5.420733.4930224 10.99 0.000 4.452244 6.389222 ------------------------------------------------------------------------------

31 . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865. reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756 31 Finally, we will investigate how R 2 behaves when a variable is omitted. In the simple regression of S on ASVABC, R 2 is 0.34, and in the simple regression of S on SM it is 0.13. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

32 32 Does this imply that ASVABC explains 34% of the variance in S and SM 13%? No, because the multiple regression reveals that their joint explanatory power is 0.35, not 0.47. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865. reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

33 33 In the second regression, ASVABC is partly acting as a proxy for SM, and this inflates its apparent explanatory power. Similarly, in the third regression, SM is partly acting as a proxy for ASVABC, again inflating its apparent explanatory power. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963. reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865. reg S SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 80.93 Model | 419.086251 1 419.086251 Prob > F = 0.0000 Residual | 2785.89708 538 5.17824736 R-squared = 0.1308 -------------+------------------------------ Adj R-squared = 0.1291 Total | 3204.98333 539 5.94616574 Root MSE = 2.2756

34 . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537.252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539.34639637 Root MSE =.50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1235911.0090989 13.58 0.000.1057173.141465 EXP |.0350826.0050046 7.01 0.000.0252515.0449137 _cons |.5093196.1663823 3.06 0.002.1824796.8361596 ------------------------------------------------------------------------------ 34 However, it is also possible for omitted variable bias to lead to a reduction in the apparent explanatory power of a variable. This will be demonstrated using a simple earnings function model, supposing the logarithm of hourly earnings to depend on S and EXP. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

35 35 If we omit EXP from the regression, the coefficient of S should be subject to a downward bias.  3 is likely to be positive. The numerator of the other factor in the bias term is negative since S and EXP are negatively correlated. The denominator is positive. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537.252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539.34639637 Root MSE =.50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1235911.0090989 13.58 0.000.1057173.141465 EXP |.0350826.0050046 7.01 0.000.0252515.0449137 _cons |.5093196.1663823 3.06 0.002.1824796.8361596 ------------------------------------------------------------------------------. cor S EXP (obs=540) | S EXP --------+------------------ S| 1.0000 EXP| -0.2179 1.0000

36 . reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537.252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539.34639637 Root MSE =.50274 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1235911.0090989 13.58 0.000.1057173.141465 EXP |.0350826.0050046 7.01 0.000.0252515.0449137 _cons |.5093196.1663823 3.06 0.002.1824796.8361596 ------------------------------------------------------------------------------. cor S EXP (obs=540) | S EXP --------+------------------ S| 1.0000 EXP| -0.2179 1.0000 36 For the same reasons, the coefficient of EXP in a simple regression of LGEARN on EXP should be downwards biased. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE

37 37 As can be seen, the coefficients of S and EXP are indeed lower in the simple regressions. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg LGEARN S EXP ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1235911.0090989 13.58 0.000.1057173.141465 EXP |.0350826.0050046 7.01 0.000.0252515.0449137 _cons |.5093196.1663823 3.06 0.002.1824796.8361596. reg LGEARN S ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1096934.0092691 11.83 0.000.0914853.1279014 _cons | 1.292241.1287252 10.04 0.000 1.039376 1.545107. reg LGEARN EXP ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- EXP |.0202708.0056564 3.58 0.000.0091595.031382 _cons | 2.44941.0988233 24.79 0.000 2.255284 2.643537

38 38 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537.252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539.34639637 Root MSE =.50274. reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538.275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539.34639637 Root MSE =.52475. reg LGEARN EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538.338948978 R-squared = 0.0233 -------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539.34639637 Root MSE =.58219 A comparison of R 2 for the three regressions shows that the sum of R 2 in the simple regressions is actually less than R 2 in the multiple regression.

39 39 This is because the apparent explanatory power of S in the second regression has been undermined by the downwards bias in its coefficient. The same is true for the apparent explanatory power of EXP in the third equation. VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE. reg LGEARN S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 100.86 Model | 50.9842581 2 25.492129 Prob > F = 0.0000 Residual | 135.723385 537.252743734 R-squared = 0.2731 -------------+------------------------------ Adj R-squared = 0.2704 Total | 186.707643 539.34639637 Root MSE =.50274. reg LGEARN S Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 140.05 Model | 38.5643833 1 38.5643833 Prob > F = 0.0000 Residual | 148.14326 538.275359219 R-squared = 0.2065 -------------+------------------------------ Adj R-squared = 0.2051 Total | 186.707643 539.34639637 Root MSE =.52475. reg LGEARN EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 12.84 Model | 4.35309315 1 4.35309315 Prob > F = 0.0004 Residual | 182.35455 538.338948978 R-squared = 0.0233 -------------+------------------------------ Adj R-squared = 0.0215 Total | 186.707643 539.34639637 Root MSE =.58219

40 Copyright Christopher Dougherty 2012. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 6.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course EC2020 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 2012.11.09


Download ppt "VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression."

Similar presentations


Ads by Google