# Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.

## Presentation on theme: "Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original."— Presentation transcript:

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 78.89 Model | 42.4015936 2 21.2007968 Prob > F = 0.0000 Residual | 144.30605 537.26872635 R-squared = 0.2271 -------------+------------------------------ Adj R-squared = 0.2242 Total | 186.707643 539.34639637 Root MSE =.51839 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1092273.0091576 11.93 0.000.0912382.1272164 WEIGHT85 |.0024192.0006402 3.78 0.000.0011616.0036769 _cons |.9194011.1609538 5.71 0.000.6032248 1.235577 ------------------------------------------------------------------------------ 1 Here is a regression of the logarithm of hourly earnings on years of schooling and weight in pounds. The weight coefficient implies than an extra pound leads to 0.24% increase in earnings, so four extra pounds leads to a 1% increase. Can you really believe this? VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 78.89 Model | 42.4015936 2 21.2007968 Prob > F = 0.0000 Residual | 144.30605 537.26872635 R-squared = 0.2271 -------------+------------------------------ Adj R-squared = 0.2242 Total | 186.707643 539.34639637 Root MSE =.51839 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1092273.0091576 11.93 0.000.0912382.1272164 WEIGHT85 |.0024192.0006402 3.78 0.000.0011616.0036769 _cons |.9194011.1609538 5.71 0.000.6032248 1.235577 ------------------------------------------------------------------------------ 2 Perhaps not, but the t statistic is very highly significant. What is going on? VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 78.89 Model | 42.4015936 2 21.2007968 Prob > F = 0.0000 Residual | 144.30605 537.26872635 R-squared = 0.2271 -------------+------------------------------ Adj R-squared = 0.2242 Total | 186.707643 539.34639637 Root MSE =.51839 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1092273.0091576 11.93 0.000.0912382.1272164 WEIGHT85 |.0024192.0006402 3.78 0.000.0011616.0036769 _cons |.9194011.1609538 5.71 0.000.6032248 1.235577 ------------------------------------------------------------------------------ 3 VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS Older people tend to have more work experience, which increases their earnings. They also tend to weigh more. This could be an explanation.

. reg LGEARN S EXP WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 70.13 Model | 52.6290507 3 17.5430169 Prob > F = 0.0000 Residual | 134.078593 536.250146628 R-squared = 0.2819 -------------+------------------------------ Adj R-squared = 0.2779 Total | 186.707643 539.34639637 Root MSE =.50015 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1222516.0090671 13.48 0.000.1044402.140063 EXP |.0324871.0050807 6.39 0.000.0225066.0424676 WEIGHT85 |.0016163.0006303 2.56 0.011.0003781.0028545 _cons |.318147.1815401 1.75 0.080 -.0384704.6747644 ------------------------------------------------------------------------------ 4 VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS Here we have controlled for work experience. The weight coefficient is lower, but still almost significant at the 1% level. Can you think of any other variable that might be correlated with both earnings and weight?

. reg LGEARN S EXP MALE WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 4, 535) = 64.31 Model | 60.6259571 4 15.1564893 Prob > F = 0.0000 Residual | 126.081686 535.235666703 R-squared = 0.3247 -------------+------------------------------ Adj R-squared = 0.3197 Total | 186.707643 539.34639637 Root MSE =.48546 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1197587.0088112 13.59 0.000.10245.1370674 EXP |.0282462.0049849 5.67 0.000.0184538.0380386 MALE |.2953164.0506962 5.83 0.000.1957283.3949045 WEIGHT85 | -.0006213.0007224 -0.86 0.390 -.0020404.0007978 _cons |.6269889.1840109 3.41 0.001.2655164.9884614 ------------------------------------------------------------------------------ 5 VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS The MALE dummy is such a variable. When it is included, the weight effect disappears.

. reg LGEARN S EXP MALE WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 4, 535) = 64.31 Model | 60.6259571 4 15.1564893 Prob > F = 0.0000 Residual | 126.081686 535.235666703 R-squared = 0.3247 -------------+------------------------------ Adj R-squared = 0.3197 Total | 186.707643 539.34639637 Root MSE =.48546 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1197587.0088112 13.59 0.000.10245.1370674 EXP |.0282462.0049849 5.67 0.000.0184538.0380386 MALE |.2953164.0506962 5.83 0.000.1957283.3949045 WEIGHT85 | -.0006213.0007224 -0.86 0.390 -.0020404.0007978 _cons |.6269889.1840109 3.41 0.001.2655164.9884614 ------------------------------------------------------------------------------ 6 VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS The point of this example is that model misspecification – variable misspecification or indeed any kind of misspecification – in general will invalidate the regression diagnostics, and as a consequence the diagnostics may lead you to the wrong conclusions.

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 78.89 Model | 42.4015936 2 21.2007968 Prob > F = 0.0000 Residual | 144.30605 537.26872635 R-squared = 0.2271 -------------+------------------------------ Adj R-squared = 0.2242 Total | 186.707643 539.34639637 Root MSE =.51839 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1092273.0091576 11.93 0.000.0912382.1272164 WEIGHT85 |.0024192.0006402 3.78 0.000.0011616.0036769 _cons |.9194011.1609538 5.71 0.000.6032248 1.235577 ------------------------------------------------------------------------------ 7 In the original model, we had two kinds of variable misspecification. We omitted EXP and MALE, and we included the irrelevant variable WEIGHT85. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 78.89 Model | 42.4015936 2 21.2007968 Prob > F = 0.0000 Residual | 144.30605 537.26872635 R-squared = 0.2271 -------------+------------------------------ Adj R-squared = 0.2242 Total | 186.707643 539.34639637 Root MSE =.51839 ------------------------------------------------------------------------------ LGEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S |.1092273.0091576 11.93 0.000.0912382.1272164 WEIGHT85 |.0024192.0006402 3.78 0.000.0011616.0036769 _cons |.9194011.1609538 5.71 0.000.6032248 1.235577 ------------------------------------------------------------------------------ 8 Including an irrelevant variable is one of the few types of misspecification that does not lead to the invalidation of the regression diagnostics. However, omitting relevant variables certainly does. This is why the t statistic in the original specification misled us. VARIABLE MISSPECIFICATION III: CONSEQUENCES FOR DIAGNOSTICS

Copyright Christopher Dougherty 2011. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 6.3 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course 20 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 11.07.25

Download ppt "Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original."

Similar presentations