Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: multicollinearity Original citation: Dougherty, C. (2012) EC220 - Introduction.

Similar presentations


Presentation on theme: "Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: multicollinearity Original citation: Dougherty, C. (2012) EC220 - Introduction."— Presentation transcript:

1 Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: multicollinearity Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 3). [Teaching Resource] © 2012 The Author This version available at: http://learningresources.lse.ac.uk/129/http://learningresources.lse.ac.uk/129/ Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/ http://creativecommons.org/licenses/by-sa/3.0/ http://learningresources.lse.ac.uk/

2 X 2 X 3 Y 101951 112156 122361 132566 142771 152976 MULTICOLLINEARITY 1 Suppose that Y = 2 + 3X 2 + X 3 and that X 3 = 2X 2 – 1. There is no disturbance term in the equation for Y, but that is not important. Suppose that we have the six observations shown.

3 MULTICOLLINEARITY 2 The three variables are plotted as line graphs above. Looking at the data, it is impossible to tell whether the changes in Y are caused by changes in X 2, by changes in X 3, or jointly by changes in both X 2 and X 3. Y X3X3 X2X2

4 Change from previous observation X 2 X 3 Y 101951125 112156125 122361125 132566125 142771125 152976125 MULTICOLLINEARITY 3 Numerically, Y increases by 5 in each observation. X 2 changes by 1.

5 MULTICOLLINEARITY 4 Hence the true relationship could have been Y = 1 + 5X 2. Y X3X3 X2X2 Y = 1 + 5X 2 ?

6 MULTICOLLINEARITY 5 However, it can also be seen that X 3 increases by 2 in each observation. Change from previous observation X 2 X 3 Y 101951125 112156125 122361125 132566125 142771125 152976125

7 MULTICOLLINEARITY 6 Hence the true relationship could have been Y = 3.5 +2.5X 3. Y X3X3 X2X2 Y = 3.5 + 2.5X 3 ?

8 MULTICOLLINEARITY 7 These two possibilities are special cases of Y = 3.5 – 2.5p + 5pX 2 + 2.5(1 – p)X 3, which would fit the relationship for any value of p. Y X3X3 X2X2 Y = 3.5 – 2.5p + 5pX 2 + 2.5(1 – p)X 3

9 MULTICOLLINEARITY 8 Y X3X3 X2X2 Y = 3.5 – 2.5p + 5pX 2 + 2.5(1 – p)X 3 There is no way that regression analysis, or any other technique, could determine the true relationship from this infinite set of possibilities, given the sample data.

10 MULTICOLLINEARITY 9 What would happen if you tried to run a regression when there is an exact linear relationship among the explanatory variables?

11 MULTICOLLINEARITY 10 We will investigate, using the model with two explanatory variables shown above. [Note: A disturbance term has now been included in the true model, but it makes no difference to the analysis.]

12 MULTICOLLINEARITY 11 The expression for the multiple regression coefficient b 2 is shown above. We will substitute for X 3 using its relationship with X 2.

13 MULTICOLLINEARITY 12 First, we will replace the terms highlighted.

14 MULTICOLLINEARITY 13 We have made the replacement.

15 MULTICOLLINEARITY 14 Next, the terms highlighted now.

16 MULTICOLLINEARITY 15 We have made the replacement.

17 MULTICOLLINEARITY 16 Finally this term.

18 MULTICOLLINEARITY 17 Again, we have made the replacement.

19 MULTICOLLINEARITY 18 It turns out that the numerator and the denominator are both equal to zero. The regression coefficient is not defined.

20 MULTICOLLINEARITY 19 It is unusual for there to be an exact relationship among the explanatory variables in a regression. When this occurs, it s typically because there is a logical error in the specification.

21 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 20 However, it often happens that there is an approximate relationship. For example, when relating earnings to schooling and work experience, it if often reasonable to suppose that the effect of work experience is subject to diminishing returns.

22 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 21 A standard way of allowing for this is to include EXPSQ, the square of EXP, in the specification. According to the hypothesis of diminishing returns, 4 should be negative.

23 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 22 We fit this specification using Data Set 21. The schooling component of the regression results is not much affected by the inclusion of the EXPSQ term. The coefficient of S indicates that an extra year of schooling increases hourly earnings by $2.75.

24 . reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY 23 In the specification without EXPSQ it was 2.68, not much different.

25 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 24 The standard error, 0.23 in the specification without EXPSQ, is also little changed and the coefficient remains highly significant.

26 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 25 By contrast, the inclusion of the new term has had a dramatic effect on the coefficient of EXP. Now it is negative, which makes little sense, and insignificant.

27 MULTICOLLINEARITY 26 Previously it had been positive and highly significant.. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------

28 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 27 The coefficient of EXPSQ is also strange. It is positive, suggesting increasing returns to experience. However, it is not significant.

29 . reg EARNINGS S EXP EXPSQ Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 45.57 Model | 22762.4472 3 7587.48241 Prob > F = 0.0000 Residual | 89247.7839 536 166.507059 R-squared = 0.2032 -------------+------------------------------ Adj R-squared = 0.1988 Total | 112010.231 539 207.811189 Root MSE = 12.904 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------ MULTICOLLINEARITY 28 The reason for these problems is that EXPSQ is highly correlated with EXP. This makes it difficult to discriminate between the individual effects of EXP and EXPSQ, and the regression estimates tend to be erratic.. cor EXP EXPSQ (obs=540) | EXP EXPSQ ------+------------------ EXP | 1.0000 EXPSQ | 0.9812 1.0000

30 . reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY 29 The high correlation causes the standard error of EXP to be larger than it would have been if EXP and EXPSQ had been less highly correlated, warning us that the point estimate is unreliable.

31 . reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY 30 When high correlations among the explanatory variables lead to erratic point estimates of the coefficients, large standard errors and unsatisfactorily low t statistics, the regression is said to said to be suffering from multicollinearity.

32 . reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY 31 Note that the coefficients remain unbiased and the standard errors remain valid.

33 . reg EARNINGS S EXP EXPSQ ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.754372.2417286 11.39 0.000 2.279521 3.229224 EXP | -.2353907.665197 -0.35 0.724 -1.542103 1.071322 EXPSQ |.0267843.0219115 1.22 0.222 -.0162586.0698272 _cons | -22.21964 5.514827 -4.03 0.000 -33.05297 -11.38632 ------------------------------------------------------------------------------. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ MULTICOLLINEARITY 32 Multicollinearity may also be caused by an approximate linear relationship among the explanatory variables. When there are only 2, an approximate linear relationship means there will be a high correlation, but this is not always the case when there are more than 2.

34 Copyright Christopher Dougherty 2011. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 3.4 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course 20 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 11.07.25


Download ppt "Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: multicollinearity Original citation: Dougherty, C. (2012) EC220 - Introduction."

Similar presentations


Ads by Google