Presentation is loading. Please wait.

Presentation is loading. Please wait.

MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.

Similar presentations


Presentation on theme: "MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent."— Presentation transcript:

1 MultiCollinearity

2 The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent of each other. Multicollinearity: data on explanatory variables for sample are perfectly or highly correlated Perfect colinearity is when tow of the X variables are the same STATA automatically controls for this by dropping one of them Perfect Collinearity: Coefficients are indeterminate; SE are infinite High degree of collinearity: large SE, imprecise coefficients, large interval estimation Multiple Regression analysis: Cannot isolate independent effects i.e. hold one variable constant while changing the other. OLS estimators still BLUE Implications: Large Variances and Covariances of estimators: Large confidence intervals Insignificant t-statistics Non-rejection of zero-coefficient hypothesis P(Type II error) large F-Tests fail to reject joint insignificance R 2 high Estimators and SE sensitive to few obs/data points Detecting Multicollinearity High R 2 /insignificant t-tests/significant F-Tests High correlation coefficients between sample data on variables Solve Multicollinearity: Impose Economic Restrictions e.g CRS in CD production function Improve Sample data Drop variable (risk of mis-specifying model and having omitted variable bias) Use rates of change of variables (impact on error term)

3 The Consequences This is not a violation of the GM theorem OLS is still BLUE and consistent The standard errors and hypothesis tests are all still valid So what is the problem? – Imprecision

4 Imprecision Because the variables are correlated the move together Therefore OLS cannot determine the partial effect with much reliability – Difficult to isolated specific effect of one variable when the tend to move together This manifests itself as high standard errors Equivalently the confidence intervals are very wide – Recall: CI=b+/-t*se

5 Perfect MC Extreme example Attempt a regression where one variable is perfectly correlated with another Standard errors would be infinite because it would be impossible to separate the independent effects of the two variables which move exactly together Stata will spot perfect multicolineararity and drop one variable Can also happen if one variable is linear combination of others

6 Perfect Multicolinearity gen x2=inc_pc (1 missing value generated). regress price inc_pc hstock_pc x2 Source | SS df MS Number of obs = 41 -------------+------------------------------ F( 2, 38) = 210.52 Model | 6.7142e+11 2 3.3571e+11 Prob > F = 0.0000 Residual | 6.0598e+10 38 1.5947e+09 R-squared = 0.9172 -------------+------------------------------ Adj R-squared = 0.9129 Total | 7.3202e+11 40 1.8301e+10 Root MSE = 39934 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc_pc | 16.15503 2.713043 5.95 0.000 10.66276 21.6473 hstock_pc | 124653.9 389266.5 0.32 0.751 -663374.9 912682.6 x2 | (dropped) _cons | -190506.4 80474.78 -2.37 0.023 -353419.1 -27593.71 ------------------------------------------------------------------------------

7 Imperfect Multicolinearity If two (or more) x variables are highly correlated – but not perfectly correlated, stata wont drop them but the standard errors will be high The implications – CI wider – more likely to not reject null hypothesis – Variables will appear individually statistically insignificant – But they will be jointly significant (F-test)

8 Detecting MC Low t-statistics for individual tests of significance High F-statistic for test of joint significance High R2 All these signs suggest that the variables matter collectively but it is difficult to distinguish their individual effects.

9 An Example regress lnQ lnK lnL Source | SS df MS Number of obs = 33 -------------+------------------------------ F( 2, 30) = 33.12 Model | 3.11227468 2 1.55613734 Prob > F = 0.0000 Residual | 1.4093888 30.046979627 R-squared = 0.6883 -------------+------------------------------ Adj R-squared = 0.6675 Total | 4.52166348 32.141301984 Root MSE =.21675 ------------------------------------------------------------------------------ lnQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnK |.4877311.7038727 0.69 0.494 -.9497687 1.925231 lnL |.5589916.8164384 0.68 0.499 -1.108398 2.226381 _cons | -.1286729.5461324 -0.24 0.815 -1.244024.9866783 ------------------------------------------------------------------------------

10 The Example Production function example K and L tend to increase over time together Economically they have independent effects But we cannot estimate their separate effects reliably with this data – individually insignificant Nevertheless, K and L matter jointly for output – High R2 – High F statistic: can reject the null of joint insignificance

11 What to Do about it? Maybe nothing – OLS is still BLUE – Individual estimates imprecise but model could still good at prediction Add more data in the hope of getting more precise estimates – Making use of consistency – The distribution gets narrower as sample size rises Drop variable – Risk omitted variable bias


Download ppt "MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent."

Similar presentations


Ads by Google