MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.

Slides:



Advertisements
Similar presentations
Further Inference in the Multiple Regression Model Hill et al Chapter 8.
Advertisements

Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Multivariate Regression
The Multiple Regression Model.
Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Specification Error II
Studenmund(2006): Chapter 8
Heteroskedasticity The Problem:
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Some Topics In Multivariate Regression. Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate.
INTERPRETATION OF A REGRESSION EQUATION
Assumption MLR.3 Notes (No Perfect Collinearity)
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Multiple Linear Regression Analysis
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Returning to Consumption
Serial Correlation and the Housing price function Aka “Autocorrelation”
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Specification Error I.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Chap 6 Further Inference in the Multiple Regression Model
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED?
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for ©
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Spring 2007 Lecture 9Slide #1 More on Multivariate Regression Analysis Multivariate F-Tests Multicolinearity The EVILS of Stepwise Regression Intercept.
QM222 Class 9 Section A1 Coefficient statistics
business analytics II ▌appendix – regression performance the R2 
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
Multivariate Regression
Fundamentals of regression analysis
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
BEC 30325: MANAGERIAL ECONOMICS
Chapter 13 Additional Topics in Regression Analysis
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

MultiCollinearity

The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent of each other. Multicollinearity: data on explanatory variables for sample are perfectly or highly correlated Perfect colinearity is when tow of the X variables are the same STATA automatically controls for this by dropping one of them Perfect Collinearity: Coefficients are indeterminate; SE are infinite High degree of collinearity: large SE, imprecise coefficients, large interval estimation Multiple Regression analysis: Cannot isolate independent effects i.e. hold one variable constant while changing the other. OLS estimators still BLUE Implications: Large Variances and Covariances of estimators: Large confidence intervals Insignificant t-statistics Non-rejection of zero-coefficient hypothesis P(Type II error) large F-Tests fail to reject joint insignificance R 2 high Estimators and SE sensitive to few obs/data points Detecting Multicollinearity High R 2 /insignificant t-tests/significant F-Tests High correlation coefficients between sample data on variables Solve Multicollinearity: Impose Economic Restrictions e.g CRS in CD production function Improve Sample data Drop variable (risk of mis-specifying model and having omitted variable bias) Use rates of change of variables (impact on error term)

The Consequences This is not a violation of the GM theorem OLS is still BLUE and consistent The standard errors and hypothesis tests are all still valid So what is the problem? – Imprecision

Imprecision Because the variables are correlated the move together Therefore OLS cannot determine the partial effect with much reliability – Difficult to isolated specific effect of one variable when the tend to move together This manifests itself as high standard errors Equivalently the confidence intervals are very wide – Recall: CI=b+/-t*se

Perfect MC Extreme example Attempt a regression where one variable is perfectly correlated with another Standard errors would be infinite because it would be impossible to separate the independent effects of the two variables which move exactly together Stata will spot perfect multicolineararity and drop one variable Can also happen if one variable is linear combination of others

Perfect Multicolinearity gen x2=inc_pc (1 missing value generated). regress price inc_pc hstock_pc x2 Source | SS df MS Number of obs = F( 2, 38) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] inc_pc | hstock_pc | x2 | (dropped) _cons |

Imperfect Multicolinearity If two (or more) x variables are highly correlated – but not perfectly correlated, stata wont drop them but the standard errors will be high The implications – CI wider – more likely to not reject null hypothesis – Variables will appear individually statistically insignificant – But they will be jointly significant (F-test)

Detecting MC Low t-statistics for individual tests of significance High F-statistic for test of joint significance High R2 All these signs suggest that the variables matter collectively but it is difficult to distinguish their individual effects.

An Example regress lnQ lnK lnL Source | SS df MS Number of obs = F( 2, 30) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = lnQ | Coef. Std. Err. t P>|t| [95% Conf. Interval] lnK | lnL | _cons |

The Example Production function example K and L tend to increase over time together Economically they have independent effects But we cannot estimate their separate effects reliably with this data – individually insignificant Nevertheless, K and L matter jointly for output – High R2 – High F statistic: can reject the null of joint insignificance

What to Do about it? Maybe nothing – OLS is still BLUE – Individual estimates imprecise but model could still good at prediction Add more data in the hope of getting more precise estimates – Making use of consistency – The distribution gets narrower as sample size rises Drop variable – Risk omitted variable bias