2Topics The form of the equation Assumptions Axis of evil (collinearity, heteroscedasticity and autocorrelation)Model miss-specificationMissing a critical variableIncluding irrelevant variable (s)
3The form of the equation Yt= Dependent variablea1 = Interceptb2= Constant (partial regression coefficient)b3= Constant (partial regression coefficient)X2 = Explanatory variableX3 = Explanatory variableet = Error term
4Partial Correlation (slope) Coefficients B2 measures the change in the mean value of Y per unit change in X2, while holding the value of X3 constant. (Known in calculus as a partial derivative)Y = a +bXdy = b
5Assumptions of MVRX2 and X3 are non-stochastic, that is, their values are fixed in repeated samplingThe error term e has a zero mean value (Σe/N=0)Homoscedasticity, that is the variance of “e”, is constant.No autocorrelation exists between the error term and the explanatory variable.No exact collinearity exist between X2 and X3The error term “e” follows the normal distribution with mean zero and constant variance
6Venn Diagram: Correlation & Coefficients of Determination (R2) YYX2X1X1X2Correlation exists betweenX1 and X2. There is a portion of thevariation of Y that can be attributed toeither oneNo correlation exists betweenX1 and X2. Each variable explainsa portion of the variation of Y
7A special case: Perfect Collinearity X1 X2X2 is a perfect function of X1. Therefore, including X2 would be irrelevant because does not explain any of the variation on Y that is already accounted by X1. The model will not run.
8Consequences of Collinearity Multicollinearity is related to sample-specific issuesLarge variance and standard error of OLS estimatorsWider confidence intervalsInsignificant t ratiosA high R2 but few significant t ratiosOLS estimators and their standard error are very sensitive to small changes in the data; they tend to be unstableWrong signs of regression coefficientsDifficult to determine the contribution of explanatory variables to the R2
11IS BAD IF WE HAVE MULTICOLLINEARITY? If the goal of the study is to use the model to predict or forecast the future mean value of the dependent variable, collinearity may not be a problemIf the goal of the study is not prediction but reliable estimation of the parameters then collinearity is a serious problemSolutions: Dropping variables, acquire more data or a new sample, rethinking the model or transform the form of the variables.
12HeteroscedasticityHeteroscedasticity: The variance of “e” is not constant, therefore, violates the assumption of hemoscedasticity or equal variance.
14What to do when the pattern is not clear ? Run a regression where you regress the residuals or error term on Y.
15LET’S ESTIMATE HETEROSCEDASTICITY Do a regression where the residuals become the dependentVariable and home value the independent variable.
16Consequences of Heteroscedasticity OLS estimators are still linearOLS estimators are still unbiasedBut they no longer have minimum variance. They are not longer BLUETherefore we run the risk of drawing wrong conclusions when doing hypothesis testing (Ho:b=0)Solutions: variable transformation, develop a new model that takes into account no linearity (logarithmic function).
17Testing for Heteroscedasticity Let’s regress the predicted value (Y hat) on the log of the residual(log e2) to see the pattern of heteroscedasticity.Log e2The above pattern shows that our relationships is best described as aLogarithmic function
18AutocorrelationTime-series correlation: The best predictor of sales for the present Christmas season is the previous Christmas seasonSpatial correlation: The best predictor of a home’s value is the value of a home next door or in the same area or neighborhood.The best predictor for a politician, to win an election as an incumbent, is the previous election (ceteris paribus)
19The product of two different error terms Ui and Uj is zero. AutocorrelationGujarati defines autocorrelation as “correlation between members of observations ordered in time [as time- series data] or space as [in cross-sectional data].E (UiUj)=0The product of two different error terms Ui and Uj is zero.Autocorrelation is a model specification error or the regression model is not specified correctly. A variable is missing or has the wrong functional form.
21The Durbin Watson Test (d) of Autocorrelation Values of the dd = 4 (perfect negative correlationd = 2 (no autocorrelation)d = 0 (perfect positive correlation)
22Let’s do a “d” testHere we solved the problem of collinearity, heteroscedasticity andautocorrelation. It cannot get any better than this.
23Model Miss-specification Omitted variable bias or underfitting a model. ThereforeThe omitted variable is correlated with the included variable then the parameters estimated are bias, that is their expected values do not match the true valueThe error variance estimated is biasThe confidence intervals and hypothesis-testing procedures and unreliable.The R2 is also unreliableLet’s run a modelLnVAL = a + bLNTLA + bLNBDR + bLNAGE (true model)LnVAL=a +bLNBDR + LNAGE + e (underfitted)
24Model Miss-specification Irrelevant variable biasThe unnecessary variables has not effect on Y (although R2 may increase).The model still give us unbias and consistent estimates of the coefficientsThe major penalty is that the true parameters are less precise therefore the CI are wider increasing the risk of drawing invalid inference during hypothesis testing (accept the Ho: B=0)Let’s run the following model:LNVALUE=a + bLNTLA+ bLNBTH + bLNBDR + bLNAGE