The Regression Model for Cross-section data : Assumption

The Regression Model for Cross-section data : Assumption

The Regression Model (Cross-section data)
Simple linear regression model The model has to be linear in the parameters (not in the variables) “Explains variable in terms of variable ” Intercept Slope parameter Dependent variable, explained variable, response variable,… Error term, disturbance, unobservables,… Independent variable, explanatory variable, regressor,…

The Regression Model (Cross-section data)
2) the multiple linear regression model The model has more than one “Explains variable in terms of variables ” Intercept Slope parameters Dependent variable, explained variable, response variable,… Error term, disturbance, Unobservables, other factors Independent variables, explanatory variables, regressors,…

Terminology for linear regression

Multiple Regression Analysis: Assumptions
To estimate , the most simplest way is Ordinary Leasts Sqaures(OLS) Before we go to the estimation of the linear regression, let we learn the assumption of the linear regression before estimating the model “Gauss-Markov assumptions” This theory states that the linear regression model in which the errors have expectation zero, are uncorrelated and have equal variance, the Best Linear Unbiased estimator (BLUE) of OLS is exist. 𝛽 These five assumptions lead to unbiasedness of OLS Linear in parameters Random sampling No perfect collinearity ( No this assumption in Simple linear regression) Zero conditional mean Homoskedasticity

Multiple Regression Analysis: Assumptions
If you are Econometrician, you must concern about the Bias. This Bring you to have wrong results. High Bias, your result deviates form the true answer. Low Bias, your result closes to the answer. Remember these things, before we go further!!!!!!! Teach student about upward and downward Bias too.

Assumption of Multiple linear regression (MLR)
1) Assumption MLR.1 (Linear in parameters) In the population, the relationship between y and the explanatory variables is linear 𝛽 0 𝑦 𝑥 1 𝛽 1 𝛽 0 𝑦 𝑥 𝑘 𝛽 k

Assumption of Multiple linear regression (MLR)
2) Assumption MLR.2 (Random sampling) The data is a random sample drawn from the population Each data point therefore follows the population equation

The Simple Regression Model
Discussion of random sampling: Wage and education The population consists, for example, of all workers of country A In the population, a linear relationship between wages (or log wages) and years of education holds Draw completely randomly a worker from the population The wage and the years of education of the worker drawn are random because one does not know beforehand which worker is drawn Throw back worker into population and repeat random draw times The wages and years of education of the sampled workers are used to estimate the linear relationship between wages and education

Multiple Regression Analysis: Assumption
3) Assumption MLR.3 (No perfect collinearity) (This assumption does not hold in Simple linear regression) Remarks on MLR.3 The assumption only rules out perfect collinearity/correlation between explanatory variables; imperfect correlation is allowed If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated. In the sample (and therefore in the population), none of the independent variables is constant 2. there are no exact linear relationships among the independent variables.” General speaking, There are no relationship between independent variables 𝑐𝑜𝑟( 𝑥 1 ,..., 𝑥 𝑘 )→0

An example for multicollinearity Expenditures for in- structional materials Average standardized test score of school Expenditures for teachers Other expenditures The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything. It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially. As a consequence, sampling variance of the estimated effects will be large.

How we fix the multicollinearity problem? In the above example, it would probably be better to sum all expenditure categories together because effects cannot be disentangled Speacial case of linear regression 𝑎𝑣𝑔𝑠𝑐𝑜𝑟𝑒= 𝛽 0 + 𝛽 1 𝑡𝑒𝑎𝑐ℎexp+ 𝛽 2 𝑚𝑎𝑡 exp 2 +𝑢 𝑎𝑣𝑔𝑠𝑐𝑜𝑟𝑒= 𝛽 0 + 𝛽 1 𝑡𝑒𝑎𝑐ℎexp+ 𝛽 2 𝑡𝑒𝑎𝑐ℎ exp 2 +𝑢

In special cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias) Omitted variable bias The bias that arises in the OLS estimators when a relevant variable is omitted from the regression.

Omitting relevant variables: the simple case Including irrelevant variables in a regression model True model (contains x1 and x2) Estimated model (x2 is omitted) = 0 in the population However, including irrevelant variables may increase sampling variance. No problem because

Omitted variable bias Conclusion: All estimated coefficients will be biased If x1 and x2 are correlated, assume a linear regression relationship between them If y is only regressed on x1 this will be the estimated intercept If y is only regressed on x1, this will be the estimated slope on x1 error term

Example: Omitting ability in a wage equation We expect that there parameters are positive The return to education will be overestimated because It will look as if people with many years of education earn very high wages, but this is partly due to the fact that people with more education are also more able on average.

X1 and X2 are positively correlated
Summary Omitted variable bias Bias of 𝛽 1 X1 and X2 are positively correlated X1 and X2 are negatively correlated X2 has a positive effect on Y Positive bias Negative bias X2 has a negative effect on Y

Trade off between Omitted Variable and Multicollinear
If X2 is related to both X1 and Y we need to include X2 if we have data on it because by including X2 we reduce bias due to its omission. The correlation between X2 and X1 referred to multicollinearity which will act to increase the standard errors on the regression coefficient. How we deal with this problem? In this scenario, if Omitted Variable Bias is a real concern, then we need to add X2 and then live with the consequence of Multicollinearity and larger standard errors. Otherwise, live with Omitted Variable Bias problem.

Discussion If X2 is unrelated to X1 but related to y, then you can include X2 in the model, there is no Multicollinear in this case. If X2 is unrelated to both X1 and y, then you are not need to add X2 in the model because it does not relate to your model If X2 is related to X1 but unrelated to y, then you are not need to add X2 in the model because you will face with Multicollinear problem.

4) Assumption MLR.4 (Zero conditional mean) In a multiple regression model, the zero conditional mean assumption is much more likely to hold because fewer things end up in the error This is to say The value of the explanatory variables must contain no information about the mean of the unobserved factors 𝑐𝑜𝑟 𝑥 𝑖 ,𝑢 =0, for 𝑖=1,…,𝑘 The value of the explanatory variable must contain no information about the mean of the unobserved factors (𝑢)

Discussion of the zero mean conditional assumption Explanatory variables (Xi) that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4 Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators

Theorem 3.1 (Unbiasedness of OLS) Interpretation of unbiasedness The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random draw However, on average, they will be equal to the values that charac-terize the true relationship between y and x in the population “On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation was repeated many times In a given sample, estimates may differ considerably from true values

Assumption MLR.5 (Homoskedasticity) Example: Wage equation Short hand notation The value of the explanatory variables must contain no information about the variance of the unobserved factors This assumption may also be hard to justify in many cases All explanatory variables are collected in a random vector with

The spread of the data points
Homoskedasticity If the error variance is not constant, we call Homoskedasticity ^ y + Residual + + + + + + + + + + + ^ y + + + + + + + + + + + + + + + + + + The spread of the data points does not change much.

The spread increases with y
Heteroscedasticity When the requirement of a constant variance is violated we have heteroskedasticity. The plot of the residual Vs. predicted value of Y will exhibit a cone shape. + ^ y Residual + + + + + + + + + + + + + + ^ + + + y + + + + + + + + + The spread increases with y ^

Graphical check for Heteroskedasticity

Heteroskedasticity When the variance of the error term is different for different values of X you have heteroskedasticity. Problem: The OLS estimators for the ’s are no longer minimum variance. You can no longer be sure that the value you get for bi a lies close to the true i.

Assignment 2 𝑘𝑖𝑑 𝑠 𝑖 = 𝛽 0 + 𝛽 1 𝑒𝑑𝑢 𝑐 𝑖 + 𝑢 𝑖
1. Let kids denote the number of children ever born to a woman, and let educ denote years of education for the woman. A simple model relating fertility to years of education is where 𝑢 𝑖 is the error term. What kinds of factors are contained in u? Are these likely to be correlated with level of education? (ii) Will a simple regression analysis uncover the partial effect (assume that the others factor are fixed) of education on the number of children ? Explain. ( hint: use MLR3) 2. Explaine the relationship between the multicollinear and omitted variable bias? 𝑘𝑖𝑑 𝑠 𝑖 = 𝛽 0 + 𝛽 1 𝑒𝑑𝑢 𝑐 𝑖 + 𝑢 𝑖

The Regression Model for Cross-section data : Assumption

Similar presentations

Presentation on theme: "The Regression Model for Cross-section data : Assumption"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Regression Model for Cross-section data : Assumption

Similar presentations

Presentation on theme: "The Regression Model for Cross-section data : Assumption"— Presentation transcript:

Similar presentations

About project

Feedback