Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6: MULTIPLE REGRESSION ANALYSIS

Similar presentations


Presentation on theme: "Chapter 6: MULTIPLE REGRESSION ANALYSIS"— Presentation transcript:

1 Chapter 6: MULTIPLE REGRESSION ANALYSIS
Econometrics Econ. 405 Chapter 6: MULTIPLE REGRESSION ANALYSIS

2 I. Regression Analysis Beyond Simple Models
In reality, economic theory is applied using more than one explanatory variable. Thus, the simple regression model (discussed last chapter) needs to be extended to include more than two variables. Adding more variables into the regression model requires revisiting the Classical Linear Regression Model (CLRM) assumptions.

3 Multiple regression analysis is more appropriate to “ceteris paribus” analysis as it allows for controlling several variables that simultaneously influence the dependent variable. In this case, the general functional form represents the relationship between the DV and the IVs which would build better model for estimating the dependent variable.

4 II. Motivation for Multiple Regression
Incorporate more explanatory factors into the model. Explicitly hold fixed other factors that otherwise would be in the error tern (u). Allow for more flexible functional forms So, the multiple regression would solve problems that cannot be solved by simple regression.

5

6 In Model (1): All factors that could have affected Wage are thrown into the error term (u). Thus (u) would be correlated to “Education” so we should assume that (u) &(X) are uncorrelated (CLRM Assumption # 6). In Model (2): Measuring with confidence the effect of education on wage holding experience fixed

7

8

9

10 III. Features of Multiple Regression

11

12 Properties of OLS Regression
Recall: Simple Regression Model Algebraic properties of OLS regression Fitted or predicted values Deviations from regression line (= residuals) Deviations from regression line sum up to zero Correlation between deviations and regressors is zero Sample averages of y and x lie on regression line

13 Multiple Regression Model
Algebraic properties of OLS regression Fitted or predicted values Deviations from regression line (= residuals) Deviations from regression line sum up to zero Correlations between deviations and regressors are zero Sample averages of y and of the regressors lie on regression line

14 IV. Goodness of Fit (R²) Accordingly:
the Goodness-of-Fit is measures of variation to to show “How well does the explanatory variable explain the dependent variable?“ TSS= total sum of squares ESS= explained sum of squares RSS= residual sum of squares

15 Total sum of squares, represents total variation in dependent variable
Total sum of squares, represents total variation in dependent variable Explained sum of squares, represents variation explained by regression Residual sum of squares, represents variation not explained by regression

16 Total variation Explained part Unexplained part
Total variation Explained part Unexplained part R-squared measures the fraction of the total variation that is explained by the regression

17 The Goodness-of-Fit is measures of variation to show “How well does the explanatory variable explain the dependent variable“, thus under the multiple regression model, it looks like the following:

18 V. Assumptions of the Multiple Regression Model
We continue within the framework of the classical linear regression model (CLRM) and to use the method of ordinary least squares (OLS) to estimate the coefficients. The simplest possible multiple regression model is the three-variable regression, with one DV and two IVs. Accordingly, CLRM consists of the following assumption;

19 E(Yi /X1i ,X2i ) = β0 + β1X1i + β2X2i + ui
Assumptions: 1- linearity : Yi = β0 + β1X1i + β2X2i + ui 2- X values are fixed in repated sample; E(Yi /X1i ,X2i ) = β0 + β1X1i + β2X2i + ui 3- Zero mean value of “ui”: E(ui | X1i , X2i) = 0 for each i 4-No serial correlation (autocorrelation): cov (ui , uj ) = 0 i ≠ j

20 5-Homoscedasticity: var (ui) = σ2 6-Zero covariance between ui and each X variable: cov (ui , X1i) = cov (ui , X2i) = 0 7- Number of observation vs number of parameters: N > # of parameters

21 8- Varilability in Xs values:
No specification bias 9-The regression model is correctly specified: No specification bias 10- No exact collinearity (perfect multicollinearity) between the X variables:

22 Now: Which CLRM assumptions are appropriate for simple regression model and which are appropriate for multiple regression model? What is the key CLRM assumption for multiple regression model?

23 Revisit the 10th Assumption:
There be no exact linear relationship between X1 and X2, i.e., no collinearity or no multicollinearity. Informally, no collinearity means none of the regressors can be written as exact linear combinations of the remaining regressors in the model. Formally, no collinearity means that there exists no set of numbers, λ1 and λ2, not both zero such that; λ1X1i + λ2X2i = 0

24 If such an exact linear relationship exists, then X1 and X2 are said to be collinear or linearly dependent. On the other hand, if last equation holds true only when λ2 = λ3 = 0, then X1 and X2 are said to be linearly independent; X1i = −4X2i or X1i + 4X2i = 0 If the two variables are linearly dependent, and if both are included in a regression model, we will have perfect collinearity or an exact linear relationship between the two regressors.

25 The Importance of Assumptions:

26 Discussion of Assumption (3): E(ui | X1i , X2i) = 0
Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of this assumption. Explanatory variables that are uncorrelated with the error term are called exogenous; this assumption holds if all explanatory variables are exogenous. Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators.

27 The Importance of Assumptions:
Assumption (6): Homoscedasticity; var (ui) = σ2 The value of the explanatory variables must contain no information about the variance of the unobserved factors.

28 VI. Multiple Regression Analysis: Estimation
1- Estimating the error variance: An unbiased estimate of the error variance can be obtained by subtracting the number of estimated regression coefficients (k) from the number of observations (n). (n-k)is also called the degrees of freedom. The (n) estimated squared residuals in the sum are not completely independent but related through the k+1 equations that define the first order conditions of the minimization problem.

29 2- Sampling variances of OLS slope estimators:
Variance of the error term TSSj R-squared from a regression of explanatory variable xj on all other independent variables (including a constant)

30 3- Standard Errors for Regression Coefficients:
The estimated standard deviations of the regression coefficients are called “standard errors“. They measure how precisely the regression coefficients are estimated. Note that these formulas are only valid under CLRM assumptions(in particular, there has to be homoscedasticity) TSSj The estimated sampling variation of the Estimated B

31 The Components of OLS Variances:
1) The error variance A high error variance increases the sampling variance because there is more “ noise“ in the equation. A large error variance necessarily makes estimates imprecise. 2) The total sample variation in the explanatory variable More sample variation leads to more precise estimates. Total sample variation automatically increases with the sample size. Increasing the sample size is thus a way to get more precise estimates.

32 3) Linear relationships among the independent variables
Regress xj on all other independent variables (including a constant). The higher R² of this regression, the more likely xj can be linearly explained by the other independent variables. In such case, sampling variance of ( ) will be the higher, the more likely explanatory variable xj can be linearly explained by other independent variables. The problem of almost linearly dependent explanatory variables is called multicollinearity (next be explained).

33 Example: Multicollinearity
Average standardized test score of school Expenditures for teachers Expenditures for in- structional materials Other expenditures The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything. It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially. Therefore, sampling variance of the estimated effects will be large.

34 Further Discussion of Multicollinearity
According to the example, it would probably be better to lump all expenditure categories together because effects cannot be disentangled. In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias- Discussed Next) Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects may be very precise. Multicollinearity may be detected through a test called “Variance Inflation Factors (VIF)“ (Explained next chapters)

35 The issue of Including and Omitting variables
Case (1): Including irrelevant variables in a regression model: No problem , still estimated coefficients are unbiased because However, including irrevelant variables may increase sampling variance. That would make the OLS estimates will not be “best” ( BLUE), Why??. B3= 0 in the population

36 Example: Case (2): Omitting relevant variables in a regression model
Conclusion: All estimated coefficients will be biased True model (contains x1 and x2) Estimated model (x2 is omitted) If x1 and x2 are correlated, it means a linear regression relationship between them If y is only regressed on x1 this will be the estimated intercept If y is only regressed on x1, this will be the estimated slope on x1 error term

37 The return to education will be overestimated because
Will both be positive The return to education will be overestimated because It will look as if people with many years of education earn very high wages, but this is partly due to the fact that people with more education are also more able on average.

38 Variances in Misspecified Models
The choice of whether to include a particular variable in a regression can be made by analyzing the tradeoff between bias and variance. It might be the case that the likely omitted variable bias in the misspecified model (2) is overcompensated by a smaller variance. True population model Estimated model (1) Estimated model (2)

39 Case (1): Estimater of both Models are Unbiased
Case (2):Estimater of X1 in Model (2) is Biased TSS1 Recall, Conditional on x1 and x2 , the variance in model (2) is always smaller than that in model (1) TSS1 Conclusion: Do not include irrelevant regressors Trade off bias and variance; Caution: bias will not vanish even in large samples

40

41

42 Further in the Interpretations of the estimator


Download ppt "Chapter 6: MULTIPLE REGRESSION ANALYSIS"

Similar presentations


Ads by Google