Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.

Similar presentations


Presentation on theme: "Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem."— Presentation transcript:

1 Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

2 © Christopher Dougherty 1999–2006 MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE EARNINGS EXP S 11 We’ll look at the geometrical interpretation of a multiple regression model with two explanatory variables. Specifically, we will look at an earnings function model where hourly earnings, EARNINGS, depend on years of schooling (highest grade completed), S, and years of work experience, EXP. The model has three dimensions, one each for EARNINGS, S, and EXP. The starting point for investigating the determination of EARNINGS is the intercept,  1. Literally the intercept gives EARNINGS for those respondents who have no schooling and no work experience. However, there were no respondents with less than 6 years of schooling. Hence a literal interpretation of  1 would be unwise. EARNINGS =  1 +  2 S +  3 EXP + u

3  1 +  2 S © Christopher Dougherty 1999–2006 EARNINGS EXP The next term on the right side of the equation gives the effect of variations in S. A one year increase in S causes EARNINGS to increase by  2 dollars, holding EXP constant. S 11 pure S effect EARNINGS =  1 +  2 S +  3 EXP + u MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

4  1 +  3 EXP © Christopher Dougherty 1999–2006 pure EXP effect S 11 EARNINGS EXP EARNINGS =  1 +  2 S +  3 EXP + u Similarly, the third term gives the effect of variations in EXP. A one year increase in EXP causes earnings to increase by  3 dollars, holding S constant. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

5 © Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP EARNINGS EXP  1 +  2 S combined effect of S and EXP EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S Different combinations of S and EXP give rise to values of EARNINGS which lie on the plane shown in the diagram, defined by EARNINGS =  1 +  2 S +  3 EXP. This is the nonstochastic/deterministic (nonrandom) component of the model. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

6 © Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP  1 +  2 S +  3 EXP + u EARNINGS EXP  1 +  2 S combined effect of S and EXP u EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S The final element of the model is the disturbance term, u. This causes the actual values of EARNINGS to deviate from the plane. In this observation, u happens to have a positive value. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

7 © Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP  1 +  2 S +  3 EXP + u EARNINGS EXP  1 +  2 S combined effect of S and EXP u A sample consists of a number of observations generated in this way. Note that the interpretation of the model does not depend on whether S and EXP are correlated or not. However we do assume that the effects of S and EXP on EARNINGS are additive. The impact of a difference in S on EARNINGS is not affected by the value of EXP, or vice versa. EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

8 © Christopher Dougherty 1999–2006 The regression coefficients are derived using the same least squares principle used in simple regression analysis. The fitted value of Y in observation i depends on our choice of b 1, b 2, and b 3. The residual e i in observation i is the difference between the actual and fitted values of Y. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

9 © Christopher Dougherty 1999–2006 We define RSS, the sum of the squares of the residuals, and choose b 1, b 2, and b 3 so as to minimize it. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

10 © Christopher Dougherty 1999–2006 First we expand RSS as shown, and then we use the first order conditions for minimizing it. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

11 © Christopher Dougherty 1999–2006 We thus obtain three equations in three unknowns. Solving for b 1, b 2, and b 3, we obtain the expressions shown above. (The expression for b 3 is the same as that for b 2, with the subscripts 2 and 3 interchanged everywhere.) MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

12 © Christopher Dougherty 1999–2006 The expression for b 1 is a straightforward extension of the expression for it in simple regression analysis. However, the expressions for the slope coefficients are considerably more complex than that for the slope coefficient in simple regression analysis. For the general case when there are many explanatory variables, ordinary algebra is inadequate. It is necessary to switch to matrix algebra. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

13 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ Here is the regression output for the earnings function using Data Set 21. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

14 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ It indicates that earnings increase by $2.68 for every extra year of schooling and by $0.56 for every extra year of work experience. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

15 © Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010 -------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ Literally, the intercept indicates that an individual who had no schooling or work experience would have hourly earnings of –$26.49. Obviously, this is impossible. The lowest value of S in the sample was 6. We have obtained a nonsense estimate because we have extrapolated too far from the data range. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

16 © Christopher Dougherty 1999–2006 GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL Suppose that you were particularly interested in the relationship between EARNINGS and S and wished to represent it graphically, using the sample data. A simple plot would be misleading.

17 © Christopher Dougherty 1999–2006 Schooling is negatively correlated with work experience. The plot fails to take account of this, and as a consequence the regression line underestimates the impact of schooling on earnings  Omitted Variable Bias (Later, we’ll discuss the mathematical details of this distortion.) To eliminate the distortion, you purge both EARNINGS and S of their components related to EXP and then draw a scatter diagram using the purged variables.. cor S EXP (obs=540) | S ASVABC --------+------------------ S| 1.0000 EXP| -0.2179 1.0000 GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

18 © Christopher Dougherty 1999–2006. reg EARNINGS EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 2.98 Model | 617.717488 1 617.717488 Prob > F = 0.0847 Residual | 111392.514 538 207.049282 R-squared = 0.0055 -------------+------------------------------ Adj R-squared = 0.0037 Total | 112010.231 539 207.811189 Root MSE = 14.389 ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- EXP |.2414715.1398002 1.73 0.085 -.0331497.5160927 _cons | 15.55527 2.442468 6.37 0.000 10.75732 20.35321 ------------------------------------------------------------------------------. predict EEARN, resid We start by regressing EARNINGS on EXP, as shown above. The residuals are the part of EARNINGS which is not related to EXP. The ‘predict’ command is the Stata command for saving the residuals from the most recent regression. We name them EEARN. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

19 © Christopher Dougherty 1999–2006. reg S EXP Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 26.82 Model | 152.160205 1 152.160205 Prob > F = 0.0000 Residual | 3052.82313 538 5.67439243 R-squared = 0.0475 -------------+------------------------------ Adj R-squared = 0.0457 Total | 3204.98333 539 5.94616574 Root MSE = 2.3821 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- EXP | -.1198454.0231436 -5.18 0.000 -.1653083 -.0743826 _cons | 15.69765.4043447 38.82 0.000 14.90337 16.49194 ------------------------------------------------------------------------------. predict ES, resid We do the same with S. We regress it on EXP and save the residuals as ES. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

20 © Christopher Dougherty 1999–2006 Now we plot EEARN on ES and the scatter is a faithful representation of the relationship, both in terms of the slope of the trend line (the black line) and in terms of the variation about that line. As you would expect, the trend line is steeper than in the scatter diagram which did not control for EXP (reproduced here as the red line). GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

21 © Christopher Dougherty 1999–2006. reg EEARN ES Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966 -------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898 ------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ES | 2.678125.2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09.5550284 0.00 1.000 -1.090288 1.090288 ------------------------------------------------------------------------------ From multiple regression:. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ Here is the regression of EEARN on ES. We will content ourselves by verifying that the estimate of the slope coefficient is the same as that from a multiple regression. A mathematical proof that the technique works requires matrix algebra. This result is also called the Frisch-Waugh-Lovell theorem. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

22 © Christopher Dougherty 1999–2006. reg EEARN ES Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 131.63 Model | 21895.9298 1 21895.9298 Prob > F = 0.0000 Residual | 89496.5833 538 166.350527 R-squared = 0.1966 -------------+------------------------------ Adj R-squared = 0.1951 Total | 111392.513 539 206.665145 Root MSE = 12.898 ------------------------------------------------------------------------------ EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ES | 2.678125.2334325 11.47 0.000 2.219574 3.136676 _cons | 8.10e-09.5550284 0.00 1.000 -1.090288 1.090288 ------------------------------------------------------------------------------ From multiple regression:. reg EARNINGS S EXP ------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- S | 2.678125.2336497 11.46 0.000 2.219146 3.137105 EXP |.5624326.1285136 4.38 0.000.3099816.8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213 ------------------------------------------------------------------------------ Finally, a small and not very important technical point. You may have noticed that the standard error and t statistic do not quite match. The reason for this is that the number of degrees of freedom is overstated by 1 in the residuals regression. That regression has not made allowance for the fact that we have already used up 1 degree of freedom in removing EXP from the model. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

23 © Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship among the regressors in the sample. A.3The disturbance term has zero expectation A.4The disturbance term is homoscedastic A.5The values of the disturbance term have independent distributions A.6The disturbance term has a normal distribution PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS Moving from the simple to the multiple regression model, we start by restating the regression model assumptions. Only A.2 is different. Previously it was stated that there must be some variation in the X variable. We will explain the difference in one of the following lectures. Provided that the regression model assumptions are valid, the OLS estimators in the multiple regression model are unbiased and efficient, as in the simple regression model.


Download ppt "Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem."

Similar presentations


Ads by Google