Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.

Slides:



Advertisements
Similar presentations
CHOW TEST AND DUMMY VARIABLE GROUP TEST
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Inference for Regression Today we will talk about the conditions necessary to make valid inference with regression We will also discuss the various types.
Simple Linear Regression and Correlation
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
EC220 - Introduction to econometrics (chapter 7)
INTERPRETATION OF A REGRESSION EQUATION
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 4 This week’s reading: Ch. 1 Today:
EC220 - Introduction to econometrics (chapter 2)
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
© Christopher Dougherty 1999–2006 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We will now investigate the consequences of misspecifying.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Chapter 4 – Nonlinear Models and Transformations of Variables.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Introduction to Linear Regression and Correlation Analysis
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Introduction to Linear Regression
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
© Christopher Dougherty 1999–2006 The denominator has been rewritten a little more carefully, making it explicit that the summation of the squared deviations.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
© Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Chapter 20 Linear and Multiple Regression
Simple Linear Regression
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem

© Christopher Dougherty 1999–2006 MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE EARNINGS EXP S 11 We’ll look at the geometrical interpretation of a multiple regression model with two explanatory variables. Specifically, we will look at an earnings function model where hourly earnings, EARNINGS, depend on years of schooling (highest grade completed), S, and years of work experience, EXP. The model has three dimensions, one each for EARNINGS, S, and EXP. The starting point for investigating the determination of EARNINGS is the intercept,  1. Literally the intercept gives EARNINGS for those respondents who have no schooling and no work experience. However, there were no respondents with less than 6 years of schooling. Hence a literal interpretation of  1 would be unwise. EARNINGS =  1 +  2 S +  3 EXP + u

 1 +  2 S © Christopher Dougherty 1999–2006 EARNINGS EXP The next term on the right side of the equation gives the effect of variations in S. A one year increase in S causes EARNINGS to increase by  2 dollars, holding EXP constant. S 11 pure S effect EARNINGS =  1 +  2 S +  3 EXP + u MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

 1 +  3 EXP © Christopher Dougherty 1999–2006 pure EXP effect S 11 EARNINGS EXP EARNINGS =  1 +  2 S +  3 EXP + u Similarly, the third term gives the effect of variations in EXP. A one year increase in EXP causes earnings to increase by  3 dollars, holding S constant. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP EARNINGS EXP  1 +  2 S combined effect of S and EXP EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S Different combinations of S and EXP give rise to values of EARNINGS which lie on the plane shown in the diagram, defined by EARNINGS =  1 +  2 S +  3 EXP. This is the nonstochastic/deterministic (nonrandom) component of the model. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP  1 +  2 S +  3 EXP + u EARNINGS EXP  1 +  2 S combined effect of S and EXP u EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S The final element of the model is the disturbance term, u. This causes the actual values of EARNINGS to deviate from the plane. In this observation, u happens to have a positive value. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 pure EXP effect pure S effect S 11  1 +  3 EXP  1 +  2 S +  3 EXP  1 +  2 S +  3 EXP + u EARNINGS EXP  1 +  2 S combined effect of S and EXP u A sample consists of a number of observations generated in this way. Note that the interpretation of the model does not depend on whether S and EXP are correlated or not. However we do assume that the effects of S and EXP on EARNINGS are additive. The impact of a difference in S on EARNINGS is not affected by the value of EXP, or vice versa. EARNINGS =  1 +  2 S +  3 EXP + u  1 +  2 S MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 The regression coefficients are derived using the same least squares principle used in simple regression analysis. The fitted value of Y in observation i depends on our choice of b 1, b 2, and b 3. The residual e i in observation i is the difference between the actual and fitted values of Y. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 We define RSS, the sum of the squares of the residuals, and choose b 1, b 2, and b 3 so as to minimize it. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 First we expand RSS as shown, and then we use the first order conditions for minimizing it. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 We thus obtain three equations in three unknowns. Solving for b 1, b 2, and b 3, we obtain the expressions shown above. (The expression for b 3 is the same as that for b 2, with the subscripts 2 and 3 interchanged everywhere.) MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 The expression for b 1 is a straightforward extension of the expression for it in simple regression analysis. However, the expressions for the slope coefficients are considerably more complex than that for the slope coefficient in simple regression analysis. For the general case when there are many explanatory variables, ordinary algebra is inadequate. It is necessary to switch to matrix algebra. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | _cons | Here is the regression output for the earnings function using Data Set 21. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | _cons | It indicates that earnings increase by $2.68 for every extra year of schooling and by $0.56 for every extra year of work experience. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006. reg EARNINGS S EXP Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | _cons | Literally, the intercept indicates that an individual who had no schooling or work experience would have hourly earnings of –$ Obviously, this is impossible. The lowest value of S in the sample was 6. We have obtained a nonsense estimate because we have extrapolated too far from the data range. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

© Christopher Dougherty 1999–2006 GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL Suppose that you were particularly interested in the relationship between EARNINGS and S and wished to represent it graphically, using the sample data. A simple plot would be misleading.

© Christopher Dougherty 1999–2006 Schooling is negatively correlated with work experience. The plot fails to take account of this, and as a consequence the regression line underestimates the impact of schooling on earnings  Omitted Variable Bias (Later, we’ll discuss the mathematical details of this distortion.) To eliminate the distortion, you purge both EARNINGS and S of their components related to EXP and then draw a scatter diagram using the purged variables.. cor S EXP (obs=540) | S ASVABC S| EXP| GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006. reg EARNINGS EXP Source | SS df MS Number of obs = F( 1, 538) = 2.98 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] EXP | _cons | predict EEARN, resid We start by regressing EARNINGS on EXP, as shown above. The residuals are the part of EARNINGS which is not related to EXP. The ‘predict’ command is the Stata command for saving the residuals from the most recent regression. We name them EEARN. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006. reg S EXP Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] EXP | _cons | predict ES, resid We do the same with S. We regress it on EXP and save the residuals as ES. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006 Now we plot EEARN on ES and the scatter is a faithful representation of the relationship, both in terms of the slope of the trend line (the black line) and in terms of the variation about that line. As you would expect, the trend line is steeper than in the scatter diagram which did not control for EXP (reproduced here as the red line). GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006. reg EEARN ES Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] ES | _cons | 8.10e From multiple regression:. reg EARNINGS S EXP EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | _cons | Here is the regression of EEARN on ES. We will content ourselves by verifying that the estimate of the slope coefficient is the same as that from a multiple regression. A mathematical proof that the technique works requires matrix algebra. This result is also called the Frisch-Waugh-Lovell theorem. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006. reg EEARN ES Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EEARN | Coef. Std. Err. t P>|t| [95% Conf. Interval] ES | _cons | 8.10e From multiple regression:. reg EARNINGS S EXP EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | EXP | _cons | Finally, a small and not very important technical point. You may have noticed that the standard error and t statistic do not quite match. The reason for this is that the number of degrees of freedom is overstated by 1 in the residuals regression. That regression has not made allowance for the fact that we have already used up 1 degree of freedom in removing EXP from the model. GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL

© Christopher Dougherty 1999–2006 A.1: The model is linear in parameters and correctly specified. A.2: There does not exist an exact linear relationship among the regressors in the sample. A.3The disturbance term has zero expectation A.4The disturbance term is homoscedastic A.5The values of the disturbance term have independent distributions A.6The disturbance term has a normal distribution PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS Moving from the simple to the multiple regression model, we start by restating the regression model assumptions. Only A.2 is different. Previously it was stated that there must be some variation in the X variable. We will explain the difference in one of the following lectures. Provided that the regression model assumptions are valid, the OLS estimators in the multiple regression model are unbiased and efficient, as in the simple regression model.