1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.

Slides:



Advertisements
Similar presentations
CHOW TEST AND DUMMY VARIABLE GROUP TEST
Advertisements

EC220 - Introduction to econometrics (chapter 5)
EC220 - Introduction to econometrics (chapter 10)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
EC220 - Introduction to econometrics (chapter 7)
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
1 ASSUMPTIONS FOR MODEL C: REGRESSIONS WITH TIME SERIES DATA Assumptions C.1, C.3, C.4, C.5, and C.8, and the consequences of their violations are the.
EC220 - Introduction to econometrics (chapter 2)
© Christopher Dougherty 1999–2006 VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE We will now investigate the consequences of misspecifying.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient (2010/2011.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
BINARY CHOICE MODELS: LOGIT ANALYSIS
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: nonlinear regression Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: Chow test Original citation: Dougherty, C. (2012) EC220 - Introduction.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
FIXED EFFECTS REGRESSIONS: WITHIN-GROUPS METHOD The two main approaches to the fitting of models using panel data are known, for reasons that will be explained.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: autocorrelation, partial adjustment, and adaptive expectations Original.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.
THE DUMMY VARIABLE TRAP 1 Suppose that you have a regression model with Y depending on a set of ordinary variables X 2,..., X k and a qualitative variable.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
1 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another variable X, with unknown parameters  1 and  2 that we wish to estimate.
1 We will continue with a variation on the basic model. We will now hypothesize that p is a function of m, the rate of growth of the money supply, as well.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
INSTRUMENTAL VARIABLES 1 Suppose that you have a model in which Y is determined by X but you have reason to believe that Assumption B.7 is invalid and.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
FOOTNOTE: THE COCHRANE–ORCUTT ITERATIVE PROCESS 1 We saw in the previous sequence that AR(1) autocorrelation could be eliminated by a simple manipulation.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for some reason there are no data on X 2.

2 As we have seen, a regression of Y on X 3,..., X k would yield biased estimates of the coefficients and invalid standard errors and tests. PROXY VARIABLES

3 Sometimes, however, these problems can be reduced or eliminated by using a proxy variable in the place of X 2. A proxy variable is one that is hypothesized to be linearly related to the missing variable. In the present example, Z could act as a proxy for X 2. PROXY VARIABLES

4 The validity of the proxy relationship must be justified on the basis of theory, common sense, or experience. It cannot be checked directly because there are no data on X 2. PROXY VARIABLES

5 If a suitable proxy has been identified, the regression model can be rewritten as shown. PROXY VARIABLES

6 We thus obtain a model with all variables observable. If the proxy relationship is an exact one, and we fit this relationship, most of the regression results will be rescued. PROXY VARIABLES

7 The estimates of the coefficients of X 3,..., X k will be the same as those that would have been obtained if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES Comparison of regression with Z instead of X 2 1.b 3,..., b k same

8 The standard errors and t statistics of the coefficients of X 3,..., X k will be the same as those that would have been obtained if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same

9 R 2 will be the same as it would have been if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same

10 The coefficient of Z will be an estimate of  2 , and so it will not be possible to obtain an estimate of  2, unless you are able to guess the value of . PROXY VARIABLES 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known

11 However the t statistic for Z will be the same as that which would have been obtained for X 2 if it had been possible to regress Y on X 2,..., X k, and so you are able to assess the significance of X 2, even if you are not able to estimate its coefficient. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known

12 It will not be possible to obtain an estimate of  1 since the intercept in the revised model is (  1 +  2 ), but usually  1 is of relatively little interest, anyway. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1

13 It is generally more realistic to hypothesize that the relationship between X 2 and Z is approximate, rather than exact. In that case the results listed above will hold approximately. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1 (approximation) (approximations)

14 However, if Z is a poor proxy for X 2, the results will effectively be subject to measurement error (see Chapter 8). Further, it is possible that some of the other X variables will try to act as proxies for X 2, and there will still be a problem of omitted variable bias. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1 (approximations) (approximation)

15 The use of a proxy variable will be illustrated with an educational attainment model. We will suppose that educational attainment depends jointly on cognitive ability and family background. PROXY VARIABLES

16 As usual, ASVABC will be used as the measure of cognitive ability. However, there is no ‘family background’ variable in the data set. Indeed, it is difficult to conceive how such a variable might be defined. PROXY VARIABLES

17 Instead, we will try to find a proxy. One obvious variable is the mother's educational attainment, SM. However, father's educational attainment, SF, may also be relevant. So we will hypothesize that the family background index depends on both. PROXY VARIABLES

18 Thus we obtain a relationship expressing S as a function of ASVABC, SM, and SF. PROXY VARIABLES

. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | Here is the corresponding regression using EAEF Data Set 21. PROXY VARIABLES

. reg S ASVABC Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons | Here is the regression of S on ASVABC alone. PROXY VARIABLES

. reg S ASVABC SM SF S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | A comparison of the regressions indicates that the coefficient of ASVABC is biased upwards if we make no attempt to control for family background. PROXY VARIABLES. reg S ASVABC S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons |

. reg S ASVABC SM SF S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | reg S ASVABC S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons | This is what we should expect. Both SM and SF are likely to have positive effects on educational attainment, and they are both positively correlated with ASVABC.. cor ASVABC SM SF (obs=570) | ASVABC SM SF ASVABC| SM| SF| PROXY VARIABLES

. reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = F( 5, 534) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | LIBRARY | SIBLINGS | _cons | LIBRARY (a dummy variable equal to 1 if anyone in the family owned a library card when the respondent was 14) and SIBLINGS (number of brothers and sisters of the respondent) are two other variables in the data set which might act as proxies for family background. PROXY VARIABLES

. reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = F( 5, 534) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | LIBRARY | SIBLINGS | _cons | The LIBRARY variable was one of three variables included in the National Longitudinal Survey of Youth to help pick up the influence of family background on education. Surprisingly, it has a negative coefficient, but it is not significant. PROXY VARIABLES

. reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = F( 5, 534) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | LIBRARY | SIBLINGS | _cons | There is a tendency for parents who are ambitious for their children to limit their number, so SIBLINGS should be expected to have a negative coefficient. It does, but it is also not significant. PROXY VARIABLES

There are further background variables which may be relevant for educational attainment: faith, ethnicity, and region of residence. These variables are supplied in the data set, but it will be left to you to experiment with them. 26 PROXY VARIABLES. reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = F( 5, 534) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | LIBRARY | SIBLINGS | _cons |

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 6.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics