Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.

Similar presentations


Presentation on theme: "1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for."— Presentation transcript:

1 1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for some reason there are no data on X 2.

2 2 As we have seen, a regression of Y on X 3,..., X k would yield biased estimates of the coefficients and invalid standard errors and tests. PROXY VARIABLES

3 3 Sometimes, however, these problems can be reduced or eliminated by using a proxy variable in the place of X 2. A proxy variable is one that is hypothesized to be linearly related to the missing variable. In the present example, Z could act as a proxy for X 2. PROXY VARIABLES

4 4 The validity of the proxy relationship must be justified on the basis of theory, common sense, or experience. It cannot be checked directly because there are no data on X 2. PROXY VARIABLES

5 5 If a suitable proxy has been identified, the regression model can be rewritten as shown. PROXY VARIABLES

6 6 We thus obtain a model with all variables observable. If the proxy relationship is an exact one, and we fit this relationship, most of the regression results will be rescued. PROXY VARIABLES

7 7 The estimates of the coefficients of X 3,..., X k will be the same as those that would have been obtained if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES Comparison of regression with Z instead of X 2 1.b 3,..., b k same

8 8 The standard errors and t statistics of the coefficients of X 3,..., X k will be the same as those that would have been obtained if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same

9 9 R 2 will be the same as it would have been if it had been possible to regress Y on X 2,..., X k. PROXY VARIABLES 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same

10 10 The coefficient of Z will be an estimate of  2 , and so it will not be possible to obtain an estimate of  2, unless you are able to guess the value of . PROXY VARIABLES 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known

11 11 However the t statistic for Z will be the same as that which would have been obtained for X 2 if it had been possible to regress Y on X 2,..., X k, and so you are able to assess the significance of X 2, even if you are not able to estimate its coefficient. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known

12 12 It will not be possible to obtain an estimate of  1 since the intercept in the revised model is (  1 +  2 ), but usually  1 is of relatively little interest, anyway. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1

13 13 It is generally more realistic to hypothesize that the relationship between X 2 and Z is approximate, rather than exact. In that case the results listed above will hold approximately. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1 (approximation) (approximations)

14 14 However, if Z is a poor proxy for X 2, the results will effectively be subject to measurement error (see Chapter 8). Further, it is possible that some of the other X variables will try to act as proxies for X 2, and there will still be a problem of omitted variable bias. PROXY VARIABLES 5.t statistic for Z same as that for X 2 3.R 2 same 2.S.e. and t for b 3,..., b k same Comparison of regression with Z instead of X 2 1.b 3,..., b k same 4.Not possible to obtain an estimate of  2, unless  known 6.Not possible to obtain an estimate of  1 (approximations) (approximation)

15 15 The use of a proxy variable will be illustrated with an educational attainment model. We will suppose that educational attainment depends jointly on cognitive ability and family background. PROXY VARIABLES

16 16 As usual, ASVABC will be used as the measure of cognitive ability. However, there is no ‘family background’ variable in the data set. Indeed, it is difficult to conceive how such a variable might be defined. PROXY VARIABLES

17 17 Instead, we will try to find a proxy. One obvious variable is the mother's educational attainment, SM. However, father's educational attainment, SF, may also be relevant. So we will hypothesize that the family background index depends on both. PROXY VARIABLES

18 18 Thus we obtain a relationship expressing S as a function of ASVABC, SM, and SF. PROXY VARIABLES

19 . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ 19 Here is the corresponding regression using EAEF Data Set 21. PROXY VARIABLES

20 . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------ 20 Here is the regression of S on ASVABC alone. PROXY VARIABLES

21 . reg S ASVABC SM SF ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------ 21 A comparison of the regressions indicates that the coefficient of ASVABC is biased upwards if we make no attempt to control for family background. PROXY VARIABLES. reg S ASVABC ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------

22 . reg S ASVABC SM SF ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1257087.0098533 12.76 0.000.1063528.1450646 SM |.0492424.0390901 1.26 0.208 -.027546.1260309 SF |.1076825.0309522 3.48 0.001.04688.1684851 _cons | 5.370631.4882155 11.00 0.000 4.41158 6.329681 ------------------------------------------------------------------------------. reg S ASVABC ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.148084.0089431 16.56 0.000.1305165.1656516 _cons | 6.066225.4672261 12.98 0.000 5.148413 6.984036 ------------------------------------------------------------------------------ 22 This is what we should expect. Both SM and SF are likely to have positive effects on educational attainment, and they are both positively correlated with ASVABC.. cor ASVABC SM SF (obs=570) | ASVABC SM SF --------+--------------------------- ASVABC| 1.0000 SM| 0.4202 1.0000 SF| 0.4090 0.6241 1.0000 PROXY VARIABLES

23 . reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718 -------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1245327.0099875 12.47 0.000.104913.1441523 SM |.0388414.039969 0.97 0.332 -.0396743.1173571 SF |.1035001.0311842 3.32 0.001.0422413.1647588 LIBRARY | -.0355224.2134634 -0.17 0.868 -.4548534.3838086 SIBLINGS | -.0665348.0408795 -1.63 0.104 -.1468392.0137696 _cons | 5.846517.5681221 10.29 0.000 4.730489 6.962546 ------------------------------------------------------------------------------ 23 LIBRARY (a dummy variable equal to 1 if anyone in the family owned a library card when the respondent was 14) and SIBLINGS (number of brothers and sisters of the respondent) are two other variables in the data set which might act as proxies for family background. PROXY VARIABLES

24 . reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718 -------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1245327.0099875 12.47 0.000.104913.1441523 SM |.0388414.039969 0.97 0.332 -.0396743.1173571 SF |.1035001.0311842 3.32 0.001.0422413.1647588 LIBRARY | -.0355224.2134634 -0.17 0.868 -.4548534.3838086 SIBLINGS | -.0665348.0408795 -1.63 0.104 -.1468392.0137696 _cons | 5.846517.5681221 10.29 0.000 4.730489 6.962546 ------------------------------------------------------------------------------ 24 The LIBRARY variable was one of three variables included in the National Longitudinal Survey of Youth to help pick up the influence of family background on education. Surprisingly, it has a negative coefficient, but it is not significant. PROXY VARIABLES

25 . reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718 -------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1245327.0099875 12.47 0.000.104913.1441523 SM |.0388414.039969 0.97 0.332 -.0396743.1173571 SF |.1035001.0311842 3.32 0.001.0422413.1647588 LIBRARY | -.0355224.2134634 -0.17 0.868 -.4548534.3838086 SIBLINGS | -.0665348.0408795 -1.63 0.104 -.1468392.0137696 _cons | 5.846517.5681221 10.29 0.000 4.730489 6.962546 ------------------------------------------------------------------------------ 25 There is a tendency for parents who are ambitious for their children to limit their number, so SIBLINGS should be expected to have a negative coefficient. It does, but it is also not significant. PROXY VARIABLES

26 There are further background variables which may be relevant for educational attainment: faith, ethnicity, and region of residence. These variables are supplied in the data set, but it will be left to you to experiment with them. 26 PROXY VARIABLES. reg S ASVABC SM SF LIBRARY SIBLINGS Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 5, 534) = 63.21 Model | 1191.57546 5 238.315093 Prob > F = 0.0000 Residual | 2013.40787 534 3.77042672 R-squared = 0.3718 -------------+------------------------------ Adj R-squared = 0.3659 Total | 3204.98333 539 5.94616574 Root MSE = 1.9418 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC |.1245327.0099875 12.47 0.000.104913.1441523 SM |.0388414.039969 0.97 0.332 -.0396743.1173571 SF |.1035001.0311842 3.32 0.001.0422413.1647588 LIBRARY | -.0355224.2134634 -0.17 0.868 -.4548534.3838086 SIBLINGS | -.0665348.0408795 -1.63 0.104 -.1468392.0137696 _cons | 5.846517.5681221 10.29 0.000 4.730489 6.962546 ------------------------------------------------------------------------------

27 Copyright Christopher Dougherty 2012. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 6.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course EC2020 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse. 2012.11.09


Download ppt "1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for."

Similar presentations


Ads by Google