Presentation on theme: "MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis."— Presentation transcript:
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis simple, we will confine it to the simple regression model.
2 We will start with measurement errors in the explanatory variable. Suppose that Y is determined by a variable Z, but Z is subject to measurement error, w. We will denote the measured explanatory variable X. MEASUREMENT ERROR
3 Substituting for Z from the second equation, we can rewrite the model as shown. MEASUREMENT ERROR
4 We are thus able to express Y as a linear function of the observable variable X, with the disturbance term being a compound of the disturbance term in the original model and the measurement error. MEASUREMENT ERROR
5 However if we fit this model using OLS, Assumption B.7 will be violated. X has a random component, the measurement error w. MEASUREMENT ERROR
6 And w is also one of the components of the compound disturbance term. Hence u is not distributed independently of X. MEASUREMENT ERROR
7 We will demonstrate that the OLS estimator of the slope coefficient is inconsistent and that in large samples it is biased downwards if 2 is positive, and upwards if 2 is negative. MEASUREMENT ERROR
8 We begin by writing down the OLS estimator and substituting for Y from the true model. In this case there are alternative versions of the true model. The analysis is simpler if you use the equation relating Y to X. MEASUREMENT ERROR
9 Simplifying, we decompose the slope coefficient into the true value and an error term as usual. MEASUREMENT ERROR
10 We have reached this point many times before. We would like to investigate whether b 2 is biased. This means taking the expectation of the error term. MEASUREMENT ERROR
11 However, it is not possible to obtain a closed-form expression for the expectation of the error term. Both its numerator and its denominator are functions of w and there are no expected value rules that can allow us to simplify. MEASUREMENT ERROR
12 As a second-best measure, we take plims and investigate what would happen in large samples. The plim rules often allow us to obtain analytical results when the expected value rules do not. MEASUREMENT ERROR
13 We focus on the error term. We would like to use the plim quotient rule. The plim of a quotient is the plim of the numerator divided by the plim of the denominator, provided that both of these limits exist. MEASUREMENT ERROR if A and B have probability limits and plim B is not 0.
14 However, as the expression stands, the numerator and the denominator of the error term do not have limits. The denominator increases indefinitely as the sample size increases. The nominator has no particular limit. MEASUREMENT ERROR if A and B have probability limits and plim B is not 0.
15 To deal with this problem, we divide both the numerator and the denominator by n. MEASUREMENT ERROR if A and B have probability limits and plim B is not 0.
16 It can be shown that the limit of the numerator is the covariance of X and u and the limit of the denominator is the variance of X. MEASUREMENT ERROR
17 Hence the numerator and the denominator of the error term have limits and we are entitled to implement the plim quotient rule. We need var(X) to be non-zero, but this will be the case assuming that there is some variation in X. MEASUREMENT ERROR
18 We can decompose both the numerator and the denominator of the error term. We will start by substituting for X and u in the numerator. MEASUREMENT ERROR
19 We expand the expression using the first covariance rule. MEASUREMENT ERROR
20 If we assume that Z, v, and w are distributed indepndently of each other, the first 3 terms are 0. The last term gives us – 2 w 2. MEASUREMENT ERROR
21 We next expand the denominator of the error term. The first two terms are variances. The covariance is 0 if we assume w is distributed independently of Z. MEASUREMENT ERROR
22 Thus in large samples, b 2 is biased towards 0 and the size of the bias depends on the relative sizes of the variances of w and Z. MEASUREMENT ERROR
23 Since b 2 is an inconsistent estimator, it is safe to assume that it is biased in finite samples as well. MEASUREMENT ERROR
24 If our assumptions concerning Z, v, and w are incorrect, b 2 would almost certainly still be an inconsistent estimator, but the expression for the large-sample bias would be more complicated. MEASUREMENT ERROR
25 A further consequence of the violation of Assumption B.7 is that the standard errors, t tests, and F test are invalid. MEASUREMENT ERROR
26 The analysis will be illustrated with a simulation. The true model is Y = 2.0 + 0.8Z + u, with the values of Z drawn randomly from a normal distribution with mean 10 and variance 4, and the values of u being drawn from a normal distribution with mean 0 and variance 4. MEASUREMENT ERROR Simulation
27 X = Z + w, where w is drawn from a normal distribution with mean 0 and variance 1. With this information, we are able to determine plim b 2. MEASUREMENT ERROR Simulation
28 The figure shows the distributions of b 2 for sample size 20 and sample size 1,000, for 10 million samples. For both sample sizes, the distributions reveal that the OLS estimator is biased downwards. MEASUREMENT ERROR 10 million samples
29 Further, the figure suggests that, if the sample size were increased, the distribution would contract to the limiting value of 0.64. MEASUREMENT ERROR 10 million samples
30 There remains the question of whether the limiting value provides guidance to the mean of the distribution for a finite sample. In general, the mean will be different from the limiting value, but will approach it as the sample size increase. MEASUREMENT ERROR 10 million samples
31 In the present case, however, the mean of the sample is almost exactly equal to 0.64, even for sample size 20. MEASUREMENT ERROR 10 million samples
32 Measurement error in the dependent variable has less serious consequences. Suppose that the true dependent variable is Q, that the measured variable is Y, and that the measurement error is r. MEASUREMENT ERROR
33 We can rewrite the model in terms of the observable variables by substituting for Q from the second equation. MEASUREMENT ERROR
34 In this case the presence of the measurement error does not lead to a violation of Assumption B.7. If v satisfies that assumption in the original model, u will satisfy it in the revised one, unless for some strange reason r is not distributed independently of X. MEASUREMENT ERROR
35 The standard errors and tests will remain valid. However the standard errors will tend to be larger than they would have been if there had been no measurement error, reflecting the fact that the variances of the coefficients are larger. MEASUREMENT ERROR
2012.11.12 Copyright Christopher Dougherty 2012. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 8.4 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course EC2020 Elements of Econometrics www.londoninternational.ac.uk/lsewww.londoninternational.ac.uk/lse.