Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.

Slides:



Advertisements
Similar presentations
CHOW TEST AND DUMMY VARIABLE GROUP TEST
Advertisements

EC220 - Introduction to econometrics (chapter 7)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
EC220 - Introduction to econometrics (chapter 7)
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
MEASUREMENT ERROR 1 In this sequence we will investigate the consequences of measurement errors in the variables in a regression model. To keep the analysis.
EC220 - Introduction to econometrics (chapter 2)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
EC220 - Introduction to econometrics (chapter 1)
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
1 A MONTE CARLO EXPERIMENT In the previous slideshow, we saw that the error term is responsible for the variations of b 2 around its fixed component 
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:
DERIVING LINEAR REGRESSION COEFFICIENTS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: Chow test Original citation: Dougherty, C. (2012) EC220 - Introduction.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
1 In a second variation, we shall consider the model shown above. x is the rate of growth of productivity, assumed to be exogenous. w is now hypothesized.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
1 UNBIASEDNESS AND EFFICIENCY Much of the analysis in this course will be concerned with three properties of estimators: unbiasedness, efficiency, and.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE 1 In this short sequence we shall decompose a random variable X into its fixed and random components.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
CONSEQUENCES OF AUTOCORRELATION
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE 1 This sequence derives an alternative expression for the population variance of a random variable. It provides.
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE
ASYMPTOTIC AND FINITE-SAMPLE DISTRIBUTIONS OF THE IV ESTIMATOR
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
© Christopher Dougherty 1999–2006 The denominator has been rewritten a little more carefully, making it explicit that the summation of the squared deviations.
A.1The model is linear in parameters and correctly specified. PROPERTIES OF THE MULTIPLE REGRESSION COEFFICIENTS 1 Moving from the simple to the multiple.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
1 We will now look at the properties of the OLS regression estimators with the assumptions of Model B. We will do this within the context of the simple.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
1 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another variable X, with unknown parameters  1 and  2 that we wish to estimate.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
1 ASYMPTOTIC PROPERTIES OF ESTIMATORS: THE USE OF SIMULATION In practice we deal with finite samples, not infinite ones. So why should we be interested.
Definition of, the expected value of a function of X : 1 EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE To find the expected value of a function of.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
HETEROSCEDASTICITY 1 This sequence relates to Assumption A.4 of the regression model assumptions and introduces the topic of heteroscedasticity. This relates.
1 INSTRUMENTAL VARIABLE ESTIMATION OF SIMULTANEOUS EQUATIONS In the previous sequence it was asserted that the reduced form equations have two important.
1 HETEROSCEDASTICITY: WEIGHTED AND LOGARITHMIC REGRESSIONS This sequence presents two methods for dealing with the problem of heteroscedasticity. We will.
1 ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION We have seen that the variance of a random variable X is given by the expression above. Variance.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
1 We will illustrate the heteroscedasticity theory with a Monte Carlo simulation. HETEROSCEDASTICITY: MONTE CARLO ILLUSTRATION 1 standard deviation of.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: simple regression model Original citation: Dougherty, C. (2012) EC220.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates of  1 and  2, respectively. In the last sequence we demonstrated that these point estimates are unbiased. PRECISION OF THE REGRESSION COEFFICIENTS probability density function of b 2 22 b2b2 standard deviation of density function of b 2

In this sequence we will see that we can also obtain estimates of the standard deviations of the distributions. These will give some idea of their likely reliability and will provide a basis for tests of hypotheses. 2 PRECISION OF THE REGRESSION COEFFICIENTS probability density function of b 2 22 b2b2 standard deviation of density function of b 2 Simple regression model: Y =  1 +  2 X + u

Expressions (which will not be derived) for the variances of their distributions are shown above. See Box 2.3 in the text for a proof of the expression for the variance of b 2. 3 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

We will focus on the implications of the expression for the variance of b 2. Looking at the numerator, we see that the variance of b 2 is proportional to  u 2. This is as we would expect. The more noise there is in the model, the less precise will be our estimates. 4 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

This is illustrated by the diagrams above. The nonstochastic component of the relationship, Y = X, represented by the dotted line, is the same in both diagrams. 5 PRECISION OF THE REGRESSION COEFFICIENTS Y = X regression line nonstochastic relationship Simple regression model: Y =  1 +  2 X + u

The values of X are the same, and the same random numbers have been used to generate the values of the disturbance term in the 20 observations. 6 PRECISION OF THE REGRESSION COEFFICIENTS regression line nonstochastic relationship Y = X Simple regression model: Y =  1 +  2 X + u

However, in the right-hand diagram the random numbers have been multiplied by a factor of 5. As a consequence, the regression line, the solid line, is a much poorer approximation to the nonstochastic relationship. 7 PRECISION OF THE REGRESSION COEFFICIENTS regression line nonstochastic relationship Y = X Simple regression model: Y =  1 +  2 X + u

Looking at the denominator of the expression for the variance of b 2, the larger is the sum of the squared deviations of X, the smaller is the variance of b 2. 8 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

However, the size of the sum of the squared deviations depends on two factors: the number of observations, and the size of the deviations of X i about its sample mean. To discriminate between them, it is convenient to define the mean square deviation of X, MSD(X). 9 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

From the expression as rewritten, it can be seen that the variance of b 2 is inversely proportional to n, the number of observations in the sample, controlling for MSD(X). The more information you have, the more accurate your estimates are likely to be. 10 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

11 A third implication of the expression is that the variance is inversely proportional to the mean square deviation of X. What is the reason for this? PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

In the diagrams above, the nonstochastic component of the relationship is the same and the same random numbers have been used for the 20 values of the disturbance term. 12 PRECISION OF THE REGRESSION COEFFICIENTS regression line nonstochastic relationship Y = X Simple regression model: Y =  1 +  2 X + u

However, MSD(X) is much smaller in the right-hand diagram because the values of X are much closer together. 13 PRECISION OF THE REGRESSION COEFFICIENTS regression line nonstochastic relationship Y = X Simple regression model: Y =  1 +  2 X + u

Hence in that diagram the position of the regression line is more sensitive to the values of the disturbance term, and as a consequence the regression line is likely to be relatively inaccurate. 14 PRECISION OF THE REGRESSION COEFFICIENTS regression line nonstochastic relationship Y = X Simple regression model: Y =  1 +  2 X + u

The figure shows the distributions of the estimates of b 2 for X = 1, 2,..., 20 and X = 9.1, 9.2,..., 11 in a simulation with 10 million samples. 15 PRECISION OF THE REGRESSION COEFFICIENTS 10 million samples Y = X Simple regression model: Y =  1 +  2 X + u

It confirms that the distribution of the estimates obtained with the high dispersion of X has a much smaller variance than that with the low dispersion of X. 16 PRECISION OF THE REGRESSION COEFFICIENTS Y = X Simple regression model: Y =  1 +  2 X + u 10 million samples

Of course, as can be seen from the variance expressions, it is really the ratio of the MSD(X) to the variance of u which is important, rather than the absolute size of either. 17 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

18 We cannot calculate the variances exactly because we do not know the variance of the disturbance term. However, we can derive an estimator of  u 2 from the residuals. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

19 Clearly the scatter of the residuals around the regression line will reflect the unseen scatter of u about the line Y i =  1 + b 2 X i, although in general the residual and the value of the disturbance term in any given observation are not equal to one another. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

20 One measure of the scatter of the residuals is their mean square error, MSD(e), defined as shown. (Remember that the mean of the OLS residuals is equal to zero). Intuitively this should provide a guide to the variance of u. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

21 Before going any further, ask yourself the following question. Which line is likely to be closer to the points representing the sample of observations on X and Y, the true line Y =  1 +  2 X or the regression line Y = b 1 + b 2 X ? ^ PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

22 The answer is the regression line, because by definition it is drawn in such a way as to minimize the sum of the squares of the distances between it and the observations. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

23 Hence the spread of the residuals will tend to be smaller than the spread of the values of u, and MSD(e) will tend to underestimate  u 2. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

24 Indeed, it can be shown that the expected value of MSD(e), when there is just one explanatory variable, is given by the expression above. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

25 However, it follows that we can obtain an unbiased estimator of  u 2 by multiplying MSD(e) by n / (n – 2). We will denote this s u 2. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

26 We can then obtain estimates of the standard deviations of the distributions of b 1 and b 2 by substituting s u 2 for  u 2 in the variance expressions and taking the square roots. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

27 These are described as the standard errors of b 1 and b 2, ‘estimates of the standard deviations’ being a bit of a mouthful. PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

28 The standard errors of the coefficients always appear as part of the output of a regression. Here is the regression of hourly earnings on years of schooling discussed in a previous slideshow. The standard errors appear in a column to the right of the coefficients.. reg EARNINGS S Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | _cons | PRECISION OF THE REGRESSION COEFFICIENTS

The Gauss–Markov theorem states that, provided that the regression model assumptions are valid, the OLS estimators are BLUE: best (most efficient) linear (functions of the values of Y) unbiased estimators of the parameters. 29 probability density function of b 2 OLS other unbiased estimator 22 b2b2 PRECISION OF THE REGRESSION COEFFICIENTS Simple regression model: Y =  1 +  2 X + u

30 The proof of the theorem is not difficult but it is not high priority and we will take it on trust. See Section 2.7 of the text for a proof for the simple regression model. PRECISION OF THE REGRESSION COEFFICIENTS probability density function of b 2 OLS other unbiased estimator 22 b2b2 Simple regression model: Y =  1 +  2 X + u

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 2.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics