Presentation on theme: "Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: goodness of fit Original citation: Dougherty, C. (2012) EC220 - Introduction."— Presentation transcript:
Four useful results: GOODNESS OF FIT 1 This sequence explains measures of goodness of fit in regression analysis. It is convenient to start by demonstrating four useful results. The first is that the mean value of the residuals must be zero.
GOODNESS OF FIT 2 The residual in any observation is given by the difference between the actual and fitted values of Y for that observation. Four useful results:
GOODNESS OF FIT 3 First substitute for the fitted value. Four useful results:
GOODNESS OF FIT 4 Now sum over all the observations. Four useful results:
GOODNESS OF FIT 5 Dividing through by n, we obtain the sample mean of the residuals in terms of the sample means of X and Y and the regression coefficients. Four useful results:
GOODNESS OF FIT 6 If we substitute for b 1, the expression collapses to zero. Four useful results:
GOODNESS OF FIT 7 Next we will demonstrate that the mean of the fitted values of Y is equal to the mean of the actual values of Y. Four useful results:
GOODNESS OF FIT 8 Again, we start with the definition of a residual. Four useful results:
GOODNESS OF FIT 9 Sum over all the observations. Four useful results:
GOODNESS OF FIT 10 Divide through by n. The terms in the equation are the means of the residuals, actual values of Y, and fitted values of Y, respectively. Four useful results:
GOODNESS OF FIT We have just shown that the mean of the residuals is zero. Hence the mean of the fitted values is equal to the mean of the actual values. 11 Four useful results:
GOODNESS OF FIT 12 Next we will demonstrate that the sum of the products of the values of X and the residuals is zero. Four useful results:
GOODNESS OF FIT 13 We start by replacing the residual with its expression in terms of Y and X. Four useful results:
GOODNESS OF FIT 14 We expand the expression. Four useful results:
GOODNESS OF FIT 15 The expression is equal to zero. One way of demonstrating this would be to substitute for b 1 and b 2 and show that all the terms cancel out. Four useful results:
GOODNESS OF FIT 16 A neater way is to recall the first order condition for b 2 when deriving the regression coefficients. You can see that it is exactly what we need. Four useful results:
GOODNESS OF FIT 17 Finally we will demonstrate that the sum of the products of the fitted values of Y and the residuals is zero. Four useful results:
GOODNESS OF FIT 18 We start by substituting for the fitted value of Y. Four useful results:
GOODNESS OF FIT 19 We expand and rearrange. Four useful results:
GOODNESS OF FIT 20 The expression is equal to zero, given the first and third useful results. Four useful results:
GOODNESS OF FIT 21 We now come to the discussion of goodness of fit. One measure of the variation in Y is the sum of its squared deviations around its sample mean, often described as the Total Sum of Squares, TSS.
GOODNESS OF FIT 22 We will decompose TSS using the fact that the actual value of Y in any observationsis equal to the sum of its fitted value and the residual.
GOODNESS OF FIT 23 We substitute for Y i.
GOODNESS OF FIT 24 From the useful results, the mean of the fitted values of Y is equal to the mean of the actual values. Also, the mean of the residuals is zero.
GOODNESS OF FIT 25 Hence we can simplify the expression as shown.
GOODNESS OF FIT 26 We expand the squared terms on the right side of the equation.
GOODNESS OF FIT 27 We expand the third term on the right side of the equation.
GOODNESS OF FIT 28 The last two terms are both zero, given the first and fourth useful results.
GOODNESS OF FIT 29 Thus we have shown that TSS, the total sum of squares of Y can be decomposed into ESS, the ‘explained’ sum of squares, and RSS, the residual (‘unexplained’) sum of squares.
GOODNESS OF FIT The words explained and unexplained were put in quotation marks because the explanation may in fact be false. Y might really depend on some other variable Z, and X might be acting as a proxy for Z. It would be safer to use the expression apparently explained instead of explained. 30
GOODNESS OF FIT 31 The main criterion of goodness of fit, formally described as the coefficient of determination, but usually referred to as R 2, is defined to be the ratio of ESS to TSS, that is, the proportion of the variance of Y explained by the regression equation.
GOODNESS OF FIT 32 Obviously we would like to locate the regression line so as to make the goodness of fit as high as possible, according to this criterion. Does this objective clash with our use of the least squares principle to determine b 1 and b 2 ?
GOODNESS OF FIT 33 Fortunately, there is no clash. To see this, rewrite the expression for R 2 in term of RSS as shown.
GOODNESS OF FIT 34 The OLS regression coefficients are chosen in such a way as to minimize the sum of the squares of the residuals. Thus it automatically follows that they maximize R 2.
GOODNESS OF FIT Another natural criterion of goodness of fit is the correlation between the actual and fitted values of Y. We will demonstrate that this is maximized by using the least squares principle to determine the regression coefficients 35
GOODNESS OF FIT We will start with the numerator and substitute for the actual value of Y, and its mean, in the first factor. 36
GOODNESS OF FIT The mean value of the residuals is zero (first useful result). We rearrange a little. 37
GOODNESS OF FIT We expand the expression The last two terms are both zero (fourth and first useful results). 38
GOODNESS OF FIT Thus the numerator simplifies to the sum of the squared deviations of the fitted values. 39
GOODNESS OF FIT We have the same expression in the denominator, under a square root. Cancelling, we are left with the square root in the numerator. 40
GOODNESS OF FIT 41 Thus the correlation coefficient is the square root of R 2. It follows that it is maximized by the use of the least squares principle to determine the regression coefficients.
Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Sections 1.5 and 1.6 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics