Presentation is loading. Please wait.

Presentation is loading. Please wait.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: f tests in a multiple regression model Original citation: Dougherty,

Similar presentations


Presentation on theme: "Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: f tests in a multiple regression model Original citation: Dougherty,"— Presentation transcript:

1 Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: f tests in a multiple regression model Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 3). [Teaching Resource] © 2012 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms.

2 F TESTS OF GOODNESS OF FIT 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole. at least one

3 F TESTS OF GOODNESS OF FIT 2 We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all. at least one

4 F TESTS OF GOODNESS OF FIT 3 Of course we hope to reject it and conclude that the model does have some explanatory power. at least one

5 F TESTS OF GOODNESS OF FIT 4 The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients  2,...,  k are zero. at least one

6 F TESTS OF GOODNESS OF FIT 5 The alternative hypothesis is that at least one of these  coefficients is different from zero. at least one

7 F TESTS OF GOODNESS OF FIT 6 In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually. at least one

8 F TESTS OF GOODNESS OF FIT 7 In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable. at least one

9 F TESTS OF GOODNESS OF FIT 8 The F statistic for the test was defined in the last sequence in Chapter 2. ESS is the explained sum of squares and RSS is the residual sum of squares. at least one

10 F TESTS OF GOODNESS OF FIT 9 It can be expressed in terms of R 2 by dividing the numerator and denominator by TSS, the total sum of squares. at least one

11 F TESTS OF GOODNESS OF FIT 10 ESS / TSS is the definition of R 2. RSS / TSS is equal to (1 – R 2 ). (See the last sequence in Chapter 2.) at least one

12 F TESTS OF GOODNESS OF FIT 11 The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively.

13 F TESTS OF GOODNESS OF FIT 12 The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero.

14 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 13 Here is the regression output using Data Set 21.

15 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 14 In this example, k – 1, the number of explanatory variables, is equal to 3 and n – k, the number of degrees of freedom, is equal to 536.

16 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 15 The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row.

17 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 16 The denominator is the residual sum of squares divided by the number of degrees of freedom remaining.

18 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 17 Hence the F statistic is All serious regression packages compute it for you as part of the diagnostics in the regression output.

19 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 18 The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is Hence we easily reject H 0 at the 0.1% level.

20 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 19 This result could have been anticipated because both ASVABC and SF have highly significant t statistics. So we knew in advance that both  2 and  4 were non-zero.

21 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 20 It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable.

22 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 21 Then the F statistic should be low enough for H 0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients.

23 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 22 The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R 2 is high. You would expect to have a highly significant F statistic.

24 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 23 However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant.

25 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 24 In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually.

26 F TESTS OF GOODNESS OF FIT 25 We now come to the other F test of goodness of fit. This is a test of the joint explanatory power of a group of variables when they are added to a regression model.

27 F TESTS OF GOODNESS OF FIT 26 For example, in the original specification, Y may be written as a simple function of X 2. In the second, we add X 3 and X 4.

28 F TESTS OF GOODNESS OF FIT 27 The null hypothesis for the F test is that neither X 3 nor X 4 belongs in the model. The alternative hypothesis is that at least one of them does, perhaps both. or or bothand

29 F TESTS OF GOODNESS OF FIT 28 For this F test, and for several others which we will encounter, it is useful to think of the F statistic as having the structure indicated above. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

30 F TESTS OF GOODNESS OF FIT 29 The ‘reduction in RSS’ is the reduction when the change is made, in this case, when the group of new variables is added. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

31 F TESTS OF GOODNESS OF FIT 30 The ‘cost in d.f.’ is the reduction in the number of degrees of freedom remaining after making the change. In the present case it is equal to the number of new variables added, because that number of new parameters are estimated. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

32 F TESTS OF GOODNESS OF FIT 31 (Remember that the number of degrees of freedom in a regression equation is the number of observations, less the number of parameters estimated. In this example, it would fall from n – 2 to n – 4 when X 3 and X 4 are added.) or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

33 F TESTS OF GOODNESS OF FIT 32 The ‘RSS remaining’ is the residual sum of squares after making the change. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

34 F TESTS OF GOODNESS OF FIT 33 The ‘degrees of freedom remaining’ is the number of degrees of freedom remaining after making the change. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

35 . reg S ASVABC Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | _cons | F TESTS OF GOODNESS OF FIT 34 We will illustrate the test with an educational attainment example. Here is S regressed on ASVABC using Data Set 21. We make a note of the residual sum of squares.

36 . reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | F TESTS OF GOODNESS OF FIT 35 Now we have added the highest grade completed by each parent. Does parental education have a significant impact? Well, we can see that a t test would show that SF has a highly significant coefficient, but we will perform the F test anyway. We make a note of RSS.

37 F TESTS OF GOODNESS OF FIT 36 The improvement in the fit on adding the parental variables is the reduction in the residual sum of squares. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

38 F TESTS OF GOODNESS OF FIT 37 The cost is 2 degrees of freedom because 2 additional parameters have been estimated. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

39 F TESTS OF GOODNESS OF FIT 38 The remaining unexplained is the residual sum of squares after adding SM and SF. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

40 F TESTS OF GOODNESS OF FIT 39 The number of degrees of freedom remaining is n – k, that is, 540 – 4 = 536. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

41 F TESTS OF GOODNESS OF FIT 40 The F statistic is or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

42 F TESTS OF GOODNESS OF FIT 41 The critical value of F(2,500) at the 0.1% level is The critical value of F(2,536) must be lower, so we reject H 0 and conclude that the parental education variables do have significant joint explanatory power. or or bothand F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

43 F TESTS OF GOODNESS OF FIT 42 This sequence will conclude by showing that t tests are equivalent to marginal F tests when the additional group of variables consists of just one variable.

44 F TESTS OF GOODNESS OF FIT 43 Suppose that in the original model Y is a function of X 2 and X 3, and that in the revised model X 4 is added.

45 F TESTS OF GOODNESS OF FIT 44 The null hypothesis for the F test of the explanatory power of the additional ‘group’ is that all the new slope coefficients are equal to zero. There is of course only one new slope coefficient,  4.

46 F TESTS OF GOODNESS OF FIT 45 The F test has the usual structure. We will illustrate it with an educational attainment model where S depends on ASVABC and SM in the original model and on SF as well in the revised model. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

47 . reg S ASVABC SM Source | SS df MS Number of obs = F( 2, 537) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | _cons | F TESTS OF GOODNESS OF FIT 46 Here is the regression of S on ASVABC and SM. We make a note of the residual sum of squares.

48 F TESTS OF GOODNESS OF FIT 47 Now we add SF and again make a note of the residual sum of squares.. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons |

49 F TESTS OF GOODNESS OF FIT 48 The reduction in the residual sum of squares is the reduction on adding SF. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

50 F TESTS OF GOODNESS OF FIT 49 The cost is just the single degree of freedom lost when estimating  4. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

51 F TESTS OF GOODNESS OF FIT 50 The RSS remaining is the residual sum of squares after adding SF. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

52 F TESTS OF GOODNESS OF FIT 51 The number of degrees of freedom remaining after adding SF is 540 – 4 = 536. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

53 F TESTS OF GOODNESS OF FIT 52 Hence the F statistic is F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

54 F TESTS OF GOODNESS OF FIT 53 The critical value of F at the 0.1% significance level with 500 degrees of freedom is The critical value with 536 degrees of freedom must be lower, so we reject H 0 at the 0.1% level. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

55 F TESTS OF GOODNESS OF FIT 54 The null hypothesis we are testing is exactly the same as for a two-sided t test on the coefficient of SF. F (cost in d.f., d.f. remaining) = reduction in RSScost in d.f. RSS remaining degrees of freedom remaining

56 F TESTS OF GOODNESS OF FIT 55 We will perform the t test. The t statistic is reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons |

57 F TESTS OF GOODNESS OF FIT 56. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | The critical value of t at the 0.1% level with 500 degrees of freedom is The critical value with 536 degrees of freedom must be lower. So we reject H 0 again.

58 F TESTS OF GOODNESS OF FIT 57. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | It can be shown that the F statistic for the F test of the explanatory power of a ‘group’ of one variable must be equal to the square of the t statistic for that variable. (The difference in the last digit is due to rounding error.)

59 F TESTS OF GOODNESS OF FIT 58. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | It can also be shown that the critical value of F must be equal to the square of the critical value of t. (The critical values shown are for 500 degrees of freedom, but this must also be true for 536 degrees of freedom.)

60 F TESTS OF GOODNESS OF FIT 59. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | Hence the conclusions of the two tests must coincide.

61 F TESTS OF GOODNESS OF FIT 60. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | This result means that the t test of the coefficient of a variable is a test of its marginal explanatory power, after all the other variables have been included in the equation.

62 F TESTS OF GOODNESS OF FIT 61. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | If the variable is correlated with one or more of the other variables, its marginal explanatory power may be quite low, even if it genuinely belongs in the model.

63 F TESTS OF GOODNESS OF FIT 62. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | If all the variables are correlated, it is possible for all of them to have low marginal explanatory power and for none of the t tests to be significant, even though the F test for their joint explanatory power is highly significant.

64 F TESTS OF GOODNESS OF FIT 63. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | If this is the case, the model is said to be suffering from the problem of multicollinearity discussed in the previous sequence.

65 Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 3.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics


Download ppt "Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: f tests in a multiple regression model Original citation: Dougherty,"

Similar presentations


Ads by Google