EC220 - Introduction to econometrics (chapter 3)

Slides:



Advertisements
Similar presentations
EC220 - Introduction to econometrics (chapter 2)
Advertisements

EC220 - Introduction to econometrics (chapter 1)
1 Although they are biased in finite samples if Part (2) of Assumption C.7 is violated, OLS estimators are consistent if Part (1) is valid. We will demonstrate.
ADAPTIVE EXPECTATIONS 1 The dynamics in the partial adjustment model are attributable to inertia, the drag of the past. Another, completely opposite, source.
EXPECTED VALUE RULES 1. This sequence states the rules for manipulating expected values. First, the additive rule. The expected value of the sum of two.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: the central limit theorem Original citation: Dougherty, C. (2012)
EC220 - Introduction to econometrics (chapter 2)
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: probability distribution example: x is the sum of two dice Original.
EC220 - Introduction to econometrics (chapter 14)
EC220 - Introduction to econometrics (review chapter)
EC220 - Introduction to econometrics (chapter 11)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: two-stage least squares Original citation: Dougherty, C. (2012) EC220.
EC220 - Introduction to econometrics (review chapter)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: consequences of autocorrelation Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: Ramsey’s reset test of functional misspecification Original citation:
EC220 - Introduction to econometrics (chapter 2)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: f test of a linear restriction Original citation: Dougherty, C. (2012)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 11) Slideshow: model c assumptions Original citation: Dougherty, C. (2012) EC220 -
EC220 - Introduction to econometrics (chapter 8)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: model b: properties of the regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: one-sided t tests Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: one-sided t tests Original citation: Dougherty, C. (2012) EC220.
EC220 - Introduction to econometrics (chapter 1)
THE ERROR CORRECTION MODEL 1 The error correction model is a variant of the partial adjustment model. As with the partial adjustment model, we assume a.
1 MAXIMUM LIKELIHOOD ESTIMATION OF REGRESSION COEFFICIENTS X Y XiXi 11  1  +  2 X i Y =  1  +  2 X We will now apply the maximum likelihood principle.
EC220 - Introduction to econometrics (chapter 6)
EC220 - Introduction to econometrics (chapter 3)
EC220 - Introduction to econometrics (chapter 4)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 7) Slideshow: White test for heteroscedasticity Original citation: Dougherty, C. (2012)
1 This very short sequence presents an important definition, that of the independence of two random variables. Two random variables X and Y are said to.
EC220 - Introduction to econometrics (review chapter)
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: asymptotic properties of estimators: the use of simulation Original.
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: expected value of a random variable Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: population variance of a discreet random variable Original citation:
EC220 - Introduction to econometrics (chapter 5)
CHOW TEST AND DUMMY VARIABLE GROUP TEST
EC220 - Introduction to econometrics (chapter 5)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.7 Original citation: Dougherty, C. (2012) EC220 - Introduction.
EC220 - Introduction to econometrics (chapter 10)
EC220 - Introduction to econometrics (chapter 4)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
EC220 - Introduction to econometrics (chapter 7)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.
EC220 - Introduction to econometrics (chapter 1)
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: Chow test Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Introduction to Econometrics, 5th edition
Presentation transcript:

EC220 - Introduction to econometrics (chapter 3) Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: f tests in a multiple regression model   Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 3). [Teaching Resource] © 2012 The Author This version available at: http://learningresources.lse.ac.uk/129/ Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms. http://creativecommons.org/licenses/by-sa/3.0/   http://learningresources.lse.ac.uk/

F TESTS OF GOODNESS OF FIT at least one This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole. 1

F TESTS OF GOODNESS OF FIT at least one We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all. 2

F TESTS OF GOODNESS OF FIT at least one Of course we hope to reject it and conclude that the model does have some explanatory power. 3

F TESTS OF GOODNESS OF FIT at least one The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients b2, ..., bk are zero. 4

F TESTS OF GOODNESS OF FIT at least one The alternative hypothesis is that at least one of these b coefficients is different from zero. 5

F TESTS OF GOODNESS OF FIT at least one In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually. 6

F TESTS OF GOODNESS OF FIT at least one In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable. 7

F TESTS OF GOODNESS OF FIT at least one The F statistic for the test was defined in the last sequence in Chapter 2. ESS is the explained sum of squares and RSS is the residual sum of squares. 8

F TESTS OF GOODNESS OF FIT at least one It can be expressed in terms of R2 by dividing the numerator and denominator by TSS, the total sum of squares. 9

F TESTS OF GOODNESS OF FIT at least one ESS / TSS is the definition of R2. RSS / TSS is equal to (1 – R2). (See the last sequence in Chapter 2.) 10

F TESTS OF GOODNESS OF FIT The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively. 11

F TESTS OF GOODNESS OF FIT The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero. 12

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Here is the regression output using Data Set 21. 13

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 In this example, k – 1, the number of explanatory variables, is equal to 3 and n – k, the number of degrees of freedom, is equal to 536. 14

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row. 15

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 The denominator is the residual sum of squares divided by the number of degrees of freedom remaining. 16

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Hence the F statistic is 104.3. All serious regression packages compute it for you as part of the diagnostics in the regression output. 17

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is 5.51. Hence we easily reject H0 at the 0.1% level. 18

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 This result could have been anticipated because both ASVABC and SF have highly significant t statistics. So we knew in advance that both b2 and b4 were non-zero. 19

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable. 20

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Then the F statistic should be low enough for H0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients. 21

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R2 is high. You would expect to have a highly significant F statistic. 22

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant. 23

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually. 24

F TESTS OF GOODNESS OF FIT We now come to the other F test of goodness of fit. This is a test of the joint explanatory power of a group of variables when they are added to a regression model. 25

F TESTS OF GOODNESS OF FIT For example, in the original specification, Y may be written as a simple function of X2. In the second, we add X3 and X4. 26

F TESTS OF GOODNESS OF FIT or or both and The null hypothesis for the F test is that neither X3 nor X4 belongs in the model. The alternative hypothesis is that at least one of them does, perhaps both. 27

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining For this F test, and for several others which we will encounter, it is useful to think of the F statistic as having the structure indicated above. 28

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The ‘reduction in RSS’ is the reduction when the change is made, in this case, when the group of new variables is added. 29

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The ‘cost in d.f.’ is the reduction in the number of degrees of freedom remaining after making the change. In the present case it is equal to the number of new variables added, because that number of new parameters are estimated. 30

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining (Remember that the number of degrees of freedom in a regression equation is the number of observations, less the number of parameters estimated. In this example, it would fall from n – 2 to n – 4 when X3 and X4 are added.) 31

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The ‘RSS remaining’ is the residual sum of squares after making the change. 32

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The ‘degrees of freedom remaining’ is the number of degrees of freedom remaining after making the change. 33

F TESTS OF GOODNESS OF FIT . reg S ASVABC Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 1, 538) = 274.19 Model | 1081.97059 1 1081.97059 Prob > F = 0.0000 Residual | 2123.01275 538 3.94612035 R-squared = 0.3376 -------------+------------------------------ Adj R-squared = 0.3364 Total | 3204.98333 539 5.94616574 Root MSE = 1.9865 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .148084 .0089431 16.56 0.000 .1305165 .1656516 _cons | 6.066225 .4672261 12.98 0.000 5.148413 6.984036 We will illustrate the test with an educational attainment example. Here is S regressed on ASVABC using Data Set 21. We make a note of the residual sum of squares. 34

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Now we have added the highest grade completed by each parent. Does parental education have a significant impact? Well, we can see that a t test would show that SF has a highly significant coefficient, but we will perform the F test anyway. We make a note of RSS. 35

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The improvement in the fit on adding the parental variables is the reduction in the residual sum of squares. 36

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The cost is 2 degrees of freedom because 2 additional parameters have been estimated. 37

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The remaining unexplained is the residual sum of squares after adding SM and SF. 38

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The number of degrees of freedom remaining is n – k, that is, 540 – 4 = 536. 39

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The F statistic is 13.16. 40

F TESTS OF GOODNESS OF FIT or or both and F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The critical value of F(2,500) at the 0.1% level is 7.00. The critical value of F(2,536) must be lower, so we reject H0 and conclude that the parental education variables do have significant joint explanatory power. 41

F TESTS OF GOODNESS OF FIT This sequence will conclude by showing that t tests are equivalent to marginal F tests when the additional group of variables consists of just one variable. 42

F TESTS OF GOODNESS OF FIT Suppose that in the original model Y is a function of X2 and X3, and that in the revised model X4 is added. 43

F TESTS OF GOODNESS OF FIT The null hypothesis for the F test of the explanatory power of the additional ‘group’ is that all the new slope coefficients are equal to zero. There is of course only one new slope coefficient, b4. 44

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The F test has the usual structure. We will illustrate it with an educational attainment model where S depends on ASVABC and SM in the original model and on SF as well in the revised model. 45

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 2, 537) = 147.36 Model | 1135.67473 2 567.837363 Prob > F = 0.0000 Residual | 2069.30861 537 3.85346109 R-squared = 0.3543 -------------+------------------------------ Adj R-squared = 0.3519 Total | 3204.98333 539 5.94616574 Root MSE = 1.963 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1328069 .0097389 13.64 0.000 .1136758 .151938 SM | .1235071 .0330837 3.73 0.000 .0585178 .1884963 _cons | 5.420733 .4930224 10.99 0.000 4.452244 6.389222 Here is the regression of S on ASVABC and SM. We make a note of the residual sum of squares. 46

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Now we add SF and again make a note of the residual sum of squares. 47

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The reduction in the residual sum of squares is the reduction on adding SF. 48

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The cost is just the single degree of freedom lost when estimating b4. 49

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The RSS remaining is the residual sum of squares after adding SF. 50

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The number of degrees of freedom remaining after adding SF is 540 – 4 = 536. 51

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining Hence the F statistic is 12.10. 52

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The critical value of F at the 0.1% significance level with 500 degrees of freedom is 10.96. The critical value with 536 degrees of freedom must be lower, so we reject H0 at the 0.1% level. 53

F TESTS OF GOODNESS OF FIT F (cost in d.f., d.f. remaining) = reduction in RSS cost in d.f. RSS remaining degrees of freedom remaining The null hypothesis we are testing is exactly the same as for a two-sided t test on the coefficient of SF. 54

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 We will perform the t test. The t statistic is 3.48. 55

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 The critical value of t at the 0.1% level with 500 degrees of freedom is 3.31. The critical value with 536 degrees of freedom must be lower. So we reject H0 again. 56

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 It can be shown that the F statistic for the F test of the explanatory power of a ‘group’ of one variable must be equal to the square of the t statistic for that variable. (The difference in the last digit is due to rounding error.) 57

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 It can also be shown that the critical value of F must be equal to the square of the critical value of t. (The critical values shown are for 500 degrees of freedom, but this must also be true for 536 degrees of freedom.) 58

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 Hence the conclusions of the two tests must coincide. 59

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 This result means that the t test of the coefficient of a variable is a test of its marginal explanatory power, after all the other variables have been included in the equation. 60

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 If the variable is correlated with one or more of the other variables, its marginal explanatory power may be quite low, even if it genuinely belongs in the model. 61

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 If all the variables are correlated, it is possible for all of them to have low marginal explanatory power and for none of the t tests to be significant, even though the F test for their joint explanatory power is highly significant. 62

F TESTS OF GOODNESS OF FIT . reg S ASVABC SM SF Source | SS df MS Number of obs = 540 -------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686 -------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943 ------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681 If this is the case, the model is said to be suffering from the problem of multicollinearity discussed in the previous sequence. 63

Copyright Christopher Dougherty 2011. These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 3.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre http://www.oup.com/uk/orc/bin/9780199567089/. Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx or the University of London International Programmes distance learning course 20 Elements of Econometrics www.londoninternational.ac.uk/lse. 11.07.25