Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:

Slides:

Advertisements

Similar presentations

CHOW TEST AND DUMMY VARIABLE GROUP TEST

Advertisements

EC220 - Introduction to econometrics (chapter 5)

EC220 - Introduction to econometrics (chapter 4)

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: slope dummy variables Original citation: Dougherty, C. (2012) EC220 -

Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: a Monte Carlo experiment Original citation: Dougherty, C. (2012) EC220.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)

HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.

EC220 - Introduction to econometrics (chapter 7)

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.

EC220 - Introduction to econometrics (chapter 2)

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: expected value of a function of a random variable Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: variable misspecification iii: consequences for diagnostics Original.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: testing a hypothesis relating to a regression coefficient (2010/2011.

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220.

EC220 - Introduction to econometrics (chapter 1)

1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.

EC220 - Introduction to econometrics (review chapter)

TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.

SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: nonlinear regression Original citation: Dougherty, C. (2012) EC220 -

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: maximum likelihood estimation of regression coefficients Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: Chow test Original citation: Dougherty, C. (2012) EC220 - Introduction.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: autocorrelation, partial adjustment, and adaptive expectations Original.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.

1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: conflicts between unbiasedness and minimum variance Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 8) Slideshow: measurement error Original citation: Dougherty, C. (2012) EC220 - Introduction.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.

1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.

Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 7) Slideshow: weighted least squares and logarithmic regressions Original citation:

1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.

MULTIPLE RESTRICTIONS AND ZERO RESTRICTIONS

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 12) Slideshow: footnote: the Cochrane-Orcutt iterative process Original citation: Dougherty,

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 9) Slideshow: instrumental variable estimation: variation Original citation: Dougherty,

. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |

Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: multiple restrictions and zero restrictions Original citation: Dougherty,

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.

Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.

COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.

Christopher Dougherty EC220 - Introduction to econometrics (review chapter) Slideshow: alternative expression for population variance Original citation:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.

RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.

1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.

1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,

SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.

GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: confidence intervals Original citation: Dougherty, C. (2012) EC220 -

1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.

F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.

WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.

1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.

VARIABLE MISSPECIFICATION II: INCLUSION OF AN IRRELEVANT VARIABLE In this sequence we will investigate the consequences of including an irrelevant variable.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: simple regression model Original citation: Dougherty, C. (2012) EC220.

VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.

Introduction to Econometrics, 5th edition

Introduction to Econometrics, 5th edition

Introduction to Econometrics, 5th edition

Introduction to Econometrics, 5th edition

Presentation transcript:

Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation: Dougherty, C. (2012) EC220 - Introduction to econometrics (chapter 5). [Teaching Resource] © 2012 The Author This version available at: Available in LSE Learning Resources Online: May 2012 This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License. This license allows the user to remix, tweak, and build upon the work even for commercial purposes, as long as the user credits the author and licenses their new creations under the identical terms

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory variable which has more than two categories. COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u 1

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In the previous sequence we used a dummy variable to differentiate between regular and occupational schools when fitting a cost function. 2 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In actual fact there are two types of regular secondary school in Shanghai. There are general schools, which provide the usual academic education, and vocational schools. 3 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES As their name implies, the vocational schools are meant to impart occupational skills as well as give an academic education. 4 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES However the vocational component of the curriculum is typically quite small and the schools are similar to the general schools. Often they are just general schools with a couple of workshops added. 5 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Likewise there are two types of occupational school. There are technical schools training technicians and skilled workers’ schools training craftsmen. 6 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES So now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others. 7 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In general it is good practice to select the most normal or basic category as the reference category, if one category is in some sense more normal or basic than the others. 8 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In the Shanghai sample it is sensible to choose the general schools as the reference category. They are the most numerous and the other schools are variations of them. 9 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Accordingly we will define dummy variables for the other three types. TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation relates to a technical school, 0 otherwise. 10 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Similarly we will define dummy variables WORKER and VOC for the skilled workers’ schools and the vocational schools. 11 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Each of the dummy variables will have a coefficient which represents the extra overhead costs of the schools, relative to the reference category. 12 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Note that you do not include a dummy variable for the reference category, and that is the reason that the reference category is usually described as the omitted category. 13 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES If an observation relates to a general school, the dummy variables are all 0 and the regression model is reduced to its basic components. 14 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0)

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES If an observation relates to a technical school, TECH will be equal to 1 and the other dummy variables will be 0. The regression model simplifies as shown. 15 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; WORKER = VOC = 0)

COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; WORKER = VOC = 0) Skilled Workers’ SchoolCOST = (  1  +  W ) +  2 N + u (WORKER = 1; TECH = VOC = 0) Vocational SchoolCOST = (  1  +  V ) +  2 N + u (VOC = 1; TECH = WORKER = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The regression model simplifies in a similar manner in the case of observations relating to skilled workers’ schools and vocational schools. 16

COST N 1+T1+T 1+W1+W 1+V1+V 11 Workers’ Vocational WW VV TT DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The diagram illustrates the model graphically. The   coefficients are the extra overhead costs of running technical, skilled workers’, and vocational schools, relative to the overhead cost of general schools. 17 Technical General

COST N WW VV TT Note that we do not make any prior assumption about the size, or even the sign, of the  coefficients. They will be estimated from the sample data. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES 18 Workers’ Vocational Technical General 1+T1+T 1+W1+W 1+V1+V 11

School TypeCOST N TECH WORKERVOC 1Technical345, Technical 537, General 170, Workers’ General 100, Vocational 28, Vocational 160, Technical 45, Technical 120, Workers’ 61, Here are the data for the first 10 of the 74 schools. Note how the values of the dummy variables TECH, WORKER, and VOC are determined by the type of school in each observation. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES 19

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The scatter diagram shows the data for the entire sample, differentiating by type of school. 20

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Here is the Stata output for this regression. The coefficient of N indicates that the marginal cost per student per year is 343 yuan. 21

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The coefficients of TECH, WORKER, and VOC are 154,000, 143,000, and 53,000, respectively, and should be interpreted as the additional annual overhead costs, relative to those of general schools. 22

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The constant term is –55,000, indicating that the annual overhead cost of a general academic school is –55,000 yuan per year. Obviously this is nonsense and indicates that something is wrong with the model. 23

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The top line shows the regression result in equation form. We will derive the implicit cost functions for each type of school. ^ 24 COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0)

COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In the case of a general school, the dummy variables are all 0 and the equation reduces to the intercept and the term involving N. ^ ^ 25

COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The annual marginal cost per student is estimated at 343 yuan. The annual overhead cost per school is estimated at –55,000 yuan. Obviously a negative amount is inconceivable. ^ ^ 26

COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55, , N (TECH = 1; WORKER = VOC = 0) = 99, N DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The extra annual overhead cost for a technical school, relative to a general school, is 154,000 yuan. Hence we derive the implicit cost function for technical schools. ^ ^ ^ 27

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES And similarly the extra overhead costs of skilled workers’ and vocational schools, relative to those of general schools, are 143,000 and 53,000 yuan, respectively. ^ ^ ^ ^ ^ 28 COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55, , N (TECH = 1; WORKER = VOC = 0) = 99, N Skilled Workers’ SchoolCOST= –55, , N (WORKER = 1; TECH = VOC = 0) = 88, N Vocational SchoolCOST= –55, , N (VOC = 1; TECH = WORKER = 0) = –2, N

COST = –55, ,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55, N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55, , N (TECH = 1; WORKER = VOC = 0) = 99, N Skilled Workers’ SchoolCOST= –55, , N (WORKER = 1; TECH = VOC = 0) = 88, N Vocational SchoolCOST= –55, , N (VOC = 1; TECH = WORKER = 0) = –2, N DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Note that in each case the annual marginal cost per student is estimated at 343 yuan. The model specification assumes that this figure does not differ according to type of school. ^ ^ ^ ^ ^ 29

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The four cost functions are illustrated graphically. 30

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES We can perform t tests on the coefficients in the usual way. The t statistic for N is 8.52, so the marginal cost is (very) significantly different from 0, as we would expect. 31

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The t statistic for the technical school dummy is 5.76, indicating the the annual overhead cost of a technical school is (very) significantly greater than that of a general school, again as expected. 32

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Similarly for skilled workers’ schools, the t statistic being

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES In the case of vocational schools, however, the t statistic is only 1.71, indicating that the overhead cost of such a school is not significantly greater than that of a general school. 34

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This is not surprising, given that the vocational schools are not much different from the general schools. 35

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Note that the null hypotheses for the tests on the coefficients of the dummy variables are than the overhead costs of the other schools are not different from those of the general schools. 36

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Finally we will perform an F test of the joint explanatory power of the dummy variables as a group. The null hypothesis is H 0 :  T =  W =  V = 0. The alternative hypothesis is that at least one  is different from 0. 37

. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | TECH | WORKER | VOC | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The residual sum of squares in the specification including the dummy variables is 5.41×

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] N | _cons | DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The residual sum of squares in the specification excluding the dummy variables is 8.92×

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES T he reduction in RSS when we include the dummies is therefore (8.92 – 5.41)× We will check whether this reduction is significant with the usual F test. 40

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The numerator in the F ratio is the reduction in RSS divided by the cost, which is the 3 degrees of freedom given up when we estimate three additional coefficients (the coefficients of the dummies). 41

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The denominator is RSS for the specification including the dummy variables, divided by the number of degrees of freedom remaining after they have been added. 42

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES The F ratio is therefore

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES F tables do not give the critical value for 3 and 69 degrees of freedom, but it must be lower than the critical value with 3 and 60 degrees of freedom. This is 6.17, at the 0.1% significance level. 44

. reg COST N Source | SS df MS Number of obs = F( 1, 72) = Model | e e+11 Prob > F = Residual | e e+10 R-squared = Adj R-squared = Total | e e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = F( 4, 69) = Model | e e+11 Prob > F = Residual | e e+09 R-squared = Adj R-squared = Total | e e+10 Root MSE = Thus we reject H 0 at a high significance level. This is not exactly surprising since t tests show that TECH and WORKER have highly significant coefficients. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES 45

Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 5.2 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own and who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course 20 Elements of Econometrics