Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables.

Similar presentations


Presentation on theme: "Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables."— Presentation transcript:

1 Chapter 5: Dummy Variables

2 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables in your regression model. Suppose that you have data on the annual recurrent expenditure, COST, and the number of students enrolled, N, for a sample of secondary schools, of which there are two types: regular and occupational. The occupational schools aim to provide skills for specific occupations and they tend to be relatively expensive to run because they need to maintain specialized workshops.

3 © Christopher Dougherty 1999–2006 Suppose, we want to estimate the cost of running an occupational and a regular school. One way of dealing with the difference in the costs would be to run separate regressions for the two types of schools. However this would have the drawback that you would be potentially running regressions with two small samples instead of one large one, with an adverse effect on the precision of the estimates of the coefficients. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

4 OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 ' +  2 N + u Another way of handling the difference would be to hypothesize that the cost function for occupational schools has an intercept  1 ' that is greater than that for regular schools. Effectively, we are hypothesizing that the annual overhead cost is different for the two types of school, but the marginal cost is the same. The marginal cost assumption is not very plausible and we will relax it in due course. 11 1'1' DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

5 © Christopher Dougherty 1999–2006 Let us define  to be the difference in the intercepts:  =  1 ' –  1. Then  1 ' =  1 +  and we can rewrite the cost function for occupational schools as shown. 1+1+  OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 +  +  2 N + u 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

6 Combined equationCOST =  1 +  OCC +  2 N + u OCC = 0 Regular schoolCOST =  1 +  2 N + u OCC = 1 Occupational schoolCOST =  1 +  +  2 N + u We can now combine the two cost functions by defining a dummy variable OCC that has value 0 for regular schools and 1 for occupational schools. (Dummy variables always have two values, 0 or 1.)  11 1+1+ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

7 © Christopher Dougherty 1999–2006 We will now fit a function of this type using actual data for a sample of 74 secondary schools in Shanghai. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

8 School TypeCOST N OCC 1Occupational345,0006231 2Occupational 537,0006531 3Regular 170,0004000 4Occupational 526.0006631 5Regular100,0005630 6Regular 28,0002360 7Regular 160,0003070 8Occupational 45,0001731 9Occupational 120,0001461 10 Occupational61,000991 The table shows the data for the first 10 schools in the sample. The annual cost is measured in yuan, one yuan being worth about 20 cents U.S. at the time. N is the number of students in the school. OCC is the dummy variable for the type of school. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

9 © Christopher Dougherty 1999–2006. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ We now run the regression of COST on N and OCC, treating OCC just like any other explanatory variable, despite its artificial nature. The Stata output is shown above. We will begin by interpreting the regression coefficients. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

10 © Christopher Dougherty 1999–2006 COST = –34,000 + 133,000OCC + 331N ^ The regression results have been rewritten in equation form. From it we can derive cost functions for the two types of school by setting OCC equal to 0 or 1. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

11 Regular School (OCC = 0) COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N ^ ^ If OCC is equal to 0, we get the equation for regular schools, as shown. It implies that the marginal cost per student per year is 331 yuan and that the annual overhead cost is -34,000 yuan. Obviously having a negative intercept does not make any sense at all and it suggests that the model is misspecified in some way. We will come back to this later. It’s worth noting that its t-statistic indicates that its not significant. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

12 Regular School (OCC = 0) Occupational School (OCC = 1) The coefficient of the dummy variable is an estimate of , the extra annual overhead cost of an occupational school. Putting OCC equal to 1, we estimate the annual overhead cost of an occupational school to be 99,000 yuan. The marginal cost is the same as for regular schools. It must be, given the model specification. COST = –34,000 + 133,000OCC + 331N COST = –34,000 + 331N COST = –34,000 + 133,000 + 331N = 99,000 + 331N ^ ^ ^ DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

13 © Christopher Dougherty 1999–2006 The scatter diagram shows the data and the two cost functions derived from the regression results. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

14 © Christopher Dougherty 1999–2006. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ In addition to the estimates of the coefficients, the regression results will include standard errors and the usual diagnostic statistics. We will perform a t test on the coefficient of the dummy variable. Our null hypothesis is H 0 :  = 0 and our alternative hypothesis is H 1 :  0. In words, our null hypothesis is that there is no difference in the overhead costs of the two types of school. The t statistic is 6.40, so it is rejected at the 0.1% significance level. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

15 © Christopher Dougherty 1999–2006. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ We can perform t tests on the other coefficients in the usual way. The t statistic for the coefficient of N is 8.34, so we conclude that the marginal cost is (very) significantly different from 0. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

16 © Christopher Dougherty 1999–2006. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ In the case of the intercept, the t statistic is –1.43, so we do not reject the null hypothesis H 0 :  1 = 0. Thus one explanation of the nonsensical negative overhead cost of regular schools might be that they do not actually have any overheads and our estimate is a random number. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

17 © Christopher Dougherty 1999–2006. reg COST N OCC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 2, 71) = 56.86 Model | 9.0582e+11 2 4.5291e+11 Prob > F = 0.0000 Residual | 5.6553e+11 71 7.9652e+09 R-squared = 0.6156 ---------+------------------------------ Adj R-squared = 0.6048 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 89248 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 331.4493 39.75844 8.337 0.000 252.1732 410.7254 OCC | 133259.1 20827.59 6.398 0.000 91730.06 174788.1 _cons | -33612.55 23573.47 -1.426 0.158 -80616.71 13391.61 ------------------------------------------------------------------------------ A more realistic version of this hypothesis is that  1 is positive but small (as you can see, the 95 percent confidence interval includes positive values) and the error term is responsible for the negative estimate. As already noted, a further possibility is that the model is misspecified in some way. We will continue to develop the model in the next sequence. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

18 © Christopher Dougherty 1999–2006 DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES Now we’ll study how to extend the dummy variable technique to handle a qualitative explanatory variable which has more than two categories. Previously, we used a dummy variable to differentiate between regular and occupational schools when fitting a cost function. In actual fact there are two types of regular secondary school in Shanghai. There are general schools, which provide the usual academic education, and vocational schools.

19 © Christopher Dougherty 1999–2006 DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES As their name implies, the vocational schools are meant to impart occupational skills as well as give an academic education. However the vocational component of the curriculum is typically quite small and the schools are similar to the general schools. Often they are just general schools with a couple of workshops added. Likewise there are two types of occupational school. There are technical schools training technicians and skilled workers’ schools training craftsmen.

20 © Christopher Dougherty 1999–2006 So now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others. In general it is good practice to select the most normal or basic category as the reference category, if one category is in some sense more normal or basic than the others. In the Shanghai sample it is sensible to choose the general schools as the reference category. They are the most numerous and the other schools are variations of them. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

21 © Christopher Dougherty 1999–2006 Accordingly we will define dummy variables for the other three types. TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation relates to a technical school, 0 otherwise. Similarly we will define dummy variables WORKER and VOC for the skilled workers’ schools and the vocational schools. Each of the dummy variables will have a coefficient which represents the extra overhead costs of the schools, relative to the reference category. Note that you do not include a dummy variable for the reference category, and that is the reason that the reference category is usually described as the omitted category. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

22 © Christopher Dougherty 1999–2006 If an observation relates to a general school, the dummy variables are all 0 and the regression model is reduced to its basic components. COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

23 © Christopher Dougherty 1999–2006 If an observation relates to a technical school, TECH will be equal to 1 and the other dummy variables will be 0. The regression model simplifies as shown. COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; WORKER = VOC = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

24 © Christopher Dougherty 1999–2006 COST =  1  +  T TECH +  W WORKER +  V VOC +  2 N + u General SchoolCOST =  1  +  2 N + u (TECH = WORKER = VOC = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; WORKER = VOC = 0) Skilled Workers’ SchoolCOST = (  1  +  W ) +  2 N + u (WORKER = 1; TECH = VOC = 0) Vocational SchoolCOST = (  1  +  V ) +  2 N + u (VOC = 1; TECH = WORKER = 0) The regression model simplifies in a similar manner in the case of observations relating to skilled workers’ schools and vocational schools. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

25 © Christopher Dougherty 1999–2006 COST N 1+T1+T 1+W1+W 1+V1+V 11 Workers’ Vocational WW VV TT The diagram illustrates the model graphically. The  coefficients are the extra overhead costs of running technical, skilled workers’, and vocational schools, relative to the overhead cost of general schools. Note that we do not make any prior assumption about the size, or even the sign, of the  coefficients. They will be estimated from the sample data. Technical General DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

26 © Christopher Dougherty 1999–2006 School TypeCOST N TECH WORKERVOC 1Technical345,000623100 2Technical 537,000653100 3General 170,000400000 4Workers’ 526.000663010 5General 100,000563000 6Vocational 28,000236001 7Vocational 160,000307001 8Technical 45,000173100 9Technical 120,000146100 10 Workers’ 61,00099010 Here are the data for the first 10 of the 74 schools. Note how the values of the dummy variables TECH, WORKER, and VOC are determined by the type of school in each observation. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

27 © Christopher Dougherty 1999–2006 The scatter diagram shows the data for the entire sample, differentiating by type of school. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

28 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ Here is the Stata output for this regression. The coefficient of N indicates that the marginal cost per student per year is 343 yuan. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

29 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ The coefficients of TECH, WORKER, and VOC are 154,000, 143,000, and 53,000, respectively, and should be interpreted as the additional annual overhead costs, relative to those of general schools. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

30 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ The constant term is –55,000, indicating that the annual overhead cost of a general academic school is –55,000 yuan per year. Obviously this is nonsense and indicates that something is wrong with the model specification. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

31 © Christopher Dougherty 1999–2006 The top line shows the regression result in equation form. We will derive the implicit cost functions for each type of school. ^ COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55,000 + 343N (TECH = WORKER = VOC = 0) DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

32 © Christopher Dougherty 1999–2006 COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55,000 + 343N (TECH = WORKER = VOC = 0) In the case of a general school, the dummy variables are all 0 and the equation reduces to the intercept and the term involving N. The annual marginal cost per student is estimated at 343 yuan. The annual overhead cost per school is estimated at –55,000 yuan. Obviously a negative amount is inconceivable. ^ ^ DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

33 COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55,000 + 343N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55,000 + 154,000 + 343N (TECH = 1; WORKER = VOC = 0) = 99,000 + 343N The extra annual overhead cost for a technical school, relative to a general school, is 154,000 yuan. Hence we derive the implicit cost function for technical schools. ^ ^ ^ DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

34 And similarly the extra overhead costs of skilled workers’ and vocational schools, relative to those of general schools, are 143,000 and 53,000 yuan, respectively. ^ ^ ^ ^ ^ COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55,000 + 343N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55,000 + 154,000 + 343N (TECH = 1; WORKER = VOC = 0) = 99,000 + 343N Skilled Workers’ SchoolCOST= –55,000 + 143,000 + 343N (WORKER = 1; TECH = VOC = 0) = 88,000 + 343N Vocational SchoolCOST= –55,000 + 53,000 + 343N (VOC = 1; TECH = WORKER = 0) = –2,000 + 343N DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

35 © Christopher Dougherty 1999–2006 COST = –55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= –55,000 + 343N (TECH = WORKER = VOC = 0) Technical SchoolCOST= –55,000 + 154,000 + 343N (TECH = 1; WORKER = VOC = 0) = 99,000 + 343N Skilled Workers’ SchoolCOST= –55,000 + 143,000 + 343N (WORKER = 1; TECH = VOC = 0) = 88,000 + 343N Vocational SchoolCOST= –55,000 + 53,000 + 343N (VOC = 1; TECH = WORKER = 0) = –2,000 + 343N Note that in each case the annual marginal cost per student is estimated at 343 yuan. The model specification assumes that this figure does not differ according to type of school. ^ ^ ^ ^ ^ DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

36 © Christopher Dougherty 1999–2006 The four cost functions are illustrated graphically. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

37 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ We can perform t tests on the coefficients in the usual way. The t statistic for N is 8.52, so the marginal cost is (very) significantly different from 0, as we would expect. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

38 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ The t statistic for the technical school dummy is 5.76, indicating the the annual overhead cost of a technical school is (very) significantly greater than that of a general school, again as expected. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

39 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ Similarly for skilled workers’ schools, the t statistic is 5.15, indicating the the annual overhead cost of a skilled workers’ school is (very) significantly greater than that of a general school, again as expected. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

40 . reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ In the case of vocational schools, however, the t statistic is only 1.71, indicating that the overhead cost of such a school is not significantly greater than that of a general school. This is not surprising, given that the vocational schools are not much different from the general schools. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

41 . reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ Note that the null hypotheses for the tests on the coefficients of the dummy variables are than the overhead costs of the other schools are not different from those of the general schools. Finally we will perform an F test of the joint explanatory power of the dummy variables as a group. The null hypothesis is H 0 :  T =  W =  V = 0. The alternative hypothesis is that at least one  is different from 0. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

42 © Christopher Dougherty 1999–2006. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------ The residual sum of squares in the specification including the dummy variables is 5.41×10 11. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

43 © Christopher Dougherty 1999–2006. reg COST N Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000 Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940 ---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 339.0432 49.55144 6.842 0.000 240.2642 437.8222 _cons | 23953.3 27167.96 0.882 0.381 -30205.04 78111.65 ------------------------------------------------------------------------------ The residual sum of squares in the specification excluding the dummy variables is 8.92×10 11. T he reduction in RSS when we include the dummies is therefore (8.92 – 5.41)×10 11. We will check whether this reduction is significant with the usual F test. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

44 © Christopher Dougherty 1999–2006. reg COST N Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000 Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940 ---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 The numerator in the F ratio is the reduction in RSS divided by the cost, which is the 3 degrees of freedom given up when we estimate three additional coefficients (the coefficients of the dummies). DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

45 . reg COST N Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000 Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940 ---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 The denominator is RSS for the specification including the dummy variables, divided by the # degrees of freedom remaining after they have been added. The F ratio is therefore 14.92. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

46 . reg COST N Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 1, 72) = 46.82 Model | 5.7974e+11 1 5.7974e+11 Prob > F = 0.0000 Residual | 8.9160e+11 72 1.2383e+10 R-squared = 0.3940 ---------+------------------------------ Adj R-squared = 0.3856 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 1.1e+05. reg COST N TECH WORKER VOC Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 F tables do not give the critical value for 3 and 69 degrees of freedom, but it must be lower than the critical value with 3 and 60 degrees of freedom. This is 6.17, at the 0.1% significance level. Thus we reject H 0 at a high significance level. This is not exactly surprising since t tests show that TECH and WORKER have highly significant coefficients. DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

47 © Christopher Dougherty 1999–2006 THE EFFECTS OF CHANGING THE REFERENCE CATEGORY So far, we chose general academic schools as the reference (omitted) category and defined dummy variables for the other categories.

48 © Christopher Dougherty 1999–2006 This enabled us to compare the overhead costs of the other schools with those of general schools and to test whether the differences were significant. However, suppose that we were interested in testing whether the overhead costs of skilled workers’ schools were different from those of the other types of school. How could we do this? The simplest solution is to re-run the regression making skilled workers’ schools the reference category. Now we need to define a dummy variable GEN for the general schools instead. THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

49 © Christopher Dougherty 1999–2006 The model is shown in equation form. Note that there is no longer a dummy variable for skilled workers’ schools since they form the reference category. COST =  1  +  T TECH +  V VOC +  G GEN +  2 N + u THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

50 © Christopher Dougherty 1999–2006 In the case of observations relating to skilled workers’ schools, all the dummy variables are 0 and the model simplifies to the intercept and the term involving N. COST =  1  +  T TECH +  V VOC +  G GEN +  2 N + u Skilled Workers' SchoolCOST =  1  +  2 N + u (TECH = VOC = GEN = 0) THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

51 © Christopher Dougherty 1999–2006 In the case of observations relating to technical schools, TECH is equal to 1 and the intercept increases by an amount  T. Note that  T should now be interpreted as the extra overhead cost of a technical school relative to that of a skilled workers’ school. COST =  1  +  T TECH +  V VOC +  G GEN +  2 N + u Skilled Workers' SchoolCOST =  1  +  2 N + u (TECH = VOC = GEN = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; VOC = GEN = 0) THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

52 © Christopher Dougherty 1999–2006 COST =  1  +  T TECH +  V VOC +  G GEN +  2 N + u Skilled Workers' SchoolCOST =  1  +  2 N + u (TECH = VOC = GEN = 0) Technical SchoolCOST = (  1  +  T ) +  2 N + u (TECH = 1; VOC = GEN = 0) Vocational SchoolCOST = (  1  +  V ) +  2 N + u (VOC = 1; TECH = GEN = 0) General SchoolCOST = (  1  +  G ) +  2 N + u (GEN = 1; TECH = VOC = 0) Similarly one can derive the implicit cost functions for vocational and general schools, their  coefficients also being interpreted as their extra overhead costs relative to those of skilled workers’ schools. THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

53 © Christopher Dougherty 1999–2006 This diagram illustrates the model graphically. Note that the  shifts are measured from the line for skilled workers’ schools. COST N 1+T1+T 1+V1+V Technic al Workers’ Vocation al General GG VV TT 11 1+G1+G THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

54 © Christopher Dougherty 1999–2006 Here are the data for the first 10 of the 74 schools with skilled workers’ schools as the reference category. School TypeCOST N TECH VOCGEN 1Technical345,000623100 2Technical 537,000653100 3General 170,000400001 4Workers’ 526.000663000 5General 100,000563001 6Vocational 28,000236010 7Vocational 160,000307010 8Technical 45,000173100 9Technical 120,000146100 10 Workers’ 61,00099000 THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

55 © Christopher Dougherty 1999–2006. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ Here is the Stata output for the regression. We will focus first on the regression coefficients. THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

56 © Christopher Dougherty 1999–2006 The regression result is shown written as an equation. ^ COST = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

57 © Christopher Dougherty 1999–2006 Putting all the dummy variables equal to 0, we obtain the equation for the reference category, the skilled workers’ schools. ^ COST = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N Skilled Workers' SchoolCOST= 88,000 + 343N (TECH = VOC = GEN = 0) ^ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

58 © Christopher Dougherty 1999–2006 Putting TECH equal to 1 and VOC and GEN equal to 0, we obtain the equation for the technical schools. ^ ^ COST = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N Skilled Workers' SchoolCOST= 88,000 + 343N (TECH = VOC = GEN = 0) Technical SchoolCOST= 88,000 + 11,000 + 343N (TECH = 1; VOC = GEN = 0) = 99,000 + 343N ^ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

59 And similarly we obtain the equations for the vocational and general schools, putting VOC and GEN equal to 1 in turn. Note that the cost functions turn out to be exactly the same as when we used general schools as the reference category. ^ ^ ^ ^ COST = 88,000 + 11,000TECH – 90,000VOC – 143,000GEN + 343N Skilled Workers' SchoolCOST= 88,000 + 343N (TECH = VOC = GEN = 0) Technical SchoolCOST= 88,000 + 11,000 + 343N (TECH = 1; VOC = GEN = 0) = 99,000 + 343N Vocational SchoolCOST= 88,000 – 90,000 + 343N (VOC = 1; TECH = GEN = 0) = –2,000 + 343N General SchoolCOST= 88,000 – 143,000 + 343N (GEN = 1; TECH = VOC = 0) = –55,000 + 343N ^ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

60 © Christopher Dougherty 1999–2006 Consequently the scatter diagram with regression lines is exactly the same as before. THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

61 © Christopher Dougherty 1999–2006 The goodness of fit, whether measured by R 2, RSS, or the standard error of the regression (the estimate of the standard deviation of u, here denoted Root MSE), is likewise not affected by the change.. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

62 © Christopher Dougherty 1999–2006 But the t tests are affected. In particular, the meaning of a null hypothesis for a dummy variable coefficient being equal to 0 is different.. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

63 © Christopher Dougherty 1999–2006 For example, the t statistic for the technical school coefficient is for the null hypothesis that the overhead costs of technical schools are the same as those of skilled workers’ schools. The t ratio in question is only 0.35, so the null hypothesis is not rejected.. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

64 © Christopher Dougherty 1999–2006 The t ratio for the coefficient of VOC is –2.65, so one concludes that the overheads of vocational schools are significantly lower than those of skilled workers’ schools, at the 1% significance level.. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

65 © Christopher Dougherty 1999–2006 General schools clearly have lower overhead costs than the skilled workers’ schools, according to the regression.. reg COST N TECH VOC GEN Source | SS df MS Number of obs = 74 ---------+------------------------------ F( 4, 69) = 29.63 Model | 9.2996e+11 4 2.3249e+11 Prob > F = 0.0000 Residual | 5.4138e+11 69 7.8461e+09 R-squared = 0.6320 ---------+------------------------------ Adj R-squared = 0.6107 Total | 1.4713e+12 73 2.0155e+10 Root MSE = 88578 ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

66 © Christopher Dougherty 1999–2006 Note that there are some differences in the standard errors as well. However, the standard error (and t-statistic) of the coefficient of N are unaffected.. reg COST N TECH WORKER VOC ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------. reg COST N TECH VOC GEN ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

67 The one test involving the dummy variables that can be performed with either specification is the test of whether the overhead costs of general schools and skilled workers’ schools are different. The choice of specification can make no difference to the outcome of this test. The only difference is caused by the fact that the regression coefficient has become negative in the second specification. The standard error is the same, so the t statistic has the same absolute magnitude and the outcome of the test must be the same.. reg COST N TECH WORKER VOC ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------. reg COST N TECH VOC GEN ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY

68 However the standard errors of the coefficients of the other dummy variables are slightly larger in the second specification. This is because the skilled workers’ schools are less ‘normal’ or ‘basic’ than the general schools and there are fewer of them in the sample (only 17, as opposed to 28). As a consequence there is less precision in measuring the difference between their costs and those of the other schools than there was when general schools were the reference category.. reg COST N TECH WORKER VOC ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 154110.9 26760.41 5.759 0.000 100725.3 207496.4 WORKER | 143362.4 27852.8 5.147 0.000 87797.57 198927.2 VOC | 53228.64 31061.65 1.714 0.091 -8737.646 115194.9 _cons | -54893.09 26673.08 -2.058 0.043 -108104.4 -1681.748 ------------------------------------------------------------------------------. reg COST N TECH VOC GEN ------------------------------------------------------------------------------ COST | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- N | 342.6335 40.2195 8.519 0.000 262.3978 422.8692 TECH | 10748.51 30524.87 0.352 0.726 -50146.93 71643.95 VOC | -90133.74 33984.22 -2.652 0.010 -157930.4 -22337.07 GEN | -143362.4 27852.8 -5.147 0.000 -198927.2 -87797.57 _cons | 88469.29 28849.56 3.067 0.003 30916.01 146022.6 ------------------------------------------------------------------------------ THE EFFECTS OF CHANGING THE REFERENCE CATEGORY


Download ppt "Chapter 5: Dummy Variables. DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 We’ll now examine how you can include qualitative explanatory variables."

Similar presentations


Ads by Google