Sei sulla pagina 1di 53

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

COST

Occupational schools Regular schools

This sequence explains how you can include qualitative explanatory variables in your regression model. Suppose that you have data on the annual recurrent expenditure, COST, and the number of students enrolled, N, for a sample of secondary schools, of which there are two types: regular and occupational. The occupational schools aim to provide skills for specific occupations and they tend to be relatively expensive to run because they need to maintain specialized workshops. One way of dealing with the difference in the costs would be to run separate regressions for the two types of school.

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

COST

Occupational schools Regular schools

However this would have the drawback that you would be running regressions with two small samples instead of one large one, with an adverse effect on the precision of the estimates of the coefficients.

OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + b2N + u COST = b1' + b2N + u

Another way of handling the difference would be to hypothesize that the cost function for occupational schools has an intercept b1' that is greater than that for regular schools.
5

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

b 1'

COST

Occupational schools Regular schools

b1
N

OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + b2N + u COST = b1' + b2N + u

Effectively, we are hypothesizing that the annual overhead cost is different for the two types of school, but the marginal cost is the same. The marginal cost assumption is not very plausible and we will relax it in due course.
7

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

b 1'

COST

Occupational schools Regular schools

b1
N

OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + b2N + u COST = b1' + b2N + u

Let us define d to be the difference in the intercepts: d = b1' b1.

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

b 1+d

COST

Occupational schools Regular schools

b1
N

OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + b2N + u COST = b1 + d + b2N + u

Then b1' = b1 + d and we can rewrite the cost function for occupational schools as shown.

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

b 1+d

COST

Occupational schools Regular schools

b1
N

Combined equation OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + d OCC + b2N + u COST = b1 + d D + b2N + u COST = b1 + b2N + u COST = b1 + d + b2N + u

We can now combine the two cost functions by defining a dummy variable OCC (D) that has value 0 for regular schools and 1 for occupational schools.
10

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

b 1+d

COST

Occupational schools Regular schools

b1
N

Combined equation OCC = 0 Regular school OCC = 1 Occupational school

COST = b1 + d OCC + b2N + u COST = b1 + d D + b2N + u COST = b1 + b2N + u COST = b1 + d + b2N + u

Dummy variables usually have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes that for occupational schools.

11

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES


700000 600000 500000

COST

400000 300000 200000 100000 0 0 200 400 600 800 1000 1200 1400

N
Occupational schools Regular schools

We will now fit a function of this type using actual data for a sample of 74 secondary schools in Shanghai.
12

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

School
1 2

Type
Occupational Occupational

COST
345,000 537,000

N
623 653

OCC
1 1

3
4 5

Regular
Occupational Regular

170,000
526.000 100,000

400
663 563

0
1 0

6
7 8

Regular
Regular Occupational

28,000
160,000 45,000

236
307 173

0
0 1

9
10

Occupational
Occupational

120,000
61,000

146
99

1
1

The table shows the data for the first 10 schools in the sample. The annual cost is measured in yuan. N is the number of students in the school.
13

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

School
1 2

Type
Occupational Occupational

COST
345,000 537,000

N
623 653

OCC
1 1

3
4 5

Regular
Occupational Regular

170,000
526.000 100,000

400
663 563

0
1 0

6
7 8

Regular
Regular Occupational

28,000
160,000 45,000

236
307 173

0
0 1

9
10

Occupational
Occupational

120,000
61,000

146
99

1
1

OCC is the dummy variable for the type of school.

14

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

We now run the regression of COST on N and OCC, treating OCC just like any other explanatory variable, despite its artificial nature.
15

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES COST = 34,000 + 133,000OCC + 331N

(COST = 34,000 + 133,000 D + 331N)

The regression results have been rewritten in equation form. From it we can derive cost functions for the two types of school by setting OCC equal to 0 or 1.
17

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES ^ COST = 34,000 + 133,000OCC + 331N (COST = 34,000 + 133,000 D + 331N)

Regular School (OCC = 0)

^ COST = 34,000 + 331N

If OCC is equal to 0, we get the equation for regular schools, as shown. It implies that the marginal cost per student per year is 331 yuan and that the annual overhead cost is -34,000 yuan.
18

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

COST = 34,000 + 133,000OCC + 331N


(COST = 34,000 + 133,000 D + 331N)

Regular School (OCC = 0)

^ COST = 34,000 + 331N

Obviously having a negative intercept does not make any sense at all and it suggests that the model is misspecified in some way. We will come back to this later.
19

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

COST = 34,000 + 133,000OCC + 331N


(COST = 34,000 + 133,000 D + 331N)

Regular School (OCC = 0)

^ COST = 34,000 + 331N

The coefficient of the dummy variable is an estimate of d, the extra annual overhead cost of an occupational school.
20

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

COST = 34,000 + 133,000OCC + 331N


(COST = 34,000 + 133,000 D + 331N)

Regular School (OCC = 0)

^ COST = 34,000 + 331N

Occupational School (OCC = 1)

^ COST = 34,000 + 133,000 + 331N = 99,000 + 331N

Putting OCC equal to 1, we estimate the annual overhead cost of an occupational school to be 99,000 yuan. The marginal cost is the same as for regular schools. It must be, given the model specification.
21

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

700000 600000 500000 400000

COST

300000 200000 100000 0 0 -100000 200 400 600 800 1000 1200 1400

N
Occupational schools Regular schools

The scatter diagram shows the data and the two cost functions derived from the regression results.
22

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

We will perform a t test on the coefficient of the dummy variable. Our null hypothesis is H0: d = 0 and our alternative hypothesis is H1: d 0.
24

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

In words, our null hypothesis is that there is no difference in the overhead costs of the two types of school. The t statistic is 6.40, so it is rejected at the 0.1% significance level.
25

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

We can perform t tests on the other coefficients in the usual way. The t statistic for the coefficient of N is 8.34, so we conclude that the marginal cost is (very) significantly different from 0.
26

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

In the case of the intercept, the t statistic is 1.43, so we do not reject the null hypothesis H0: b1 = 0.
27

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

Thus one explanation of the nonsensical negative overhead cost of regular schools might be that they do not actually have any overheads and our estimate is a random number.
28

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Model 1: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------const -33612.6 23573.5 -1.426 0.1583 N 331.449 39.7584 8.337 3.97e-012 *** OCC 133259 20827.6 6.398 1.46e-08 *** Mean dependent var Sum squared resid R-squared F(2, 71) Log-likelihood Schwarz criterion 187418.0 5.66e+11 0.615637 56.86072 -947.0092 1906.931 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) Akaike criterion Hannan-Quinn 141969.9 89248.09 0.604810 1.81e-15 1900.018 1902.776

A more realistic version of this hypothesis is that b1 is positive but small (as you can see, the 95 percent confidence interval includes positive values) and the error term is responsible for the negative estimate.
29

DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES

Ct = 1 + 2 Yt + et
D= 1

t= 1929, ., 1970

If characteristic is present

0 If characteristic is not present

1
Dt = 0

If t= 1941, .., 1946


otherwise

Ct = 1 + Dt + 2 Yt + et

t= 1929, ., 1970

E(Ct)

(1 + ) + 2 Yt
=

when Dt = 1 when Dt = 0

1 + 2 Yt

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u ( COST = b1 + dT D1+ dW D2 + dV D3 + b2N + u )


In the previous sequence we used a dummy variable to differentiate between regular and occupational schools when fitting a cost function. In actual fact there are two types of regular secondary school in Shanghai. There are general schools, which provide the usual academic education, and vocational schools. As their name implies, the vocational schools are meant to impart occupational skills as well as give an academic education. However the vocational component of the curriculum is typically quite small and the schools are similar to the general schools. Often they are just general schools with a couple of workshops added. Likewise there are two types of occupational school. There are technical schools training technicians and skilled workers schools training craftsmen. So now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u ( COST = b1 + dT D1+ dW D2 + dV D3 + b2N + u )

In general it is good practice to select the most normal or basic category as the reference category, if one category is in some sense more normal or basic than the others. In the Shanghai sample it is sensible to choose the general schools as the reference category. They are the most numerous and the other schools are variations of them. Accordingly we will define dummy variables for the other three types. TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation relates to a technical school, 0 otherwise.

Similarly we will define dummy variables WORKER and VOC for the skilled workers schools and the vocational schools. Each of the dummy variables will have a coefficient which represents the extra overhead costs of the schools, relative to the reference category.
Note that you do not include a dummy variable for the reference category, and that is the reason that the reference category is usually described as the omitted category.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u ( COST = b1 + dT D1+ dW D2 + dV D3 + b2N + u ) General School
(TECH = WORKER = VOC = 0)

COST = b1 + b2N + u COST = (b1 + dT) + b2N + u COST = (b1 + dW) + b2N + u COST = (b1 + dV) + b2N + u

Technical School
(TECH = 1; WORKER = VOC = 0)

Skilled Workers School


(WORKER = 1; TECH = VOC = 0)

Vocational School
(VOC = 1; TECH = WORKER = 0)

If an observation relates to a general school, the dummy variables are all 0 and the regression model is reduced to its basic components. If an observation relates to a technical school, TECH will be equal to 1 and the other dummy variables will be 0. The regression model simplifies as shown. The regression model simplifies in a similar manner in the case of observations relating to skilled workers schools and vocational schools.
16

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST

Technical

b1+dT b1+dW b1+dV b1

Workers Vocational

dW dV

dT

General

N
The diagram illustrates the model graphically. The d coefficients are the extra overhead costs of running technical, skilled workers, and vocational schools, relative to the overhead cost of general schools.
17

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

COST

Technical

b1+dT b1+dW b1+dV b1

Workers Vocational

dW dV

dT

General

Note that we do not make any prior assumption about the size, or even the sign, of the coefficients. They will be estimated from the sample data.

d
18

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

School
1 2 3 4 5

Type
Technical Technical General Workers General

COST
345,000 537,000 170,000 526.000 100,000

N
623 653 400 663 563

TECH WORKER VOC


1 1 0 0 0 0 0 0 1 0 0 0 0 0 0

6
7 8 9 10

Vocational
Vocational Technical Technical Workers

28,000
160,000 45,000 120,000 61,000

236
307 173 146 99

0
0 1 1 0

0
0 0 0 1

1
1 0 0 0

Here are the data for the first 10 of the 74 schools. Note how the values of the dummy variables TECH, WORKER, and VOC are determined by the type of school in each observation.
19

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


700000 600000 500000

COST

400000 300000 200000 100000 0 0 200 400 600 800 1000 1200 1400

N
Technical schools Vocational schools General schools Workers' schools

The scatter diagram shows the data for the entire sample, differentiating by type of school.

20

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


Model 2: OLS, using observations 1-74 Dependent variable: COST
coefficient std. error t-ratio p-value -------------------------------------------------------------------------------------------------const -54893.1 26673.1 -2.058 0.0434 ** N 342.634 40.2195 8.519 2.25e-012 *** TECH 154111 26760.4 5.759 2.15e-07 *** WORKER 143362 27852.8 5.147 2.38e-06 *** VOC 53228.6 31061.6 1.714 0.0911 * Mean dependent var Sum squared resid R-squared F(4, 69) 187418.0 5.41e+11 0.632050 29.63132 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) 141969.9 88578.37 0.610719 2.39e-14

Here is the output for this regression. The coefficient of N indicates that the marginal cost per student per year is 343 yuan. The coefficients of TECH, WORKER, and VOC are 154,000, 143,000, and 53,000, respectively, and should be interpreted as the additional annual overhead costs, relative to those of general schools. The constant term is 55,000, indicating that the annual overhead cost of a general academic school is 55,000 yuan per year. Obviously this is nonsense and indicates that something is wrong with the model.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N)

The top line shows the regression result in equation form. We will derive the implicit cost functions for each type of school.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N) General School
(TECH = WORKER = VOC = 0)

^ COST = 55,000 + 343N

In the case of a general school, the dummy variables are all 0 and the equation reduces to the intercept and the term involving N.
25

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N) General School
(TECH = WORKER = VOC = 0)

^ COST = 55,000 + 343N

The annual marginal cost per student is estimated at 343 yuan. The annual overhead cost per school is estimated at 55,000 yuan. Obviously a negative amount is inconceivable.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N) General School
(TECH = WORKER = VOC = 0)

^ COST = 55,000 + 343N

Technical School
(TECH = 1; WORKER = VOC = 0)

^ COST = 55,000 + 154,000 + 343N = 99,000 + 343N

The extra annual overhead cost for a technical school, relative to a general school, is 154,000 yuan. Hence we derive the implicit cost function for technical schools.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N) General School
(TECH = WORKER = VOC = 0)

^ COST = 55,000 + 343N

Technical School
(TECH = 1; WORKER = VOC = 0)

^ COST = 55,000 + 154,000 + 343N = 99,000 + 343N ^ COST = 55,000 + 143,000 + 343N = 88,000 + 343N ^ COST = 55,000 + 53,000 + 343N = 2,000 + 343N

Skilled Workers School


(WORKER = 1; TECH = VOC = 0)

Vocational School
(VOC = 1; TECH = WORKER = 0)

And similarly the extra overhead costs of skilled workers and vocational schools, relative to those of general schools, are 143,000 and 53,000 yuan, respectively.
28

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES

^ COST = 55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N (COST = 55,000 + 154,000 D1 + 143,000 D2 + 53,000 D3 + 343N) General School
(TECH = WORKER = VOC = 0)

^ COST = 55,000 + 343N

Technical School
(TECH = 1; WORKER = VOC = 0)

^ COST = 55,000 + 154,000 + 343N = 99,000 + 343N ^ COST = 55,000 + 143,000 + 343N = 88,000 + 343N ^ COST = 55,000 + 53,000 + 343N = 2,000 + 343N

Skilled Workers School


(WORKER = 1; TECH = VOC = 0)

Vocational School
(VOC = 1; TECH = WORKER = 0)

Note that in each case the annual marginal cost per student is estimated at 343 yuan. The model specification assumes that this figure does not differ according to type of school.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


700000 600000 500000 400000

COST

300000 200000 100000 0 0 -100000 200 400 600 800 1000 1200 1400

N
Technical schools Vocational schools General schools Workers' schools

The four cost functions are illustrated graphically.

30

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


Model 2: OLS, using observations 1-74 Dependent variable: COST
coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------------const -54893.1 26673.1 -2.058 0.0434 ** N 342.634 40.2195 8.519 2.25e-012 *** TECH 154111 26760.4 5.759 2.15e-07 *** WORKER 143362 27852.8 5.147 2.38e-06 *** VOC 53228.6 31061.6 1.714 0.0911 * Mean dependent var Sum squared resid R-squared F(4, 69) 187418.0 5.41e+11 0.632050 29.63132 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) 141969.9 88578.37 0.610719 2.39e-14

The t statistic for N is 8.52, so the marginal cost is (very) significantly different from 0 The t statistic for the technical school dummy is 5.76, indicating the the annual overhead cost of a technical school is (very) significantly greater than that of a general school, again as expected. In the case of vocational schools, the t statistic is only 1.71, indicating that the overhead cost of such a school is not significantly greater than that of a general school. This is not surprising, given that the vocational schools are not much different from the general schools.

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


Model 2: OLS, using observations 1-74 Dependent variable: COST
coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------------const -54893.1 26673.1 -2.058 0.0434 ** N 342.634 40.2195 8.519 2.25e-012 *** TECH 154111 26760.4 5.759 2.15e-07 *** WORKER 143362 27852.8 5.147 2.38e-06 *** VOC 53228.6 31061.6 1.714 0.0911 * Mean dependent var Sum squared resid R-squared F(4, 69) 187418.0 5.41e+11 0.632050 29.63132 S.D. dependent var S.E. of regression Adjusted R-squared P-value(F) 141969.9 88578.37 0.610719 2.39e-14

Finally we will perform an F test of the joint explanatory power of the dummy variables as a group. The null hypothesis is H0: dT = dW = dV = 0. The alternative hypothesis is that at least one d is different from 0. The residual sum of squares in the specification including the dummy variables is 5.411011.

37

DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES


Model 3: OLS, using observations 1-74 Dependent variable: COST coefficient std. error t-ratio p-value ----------------------------------------------------------------------------------------------------const 23953.3 27168.0 0.8817 0.3809 N 339.043 49.5514 6.842 2.16e-09 *** Mean dependent var Sum squared resid R-squared F(1, 72) 187418.0 8.92e+11 0.394023 46.81636 S.D. dependent var 141969.9 S.E. of regression 111280.6 Adjusted R-squared 0.385606 P-value(F) 2.16e-09

The residual sum of squares in the specification excluding the dummy variables is 8.921011 The reduction in RSS when we include the dummies is therefore (8.92 5.41)1011. We will check whether this reduction is significant with the usual F test.

(8.92 1011 5.41 1011 ) / 3 F (3,69) 14.92 11 5.41 10 / 69

p-value(F)= 1.38448e-007

The numerator in the F ratio is the reduction in RSS divided by the cost, which is the 3 degrees of freedom given up when we estimate three additional coefficients (the coefficients of the dummies). The denominator is RSS for the specification including the dummy variables, divided by the number of degrees of freedom remaining after they have been added.

TWO SETS OF DUMMY VARIABLES

COST = b1 + d OCC + e RES + b2N + u ( COST = b1 + d D + e E + b2 N + u )


To model the higher overhead costs of residential schools, we introduce a dummy variable RES which is equal to 1 for them and 0 for nonresidential schools. e is the extra annual overhead cost of a residential school, relative to that of a nonresidential one.
We will also make a distinction between occupational and regular schools, using the dummy variable OCC defined in the first sequence. (It would be better to use the four-category school classification, and in practice we would, but it would complicate the graphics.) We will also make a distinction between occupational and regular schools, using the dummy variable OCC defined in the first sequence. (It would be better to use the four-category school classification, and in practice we would, but it would complicate the graphics.)

TWO SETS OF DUMMY VARIABLES COST = b1 + d OCC + e RES + b2N + u ( COST = b1 + d D + e E+ b2 N + u )

Regular, nonresidential
(OCC = RES = 0) (D1=D2=0)

COST = b1 + b2N + u

Regular, residential
(OCC = 0; RES = 1) (D1=0; D2=1)

COST = (b1 + e ) + b2N + u

Occupational, nonresidential
(OCC = 1; RES = 0) (D1=1; D2=0)

COST = (b1 + d ) + b2N + u COST = (b1 + d + e ) + b2N + u

Occupational, residential
(OCC = RES = 1) (D1=1; D2=1)

In the case of a nonresidential occupational school, RES is 0 and OCC is 1, so the overhead cost increases by d. If the school is both occupational and residential, it increases by ( d + e).

TWO SETS OF DUMMY VARIABLES

COST

Occupational, residential Occupational, nonresidential

b 1 +d +e
b 1+d e b 1+e b1

d +e

Regular, residential

Regular, nonresidential

The diagram illustrates the model graphically. Note that the effects of the different components of the model are assumed to be separate and additive in this specification.

TWO SETS OF DUMMY VARIABLES

^ COST = 29,000 + 110,000OCC + 58,000RES + 322N COST = 29,000 + 110,000 D + 58,000 E + 322N Regular, nonresidential
(OCC = RES = 0)

^ COST = 29,000 + 322N

Regular, residential
(OCC = 0; RES = 1)

^ COST = 29,000 + 58,000 + 322N = 29,000 + 322N ^ COST = 29,000 + 110,000 + 322N = 81,000 + 322N ^ COST = 29,000 + 110,000 + 58,000 + 322N = 139,000 + 322N

Occupational, nonresidential
(OCC = 1; RES = 0)

Occupational, residential
(OCC = RES = 1)

The cost functions for nonresidential and residential occupational schools are derived by putting OCC equal to 1 and RES equal to 0 and 1, respectively.

SLOPE DUMMY VARIABLES


700000 600000 500000 400000

COST

300000 200000 100000 0 0 -100000 200 400 600 800 1000 1200 1400

N
Occupational schools Regular schools

Assumption: the marginal cost per student is the same for occupational and regular schools. Hence the cost functions are parallel. However, this is not a realistic assumption. Occupational schools incur expenditure on training materials that is related to the number of students. Looking at the scatter diagram, you can see that the cost function for the occupational schools should be steeper, and that for the regular schools should be flatter. We will relax the assumption of the same marginal cost by introducing what is known as a slope dummy variable. This is NOCC, defined as the product of N and OCC.

SLOPE DUMMY VARIABLES COST = b1 + d OCC + b2N + l(N*OCC) + u COST = b1 + d D + b2N + l(N* D ) + u Regular school
(OCC = NOCC = 0)

COST = b1 + b2N + u

Occupational school
(OCC = 1; NOCC = N)

COST = (b1 + d ) + (b2 + l)N + u

COST b 2 l * OCC N COST b2 N COST b2 l N when OCC 0 OCC 1

when

SLOPE DUMMY VARIABLES COST = b1 + d OCC + b2N + l(N*OCC) + u COST = b1 + d D + b2N + l(N* D) + u Regular school
(OCC = NOCC = 0)

COST = b1 + b2N + u

Occupational school
(OCC = 1; NOCC = N)

COST = (b1 + d ) + (b2 + l)N + u

In the case of an occupational school, OCC is equal to 1 and NOCC is equal to N. The equation simplifies as shown. In the case of a regular school, OCC is 0 and hence so also is NOCC. The model reduces to its basic components. In the case of an occupational school, OCC is equal to 1 and NOCC is equal to N. The equation simplifies as shown. The model now allows the marginal cost per student to be an amount l greater than that in regular schools, as well as allowing the overhead costs to be different.

SLOPE DUMMY VARIABLES

COST

Occupational

l b 1 +d b1 d
Regular

The diagram illustrates the model graphically.

SLOPE DUMMY VARIABLES COST = 51,000 4,000 OCC + 152N + 284(N*OCC) ( COST = 51,000 4,000 D + 152N + 284(N* D) ) ^ COST = 51,000 + 152N ^ COST = 51,000 4,000 + 152N + 284N = 47,000 + 436N

Regular school
(OCC = NOCC = 0)

Occupational school
(OCC = 1; NOCC = N)

Putting OCC, and hence NOCC, equal to 0, we get the cost function for regular schools. We estimate that their annual overhead costs are 51,000 yuan and their annual marginal cost per student is 152 yuan.
Putting OCC equal to 1, and hence NOCC equal to N, we estimate that the annual overhead costs of the occupational schools are 47,000 yuan and the annual marginal cost per student is 436 yuan.

SLOPE DUMMY VARIABLES


700000 600000 500000

COST

400000 300000 200000 100000 0 0 200 400 600 800 1000 1200 1400

N
Occupational schools Regular schools

You can see that the cost functions fit the data much better than before and that the real difference is in the marginal cost, not the overhead cost.

SLOPE DUMMY VARIABLES


700000 600000 500000 400000

COST

300000 200000 100000 0 0 -100000 200 400 600 800 1000 1200 1400

N
Occupational schools Regular schools

Now we can see why we had a nonsensical negative estimate of the overhead cost of a regular school in previous specifications.
The assumption of the same marginal cost led to an estimate of the marginal cost that was a compromise between the marginal costs of occupational and regular schools. The cost function for regular schools was too steep and as a consequence the intercept was underestimated, actually becoming negative and indicating that something must be wrong with the specification of the model.

Potrebbero piacerti anche