Sei sulla pagina 1di 5

Sociology 362

multinomial logit

• interpreting multinomial logistic regressions;


• recovering equations/comparisons not estimated
• probability computations

The data for this exercise comes for the 1991 General Social Survey. The categorical
dependent variable occ is coded as follows:

occ=0 if a workers occupation is laborer, operative or craft;


occ=1 if occupation is clerical, sales, or service;
occ=2 if occupation is managerial, technical, or professional.

The independent variables are : educ is years of schooling; age is age in years; sexx
is coded 1 male, 0 female; rural is coded 1 if grew up in rural area, 0 otherwise.

1. tab occ

occ | Freq. Percent Cum.


------------+-----------------------------------
0 | 172 27.17 27.17
1 | 248 39.18 66.35
2 | 213 33.65 100.00
------------+-----------------------------------
Total | 633 100.00

Let’s begin with the null model with no regressors:

2. mlogit occ,base(0)

Iteration 0: log likelihood = -688.49317

Multinomial regression Number of obs = 633


LR chi2(0) = 0.00
Prob > chi2 = .
Log likelihood = -688.49317 Pseudo R2 = 0.0000

------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
_cons | .3659343 .0992281 3.688 0.000 .1714508 .5604177
---------+--------------------------------------------------------------------
2 |
_cons | .2137977 .1025124 2.086 0.037 .0128771 .4147183
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)

The coefficients above are on the logodds scale. In particular, they are the log odds
of being in occupation 1 versus 0 and 2 versus 0. Hence, they should equal the
following:

and the same for category 2: ln(231/172)=.2137977.


Now let’s add education to the model:

3. mlogit occ educ,base(0)

Iteration 0: log likelihood = -688.49317


Iteration 1: log likelihood = -578.97699
Iteration 2: log likelihood = -568.79391
Iteration 3: log likelihood = -568.46166
Iteration 4: log likelihood = -568.4611

Multinomial regression Number of obs = 633


LR chi2(2) = 240.06
Prob > chi2 = 0.0000
Log likelihood = -568.4611 Pseudo R2 = 0.1743

------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
educ | .2175129 .0495753 4.388 0.000 .120347 .3146788
_cons | -2.341483 .6221847 -3.763 0.000 -3.560943 -1.122024
---------+--------------------------------------------------------------------
2 |
educ | .7404903 .0630034 11.753 0.000 .6170059 .8639747
_cons | -9.937645 .8608307 -11.544 0.000 -11.62484 -8.250448
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)

To get the coefficients on the odds ratio scale we just add the option ,rrr like so:

4. mlogit occ educ,base(0) rrr

Iteration 0: log likelihood = -688.49317


Iteration 1: log likelihood = -578.97699
Iteration 2: log likelihood = -568.79391
Iteration 3: log likelihood = -568.46166
Iteration 4: log likelihood = -568.4611

Multinomial regression Number of obs = 633


LR chi2(2) = 240.06
Prob > chi2 = 0.0000
Log likelihood = -568.4611 Pseudo R2 = 0.1743

------------------------------------------------------------------------------
occ | RRR Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 |
educ | 1.242981 .0616212 4.388 0.000 1.127888 1.369819
---------+--------------------------------------------------------------------
2 |
educ | 2.096963 .1321158 11.753 0.000 1.853371 2.372572
------------------------------------------------------------------------------
(Outcome occ==0 is the comparison group)
The interpretation of the odds ratio is analogous to logistic regression. Hence, for
category 1, exp(.2175129)= 1.242981, and similarly for category 2. This means that one
additional year of schooling multiplies the odds of being in occupation 1 rather than
0 by 1.2429, i.e., one year of schooling increases the odds of being in category 1
instead of 0 by about 24%. Similarly, the odds of being in category 2 instead of 0
are more than doubled (2.09) for each one year increase in schooling.

Recovering coefficients not estimated


What is the effect of one additional year of schooling on the odds of being in
occupation 2 rather than 1? This comparison was not estimated because occupation 0
was chosen as the base category. Still, we can recover the relevant coefficients from
those that were reported above. For example, suppose we want the comparison of
occupation 2 to occupation 1, taking the latter as the base comparison category. Then
we have

Hence, if one additional year of schooling increases the logodds of occ 2 instead of 0
by .7404, and increases the logodds of 1 instead of 0 by .2175, then it increases the
logodds of 2 versus 1 (taking occ 1 as the base category) by .7404-.2175=.5229

To get the odds ratio, we just take exp(.5229)=1.687. Note that this is identical
(aside from rounding error) to the ratio of the odds ratio for category 2 to the odds
ratio for category 1 from the regression above with 0 as the base category:
2.09/1.24=.5229.

In a similar fashion, all the intercepts and coefficients from a multinomial


regression that takes 1 as the base category can be recovered from the results above.
As an exercise, you should show how to do this so that you get the following results:

5. mlogit occ educ,base(1)

Iteration 0: log likelihood = -688.49317


Iteration 1: log likelihood = -578.97699
Iteration 2: log likelihood = -568.79391
Iteration 3: log likelihood = -568.46166
Iteration 4: log likelihood = -568.4611

Multinomial regression Number of obs = 633


LR chi2(2) = 240.06
Prob > chi2 = 0.0000
Log likelihood = -568.4611 Pseudo R2 = 0.1743

------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 |
educ | -.2175129 .0495753 -4.388 0.000 -.3146788 -.120347
_cons | 2.341483 .6221847 3.763 0.000 1.122024 3.560943
---------+--------------------------------------------------------------------
2 |
educ | .5229774 .0514263 10.169 0.000 .4221837 .6237711
_cons | -7.596161 .7404896 -10.258 0.000 -9.047494 -6.144828
------------------------------------------------------------------------------
(Outcome occ==1 is the comparison group)
Note that the education coefficient for the comparison of occupation 0 to occupation 1
is identical in magnitude but opposite in sign to the education coefficient for the
comparison of occ 1 to occ 0.

Now that you know how to recover coefficients and odds ratios by hand, here’s a
command that does it automatically and covers all possibilities:

6. listcoef educ

mlogit (N=633): Factor Change in the Odds of occ

Variable: educ (sd= 2.71668)

Odds comparing |
Group 1 - Group 2 | b z P>|z| e^b e^bStdX
------------------+---------------------------------------------
1 -2 | -0.52298 -10.169 0.000 0.5928 0.2415
1 -0 | 0.21751 4.388 0.000 1.2430 1.8056
2 -1 | 0.52298 10.169 0.000 1.6870 4.1403
2 -0 | 0.74049 11.753 0.000 2.0970 7.4758
0 -1 | -0.21751 -4.388 0.000 0.8045 0.5538
0 -2 | -0.74049 -11.753 0.000 0.4769 0.1338
----------------------------------------------------------------

Probability interpretations
How about computing the probability of being in each occupation for a given value of
schooling. To do this, ask stata to compute the probabilities with the following
command:

7. predict p0 p1 p2

Now to get these for each year of schooling value, I did the following (which by the
way, destroys your original data file)

8. collapse (mean) p0 p1 p2,by(educ)

Then I summed the probabilities for each vlaue of educ:

9.gen summ_p=p0 + p1 +p2

10. list educ p0 p1 p2 summ_p

educ p0 p1 p2 summ_p
1. 3 0.8438 0.1559 0.0004 1
2. 4 0.8127 0.1866 0.0008 1
3. 5 0.7768 0.2217 0.0015 1
4. 6 0.7359 0.2611 0.0030 1
5. 7 0.6899 0.3042 0.0059 1
6. 8 0.6385 0.3499 0.0115 1
7. 9 0.5817 0.3963 0.0220 1
8. 10 0.5192 0.4396 0.0412 1
9. 11 0.4506 0.4743 0.0751 1
10. 12 0.3763 0.4923 0.1314 1
11. 13 0.2977 0.4842 0.2181 1
12. 14 0.2194 0.4435 0.3371 1
13. 15 0.1485 0.3731 0.4784 1
14. 16 0.0919 0.2871 0.6210 1
15. 17 0.0525 0.2038 0.7437 1
16. 18 0.0281 0.1358 0.8360 1
17. 19 0.0144 0.0866 0.8990 1
18. 20 0.0072 0.0536 0.9392 1

Notice that for each value of education, the probabilities (as given by summ_p) sum to
1.
Here’s an example of computing the probabilities. I do this for case of educ=16 years.

As an exercise, you should try to compute some of the other probabilities at some
other levels of education to make sure you know how.

Note that the effect of a one year change in schooling on the probability, of, say,
being in occupation 2 depends on the value of schooling that you start from. This is
just like the binary case and is due to the fact that the probabilities are a
nonlinear function of schooling.

Potrebbero piacerti anche