Let us consider Example 16.1 in Wooldridge (2010), concerning school and employment decisions for young men. The data contain information on employment and schooling for young men over several years. We will work with the data for 1987.
The outcome is status, coded 1=in school, 2=at home (meaning not in school and not working), and 3=working. The predictors are education, a quadratic on work experience, and an indicator for black.
We read the data from the Stata website, keep the year 1987, drop
missing values, label the outcome, and fit the model. The argument
reltol
is used to increase the precision of the
estimates
> library(haven)
> library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
> library(nnet)
> keane <- read_dta("https://www.stata.com/data/jwooldridge/eacsap/keane.dta")
> keane <- filter(keane, year==87, !is.na(status)) |>
+ mutate(status=factor(status, labels=c("school","home","work")))
> ml <- multinom(status ~ educ + exper + expersq + black, data=keane, reltol=1e-12)
# weights: 18 (10 variable)
initial value 1886.317300
iter 10 value 1034.528631
iter 20 value 907.857379
iter 30 value 907.857229
final value 907.857225
converged
> summary(ml)
Call:
multinom(formula = status ~ educ + exper + expersq + black, data = keane,
reltol = 1e-12)
Coefficients:
(Intercept) educ exper expersq black
home 10.277875 -0.6736313 -0.1062149 -0.01251515 0.8130167
work 5.543798 -0.3146573 0.8487367 -0.07730035 0.3113612
Std. Errors:
(Intercept) educ exper expersq black
home 1.133336 0.06989987 0.1732820 0.02522912 0.3027231
work 1.086409 0.06510963 0.1569856 0.02292171 0.2815339
Residual Deviance: 1815.714
AIC: 1835.714
The results agree exactly with Table 16.1 in Wooldridge (2010, page 645).
Let us focus on the coefficient of black in the work equation, which is 0.311. Exponentiating we obtain
Thus, the relative probability of working rather than being in school is 37% higher for blacks than for non-blacks with the same education and work experience. (Relative probabilities are also called relative odds.)
A common mistake is to interpret this coefficient as meaning that the probability of working is higher for blacks. It is only the relative probability of work over school that is higher. To obtain a fuller picture we need to consider the second equation as well. The coefficient of black in the home equation is 0.813. Exponentiating, we obtain
Thus, the relative probability of being at home rather than in school for blacks is more than double the corresponding relative probability for non blacks with the same education and work experience.
In short, black is associated with an increase in the relative probability of work over school, but also a much large increase in the relative probability of home over school. What happens with the actual probability of working depends on how these two effects balance out.
To determine the effect of black in the probability scale we need to compute marginal effects, which can be done using continuous or discrete calculations.
The continuous calculation is based on the derivative of the probability of working with respect to a predictor. Let \(\pi_{ij}=\Pr\{Y_i=j\}\) denote the probability that the i-th observation follows on the j-th category, which is given by \[ \pi_{ij} = \frac{e^{x_i'\beta_j}}{\sum_r e^{x_i'\beta_r}} \] where \(\beta_j = 0\) when j is the baseline or reference outcome, in this case school.
Taking derivatives w.r.t. the k-th predictor we obtain, after some simplification \[ \frac{\partial\pi_{ij}}{\partial x_{ik}} = \pi_{ij} ( \beta_{jk} - \sum_r \pi_{ir} \beta_{rk} ) \] noting again that the coefficient is zero for the baseline outcome.
To compute these we predict the probabilities and then apply the formula.
> b <- coef(ml)
> pr <- data.frame(predict(ml, type="p"))
> part <- pr$home*b["home","black"] + pr$work*b["work","black"]
> me <- data.frame(
+ school = pr$school * ( - part),
+ home = pr$home * (b["home", "black"] - part),
+ work = pr$work * (b["work", "black"] - part))
> summarize(me, school = mean(school), home = mean(home), work=mean(work))
school home work
1 -0.01838111 0.05897901 -0.0405979
We find that the average marginal effect of black on work is actually negative: -0.0406. This means that the probability of working is on average about four percentage points lower for blacks than for non-blacks with the same education and experience.
For the discrete calculation we compute predicted probabilities by setting ethnicity to black and then to non-black and averaging:
> black <- colMeans(predict(ml, mutate(keane, black=1), type="p"))
> black
school home work
0.04507381 0.22741139 0.72751480
> notblack <- colMeans(predict(ml, mutate(keane, black=0), type="p"))
> notblack
school home work
0.06306482 0.16594933 0.77098585
> data.frame(avg.marginal.effect = black["work"] - notblack["work"])
avg.marginal.effect
work -0.04347105
We find that the average probability of working is 0.7275 if black and 0.7710 if not black, a difference of -0.0435, so the probability of working is on average just over four percentage points lower for blacks.
The take away conclusion here is that multinomial logit coefficients can only be interpreted in terms of relative probabilities. To reach conclusions about actual probabilities we need to calculate continuous or discrete marginal effects.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. Second Edition. Cambridge, Massachussets: The MIT Press.
Updated fall 2022