We will work with a dataset on union membership used in the Stata manuals and in my own paper on intra-class correlation for binary data. This is a subsample of the National Longitudinal Survey of Youth (NLSY) and has union membership information from 1970-88 for 4,434 women aged 14-26 in 1968. The data are available in the Stata website
. use https://www.stata-press.com/data/r14/union, clear (NLS Women 14-24 in 1968)
Here is a logit model
. gen southXt = south*year . logit union age grade not_smsa south##c.year Iteration 0: log likelihood = -13864.23 Iteration 1: log likelihood = -13547.326 Iteration 2: log likelihood = -13542.493 Iteration 3: log likelihood = -13542.49 Iteration 4: log likelihood = -13542.49 Logistic regression Number of obs = 26,200 LR chi2(6) = 643.48 Prob > chi2 = 0.0000 Log likelihood = -13542.49 Pseudo R2 = 0.0232 ─────────────┬──────────────────────────────────────────────────────────────── union │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ .0207656 .0050001 4.15 0.000 .0109655 .0305658 grade │ .0489121 .0064308 7.61 0.000 .036308 .0615162 not_smsa │ -.2195163 .0355977 -6.17 0.000 -.2892865 -.1497461 1.south │ -1.530743 .4387964 -3.49 0.000 -2.390768 -.6707176 year │ -.0145607 .0057128 -2.55 0.011 -.0257576 -.0033638 │ south#c.year │ 1 │ .0110577 .0054775 2.02 0.044 .000322 .0217933 │ _cons │ -1.066526 .3414275 -3.12 0.002 -1.735711 -.3973401 ─────────────┴──────────────────────────────────────────────────────────────── . estimates store logit
Let us try a fixed-effects model first.
. xtlogit union age grade not_smsa south##c.year, i(id) fe note: multiple positive outcomes within groups encountered. note: 2,744 groups (14,165 obs) omitted because of all positive or all negative outcomes. Iteration 0: log likelihood = -4516.5881 Iteration 1: log likelihood = -4510.8906 Iteration 2: log likelihood = -4510.888 Iteration 3: log likelihood = -4510.888 Conditional fixed-effects logistic regression Number of obs = 12,035 Group variable: idcode Number of groups = 1,690 Obs per group: min = 2 avg = 7.1 max = 12 LR chi2(6) = 78.60 Log likelihood = -4510.888 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── union │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ .0710973 .0960536 0.74 0.459 -.1171643 .2593589 grade │ .0816111 .0419074 1.95 0.051 -.0005259 .163748 not_smsa │ .0224809 .1131786 0.20 0.843 -.199345 .2443069 1.south │ -2.856488 .6765694 -4.22 0.000 -4.182539 -1.530436 year │ -.0636853 .0967747 -0.66 0.510 -.2533602 .1259896 │ south#c.year │ 1 │ .0264136 .0083216 3.17 0.002 .0101036 .0427235 ─────────────┴──────────────────────────────────────────────────────────────── . estimates store fixed
Note how we lost 62% of our sample, down from 4434 to 1690 women. The 2744 drop outs are women who didn’t have variation in union membership over time. We will compare the estimates later.
Now we fit a random-effects model:
. xtlogit union age grade not_smsa south##c.year, i(id) re nolog Random-effects logistic regression Number of obs = 26,200 Group variable: idcode Number of groups = 4,434 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 5.9 max = 12 Integration method: mvaghermite Integration pts. = 12 Wald chi2(6) = 227.46 Log likelihood = -10540.274 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── union │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ .0156732 .0149895 1.05 0.296 -.0137056 .045052 grade │ .0870851 .0176476 4.93 0.000 .0524965 .1216738 not_smsa │ -.2511884 .0823508 -3.05 0.002 -.4125929 -.0897839 1.south │ -2.839112 .6413116 -4.43 0.000 -4.096059 -1.582164 year │ -.0068604 .0156575 -0.44 0.661 -.0375486 .0238277 │ south#c.year │ 1 │ .0238506 .0079732 2.99 0.003 .0082235 .0394777 │ _cons │ -3.009365 .8414963 -3.58 0.000 -4.658667 -1.360062 ─────────────┼──────────────────────────────────────────────────────────────── /lnsig2u │ 1.749366 .0470017 1.657245 1.841488 ─────────────┼──────────────────────────────────────────────────────────────── sigma_u │ 2.398116 .0563577 2.290162 2.511158 rho │ .6361098 .0108797 .6145307 .6571548 ─────────────┴──────────────────────────────────────────────────────────────── LR test of rho=0: chibar2(01) = 6004.43 Prob >= chibar2 = 0.000 . estimates store random
Here’s a table comparing the estimates for logit, random and
fixed-effects models. We use the equation
option so Stata
can match the estimates.
. estimates table logit random fixed, equation(1) ─────────────┬─────────────────────────────────────── Variable │ logit random fixed ─────────────┼─────────────────────────────────────── #1 │ age │ .02076564 .01567318 .07109727 grade │ .0489121 .08708515 .08161106 not_smsa │ -.21951628 -.25118839 .02248093 │ south │ 1 │ -1.5307427 -2.8391118 -2.8564878 │ year │ -.01456067 -.00686044 -.06368531 │ south#c.year │ 1 │ .01105767 .02385056 .02641356 │ _cons │ -1.0665258 -3.0093648 ─────────────┼─────────────────────────────────────── /lnsig2u │ 1.7493664 ─────────────┴───────────────────────────────────────
The main change is in the coefficient of not_smsa
. You
might think this indicates something wrong with the logit and
random-effects models, but note that only women who have moved
between standard metropolitan statistical areas and other places
contribute to the fixed-effects estimate. It seems reasonable to believe
that these women differ from the rest.
The random-effect coefficients are larger in magnitude than the ordinary logit coefficients. This is almost always the case. Omission of the random effect biases the coefficients towards zero.
The intra-class correlation of 0.636 in the random-effects model indicates a high correlation between a woman’s propensity to be a union member in different years, after controlling for education and residence.
My paper with Elo in the Stata journal shows how this coefficient can be interpreted in terms of an odds ratio, and translated into measures of manifest correlation such as Pearson’s r and Yule’s Q.
These measures can be calculated using xtrho
, a
post-estimation command available from the Stata journal website. In
Stata type search xtrho
, or
net describe st0031, from(https://www.stata-journal.com/software/sj3-1)
.
For the average woman the correlation between actual union membership in
any two years is 0.42 using Pearson’s r and 0.78 using Yule’s Q.
Rodríguez, G. and Elo, I. (2003). “Intra-class correlation in random-effects models for binary data”. The Stata Journal,3(1):32-46. https://journals.sagepub.com/doi/pdf/10.1177/1536867X0300300102
Updated fall 2022